
0:000:00
<p>本期的 15 篇论文如下:</p><p>[00:22] 📊 V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models(V-ReasonBench:面向视频生成模型的统一推理基准套件)</p><p>[01:06] 🧠 Step-Audio-R1 Technical Report(Step-Audio-R1技术报告)</p><p>[01:48] 🧭 Scaling Spatial Intelligence with Multimodal Foundation Models(通过多模态基础模型扩展空间智能)</p><p>[02:18] 🎬 First Frame Is the Place to Go for Video Content Customization(首帧是实现视频内容定制化的关键所在)</p><p>[02:49] 🎬 Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO(视频即答案:使用联合GRPO预测并生成下一视频事件)</p><p>[03:29] 🔮 SAM 3D: 3Dfy Anything in Images(SAM 3D:图像中任意物体的三维化)</p><p>[04:03] 🚀 MiMo-Embodied: X-Embodied Foundation Model Technical Report(MiMo-Embodied:跨具身基础模型技术报告)</p><p>[04:38] 🧠 Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation(边生成边思考:在视觉生成中交织文本推理)</p><p>[05:10] 🏆 TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval(TurkColBERT:土耳其信息检索中稠密与延迟交互模型的基准研究)</p><p>[05:53] 🌀 Nemotro...