2025.11.21 | V-ReasonBench考视频模型推理；Step-Audio-R1让语音越“想”越强 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：[00:22] 📊 V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models（V-ReasonBench：面向视频生成模型的统一推理基准套件）[01:06] 🧠 Step-Audio-R1 Technical Report（Step-Audio-R1技术报告）[01:48] 🧭 Scaling Spatial Intelligence with Multimodal Foundation Models（通过多模态基础模型扩展空间智能）[02:18] 🎬 First Frame Is the Place to Go for Video Content Customization（首帧是实现视频内容定制化的关键所在）[02:49] 🎬 Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO（视频即答案：使用联合GRPO预测并生成下一视频事件）[03:29] 🔮 SAM 3D: 3Dfy Anything in Images（SAM 3D：图像中任意物体的三维化）[04:03] 🚀 MiMo-Embodied: X-Embodied Foundation Model Technical Report（MiMo-Embodied：跨具身基础模型技术报告）[04:38] 🧠 Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation（边生成边思考：在视觉生成中交织文本推理）[05:10] 🏆 TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval（TurkColBERT：土耳其信息检索中稠密与延迟交互模型的基准研究）[05:53] 🌀 Nemotro...