
0:000:00
<p>本期的 15 篇论文如下:</p><p>[00:26] 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer(Z-Image:基于单流扩散Transformer的高效图像生成基础模型)</p><p>[01:00] 🤔 REASONEDIT: Towards Reasoning-Enhanced Image Editing Models(REASONEDIT:迈向推理增强的图像编辑模型)</p><p>[01:25] 🎬 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement(AnyTalker:通过交互性精炼实现可扩展的多人物对话视频生成)</p><p>[01:59] 🌉 Vision Bridge Transformer at Scale(大规模视觉桥接变换器)</p><p>[02:35] 🔍 Architecture Decoupling Is Not All You Need For Unified Multimodal Model(架构解耦并非统一多模态模型的全部所需)</p><p>[03:23] ⚡ DiP: Taming Diffusion Models in Pixel Space(DiP:在像素空间驾驭扩散模型)</p><p>[03:49] 🧠 Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models(每个令牌都重要:在大型语言模型中泛化1600万超长上下文)</p><p>[04:19] 🤖 DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action(DualVLA:通过部分解耦推理与动作构建可泛化的具身智能体)</p><p>[05:02] ⚡ Adversarial Flow Models(对抗性流模型)</p><p>[05:29] 🔬 D...