2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

Published on Nov 11
9分钟
HuggingFace 每日AI论文速递
0:00
0:00
<p>本期的 13 篇论文如下:</p><p>[00:25] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction(IterResearch:基于马尔可夫状态重构的长程智能体再思考)</p><p>[01:16] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation(DRIVE:面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践)</p><p>[02:03] 🔬 The Station: An Open-World Environment for AI-Driven Discovery(“站”:面向AI驱动科学发现的开放世界环境)</p><p>[02:43] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services(RedOne 2.0:社交网络场景下领域大模型后训练新范式)</p><p>[03:15] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization(SofT-GRPO:用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及)</p><p>[03:53] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs(路由流形对齐提升混合专家大语言模型的泛化能力)</p><p>[04:30] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads(以置信度推理:通过不确定性头高效验证大模型推理步骤)...