s1: 简单的测试时扩展
ArXiv ID: 2501.19393作者: Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto机构: Stanford University, University of Washington, Hugging Face发布日期: 2025-01-31模型: s1-32B (基于 Qwen2.5-32B-Instruct)
摘要OpenAI o1 等推理模型展示了测试时计算扩展的巨大潜力,但其训练方法(大规模强化学习)成本高昂且不透明。本文证明,仅用 1000 个精选问题进行监督微调,...