Chain of Preference Optimization: 用偏好学习蒸馏Tree-of-Thought推理能力

论文信息 标题: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs 作者: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin 机构: Sea AI Lab (SAIL), Nanyang Technological University 发表: NeurIPS 2024 链接: arXiv | GitHub | PDF 核心贡献CPO通过偏好优化将Tree-of-Thought的搜索能力蒸馏到Chain-of-Thought推理中,实现了在推理时无需树搜索开销的情况下,达到甚至超越ToT的性能。核心创新在于利用树搜索过程中的隐含偏好信息,训练模型对齐优质推理路径。 研究动机CoT的局限性Chain...

阅读全文

© 2026 Generative AI Discovery All Rights Reserved.
Theme by hiero