Strategic Chain-of-Thought: 策略先行的推理范式

Posted on 九月 5, 2024

论文信息

标题: Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation
作者: Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan
机构: Harbin Institute of Technology, Tencent AI Lab
发表: arXiv preprint
链接: arXiv | PDF

核心贡献

Strategic Chain-of-Thought (SCoT)提出在生成推理步骤前先让模型制定解题策略，通过两阶段prompt设计，在GSM8K上实现21.05%的性能提升。核心创新是将策略规划与执行分离，解决了传统CoT推理不稳定的问题。

问题与解决方案

传统CoT的不稳定性

核心问题：
Chain-of-Thought prompting的效果高度不稳定：

相同问题，不同随机种子导致性能波动10-15%
推理路径质量参差不齐
容易在早期步骤犯错，后续步骤无法挽回

根本原因：
模型直接开始推理，缺乏整体规划。就像解数学题不审题直接算，容易用错方法。

SCoT的核心思想

两阶段设计：

阶段1: 策略制定
"What strategy should I use to solve this problem?"
模型输出: 解题策略（如"先找关键信息，再列方程"）

阶段2: 策略执行  
"Following the strategy [上述策略], let's solve step by step:"
模型输出: 按策略进行的推理步骤

为什么有效：

策略提供了推理的”脚手架”（scaffold）
强制模型先思考方法论，再执行
策略可以作为推理过程的检查点（是否偏离策略？）

方法详解

方法设计

Prompt模板

SCoT Zero-Shot：

Let's first devise a strategy to solve this problem.
Strategy: [模型生成策略]

Now, let's follow this strategy step by step:
Step 1: [模型按策略推理]
...
Therefore, the answer is: [答案]

SCoT Few-Shot：
增加示例匹配机制：

Here are some examples with their strategies:
[示例1: 问题 → 策略 → 推理]
[示例2: 问题 → 策略 → 推理]

Now for the new problem:
Strategy: [生成]
Reasoning: [生成]

自动示例匹配

作者提出根据问题类型自动选择示例：

对问题分类（算术/逻辑/常识）
从示例池中检索同类示例
提取示例中的成功策略模式

优势：避免人工设计示例，自动化few-shot优化。

实验结果

主要性能提升

算术推理（GSM8K）：

Llama3-8b基线CoT: 72.1%
SCoT (Zero-Shot): 82.5% (+10.4%)
SCoT (Few-Shot): 93.15% (+21.05%) ✓

对象追踪（Tracking_Objects）：

基线CoT: 61.2%
SCoT: 85.33% (+24.13%)

逻辑推理（Date Understanding）：

基线CoT: 68.4%
SCoT: 79.2% (+10.8%)

跨模型泛化

模型	CoT基线	SCoT
GPT-3.5-turbo	75.3%	84.1%
Llama3-8b	72.1%	93.2%
Llama2-70b	83.2%	89.7%

趋势：中等规模模型受益最大（Llama3-8b提升21%），大模型提升较小（70b仅6.5%）。

消融实验

策略的必要性：

无策略（标准CoT）: 72.1%
仅策略无推理: 45.3%（策略本身不够）
策略+推理（SCoT）: 93.2% ✓

策略质量的影响：

随机策略: 76.5% (+4.4%，仍有提升！）
相关策略: 88.3%
最优策略: 93.2%

洞察：即使随机策略也能提升性能，说明”有策略”本身就是一种有用的推理结构。

深度分析

为什么策略有效？

假设1：减少搜索空间

策略限制了推理的可能路径：

无策略：10^6种可能的推理序列
有策略：10^4种符合策略的序列

类似于在迷宫中先看地图再走。

假设2：激活相关知识

生成策略的过程激活了任务相关的预训练知识：

# 伪代码：策略的隐式作用
strategy = model.generate("strategy for problem P")
# → 激活知识: "类似问题通常用X方法"

reasoning = model.generate("solve P using strategy", context=strategy)
# → 推理时已经在正确的知识区域

假设3：自洽性检查

策略提供了检查推理是否偏离的参照：

每步推理后，模型隐式检查”这步符合策略吗？”
提高了推理的自洽性（self-consistency）

与相关工作对比

vs Plan-and-Solve：

Plan-and-Solve：用固定的计划模板
SCoT：让模型自己生成策略（更灵活）

vs Least-to-Most Prompting：

Least-to-Most：分解问题为子问题
SCoT：制定解题策略（更抽象的规划）

vs Tree-of-Thoughts：

ToT：搜索多条推理路径
SCoT：用策略引导单条路径（效率更高）

失败案例

案例：需要多种策略的问题

问题：”Alice has 5 apples, gives 2 to Bob who then gives half to Charlie. If Charlie buys 3 more, how many does each have?”

SCoT策略：”Track each person’s apples separately”
推理：成功跟踪Alice和Bob，但在Charlie处出错

原因：单一策略不足以覆盖问题的所有方面。可能需要hierarchical策略。

实用价值

适用场景

✓ 推荐使用：

数学推理、逻辑题、代码调试
需要多步规划的任务
CoT效果不稳定的场景

✗ 不推荐：

简单的查找/分类任务
已经有明确算法的问题
需要创造性而非逻辑性的任务

实现技巧

# 基础SCoT实现
def scot_prompt(question):
    return f"""Question: {question}
    
Let's first understand what strategy would work best for this problem.
Strategy: [Think about the approach]

Now following this strategy, let's solve step by step:
[Work through the solution]

Therefore, the final answer is: [Answer]
"""

# 改进：策略模板库
STRATEGY_TEMPLATES = {
    "arithmetic": "Break down into smaller calculations",
    "logic": "List knowns and unknowns, then reason",
    "word_problem": "Extract key information, then formulate"
}

def scot_with_template(question, task_type):
    template = STRATEGY_TEMPLATES.get(task_type, "Think systematically")
    return f"""Question: {question}
    
Suggested strategy: {template}
Let's apply this strategy:
...
"""

与Self-Consistency结合

# SCoT + Self-Consistency
def scot_self_consistency(question, n=5):
    answers = []
    for _ in range(n):
        # 每次生成不同的策略
        response = model.generate(scot_prompt(question))
        answer = extract_answer(response)
        answers.append(answer)
    
    # 多数投票
    return majority_vote(answers)

# 实验：SCoT+SC比单独SCoT再提升3-5%

成本收益

方面	CoT	SCoT	变化
Token消耗	100	150	+50%
准确率	72%	93%	+29%
延迟	1x	1.3x	+30%
ROI	-	-	性价比高

结论：虽然成本增加50%，但准确率提升29%，对于关键任务非常值得。

局限性

策略生成质量依赖模型能力：小模型可能生成无效策略
增加了token消耗：策略本身需要50-100 tokens
不适合创造性任务：过于结构化可能限制创新
缺少代码实现：论文未开源，需自行实现

总结

SCoT通过”策略先行”的简单思想，显著提升了CoT推理的稳定性和准确性：

核心优势：

✓ 实现简单（只需修改prompt）
✓ 效果显著（GSM8K +21%）
✓ 跨模型通用
✓ 与其他技术兼容（可结合SC、ToT）

最佳实践：

对复杂推理任务，优先尝试SCoT
可以人工提供策略模板加速
与Self-Consistency结合效果更佳

启示：
人类解题时会先想”用什么方法”，再执行。SCoT让LLM也这样做，是非常自然且有效的改进。