Skip to content

Latest commit

 

History

History
43 lines (42 loc) · 14.4 KB

cot_results.md

File metadata and controls

43 lines (42 loc) · 14.4 KB

Collected from Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Method Year Setting Model Mathematical Mathematical Mathematical Mathematical Commonsense Commonsense Symbolic Symbolic Paper
GSM8K SVAMP Asdiv AQuA CSQA StrategyQA LastLetterConcat CoinFlip
I-O Prompting 2020 fewshot text-davinci-002 19.70 69.90 74.00 29.50 79.50 65.90 5.80 49.00 Language Models are Few-Shot Learners
Fewshot CoT 2022 fewshot text-davinci-002 63.10 76.40 80.40 45.30 73.50 65.40 77.50 99.60 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
PoT 2022 fewshot text-davinci-002 80.00 89.10 - 58.60 - - - - Program of Thoughts Prompting: Disentangling Computation from Reasoningfor Numerical Reasoning Tasks
Complex CoT 2023 fewshot text-davinci-002 72.60 - - - - 77.00 - - Complexity-Based Prompting for Multi-step Reasoning
Automate CoT 2023 fewshot text-davinci-002 49.70 73.30 74.20 37.90 76.10 67.90 58.90 - Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
Fewshot CoT 2022 fewshot text-davinci-003 66.83 69.06 - 29.13 - - - - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
PHP 2023 fewshot text-davinci-003 79.00 84.70 - 58.60 - - - - Progressive-Hint Prompting Improves Reasoning in Large Language Models
Self-consistency 2023 fewshot text-davinci-003 67.93 83.11 - 55.12 - - - - Self-Consistency Improves Chain of Thought Reasoning in Language Models
Active Prompt 2023 fewshot text-davinci-003 65.60 80.50 79.80 48.00 78.90 74.20 71.20 - Active Prompting with Chain-of-Thought for Large Language Models
Synthetic Prompt 2023 fewshot text-davinci-003 73.90 81.80 80.70 - - - - - Synthetic Prompting: Generating Chain-of-Thought Demonstrations forLarge Language Models
FOBAR 2023 fewshot text-davinci-003 79.50 86.00 - 58.66 - - - - Forward-Backward Reasoning in Large Language Models for Verification
Boosted Prompting 2023 fewshot text-davinci-003 71.60 - - 55.10 - - - - Boosted Prompt Ensembles for Large Language Models
Fewshot CoT 2022 fewshot code-davinci-002 60.10 75.80 80.10 39.80 79.00 73.40 70.40 99.00 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Self-Consistency 2023 fewshot code-davinci-002 78.00 86.80 87.80 52.00 81.50 79.80 73.40 99.50 Self-Consistency Improves Chain of Thought Reasoning in Language Models
PAL 2023 fewshot code-davinci-002 72.00 79.40 79.60 - - - - - PAL: Program-aided Language Models
Resprompt 2023 fewshot code-davinci-002 66.60 - - 45.30 - - - - Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models
DIVERSE 2023 fewshot code-davinci-002 82.30 87.00 88.70 - 79.90 78.60 - - Making Language Models Better Reasoners with Step-Aware Verifier
Least-to-Most 2023 fewshot code-davinci-002 68.01 - - - - - 94.00 - Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Boosted Prompting 2023 fewshot code-davinci-002 83.30 88.60 - 61.70 - - - - Boosted Prompt Ensembles for Large Language Models
Fewshot CoT 2022 fewshot gpt-3.5-turbo 76.50 81.90 - 54.30 78.00 63.70 73.20 99.00 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Self-consistency 2023 fewshot gpt-3.5-turbo 81.90 86.40 - 62.60 - - - - Self-Consistency Improves Chain of Thought Reasoning in Language Models
MetaCoT 2023 fewshot gpt-3.5-turbo 75.10 88.60 - 54.70 72.40 64.50 77.20 100.00 Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Verify CoT 2023 fewshot gpt-3.5-turbo 86.00 - - 69.50 - - 92.60 - Deductive Verification of Chain-of-Thought Reasoning
Active Prompting 2023 fewshot gpt-3.5-turbo 81.80 82.50 87.90 55.30 - - - - Active Prompting with Chain-of-Thought for Large Language Models
RCoT 2023 fewshot gpt-3.5-turbo 84.60 84.90 89.30 57.10 - - - - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoningby Reversing Chain-of-Thought
FOBAR 2023 fewshot gpt-3.5-turbo 87.40 87.40 - 57.50 - - - - Forward-Backward Reasoning in Large Language Models for Verification
Memory-of-Thought 2023 fewshot gpt-3.5-turbo - - - 54.10 - - - - MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve withMemory-of-Thoughts
Adaptive-consistency 2023 fewshot gpt-3.5-turbo 82.70 85.00 83.00 - - 67.90 - - Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoningwith LLMs
Boosted Prompting 2023 fewshot gpt-3.5-turbo 87.10 - - 72.80 - - - - Boosted Prompt Ensembles for Large Language Models
Zeroshot CoT 2022 zeroshot text-davinci-002 40.50 63.70 - 31.90 64.00 52.30 57.60 87.80 Large Language Models are Zero-Shot Reasoners
PoT 2022 zeroshot text-davinci-002 57.00 70.80 - 43.90 - - - - Program of Thoughts Prompting: Disentangling Computation from Reasoningfor Numerical Reasoning Tasks
AutoCoT 2023 zeroshot text-davinci-002 47.90 69.50 - 36.50 74.40 65.40 59.70 99.90 Automatic Chain of Thought Prompting in Large Language Models
COSP 2023 zeroshot code-davinci-001 8.70 - - 55.40 52.80 - - Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoningwith LLMs
Plan-and-Solve 2023 zeroshot text-davinci-003 58.20 72.00 - 42.50 65.20 63.80 64.80 96.80 Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Agent-Instruct 2023 zeroshot gpt-3.5-turbo 73.40 80.80 - 57.90 74.10 69.00 99.80 95.20 Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Self-Refine 2023 zeroshot gpt-3.5-turbo 64.10 - - - - - - - Self-Refine: Iterative Refinement with Self-Feedback
RCoT 2023 zeroshot gpt-3.5-turbo 82.00 79.60 86.00 55.50 - - - - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoningby Reversing Chain-of-Thought