Collected from Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Method	Year	Setting	Model	Mathematical	Mathematical	Mathematical	Mathematical	Commonsense	Commonsense	Symbolic	Symbolic	Paper
				GSM8K	SVAMP	Asdiv	AQuA	CSQA	StrategyQA	LastLetterConcat	CoinFlip
I-O Prompting	2020	fewshot	text-davinci-002	19.70	69.90	74.00	29.50	79.50	65.90	5.80	49.00	Language Models are Few-Shot Learners
Fewshot CoT	2022	fewshot	text-davinci-002	63.10	76.40	80.40	45.30	73.50	65.40	77.50	99.60	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
PoT	2022	fewshot	text-davinci-002	80.00	89.10	-	58.60	-	-	-	-	Program of Thoughts Prompting: Disentangling Computation from Reasoningfor Numerical Reasoning Tasks
Complex CoT	2023	fewshot	text-davinci-002	72.60	-	-	-	-	77.00	-	-	Complexity-Based Prompting for Multi-step Reasoning
Automate CoT	2023	fewshot	text-davinci-002	49.70	73.30	74.20	37.90	76.10	67.90	58.90	-	Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
Fewshot CoT	2022	fewshot	text-davinci-003	66.83	69.06	-	29.13	-	-	-	-	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
PHP	2023	fewshot	text-davinci-003	79.00	84.70	-	58.60	-	-	-	-	Progressive-Hint Prompting Improves Reasoning in Large Language Models
Self-consistency	2023	fewshot	text-davinci-003	67.93	83.11	-	55.12	-	-	-	-	Self-Consistency Improves Chain of Thought Reasoning in Language Models
Active Prompt	2023	fewshot	text-davinci-003	65.60	80.50	79.80	48.00	78.90	74.20	71.20	-	Active Prompting with Chain-of-Thought for Large Language Models
Synthetic Prompt	2023	fewshot	text-davinci-003	73.90	81.80	80.70	-	-	-	-	-	Synthetic Prompting: Generating Chain-of-Thought Demonstrations forLarge Language Models
FOBAR	2023	fewshot	text-davinci-003	79.50	86.00	-	58.66	-	-	-	-	Forward-Backward Reasoning in Large Language Models for Verification
Boosted Prompting	2023	fewshot	text-davinci-003	71.60	-	-	55.10	-	-	-	-	Boosted Prompt Ensembles for Large Language Models
Fewshot CoT	2022	fewshot	code-davinci-002	60.10	75.80	80.10	39.80	79.00	73.40	70.40	99.00	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Self-Consistency	2023	fewshot	code-davinci-002	78.00	86.80	87.80	52.00	81.50	79.80	73.40	99.50	Self-Consistency Improves Chain of Thought Reasoning in Language Models
PAL	2023	fewshot	code-davinci-002	72.00	79.40	79.60	-	-	-	-	-	PAL: Program-aided Language Models
Resprompt	2023	fewshot	code-davinci-002	66.60	-	-	45.30	-	-	-	-	Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models
DIVERSE	2023	fewshot	code-davinci-002	82.30	87.00	88.70	-	79.90	78.60	-	-	Making Language Models Better Reasoners with Step-Aware Verifier
Least-to-Most	2023	fewshot	code-davinci-002	68.01	-	-	-	-	-	94.00	-	Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Boosted Prompting	2023	fewshot	code-davinci-002	83.30	88.60	-	61.70	-	-	-	-	Boosted Prompt Ensembles for Large Language Models
Fewshot CoT	2022	fewshot	gpt-3.5-turbo	76.50	81.90	-	54.30	78.00	63.70	73.20	99.00	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Self-consistency	2023	fewshot	gpt-3.5-turbo	81.90	86.40	-	62.60	-	-	-	-	Self-Consistency Improves Chain of Thought Reasoning in Language Models
MetaCoT	2023	fewshot	gpt-3.5-turbo	75.10	88.60	-	54.70	72.40	64.50	77.20	100.00	Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Verify CoT	2023	fewshot	gpt-3.5-turbo	86.00	-	-	69.50	-	-	92.60	-	Deductive Verification of Chain-of-Thought Reasoning
Active Prompting	2023	fewshot	gpt-3.5-turbo	81.80	82.50	87.90	55.30	-	-	-	-	Active Prompting with Chain-of-Thought for Large Language Models
RCoT	2023	fewshot	gpt-3.5-turbo	84.60	84.90	89.30	57.10	-	-	-	-	RCOT: Detecting and Rectifying Factual Inconsistency in Reasoningby Reversing Chain-of-Thought
FOBAR	2023	fewshot	gpt-3.5-turbo	87.40	87.40	-	57.50	-	-	-	-	Forward-Backward Reasoning in Large Language Models for Verification
Memory-of-Thought	2023	fewshot	gpt-3.5-turbo	-	-	-	54.10	-	-	-	-	MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve withMemory-of-Thoughts
Adaptive-consistency	2023	fewshot	gpt-3.5-turbo	82.70	85.00	83.00	-	-	67.90	-	-	Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoningwith LLMs
Boosted Prompting	2023	fewshot	gpt-3.5-turbo	87.10	-	-	72.80	-	-	-	-	Boosted Prompt Ensembles for Large Language Models
Zeroshot CoT	2022	zeroshot	text-davinci-002	40.50	63.70	-	31.90	64.00	52.30	57.60	87.80	Large Language Models are Zero-Shot Reasoners
PoT	2022	zeroshot	text-davinci-002	57.00	70.80	-	43.90	-	-	-	-	Program of Thoughts Prompting: Disentangling Computation from Reasoningfor Numerical Reasoning Tasks
AutoCoT	2023	zeroshot	text-davinci-002	47.90	69.50	-	36.50	74.40	65.40	59.70	99.90	Automatic Chain of Thought Prompting in Large Language Models
COSP	2023	zeroshot	code-davinci-001	8.70	-	-		55.40	52.80	-	-	Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoningwith LLMs
Plan-and-Solve	2023	zeroshot	text-davinci-003	58.20	72.00	-	42.50	65.20	63.80	64.80	96.80	Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Agent-Instruct	2023	zeroshot	gpt-3.5-turbo	73.40	80.80	-	57.90	74.10	69.00	99.80	95.20	Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Self-Refine	2023	zeroshot	gpt-3.5-turbo	64.10	-	-	-	-	-	-	-	Self-Refine: Iterative Refinement with Self-Feedback
RCoT	2023	zeroshot	gpt-3.5-turbo	82.00	79.60	86.00	55.50	-	-	-	-	RCOT: Detecting and Rectifying Factual Inconsistency in Reasoningby Reversing Chain-of-Thought

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cot_results.md

cot_results.md

Files

cot_results.md

Latest commit

History

cot_results.md

File metadata and controls