Evaluation_of_LLMs_on_NLG

Evaluation of LLMs on NLG benchmarks.

What is NLG?

Natural Language Generation (NLG) is the process of producing a natural language text to meet specified communicative goals. The texts that are generated may range from a single phrase given in answer to a question, through multi-sentence remarks and questions within a dialog, to full-page explanations.

Tasks and Datasets

Text Summarization

Language	Dataset	Test Input	Test Output	Train Input	Train Output	Data
English	CNN/DailyMail	3628; 175; 1088.84	1101; 12; 84.47			Paper & Data
	XSum (Extreme summarization)	18962; 95; 661.01	107; 4; 31.63	23397; 95; 656.37	120; 2; 31.68	Paper & Data
Chances	LCSTS	786; 191; 329.88	72; 6; 32.20	2736; 171; 329.37	107; 4; 32.23	--原始数据处理后数据提取码：duba

Open-domain Dialogue System

Language	Dataset	Test Input	Test Output	Train Input	Train Output	Data
English	DailyDialog	663, 129, 246.57	max:246, 3, 16.55	1099, 128, 247.60	313, 2, 16.19	Paper & Data
	PersonaChat	529, 273, 372.28	32, 5, 13.53			Paper & Data
	Empathatic Dialogues	380, 148, 201.64	110, 2, 17.85	400, 146, 193.13	135 2, 16.80	Paper & Data
Chinese	清华LCCC	848, 214, 264.73	347, 3, 28.27	1666, 213, 278.59	314, 2, 26.98	Paper & Data

Controllable Story Generation

Language	Dataset	Test Input	Test Output	Train Input	Train Output	Paper
English	ROCStories	203; 115; 140.44	34; 4; 12.76	182; 115; 140.56	29; 4; 12.74	Paper & Data
	WritingPrompts	179; 95; 125.64	2912; 132; 793.14	208; 95; 126.23	8308; 115; 789.08	Paper & Data
Chinese	LOT	329; 215; 254.73	568; 112; 275.07	324; 207; 253.90	624; 113; 273.10	Paper & Data

Data-to-Text

Language	Dataset	Test Input	Test Output	Paper
English	WebNLG	324; 156; 201.68	158; 7; 38.55	Paper & Data
	Rotowire	813; 177; 434.31	998; 191; 460.73	Paper & Data
Chinese	Advertisement	453; 222; 288	364; 93; 193	Paper & Data

Style Transfer (Paraphrasing & Simplification & formal-informal)

Language	Dataset	Paper
English	Yelp	Paper & Data
Chinese (Formal-Informal)	CLSD	Paper & Data
English (Paraphrasing)	ParaNMT	[Paper](ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations) & Data
English (Sentence Simplification)	WikiLarge	[Paper](Sentence Simplification with Deep Reinforcement Learning) & Data

LLMs

ChatGPT
ChatGLM2-6B $\checkmark$
Flan-T5-XXL $\dots$
LLaMA2-7b-chat $\checkmark$
LLaMA2-13b-chat$\checkmark$
Llama2-Chinese-13b-Chat $\checkmark$
Vicuna-13B-v1.5-16k $\checkmark$
Chinese-Alpaca-2-13b $\checkmark$
Qwen-7b-chat$\checkmark$
Baichuan2-13b-chat$\checkmark$
Oasst-Pythia-12B

实验设置：Chat模型

实验目的：

中英文模型对照 (llama2-7b-chat vs baichuan2 vs Qwen)
模型规模（llama2-7b-chat vs llama2-13b-chat）
模型架构不同（chatglm2-6b vs llama2-7b-chat，llama2-13b-chat vs oasst-pythia-12b，flan-t5-xxl）
模型微调数据集不同（Qwen, Baichuan2, chinsese-alpaca2, vicuna）

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
INDEX.md		INDEX.md
README.md		README.md
Results.md		Results.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation_of_LLMs_on_NLG

What is NLG?

Tasks and Datasets

Text Summarization

Open-domain Dialogue System

Controllable Story Generation

Data-to-Text

Style Transfer (Paraphrasing & Simplification & formal-informal)

LLMs

About

Releases

Packages

Contributors 2

Patrick-Ni/Evaluation_of_LLMs_on_NLG

Folders and files

Latest commit

History

Repository files navigation

Evaluation_of_LLMs_on_NLG

What is NLG?

Tasks and Datasets

Text Summarization

Open-domain Dialogue System

Controllable Story Generation

Data-to-Text

Style Transfer (Paraphrasing & Simplification & formal-informal)

LLMs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages