Skip to content

Patrick-Ni/Evaluation_of_LLMs_on_NLG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 

Repository files navigation

Evaluation_of_LLMs_on_NLG

Evaluation of LLMs on NLG benchmarks.

What is NLG?

Natural Language Generation (NLG) is the process of producing a natural language text to meet specified communicative goals. The texts that are generated may range from a single phrase given in answer to a question, through multi-sentence remarks and questions within a dialog, to full-page explanations.

Tasks and Datasets

Text Summarization

Language Dataset Test Input Test Output Train Input Train Output Data
English CNN/DailyMail 3628; 175; 1088.84 1101; 12; 84.47 Paper & Data
XSum (Extreme summarization) 18962; 95; 661.01 107; 4; 31.63 23397; 95; 656.37 120; 2; 31.68 Paper & Data
Chances LCSTS 786; 191; 329.88 72; 6; 32.20 2736; 171; 329.37 107; 4; 32.23 --原始数据 处理后数据 提取码:duba

Open-domain Dialogue System

Language Dataset Test Input Test Output Train Input Train Output Data
English DailyDialog 663, 129, 246.57 max:246, 3, 16.55 1099, 128, 247.60 313, 2, 16.19 Paper & Data
PersonaChat 529, 273, 372.28 32, 5, 13.53 Paper & Data
Empathatic Dialogues 380, 148, 201.64 110, 2, 17.85 400, 146, 193.13 135 2, 16.80 Paper & Data
Chinese 清华LCCC 848, 214, 264.73 347, 3, 28.27 1666, 213, 278.59 314, 2, 26.98 Paper & Data

Controllable Story Generation

Language Dataset Test Input Test Output Train Input Train Output Paper
English ROCStories 203; 115; 140.44 34; 4; 12.76 182; 115; 140.56 29; 4; 12.74 Paper & Data
WritingPrompts 179; 95; 125.64 2912; 132; 793.14 208; 95; 126.23 8308; 115; 789.08 Paper & Data
Chinese LOT 329; 215; 254.73 568; 112; 275.07 324; 207; 253.90 624; 113; 273.10 Paper & Data

Data-to-Text

Language Dataset Test Input Test Output Train Input Train Output Paper
English WebNLG 324; 156; 201.68 158; 7; 38.55 Paper & Data
Rotowire 813; 177; 434.31 998; 191; 460.73 Paper & Data
Chinese Advertisement 453; 222; 288 364; 93; 193 Paper & Data

Style Transfer (Paraphrasing & Simplification & formal-informal)

Language Dataset Paper
English Yelp Paper & Data
Chinese (Formal-Informal) CLSD Paper & Data
English (Paraphrasing) ParaNMT [Paper](ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations) & Data
English (Sentence Simplification) WikiLarge [Paper](Sentence Simplification with Deep Reinforcement Learning) & Data

LLMs

实验设置:Chat模型

实验目的:

  • 中英文模型对照 (llama2-7b-chat vs baichuan2 vs Qwen)
  • 模型规模(llama2-7b-chat vs llama2-13b-chat)
  • 模型架构不同 (chatglm2-6b vs llama2-7b-chat,llama2-13b-chat vs oasst-pythia-12b,flan-t5-xxl)
  • 模型微调数据集不同 (Qwen, Baichuan2, chinsese-alpaca2, vicuna)

About

Evaluation of LLMs on NLG benchmarks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published