A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models (ACM Computing Surveys)

Abstract

Figurative language generation (FLG) is the task of reformulating a given text to include a desired figure of speech, such as a hyperbole, a simile, and several others, while still being faithful to the original context. This is a fundamental, yet challenging task in Natural Language Processing (NLP), which has recently received increased attention due to the promising performance brought by pre-trained language models. Our survey provides a systematic overview of the development of FLG, mostly in English, starting with the description of some common figures of speech, their corresponding generation tasks and datasets. We then focus on various modelling approaches and assessment strategies, leading us to discussing some challenges in this field, and suggesting some potential directions for future research. To the best of our knowledge, this is the first survey that summarizes the progress of FLG including the most recent development in NLP. We also organize corresponding resources, e.g., paper lists and datasets, and make them accessible in an open repository. We hope this survey can help researchers in NLP and related fields to easily track the academic frontier, providing them with a landscape and a roadmap of this area.

Survey Overview

Datasets & Benchmarks

Figure of speech	Task	Dataset	train	Valid	Test	Lang	Para
Simile	Literal↔Simile	Data	82,687	5,145	150	en	✓
	Simile↔Context	Data	5.4M	2,500	2,500	zh	✓
	Narrative+Simile→Text	Data	3,100	376	1,520	en	✓
	Concept→Analogy + Explanation	Data	-	-	148	en	✓
Metaphor	Literal↔Metaphor	Data	260k	15,833	250	en	✓
		Data	90k	3,498	150	en	✓
		Data	248k	-	150	en	✓
		Data	-	-	171	en	✓
		CMC	3,554/2,703	-	-	zh	✗
Hyperbole	Literal↔Hyperbole	Paper	709	-	-	en	✓
		HYPO-cn	2,082/2,680	-	-	zh	✗
		HYPO-red	2,163/1,167	-	-	en	✗
		HYPO-XL	-/17,862	-	-	en	✗
Idiom	Idiom↔Literal	Paper	88	-	84	en	✓
	Idiom (en)↔Literal (de)	Data	1,998	-	1,500	en/de	✓
	Idiom (de)↔Literal (en)	Data	1,848	-	1,500	de/en	✓
	Literal↔Idiom	PIE	3,784	876	876	en	✓
	Narrative+Idiom→Text	Data	3,204	355	1,542	en	✓
Irony (Sarcasm)	Literal↔Irony (Sarcasm)	Data	2,400	300	300	en	✓
		Data	-	-	203	en	✓
		Data	112k/262k	-	-	en	✗
		Data	4,762	-	-	en	✓
Pun	Word senses→Pun	Data	1,274	-	-	en	✓
Pun	Context→Pun	Data	2,753	-	-	en	✓
Personification	Topic→Personification	Data	67,441	3,747	3,747	zh	✓

Modelling Approaches

We review the modelling approaches, from traditional to state-of-the-art, and divide them into two categories: knowledge-based and neural-based approaches.

Knowledge-based Approaches
Subcategory	Paper	Code	Form	Venue	Pros and Cons
Rule and template	Abe et al.	-	Metaphor	CSS 2006	Pros: - Intuitive and simple - Tailored to specific forms Cons: - Poor flexibility and diversity
	Terai et al.	-	Metaphor	ICANN 2010
	Joshi et al.	Code	Sarcasm	WISDOM 2015
	Veale et al.	-	Metaphor	Metaphor WS 2016
Knowledge resource	Pereira et al.	-	Metaphor	AAAI WS 2006	Pros: - Exploiting knowledge resource - High interpretability Cons: - Prior linguistic knowledge - Construct desired resources
	Veale et al.	-	Metaphor	COLING 2008
	Petrović et al.	-	Pun	ACL 2013
	Hong et al.	-	Pun	CALC 2009
	Shutova et al.	-	Metaphor	NAACL 2010
	Valitutti et al.	-	Pun	ACL 2013
	Liu et al.	-	Idiom	NAACL 2016
	Gero et al.	-	Metaphor	CHI 2019
	Stowe et al.	-	Metaphor	ACL 2021
	Hervas et al.	-	Metaphor	MICAI 2007
	Ovchinnikova et al.	-	Metaphor	Arxiv 2014
	Harmon et al.	-	Simile	ICCC 2015
Neural-based Approaches
Subcategory	Paper	Code	Form	Venue	Pros and Cons
Training from scratch	Peled et al.	Code	Sarcasm	ACL 2017	Pros: - Straightforward - Combine retrieval approaches Cons: - Large-scale training data - Large computational resources
	Fadaee et al.	Code	Idiom	LREC 2018
	Liu et al.	Code	Metaphor/ Personification	ACL 2019
	Stowe et al.	Code	Metaphor	CoNLL 2021
	Yu et al.	-	Pun	ACL 2018
	Yu et al.	Code	Metaphor	NAACL 2019
	Li et al.	Code	Metaphor	INLG 2022
	He et al.	Code	Pun	NAACL 2019
	Yu et al.	Code	Pun	EMNLP 2020
	Zhou et al.	Code	Idiom	Arxiv 2021
	Zhu et al.	Code	Irony	Arxiv 2019
	Luo et al.	Code	Pun	EMNLP 2019
	Mishra et al.	Code	Sarcasm	EMNLP 2019
Fine-tuning PLMs	Zhang et al.	Code	Simile	AAAI 2021	Pros: - Straightforward -Pre-trained knowledge - State-of-the-art results Cons: - Large computational resources
	Zhou et al.	Code	Idiom	AAAI 2022
	Zhang et al.	Code	Hyperbole	NAACL 2022
	Chakrabarty et al.	Code	Simile	EMNLP 2020
	Stowe et al.	Code	Metaphor	ACL 2021
	Chakrabarty, et al.	Code	Metaphor	NAACL 2021
	Stowe et al.	Code	Metaphor	CoNLL 2021
	Tian et al.	Code	hyperbole	EMNLP 2021
	Chakrabarty et al.	Code	Sarcasm	ACL 2020
	Mittal et al.	Code	Pun	NAACL 2022
	Chakrabarty et al.	Code	Idiom Simile	TACL 2022
	Tian et al.	Code	Pun	EMNLP 2022
	Lai et al.	Code	Hyperbole Sarcasm Idiom Metaphor Simile	COLING 2022
Prompt learning	Chakrabarty et al.	Code	Idiom Simile	TACL 2022	Pros: - Straightforward - A few/no labelled samples Cons: - Prompt engineering - Large computational resources
	Reif et al.	-	Metaphor	ACL 2022
	Mittal et al.	Code	Pun	NAACL 2022
	Bhavya et al.	Code	Analogy (Simile)	INLG 2022

Evaluation Methods

We review 34 papers and count automatic metrics used for the automatic evaluation and criteria set for the human evaluation in Figurative Language Generation.

Workshops

- [4th Workshop on Processing Figurative Language Processing](https://sites.google.com/view/figlang2024). 2024. - [3rd Workshop on Processing Figurative Language Processing](https://aclanthology.org/events/flp-2022/). 2022. - [2nd Workshop on Processing Figurative Language Processing](https://aclanthology.org/volumes/2020.figlang-1/). 2020. - [1st Workshop on Processing Figurative Language Processing](https://aclanthology.org/volumes/W18-09/). 2020.

Citation

@article{lai-etal-2024-agfl,
    title = "A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models",
    author = "Lai, Huiyuan and Nissim, Malvina",
    journal = {ACM Computing Surveys},
    year = {2024},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models (ACM Computing Surveys)

Abstract

Survey Overview

Datasets & Benchmarks

Modelling Approaches

Knowledge-based Approaches

Neural-based Approaches

Evaluation Methods

Workshops

Citation

About

Releases

Packages

laihuiyuan/Figurative-Language-Generation

Folders and files

Latest commit

History

Repository files navigation

A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models (ACM Computing Surveys)

Abstract

Survey Overview

Datasets & Benchmarks

Modelling Approaches

Knowledge-based Approaches

Neural-based Approaches

Evaluation Methods

Workshops

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages