A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models (ACM Computing Surveys)
Figure of speech | Task | Dataset | train | Valid | Test | Lang | Para |
---|---|---|---|---|---|---|---|
Simile | Literal↔Simile | Data | 82,687 | 5,145 | 150 | en | ✓ |
Simile↔Context | Data | 5.4M | 2,500 | 2,500 | zh | ✓ | |
Narrative+Simile→Text | Data | 3,100 | 376 | 1,520 | en | ✓ | |
Concept→Analogy + Explanation | Data | - | - | 148 | en | ✓ | |
Metaphor | Literal↔Metaphor | Data | 260k | 15,833 | 250 | en | ✓ |
Data | 90k | 3,498 | 150 | en | ✓ | ||
Data | 248k | - | 150 | en | ✓ | ||
Data | - | - | 171 | en | ✓ | ||
CMC | 3,554/2,703 | - | - | zh | ✗ | ||
Hyperbole | Literal↔Hyperbole | Paper | 709 | - | - | en | ✓ |
HYPO-cn | 2,082/2,680 | - | - | zh | ✗ | ||
HYPO-red | 2,163/1,167 | - | - | en | ✗ | ||
HYPO-XL | -/17,862 | - | - | en | ✗ | ||
Idiom | Idiom↔Literal | Paper | 88 | - | 84 | en | ✓ |
Idiom (en)↔Literal (de) | Data | 1,998 | - | 1,500 | en/de | ✓ | |
Idiom (de)↔Literal (en) | 1,848 | - | 1,500 | de/en | ✓ | ||
Literal↔Idiom | PIE | 3,784 | 876 | 876 | en | ✓ | |
Narrative+Idiom→Text | Data | 3,204 | 355 | 1,542 | en | ✓ | |
Irony (Sarcasm) | Literal↔Irony (Sarcasm) | Data | 2,400 | 300 | 300 | en | ✓ |
Data | - | - | 203 | en | ✓ | ||
Data | 112k/262k | - | - | en | ✗ | ||
Data | 4,762 | - | - | en | ✓ | ||
Pun | Word senses→Pun | Data | 1,274 | - | - | en | ✓ |
Context→Pun | Data | 2,753 | - | - | en | ✓ | |
Personification | Topic→Personification | Data | 67,441 | 3,747 | 3,747 | zh | ✓ |
We review the modelling approaches, from traditional to state-of-the-art, and divide them into two categories: knowledge-based and neural-based approaches.
Subcategory | Paper | Code | Form | Venue | Pros and Cons |
---|---|---|---|---|---|
Rule and template | Abe et al. | - | Metaphor | CSS 2006 | Pros: - Intuitive and simple - Tailored to specific forms Cons: - Poor flexibility and diversity |
Terai et al. | - | Metaphor | ICANN 2010 | ||
Joshi et al. | Code | Sarcasm | WISDOM 2015 | ||
Veale et al. | - | Metaphor | Metaphor WS 2016 | ||
Knowledge resource | Pereira et al. | - | Metaphor | AAAI WS 2006 | Pros: - Exploiting knowledge resource - High interpretability Cons: - Prior linguistic knowledge - Construct desired resources |
Veale et al. | - | Metaphor | COLING 2008 | ||
Petrović et al. | - | Pun | ACL 2013 | ||
Hong et al. | - | Pun | CALC 2009 | ||
Shutova et al. | - | Metaphor | NAACL 2010 | ||
Valitutti et al. | - | Pun | ACL 2013 | ||
Liu et al. | - | Idiom | NAACL 2016 | ||
Gero et al. | - | Metaphor | CHI 2019 | ||
Stowe et al. | - | Metaphor | ACL 2021 | ||
Hervas et al. | - | Metaphor | MICAI 2007 | ||
Ovchinnikova et al. | - | Metaphor | Arxiv 2014 | ||
Harmon et al. | - | Simile | ICCC 2015 | ||
Subcategory | Paper | Code | Form | Venue | Pros and Cons |
Training from scratch | Peled et al. | Code | Sarcasm | ACL 2017 | Pros: - Straightforward - Combine retrieval approaches Cons: - Large-scale training data - Large computational resources |
Fadaee et al. | Code | Idiom | LREC 2018 | ||
Liu et al. | Code | Metaphor/ Personification |
ACL 2019 | ||
Stowe et al. | Code | Metaphor | CoNLL 2021 | ||
Yu et al. | - | Pun | ACL 2018 | ||
Yu et al. | Code | Metaphor | NAACL 2019 | ||
Li et al. | Code | Metaphor | INLG 2022 | ||
He et al. | Code | Pun | NAACL 2019 | ||
Yu et al. | Code | Pun | EMNLP 2020 | ||
Zhou et al. | Code | Idiom | Arxiv 2021 | ||
Zhu et al. | Code | Irony | Arxiv 2019 | ||
Luo et al. | Code | Pun | EMNLP 2019 | ||
Mishra et al. | Code | Sarcasm | EMNLP 2019 | ||
Fine-tuning PLMs | Zhang et al. | Code | Simile | AAAI 2021 | Pros: - Straightforward -Pre-trained knowledge - State-of-the-art results Cons: - Large computational resources |
Zhou et al. | Code | Idiom | AAAI 2022 | ||
Zhang et al. | Code | Hyperbole | NAACL 2022 | ||
Chakrabarty et al. | Code | Simile | EMNLP 2020 | ||
Stowe et al. | Code | Metaphor | ACL 2021 | ||
Chakrabarty, et al. | Code | Metaphor | NAACL 2021 | ||
Stowe et al. | Code | Metaphor | CoNLL 2021 | ||
Tian et al. | Code | hyperbole | EMNLP 2021 | ||
Chakrabarty et al. | Code | Sarcasm | ACL 2020 | ||
Mittal et al. | Code | Pun | NAACL 2022 | ||
Chakrabarty et al. | Code | Idiom Simile |
TACL 2022 | ||
Tian et al. | Code | Pun | EMNLP 2022 | ||
Lai et al. | Code | Hyperbole Sarcasm Idiom Metaphor Simile |
COLING 2022 | ||
Prompt learning | Chakrabarty et al. | Code | Idiom Simile |
TACL 2022 | Pros: - Straightforward - A few/no labelled samples Cons: - Prompt engineering - Large computational resources |
Reif et al. | - | Metaphor | ACL 2022 | ||
Mittal et al. | Code | Pun | NAACL 2022 | ||
Bhavya et al. | Code | Analogy (Simile) |
INLG 2022 |
@article{lai-etal-2024-agfl,
title = "A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models",
author = "Lai, Huiyuan and Nissim, Malvina",
journal = {ACM Computing Surveys},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
}