The code and data for "Understanding Jargon: Combining Extraction and Generation for Definition Modeling" (EMNLP '22)
We propose to combine extraction and generation for jargon definition modeling: first extract self- and correlative definitional information of target jargon from the Web and then generate the final definitions by incorporating the extracted definitional information. Our framework is remarkably simple but effective: experiments demonstrate our method can generate high-quality definitions for jargon and outperform state-of-the-art models significantly, e.g., BLEU score from 8.76 to 22.66 and human-annotated score from 2.34 to 4.04.
Please refer to the detailed README.md
in ./extraction/
and ./generation/
Data can be downloaded from Google Drive
Stored in ./sample/generated_definition_for_cs_term.txt
The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:
@inproceedings{huang2022understanding,
title={Understanding Jargon: Combining Extraction and Generation for Definition Modeling},
author={Huang, Jie and Shao, Hanyin and Chang, Kevin Chen-Chuan and Xiong, Jinjun and Hwu, Wen-mei},
booktitle={Proceedings of EMNLP},
year={2022}
}