This is an archive of code which was used to produce dataset and results available in our INLG 2020 paper: RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation
The dataset we publish contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.
Please visit the website of our project: recipenlg.cs.put.poznan.pl to download it.
NOTE: The dataset contains all the data we gathered including from other datasets. To access only our gathered recipes (with no 12 instead of 1/2 etc), filter the dataset for source=Gathered. It results in approx 1.6M recipes of better quality.
Use the following BibTeX entry:
@inproceedings{bien-etal-2020-recipenlg,
title = "{R}ecipe{NLG}: A Cooking Recipes Dataset for Semi-Structured Text Generation",
author = "Bie{\'n}, Micha{\l} and
Gilski, Micha{\l} and
Maciejewska, Martyna and
Taisner, Wojciech and
Wisniewski, Dawid and
Lawrynowicz, Agnieszka",
booktitle = "Proceedings of the 13th International Conference on Natural Language Generation",
month = dec,
year = "2020",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.inlg-1.4",
pages = "22--28",
}
The pyTorch model is available in HuggingFace model hub as mbien/recipenlg. You can therefore easily import it into your solution as follows:
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mbien/recipenlg")
model = AutoModelWithLMHead.from_pretrained("mbien/recipenlg")
You can also check the generation performance interactively on our website (link above).
The SpaCy NER model is available in the ner
directory
Yes, sure! If you feel some information is missing in our paper, please check first in our thesis, which is much more detailed. In case of further questions, you're invited to send us a github issue, we will respond as fast as we can!
We worked on the project interactively, and our core result is a new dataset. That's why the repo is rather a set of loosely connected python files and jupyter notebooks than a working runnable solution itself. However if you feel some part crucial for the reproduction is missing or you are dedicated to make the experience smoother, send us a feature request or (preferably), a pull request.