Skip to content

Set of scripts and notebooks used to produce results visible in RecipeNLG paper

Notifications You must be signed in to change notification settings

Glorf/recipenlg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation

This is an archive of code which was used to produce dataset and results available in our INLG 2020 paper: RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation

What's exciting about it?

The dataset we publish contains 2231142 cooking recipes (>2 millions). It's processed in more careful way and provides more samples than any other dataset in the area.

Where is the dataset?

Please visit the website of our project: recipenlg.cs.put.poznan.pl to download it.
NOTE: The dataset contains all the data we gathered including from other datasets. To access only our gathered recipes (with no 12 instead of 1/2 etc), filter the dataset for source=Gathered. It results in approx 1.6M recipes of better quality.

I've used the dataset in my research. How to cite you?

Use the following BibTeX entry:

@inproceedings{bien-etal-2020-recipenlg,
    title = "{R}ecipe{NLG}: A Cooking Recipes Dataset for Semi-Structured Text Generation",
    author = "Bie{\'n}, Micha{\l}  and
      Gilski, Micha{\l}  and
      Maciejewska, Martyna  and
      Taisner, Wojciech  and
      Wisniewski, Dawid  and
      Lawrynowicz, Agnieszka",
    booktitle = "Proceedings of the 13th International Conference on Natural Language Generation",
    month = dec,
    year = "2020",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.inlg-1.4",
    pages = "22--28",
}

Where are your models?

The pyTorch model is available in HuggingFace model hub as mbien/recipenlg. You can therefore easily import it into your solution as follows:

from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mbien/recipenlg")
model = AutoModelWithLMHead.from_pretrained("mbien/recipenlg")

You can also check the generation performance interactively on our website (link above).
The SpaCy NER model is available in the ner directory

Could you explain X and Y?

Yes, sure! If you feel some information is missing in our paper, please check first in our thesis, which is much more detailed. In case of further questions, you're invited to send us a github issue, we will respond as fast as we can!

How to run the code?

We worked on the project interactively, and our core result is a new dataset. That's why the repo is rather a set of loosely connected python files and jupyter notebooks than a working runnable solution itself. However if you feel some part crucial for the reproduction is missing or you are dedicated to make the experience smoother, send us a feature request or (preferably), a pull request.

About

Set of scripts and notebooks used to produce results visible in RecipeNLG paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published