GitHub - Anshumaan-Chauhan02/Recipe-Infusion: Persona Styled Recipes are generated given a list of ingredients

Project Description

This project introduces Recipe Infusion, a framework designed to generate style-infused recipes. The framework consists of two main components: Recipe Generation and Style Infusion. In the Recipe Generation component, a distilgpt2 model is fine-tuned on a processed custom dataset. This dataset is created by combining RecipeBox and RecipeNLG data sources. The fine-tuned distilgpt2 model demonstrates the ability to generate coherent and sensible recipes. Moving on to the Style Infusion component, the project focuses on fine-tuning a conditional generation model called T5 small for the purpose of style transfer. Due to the unavailability of parallel datasets specific to the selected celebrities' styles, the project utilizes back translation as an approach to create a parallel dataset. This parallel dataset is generated by translating styled sentences back and forth between languages. The resulting parallel dataset is then used to train the T5 model. Once trained, the T5 model is employed to perform style transfer on the generated recipes. By leveraging the learned style representations, the framework enables the infusion of different styles into the recipe content, providing users with recipe variations that reflect specific styles associated with the selected celebrities or other sources. Overall, the Recipe Infusion framework offers a comprehensive approach to generating style-infused recipes, combining both recipe generation and style transfer techniques. The project's results demonstrate the effectiveness of the approach and its potential to enhance recipe personalization and creativity.

Dataset Information

Recipe Generation
- RecipeNLG : https://recipenlg.cs.put.poznan.pl/
- RecipeBox : https://eightportions.com/datasets/Recipes/
Text Style Transfer
- William Shakespeare :
  - Translations of Shakespeare plays to Modern English
  - https://www.kaggle.com/datasets/garnavaurha/shakespearify
- Taylor Swift :
  - Taylor Swift Song Lyrics
  - https://www.kaggle.com/datasets/PromptCloudHQ/taylor-swift-song-lyrics-from-all-the-albums
- Donald Trump :
  - Donal Trump tweets through June 2020
  - https://www.kaggle.com/datasets/austinreese/trump-tweets
- Michael Scott :
  - Complete script of The Office
  - https://www.kaggle.com/datasets/nasirkhalid24/the-office-us-complete-dialoguetranscript

Dependencies

Numpy : Perform several mathematical evaluations in the preprocessing of the datasets

pip install numpy
Pandas : Loading/Processing/Storing of the different datasets

pip install pandas
Itertools : Easy iteration of large lists

pip install itertools
Sklearn : Cosine Similarity and TF-IDF

pip install sklearn
Transformers : DistilGPT2, T5-small, MarianMT (both model and tokenizers)

pip install transformers
SentencePiece : Used by MarianMT's tokenizer (Back Translation)

pip install sentencepiece
Evaluate : BLEU Score evaluation

pip install evaluate
Matplotlib: Plotting of the training curves

pip install matplotlib

Files

RecipeDataset.ipynb :
- Loading of both Recipes datasets
- Preprocessing datasets to get into a common format
- Performing statistical analysis on the data
- Storing the final concatenated dataset
Statistics.ipynb :
- Statistical analysis on the preprocessed datasets and the final concatenated dataset
Recipe_Generation_DistilGPT.ipynb :
- Loading of the final recipe dataset
- Data Preparation of the final dataset
- Training of DistilGPT2 Model
- Testing of the Finetuned (FT) model and baseline model
- Evaluation of the models - BLEU Score and Perplexity
- Generation of Recipe dataset for Style Transfer
- Error Analysis on Adversarial inputs
Preprocess_TST_dataset.ipynb :
- Loading the non-parallel data - Taylor and Trump
- Preprocess the datasets
- Extract statistical info about the dataset
Shakespeare_and_Scripts_Preprocessing.ipynb :
- Loading the non-parallel data - Michael
- Load the parallel data - Shakespeare
- Preprocess the datasets
- Extract some statistical info about the dataset
BackTranslation.ipynb :
- Load the MarianMT models for Fr-En and En-Fr
- Perform back translation to generate synthetic parallel data - Michael, Taylor and Trump
- Store the parallel dataset
TST_Architecture.ipynb :
- Load all the parallel datasets
- Finetune a different T5-small model on each dataset
- Generate styled recipes - Sentence-wise and Entire Recipe
- Test the performance (Human Evaluation) on the styled recipes (Sentence-wise)
- Check for style infusion on random sentences
Supplementary/Adversarial Inputs.xlsx
- Adversarial Examples to the model. Contains 120 examples for which model's output differs from the expected behavior and is of low quality
Supplementary/Sentence_Styled_Recipes.xlsx
- Human Evaluations on the Styled Recipes generated by the Fine tuned T5 model

How to Run

Except training (due to computational limitations) of the LLMs all of the code was implemented in Google Colab. We have listed the steps that needed to be followed for a successful implementation of the project.

Download all the .ipynb files and upload them in a new folder on Google Drive named 'Project 685'
Download all the Recipe Datasets and add to top level folder 'Project 685'
Run RecipeDataset.ipynb to get the 'Final_dataset' file, which consists of the preprocessed concatenated dataset
Run Statistics.ipynb file to display some statistics about the datasets [OPTIONAL]
Run Recipe_Generation_DistilGPT.ipynb to get the finetuned recipe generation model and Recipe generations
Download the Text Style Transfer datasets. Create a new sub-folder {persona}_TST. (ex. Taylor_TST)
Upload the .zip datasets for Taylor and Trump in their respective sub-folders. For Shakespeare and Michael add unzipped .csv files to top level folder
Run the Preprocess_TST_dataset.ipynb and Shakespeare_and_Scripts_Preprocessing.ipynb to get the appropriate formatted dataset for Back translation
Run BackTranslation.ipynb to get a parallel dataset for Taylor, Trump and Michael
Run TST_Architecture.ipynb file to get the finetuned TST models and generate final outputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Table of Contents

Dataset Information

Dependencies

Files

How to Run

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Supplementary		Supplementary
BackTranslation.ipynb		BackTranslation.ipynb
LICENSE		LICENSE
Preprocess_TST_dataset.ipynb		Preprocess_TST_dataset.ipynb
README.md		README.md
RecipeDataset.ipynb		RecipeDataset.ipynb
Recipe_Generation_DistilGPT.ipynb		Recipe_Generation_DistilGPT.ipynb
Shakespeare_and_Scripts_Preprocessing.ipynb		Shakespeare_and_Scripts_Preprocessing.ipynb
Statistics.ipynb		Statistics.ipynb
TST_Architecture.ipynb		TST_Architecture.ipynb

License

Anshumaan-Chauhan02/Recipe-Infusion

Folders and files

Latest commit

History

Repository files navigation

Project Description

Table of Contents

Dataset Information

Dependencies

Files

How to Run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages