View Presentation Here!

Ultimate Recipe Recommender

Matteo Fortier

Abstract

The goal of this project was to explore wheter unsupervised learning could group recipes together based on ingredients and produce a recommendation system based on the topics. This is important to investigate from the point of view of meal kit delivery companies who may want to increase customer retention by providing customers recipes they are likely to enjoy. The Recipes1M+ dataset provided by MIT was used for this project. NLP and topic modelling was used to form the recommendation system. A prototype was deployed on streamlit.

Design

Meal kit delivery companies such as Blue Apron, Hello Fresh and plated have all been experiencing poor customer retention. Analysts estimate a churn rate of 70% after 12 months. Source

According to a survey of people who have stopped using meal kits, the biggest reason for churn is value for money. However, 32% of respondents claimed reasons related to the food and recipes themselves, such as 'flavour of the finished recipe', 'ability to choose meals on diet', or 'difficulty level of recipe'. Hence, having a recipe recommender system based on recipes that users have previously liked, could significantly impact the retention rate of customers.

Data

The Recipes1M+ dataset provided by MIT was used for this project. The dataset includes information on recipes including their title, instructions, ingredients and url. Instructions and ingredients were provided in a list of strings format, neither in a standardised format as all recipes were scrapped from various recipe websites. Hence, a significant amount of preprocessing and nlp had to be done on the dataset.

The dataset included 1 million recipes. The large dataset meant cloud computing tools such as google cloud platform had to be used to more easily process the data. Additionally parallel processing tools such as spacy's nlp.pipe and swifter (dask pandas) had to be used to more quickly process data.

Algorithms

Natural Language Processing

Per-ingredient processing (get a single ingredient token per string in list, i.e red bell peppers -> RedBellPeppers)
Per-word processing (get multiple tokens per string in list, i.e parmesan cheese -> Parmsan Cheese)
Spacy Processing to extract nouns and adjectives only from text
Spacy Processing for lemmatization
Alt. SnowballStemmer for stemming
General preprocessing steps such as removing items inside brackets, removing punctuation, etc.

Unsupervised Learning Models

Multiple models were tried. Count vectorizer and TFIDF vectorizer were both tried with TFIDF being the selected vectorizer. TFIDF was preferred as it led to better recommendations for the test set of recipes. This is probably due to TFIDF accounting for the document frequency of certain ingredients.

NMF and SVD were both tried as topic modellers, and compared on the test set. It seemed SVD performed better than NMF. This could be due to the fact that SVD may hold more information with regards to negative coefficients compared to NMF. It makes sense within the context of food as certain ingredients may contradict topics.

Model Evaluation and Selection

Models were evaluated against a test set of recipes. The test set includes a range of different recipes in terms of number of ingredients and types of cuisine. Based on intuition and exploring the recommendation outputs, models were evaluated on how well they performed on the test set.

Tools

Python, pandas, NumPy, SciPy, scikit-learn
Google Cloud Platform
Swifter (Dask Pandas)
SpaCy
SVD, NMF, TFIDF Vectorizer
Streamlit
WordCloud

Communication

The project used powerpoint for the presentation and the python visualisation libraries for the visuals.

WordCloud was used to generate a word cloud

Streamlit was used to deploy a prototype application for the recommender.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
final		final
images		images
.DS_Store		.DS_Store
.gitignore		.gitignore
1_recipe_list.ipynb		1_recipe_list.ipynb
README.md		README.md
Untitled1.ipynb		Untitled1.ipynb
_mvp.md		_mvp.md
_proposal.md		_proposal.md
_writeup.md		_writeup.md
ingredients_modelling.ipynb		ingredients_modelling.ipynb
ingredients_processing.ipynb		ingredients_processing.ipynb
ingredients_spacy.ipynb		ingredients_spacy.ipynb
st_app.py		st_app.py
stopwords.ipynb		stopwords.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

View Presentation Here!

Ultimate Recipe Recommender

Abstract

Design

Data

Algorithms

Tools

Communication

About

Releases

Packages

Languages

matteofortier/NLP_PROJECT

Folders and files

Latest commit

History

Repository files navigation

View Presentation Here!

Ultimate Recipe Recommender

Abstract

Design

Data

Algorithms

Tools

Communication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages