Dinner Date with Data: Recommending Recipes with PySpark

Overview

This project aims to recommend recipes to users based on their preferences using PySpark and GPU acceleration. By leveraging PySpark's distributed computing capabilities and GPU-accelerated libraries like cuDF and cuML, the system can efficiently process large-scale data and perform similarity calculations for recipe recommendations.

Contributors

Ahmed
Ibtehaj
Maira

Installation

To run the project, follow these steps:

Install necessary packages:

!pip install pyspark
!pip install cudf-cu12 --extra-index-url=https://pypi.nvidia.com
!pip install cuml-cu12 --extra-index-url=https://pypi.nvidia.com --no-cache-dir

Install additional libraries:

!pip install termcolor nltk wordcloud numpy matplotlib seaborn pandas tqdm

Download NLTK data:
```
import nltk
nltk.download('wordnet')
```

Usage

Setting up Spark Session: Initialize a Spark session to start working with Spark dataframes.
Loading Data: Load the recipe data into a Spark dataframe from a CSV file.
Text Cleaning: Perform text cleaning on the ingredients column to remove numbers and punctuation.
Word Cloud and Word Frequency Analysis: Visualize the most common ingredients using word cloud and word frequency charts.
Text Preprocessing Pipeline: Build a text preprocessing pipeline including tokenization, stop word removal, and lemmatization.
GPU Setup: Check GPU availability and setup for accelerated processing.
Cosine Similarity Calculation: Calculate cosine similarity between recipes using GPU acceleration.
Recommendation Generation: Generate recipe recommendations based on similarity scores.
Ending Spark Session: Stop the Spark session to release resources.

Features

Efficient handling of large-scale data using PySpark.
GPU-accelerated processing for faster similarity calculations.
Text preprocessing pipeline for cleaning and tokenizing recipe ingredients.
Visualization of ingredient frequencies using word cloud and word frequency charts.
On-the-fly similarity calculations for real-time recommendation generation.

Ending Notes

The project showcases the use of PySpark and GPU acceleration for recipe recommendations, providing insights into the text processing pipeline and similarity calculations. It suggests potential improvements for model persistence, hyperparameter tuning, user feedback integration, and scalability. Overall, it's a well-structured project with the potential for further enhancements.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
recipes.csv		recipes.csv
recipes_recommender.ipynb		recipes_recommender.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dinner Date with Data: Recommending Recipes with PySpark

Overview

Contributors

Installation

Usage

Features

Ending Notes

License

About

Releases

Packages

Languages

License

ahmedembeddedxx/dinner-date-with-data

Folders and files

Latest commit

History

Repository files navigation

Dinner Date with Data: Recommending Recipes with PySpark

Overview

Contributors

Installation

Usage

Features

Ending Notes

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages