Course Project in “Data Mining and Exploration”
There are three sections of code in our project: Section1 EDA, the file name is 'section1_EDA.ipynb' Section2 cuisine prediction, the file name is 'section2_Cuisine_prediction.ipynb' Section3 collaborative filtering corresponds to the part2, part3, and part4 in our report respectively. And the files are 'cf_module.py' and 'section3_CF.py'. (The supporting functions for section3 are included in cf_module.py.)
instructions:
- Put the data under '042'
- Setup a python3 virtual env and install the following packages:
- Scipy: 1.5.4
- Numpy: 1.19.5
- Pandas 1.1.5
- Matplotlib: 3.3.4
- seaborn 0.10.0
- sklearn: 0.24.1
- xgboost 1.19.2
For Section1 EDA and Section2 cuisine prediction:
- Start a server and run the notebooks 'section1_EDA.ipynb' and 'section2_Cuisine_prediction.ipynb'
- View the results in the notebook and the generated files in '042'
For section3 collaborative filtering:
- Make sure that section3_CF.py, cf_module.py and the datafile recipes.csv are in the same folder.
- Run section3_CF.py, who will import functions from cf_module.py.
- The results can be seen by checking the corresponding variable in the environment.
"DME_Report.pdf" is the final report, and its LaTeX source code is all included in the report folder.