eda_*
- Files that perform exploratory data analysismodel_*
- Files implementing a specific modeldata/download.py
- interactive data downloading tool from Amazon reviews dataset sourcedata/merge_datasets.py
- combine select .json.gz files to combined.csvdata/combined.csv
- combination of Fashion, Beauty, Appliances, Arts and crafts, Musical instruments, and Software product categoriesmodels/
- pickle files for models, vectorizersresults/
- results of eda, testing, modelling, etcrequirements.txt
- installs all required libraries, except cuML and cuDF for Google Colab
model_logreg.pkl
- Training data was based on the baseline reviewTextSummary, transformed by a TfidfVectorizer(). Running the notebook should properly create the test data.model_svm_clf.pkl
andmodle_svm_vectorizer.pkl
- Running themodel_svm_final.ipynb
notebook on Google Colab in order will run until the pickle file as long as it is found on the specified path- need to change the
work_path
to appropriate value
- need to change the
Data Analytics Processes Project. Jan 2024 - May 2024