CS 2470 Final Project: Predicting NYT Bestseller using Natural Language Processing

Step 1: Web Scraping

Our group started out by scraping data from the web, leveraging the GoodReads, NYT API along with Amazon Books, to fetch meta information for each book. Navigate to the following notebooks for the technical implementation: Amazon_Web_Scrapper_Code.ipynb, scraper.ipynb.

The next step was to gain a deeper understanding of our dataset and to remove any features that we did not believe to be advantageous in predicting whether a book will be a bestseller or not. Refer to the following notebook for a technical implementation: EDA_Part1.ipynb.

The last step was to implement 4 models which yielded a higher test accuracy with each consecutive model. The following four models were implemented: GRU, CNN + GRU, Single-Headed Attention, Multi-Headed Attention. For a technical implementation, please refer to the following notebooks: Model_GRU_GRUCNN.ipynb and Model_Attention.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
2.0.model_only_title.ipynb		2.0.model_only_title.ipynb
3.0.model_only_title_attention.ipynb		3.0.model_only_title_attention.ipynb
Amazon_Web_Scrapper_Code.ipynb		Amazon_Web_Scrapper_Code.ipynb
EDA_Part1.ipynb		EDA_Part1.ipynb
Model_Attention.ipynb		Model_Attention.ipynb
Model_GRU_GRUCNN.ipynb		Model_GRU_GRUCNN.ipynb
NYT bestseller classification.mp4		NYT bestseller classification.mp4
README.md		README.md
Scraper.ipynb		Scraper.ipynb
books_datasetv2.csv		books_datasetv2.csv
requirements.txt		requirements.txt