Web Scraper built using python Scrapy

For extracting mobile phone specifications from gsmarena.com

Scraped 10144 devices and their specifications from 10144 pages containing the model specifications, 116 brand pages containing multi-sub pages for the models listing by each brand, and 1 page containing all the brands which was the entry point for the Web Spider.
Total run time of over 48 hours so as to not overload the server and prevent banning.

Overloading target server, hence used the library scrapy-rotating-proxies with a list of a number of free open proxies obtained online.
Lack of understanding in CONCURRENT_REQUESTS settings along with the proxies, so total run time could probably have been reduced.
Improper handling of target webpage html, resulting in some mismatch between column and data.

files/specs_extracted.csv - extracted specifications into separate columns
files/gsmarena_data.csv - raw data scraped directly containing the specifications as nested dictionaries under single column
files/gsmarena_brands.csv - raw data containing all the brands and number of models in each brand according to the site
analysis.ipynb - some data cleaning and extracting the specifications into columns
files/visited_models.txt - all the visited model specification urls

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
files		files
webscraper		webscraper
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
scrapy.cfg		scrapy.cfg
utils.ipynb		utils.ipynb