pscraper

Overview

The goal of the project is to understand how the supply-side of vehicle availability affects the electric vehicle market. It tracks all vehicles that are available for sale at online marketplaces(Cars.com, Autotrader) across the country through a web scraper that takes advantage of publicly available data.

The main variables that are tracked are:

Variable	Description
`first_date`	Date this vehicle was first available for sale
`last_date`	Date this vehicle was last available for sale
`duration`	Number of days this vehicle was available for sale
`price`	Price of the vehicle, all price changes are tracked
`seller_id`	Seller of the car, all seller changes are tracked

Components

There are 4 main project components

Django Backend Application

This is a backend app written in Python3.7 using Django3 and Django Rest Framework. It provides access to a MySQL database that contains the data, through 3 main API endpoints. The backend app is hosted on Heroku and the MySQL database is hosted on Cloud SQL.

It has 3 main models, vehicle, seller and history.

The vehicle model has the most up to date information on the vechile such as VIN, Make, Model, Price, etc.
The seller model has infomation on the sellers such as name, address, etc.
The history model is used to track changes in price and seller.

For more information on this project see pscraper-db

Python Library

This is a python library with three packages:

`pscraper.scraper`

Main package of the library that, among others, provides one main API, scrape, which performs the scraping process following the given parameters and saves it to the database. Every vehicle is processed in a sequential manner, and there is work in progress to change that to a multithreading approach for performance improvement.

The marketplace where data is scraped from are: Cars.com, and AutoTrader.

`pscraper.api`

This package has APIs that interact with the endpoints provided by the backend application. It is used to create, update or retrieve records on a vehicle or seller.

`pscraper.utils`

Provides several miscellaneous helpers used for the scraping and reporting process..

Since the scraping function is designed to be autonomous, the functions are configured so that if any failure occurs a slack message will be sent to a specific channel.

For more information on this project see pscraper-lib.

Daily Scraping Tool

This project contains the script that performs web-scraping daily. At the end of the scraping process a report is build a send to a slack channel. The command to start the scheduled scraping is:

$ nohup ./scrape.py &

This is run inside a tmux session on a Google Cloud Compute Engine so that the process can run without any supervision.

For more information on this project see pscraper-tool.

Data Dashboard

WIP

A frontend dashboard, build with ReactJS. There might be more uses for it later but at the moment it's planned to be used mainly as a data visualization tool. Users are authenticated using the Django framework that's already set up, and it will have different views/layout based on the user's permissions.

For more information on this project see pscraper-dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
dealers_table.png		dealers_table.png
main_average.png		main_average.png
main_variance.png		main_variance.png
vehicle_table.png		vehicle_table.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pscraper

Overview

Components

Django Backend Application

Python Library

`pscraper.scraper`

`pscraper.api`

`pscraper.utils`

Daily Scraping Tool

Data Dashboard

Screenshots:

About

Releases

Packages

eneakllomollari/pscraper

Folders and files

Latest commit

History

Repository files navigation

pscraper

Overview

Components

Django Backend Application

Python Library

pscraper.scraper

pscraper.api

pscraper.utils

Daily Scraping Tool

Data Dashboard

Screenshots:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

`pscraper.scraper`

`pscraper.api`

`pscraper.utils`

Packages