Skip to content

Latest commit

 

History

History
81 lines (55 loc) · 4.11 KB

README.md

File metadata and controls

81 lines (55 loc) · 4.11 KB

The stock scrapper behind StockStalker

StockStalker - Stock Scraper

Important Files & Folders

The structure of Stock Scraper is intented to be fairly straightforward. If you clone this repository in order to run the scripts:

git clone https://github.com/Stock-Stalker/stock-scraper

You'll have access to two main sub folders:

data and scripts

As the names suggest, data should be written to the data folder, and scripts should be run from the scripts folder.

Data

While you're welcome to write new data to the data folder, there are a few existing files already which you can use to run the scripts.

djia_news.csv contains scraped and labeled headlines for the top thirty companies on the Dow Jones Industrial Average index. This is a portion of the full dataset used to train the initial StockStalker algorithm, available in full on Kaggle. This file contains nearly 2500 data points.

dow.csv contains ticker symbols and full company names of the top 30 Dow Jones Industrial Average index companies.

nasdaqlisted.csv contains ticker symbols and full company names of the nearly 3,000 NASDAQ index companies.

nasdaq_news.csv contains scraped and labeled headlines for the nearly 3,000 NASDAW index companies. This is the other (larger) portion of the full dataset available on Kaggle (also linked above). This file contains about 13,400 data points.

Scripts

Right now, the script that we've made available to easily call is the get_data script. All other functions are easily callable with minimal adjustments. The purpose of this service is largely for us to obtain training data, so minimal effort was put into making any of the other scraping functions flexible and callable on their own (although, it would take an extraordinarily minimal amount of work to adjust this to your need).

In order to call get_data, first you'll want to open the scripts folder, and take a look at the function call at the bottom of get_data.py. Ensure that the filepath being passed into the function call is the correct file that you'd like to write. Please also check out the get_data() function and ensure that you're reading from the .csv file that you'd like to be reading from.

Once you ensure that all settings are to your liking, you can run the following from terminal:

If you aren't already in the StockScraper directory:

cd stock-scraper

Then,

python3 -m scripts.get_data

As this runs, anytime it enters its except block, it will print a count of errors encountered in the format:

DataError: 19

The news_fetchers utils that get_data() is dependent on will also print a message when they encounter an except block. In this way, you're able to see output that allows you to anticipate your eventual output in a somewhat rudimentary way, and execution of the script will not stop because the functions encounter empty strings or bad inputs.

Once again, this could be easily manipulated into a slightly easier-to-configure CLI tool if you'd like.