This program scrapes Mars data from the following websites in order to create a fact sheet with Mars images as a local web page.
- Nasa Mars news
- Jet Propulsion Labs
- Twitter account with Mars weather
- Mars Facts
- USGS Astrogeology site for images of Mars' Hemispheres
Note: Twitter disable access to tweet texts, so I used Tweepy and the Twitter api to get the weather. Technically not scraping.
(Written in python)
- app.py is the flask application server
- scrape_mars.py is the file converted from the jupyter notebook to scrape the data
- mission_to_mars.ipynb is the jupyter notebook
- template\index.html is the home page for the server
- chromedrive.exe -- essential for scraping the data
- MongoDB - Database is called mission_to_mars, Collection is called mars_info. Every time new data is scraped, the old database is dropped and a new one is created.
-
Prerequisites:
- All files listed above - download to a new subdirectory
- Python
- Chromedrive (goes to target website to scrape data)
- Tweepy (to get data from Twitter api)
- Pandas
- Beautiful Soup (tool for parsing scraped data)
- requests
- splinter/browser (used with Chromedrive)
- re
- A config.py file with your Twitter authorization credentials
- PyMongo (to interact with MongoDB)
- Jupyter Notebook (If you want to run pieces of code in mission_to_mars.ipynb to see how things work)
- Chrome browser
- A twitter account with access credentials for the tweets you want
- Flask (to create a web server to host your web page)
-
Run the application:
- Go to a terminal application, such as git bash
- Go to the subdirectory containing your Mars scraping files
- At the command line, run the flask application python app.py
- In the chrome browser, go to http://127.0.0.1:5000/ The Mars Facts page appears.
- Press "Scrape New Data" and the program will go get the latest info. from the Mars websites.