A project analyzing public comments hosted on regulations.gov about the regression of Title IX by United States Secretary of Education Betsy DeVos.
This project is in conjunction with Honors College Assistant Dean Sarah Cook, Ph.D. and Rebecca Wilson, M.A. at Georgia State University.
This research project has its data analysis split into 4 different sections. These sections are runnable code in the form of Jupyter Python notebooks, and this repository hosts that code. The code can be inspected in a web browser by clicking on one of the .ipynb files.
4 Jupyter Python notebooks:
- [Data Cleanup](Data Cleanup.ipynb)
- [Download Attachments](Download Attachments.ipynb)
- [Meta Analysis](Meta Analysis.ipynb)
- [Random Samples](Random Samples.ipynb)
Each .ipynb file can be viewed by clicking the link in the files directory above. Each file is a step in the analysis of the 16,000+ comments regarding the Title IX regression and includes an introduction and comments along with the code explaining each operation and its significance.
The comments were scraped from the regulations.gov website using webscraper.io and a sitemap similar to this one.
See issue #1 for more details.
The data folder includes:
- db.json: The original database exported from the couchDB instance where web scraping results were stored.
- db2.json: A copy of db.json with the author's name, state, and zipcode extracted from the comment body and placed into their respective fields.
- Created by running
./extract_location.sh > db2.json
- Created by running
- db3.json: A copy of db2.json with exact duplicate copies of comments removed. The script removes every other comment because the cloudDB instance held two copies of every comment next to each other.
- Created by running
./remove_duplicates.sh > db3.json
- Created by running
- Data Cleanup.ipynb
- Creates two new datasets in /data:
has_attachment
: contains comments which have attachments.no_attachment
: contains comments with no attachment.
- Download Attachments.ipynb
- Creates the
/attachments
folder which holds every attachment in the call for comments. Uses thehas_attachment
file.
- Meta Analysis.ipynb
- Performs an analysis of the
has_attachment
andno_attachment
comments.- Generates a heat map of each state from which comments were posted and a table of the most commen zip codes for each of the datasets.
- Random Samples.ipynb
- Creates samples in the /samples folder used for manual data comparison.
The repository of files can be downloaded from GitHub with the "Download as .zip" file above.
- Python3
- Pip
- Virtualenv
After installing the required packages, set up the environment: (These commands can be copied into a terminal)
virtualenv env #create a new virtualenv
source env/bin/activate #activate the virtualenv
pip install -r requirements.txt #install librarires
jupyter notebook #open a notebook
Versions in the requirements.txt
file.
This project is licensed under the MIT License - see the LICENSE file for details.