Final project for INFO-664: Programming for Cultural Heritage
By Emma Powell, Pratt MSLIS '24
This project looks to explore representation on Sesame Street through the guest stars over time. The data captures demographic information relating to gender, ethnicity, and sexuality, as well as further information on occupation, religion, military branch, and more. All of the data is collected through crowdsourced information hubs of Muppet Wiki, Wikipedia, IMDb, and Wikidata, making it less thorough than ideal.
You can view the visualization here.
sesame-street-guest-stars/data
Data is collected from scraping Muppet Wiki and Wikipedia, the Cinemagoer Python package, and Wikidata API.
.../sesame_guest_stars.csv
CSV file containing list of names from Muppet Wiki, corresponding identifiers from Wikipedia, and character names from IMDb.
.../sesame_guest_stars_revised_csv.txt
Cleaned data from sesame_guest_stars_updated.csv
through Open Refine, now stored as CSV data in a TXT file.
.../sesame_guest_stars_updated.csv
CSV file containing all data pulled from Wikidata API alongside data from sesame_guest_stars.csv
.
sesame-street-guest-stars/scripts
Group of Python scripts used to find all necessary data. They are listed below in the correct order to run, as the scripts build off each other.
.../muppet_scraping_v1.py
A script to build the base list of guest stars. Scrapes Muppet Wiki for Seasons 1-39, 46, and 48. Requests name of guest star, respective season, and Wikipedia link. (Note: At time of creation, Seasons 40-45 list guest stars in a different structure and Seasons 47, 49-54 do not have a full list of guest stars)
.../muppet_scraping_v2.py
Builds of base list of guest stars with Seasons 40-45 due to different structure.
.../wikipedia_scraping.py
Scrapes Wikipedia using Wikipedia links from Muppet Wiki scrape. Finds Wikidata QIDs and IMDb IDs.
.../get_imdb.py
Uses Cinemagoer and IMDb ID to find character name on Sesame Street for guest stars.
.../get_wikidata.py
Requests information from Wikidata with Wikidata QID for the following properties: instance of, gender, occupation, creator, from narrative universe, country of citizenship, sexual orientation, ethnic group, religion, convicted of, military branch, country of origin, has part(s).