DataBased - A Hip-Hop Data Set

DataBased is a set of scripts that will scrape a hip-hop dataset to be used at your discretion. If you do not want to build your own set, exported collections can be found in JSON format in the Raw_JSON archive. To build the set in MongoDB, see INSTRUCTIONS.md.

#Schema

Artists
- genres (array of strings)
- related artists (array of artists with genres, names, spotify info) (max 20)
- Spotify ID (as "id")
- ID on Genius
- last.fm tags (count, url to tag, tag name)
Songs
- title
- url to lyrics on Genius
- Genius name of artist associated with song
- Genius ID of song
Lyrics
- Genius ID of song
- text
- title of song

Goodies

In the Goodies folder, you will find wordclouds generated using WordCloud.py, a graph of related artists generated in R, and samples of lyrics generated by neural networks trained on specific artists (using char-rnn)

#TO-DO:

Scrape audio-features for songs from spotify
Run my own analytics, including:
- swearing metrics for songs/artists
- unique word counts for artists for set lyric set size
- references to places
- etc...

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Goodies		Goodies
.DS_Store		.DS_Store
INSTRUCTIONS.md		INSTRUCTIONS.md
README.md		README.md
Raw_JSON.tar.gz		Raw_JSON.tar.gz
artist_finder.py		artist_finder.py
batch_genius_id_scraper		batch_genius_id_scraper
batch_lyrics_scraper		batch_lyrics_scraper
batch_song_scraper_genius		batch_song_scraper_genius
genius_id_scraper.py		genius_id_scraper.py
lyrics_scraper.py		lyrics_scraper.py
song_scraper_genius.py		song_scraper_genius.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataBased - A Hip-Hop Data Set

Goodies

About

Releases

Packages

Contributors 2

Languages

NimaBoscarino/DataBased

Folders and files

Latest commit

History

Repository files navigation

DataBased - A Hip-Hop Data Set

Goodies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages