Abstract: This is an analysis of happiness in music and the external factors that drive it.
Hypothesis: Music is a representation of our mood. Therefore when the national mood is high, people listen to happy music.
Results: Using logistic regression, I discovered a negative relationship between the consumer confidence index and when the No. 1 billboard hit is a happy song. I found that people are especially likely to listen to happy music following a national tragedy.
See this work as a presentation in slide format.
Music is part of our daily lives. We listen to it when we work out. We listen to it when we're sad. This project attempts to predict what type of popular music is listened to based on external factors. For instance, if the Consumer Confidence Index is up, are the top Billboard hits happy songs? Conversely, if the Consumer Confidence Index is down, do people stay indoors and listen to The Cure and Morrissey on repeat?
Also, why on earth was The Macarena the No. 1 Billboard hit for 14 straight weeks in 1996?
Source | Description | Format |
---|---|---|
Billboard | Billboard Top 100 List going back to 1958 | CSV |
Consumer Confidence Index | Monthly Index value going back to 1960 | CSV |
Spotify API | Song valence and energy scores | API |
MetroLyrics, songlyrics, lyricsmode | Lyrics billboard hits | Scraped |
The lyrics in pop songs are, on average, positive. The most frequenly used words sound almost like a pop song when read in order.
Spotify’s API contains a number calculated of musical qualities of a song. In this analysis, Energy - a score measuring intensity of a track and Valence - a score measuring how positive / peppy a track sounds — are used as representations of the musical happiness of a song.
The lyrical happiness of a song was calcualted by using the sentiment of a song’s lyrics. This was done by using the natural language processing Python package TextBlob. In this analysis, lyrical happiness is called Text Polarity.
Comined musical and lyrical scores to create an aggregate musical happiness score, which I called the Happy Index:
A Savitzky-Golay filter was applied to the CCI to smooth the curve then the gradient of the curve was used as a feature for the logistic regression.
Below, months with sad No. 1 Billboard hits are represented with purple lines. Dates when national tragedies occured are highlighted below.
My model resulted in an r-squared of .77.
- Try other economic indicators outside of the CCI
- Put interactive web-app online
- Discover why The Macerena was so popular
First of all, thank you to all of the students, instructors, and staff at Galvanize. I learned an incredible amount in three months. Also, thank you to the following data scientists for inspiration for the project.