Qatar World Cup was full of surprises! From Saudi Arabia shocking the world by upsetting Argentina to Morocco's historic run to the semifinals, you must heard or witnessed those moments during that soccer craze. In this project, I will first use BERTopic to model topics on tweets posted during the World Cup 2022. Then I will use Hugging Face API to do a sentiment analysis on those tweets. Finally, I will connect the topic modeling and sentiment analysis to conduct topic based sentiment analysis.
Figure 1: Topic Words for Top 10 Topics
Figure 2: Topics over Time - Dynamic Topic Modeling
Figure 3: Word Cloud for Tweets with Positive Sentiments (sample size: 10,000)
Figure 4: Word Cloud for Tweets with Negative Sentiments (sample size: 10,000; sensitive words removed)
Figure 5: Sentiment Distribution on All Tweets (sample size: 300)
Figure 6: Sentiment Distribution on Topics (sample size: 300)
- Based on the insights obtained from Figure 1, it appears that while there were initial controversies surrounding Qatar hosting the World Cup, the prevalent use of keywords such as "qatar2022" and "best" suggest that there is widespread appreciation for the efforts that Qatar has made in preparing for the event. Additionally, it is evident that among the many stars who played in the World Cup, Twitter users frequently discussed PSG teammates Mbappe and Messi, as indicated by the prominent appearance of their names as keywords.
- Figure 2 revealed that the topic "qatar2022" and "best" was highly frequent across time, suggesting a widespread appreciation for Qatar's efforts in hosting the event.
- After performing sentiment analysis using the Hugging Face API, we grouped the tweets based on their sentiments (positive and negative) and visualized the most frequent words that appeared in each sentiment category in Figures 3 and 4. These visualizations highlighted the prevalence of positive sentiments, with words such as "congrats", "Messi", and "winner" appearing frequently in positive tweets, indicating a general admiration for Leo Messi and appreciation for the tournament. Negative tweets also had mentions of "Messi", but with less frequency, and frequently used the word "rigged", indicating a level of dissatisfaction with the refereeing in the tournament.
- Figure 5 showed that 60% of the 10,000 analyzed tweets exhibited positive sentiments, suggesting that most people were happy about the Qatar World Cup. Finally, we connected topic modeling and sentiment analysis to calculate the average sentiment of tweets in each topic, which is shown in Figure 6. The majority of the sentiments for each topic were positive, with topic 1 (argentina_world_cup_messi) showing the most positive sentiment, indicating a high level of satisfaction with Argentina winning the tournament. These findings highlight the potential of BERTopic and sentiment analysis as powerful tools for generating insights from large volumes of text data.
https://huggingface.co/inference-api
Parameter | Type | Description |
---|---|---|
hf_token |
string |
Required. Your API key |
https://medium.com/@cd_24/using-bertopic-to-analyze-qatar-world-cup-twitter-data-part-3-b220630fd894
https://github.com/JustAnotherArchivist/snscrape
https://medium.com/@cd_24/using-bertopic-to-analyze-qatar-world-cup-twitter-data-a5956c4949f1
The codes programmed in this project are displayed in this repository. Feel free to check and use them.
If you are curious to know more details about topic based sentiment analysis, go check my posts on Medium.