-
-
Notifications
You must be signed in to change notification settings - Fork 216
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #492 from adi271001/Commonwealth-Games-Tweets-Anal…
…ysis Commonwealth games tweets analysis
- Loading branch information
Showing
36 changed files
with
100,031 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# Commonwealth Games 2022 Twitter Dataset | ||
|
||
## Overview | ||
|
||
This dataset contains Twitter data related to the Commonwealth Games 2022, including information such as tweet content, sentiment, user details, and more. The dataset is suitable for various analyses, including sentiment analysis, topic modeling, and user behavior studies. | ||
|
||
## Dataset Details | ||
|
||
### Columns: | ||
|
||
1. **`id`:** | ||
- Unique identifier for each tweet. | ||
|
||
2. **`conversation_id`:** | ||
- ID for the conversation to which the tweet belongs. | ||
|
||
3. **`created_at`:** | ||
- Timestamp indicating when the tweet was created. | ||
|
||
4. **`date`:** | ||
- Date of the tweet. | ||
|
||
5. **`time`:** | ||
- Time of the tweet. | ||
|
||
6. **`timezone`:** | ||
- Timezone information. | ||
|
||
7. **`user_id`:** | ||
- Unique identifier for the user. | ||
|
||
8. **`username`:** | ||
- Twitter username of the user. | ||
|
||
9. **`name`:** | ||
- Name of the user. | ||
|
||
10. **`place`:** | ||
- Geographical place information. | ||
|
||
11. **`tweet`:** | ||
- Text content of the tweet. | ||
|
||
12. **`language`:** | ||
- Language in which the tweet is written. | ||
|
||
13. **`mentions`:** | ||
- Users mentioned in the tweet. | ||
|
||
14. **`urls`:** | ||
- URLs included in the tweet. | ||
|
||
15. **`photos`:** | ||
- Photos attached to the tweet. | ||
|
||
16. **`replies_count`:** | ||
- Count of replies to the tweet. | ||
|
||
17. **`retweets_count`:** | ||
- Count of retweets of the tweet. | ||
|
||
18. **`likes_count`:** | ||
- Count of likes on the tweet. | ||
|
||
19. **`hashtags`:** | ||
- Hashtags used in the tweet. | ||
|
||
20. **`cashtags`:** | ||
- Cashtags used in the tweet. | ||
|
||
21. **`link`:** | ||
- Link to the tweet. | ||
|
||
22. **`retweet`:** | ||
- Indicates if the tweet is a retweet. | ||
|
||
23. **`quote_url`:** | ||
- URL of the quoted tweet if applicable. | ||
|
||
24. **`video`:** | ||
- Indicates if the tweet contains a video. | ||
|
||
25. **`thumbnail`:** | ||
- Thumbnail of the video. | ||
|
||
26. **`near`:** | ||
- Location information. | ||
|
||
27. **`geo`:** | ||
- Geographical information. | ||
|
||
28. **`source`:** | ||
- Source platform for the tweet. | ||
|
||
29. **`user_rt_id`:** | ||
- ID of the user who retweeted. | ||
|
||
30. **`user_rt`:** | ||
- User who retweeted. | ||
|
||
31. **`retweet_id`:** | ||
- ID of the retweet. | ||
|
||
32. **`reply_to`:** | ||
- Users to whom the tweet is a reply. | ||
|
||
33. **`retweet_date`:** | ||
- Timestamp for retweet date. | ||
|
||
34. **`translate`:** | ||
- Indicates if translation is available. | ||
|
||
35. **`trans_src`:** | ||
- Source language for translation. | ||
|
||
36. **`trans_dest`:** | ||
- Destination language for translation. | ||
|
||
### Source: | ||
[Kaggle - Tweets on Commonwealth Games 2022](https://www.kaggle.com/datasets/aneeshtickoo/tweets-on-common-wealth-games-2022) | ||
|
||
## Potential Use Cases | ||
|
||
1. **Sentiment Analysis:** | ||
- Analyze the sentiment of tweets to understand the overall mood regarding the Commonwealth Games. | ||
|
||
2. **Topic Modeling:** | ||
- Identify and explore the most common topics discussed by users. | ||
|
||
3. **User Engagement:** | ||
- Examine user engagement metrics such as likes, retweets, and replies. | ||
|
||
4. **Language Distribution:** | ||
- Study the distribution of languages used in tweets. | ||
|
||
## Usage | ||
|
||
1. **Download the Dataset:** | ||
- Access the dataset on [Kaggle](https://www.kaggle.com/datasets/aneeshtickoo/tweets-on-common-wealth-games-2022) and download the necessary files. | ||
|
||
2. **Explore the Dataset:** | ||
- Utilize tools like pandas, numpy, or your preferred data analysis library to explore and understand the dataset. | ||
|
||
3. **Contribute:** | ||
- If you discover interesting insights or create visualizations, consider sharing them with the community. | ||
|
||
## Acknowledgments | ||
|
||
- Special thanks to Aneesh Tickoo for providing this dataset on Kaggle. | ||
|
||
## License | ||
|
||
This dataset is available under the specified license on Kaggle. Please refer to the [Kaggle dataset page](https://www.kaggle.com/datasets/aneeshtickoo/tweets-on-common-wealth-games-2022) for more details. |
35,061 changes: 35,061 additions & 0 deletions
35,061
Commonwealth Games Tweets Analysis/Dataset/cwg.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Commonwealth Games 2022 Twitter Analysis | ||
|
||
## Overview | ||
|
||
This project aims to analyze Twitter data related to the Commonwealth Games 2022. The dataset includes tweets with various features such as sentiment, topic, and word frequency. The analysis involves the application of several machine learning and deep learning models to understand patterns and sentiments within the tweets. | ||
|
||
## Key Insights | ||
|
||
1. **Sentiment Analysis:** | ||
- Majority of tweets exhibit a positive sentiment. | ||
- Most common sentiment is positive, contributing to an overall positive tone. | ||
- ![Graph of sentiment](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta12.PNG) | ||
|
||
2. **Word Frequency:** | ||
- "cwg2022" is the most frequently used word. | ||
- Word cloud and frequency distribution highlight prominent topics in tweets. | ||
- ![Frequency Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta10.PNG) | ||
|
||
3. **Tweet Length:** | ||
- Majority of tweets fall within 150 to 190 words. | ||
- Users express detailed thoughts and opinions about Commonwealth Games 2022. | ||
- ![Distribution of Tweet Lengths](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta9.PNG) | ||
|
||
## Models and Accuracies | ||
|
||
| Model | Accuracy | | ||
|-----------------------|----------| | ||
| Logistic Regression | 93% | | ||
| Decision Tree | 90% | | ||
| Gradient Boosting | 89% | | ||
| SVM | 86% | | ||
| Deep Learning Models | 41.76% | | ||
|
||
## Models Overview | ||
|
||
1. **Logistic Regression:** | ||
- Achieved the highest accuracy of 93%. | ||
- Demonstrates strong performance in classifying tweet sentiments. | ||
|
||
2. **Decision Tree:** | ||
- Achieved an accuracy of 90%. | ||
- Captures complex relationships within the dataset. | ||
|
||
3. **Gradient Boosting:** | ||
- Achieved an accuracy of 89%. | ||
- Provides effective ensemble learning for improved predictions. | ||
|
||
4. **SVM:** | ||
- Achieved an accuracy of 86%. | ||
- Utilizes support vector machines for accurate classification. | ||
|
||
-![Joint Accuracy Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta3.png) | ||
|
||
5. **Deep Learning Models(ANN,CNN,DNN,RNN,LSTM):** | ||
- Achieved an accuracy of 41.76%. | ||
- Indicates challenges in capturing complex patterns in the dataset. | ||
- ![CNN Accuracy Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta5.png) | ||
- ![ANN Accuracy Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta6.png) | ||
- ![RNN Accuracy Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta7.png) | ||
- ![DNN Accuracy Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta8.png) | ||
- ![LSTM Accuracy Graph](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta4.png) | ||
|
||
|
||
## Word Cloud | ||
|
||
![Word Cloud of Most Common Words](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta11.PNG) | ||
|
||
## Topic Analysis | ||
|
||
![Distribution of Topics in Tweets](https://github.com/adi271001/ML-Crate/blob/Commonwealth-Games-Tweets-Analysis/Commonwealth%20Games%20Tweets%20Analysis/Images/cgta13.PNG) | ||
|
||
## Conclusions | ||
|
||
- Logistic Regression emerges as the most accurate model for sentiment analysis. | ||
- Decision Tree and Gradient Boosting models provide robust performances. | ||
- Deep learning models show potential areas for improvement. | ||
|
||
## Usage | ||
1. Install dependencies: | ||
pip install -r requirements.txt | ||
2. Run the analysis scripts. | ||
|
||
3. View the generated visualizations in the plots directory. | ||
|
||
## Acknowledgement | ||
|
||
I would Love to thank Kaggle for providing the dataset and maintainer for assigning me this project and KOSS for participating in KWOC 2023 |
Oops, something went wrong.