Predicting US Presidential Election 2020 Result Using Twitter Sentiment Analysis with Python
-
Using twitter API to scrape tweets
- Copy “API Key”, “API Secret”, “Access Token”, and “Access Token Secret” to use as Oauth keys.
- Setup Authentication with Twitter using tweepy package.
- Extracting tweets for both Donald Trump and Joe Biden.
candidate_name = ['realDonaldTrump','JoeBiden'] replies_trump = [] replies_biden = [] for candidate in candidate_name: for tweet in tweepy.Cursor(api.search,q='to:'+candidate, result_type='recent',timeout=999999).items(10000): if candidate == "realDonaldTrump": replies_trump.append(tweet) elif candidate == "JoeBiden":' replies_biden.append(tweet)
- converting the files into dataframe and to csv
biden_df = pd.DataFrame() trump_df = pd.DataFrame() df_names = ['biden_df','trump_df'] for tweet in replies_trump: row = {'user': tweet.user.screen_name, 'text': tweet.text.replace('\n', ' ')} trump_df = trump_df.append(row, ignore_index=True) trump_df.to_csv(r'data/trump_data.csv') for tweet in replies_biden: row = {'user': tweet.user.screen_name, 'text': tweet.text.replace('\n', ' ')} biden_df = biden_df.append(row, ignore_index=True) biden_df.to_csv(r'data/biden_data.csv')
- generated data is avaliabe in 'trump_data.csv' and 'biden_data.csv'.
- both the dataset contains to 2 columns and 10000 rows.
- 'text' - this coloumn contains tweets containing '@realDonaldTrump' or '@JoeBiden' respectively.
- 'user' - this coloumn contains the username.
-
Importing the datasets
-
Sentiment analysis using TextBlob
- Polarity ranges from -1 to +1 and tells whether the text has negative sentiments or positive sentiments
- polarity function returns the polarity of each tweet
def polarity(review): return TextBlob(review).sentiment.polarity Trump_reviews['polarity'] = Trump_reviews['text'].apply(polarity) Biden_reviews['polarity'] = Biden_reviews['text'].apply(polarity)
- adding the tag of 'Positive', 'Negative' or 'Netural' according to the polarity
Trump_reviews['Expression'] = np.where(Trump_reviews['polarity']>0,'Positive','Negative') Trump_reviews.loc[Trump_reviews.polarity == 0, 'Expression'] = 'Netural' Trump_reviews.head() Biden_reviews['Expression'] = np.where(Biden_reviews['polarity']>0,'Positive','Negative') Biden_reviews.loc[Biden_reviews.polarity == 0, 'Expression'] = 'Netural' Biden_reviews.head()
- Visualizing to find Positive, Negative and Neutral
- Droping all neutral data since they do not add value to the analysis
Trump_reviews.drop((Trump_reviews[Trump_reviews['polarity']==0]).index, inplace=True) print(Trump_reviews.shape) Biden_reviews.drop((Biden_reviews[Biden_reviews['polarity']==0]).index, inplace=True) print(Biden_reviews.shape)
- After droping the neutral data the I have an uneven dataset to balance out both datasets I make use of 'balanced_data' function.
def balanced_data(reviews,n): np.random.seed(10) drop = np.random.choice(reviews.index,n,replace=False) review_subset = reviews.drop(drop) return review_subset Trump_subset = balanced_data(Trump_reviews,99) print(Trump_subset.shape) Biden_subset = balanced_data(Biden_reviews,300) print(Biden_subset.shape)
- After balancing the data we have 4000 rows in each dataset.
-
Donald Trump
- From the below figure, one can easily interpret that polarity ranges from -1 to +1 and a larger number of people have positive reviews because it is mostly concentrated between 0 and 0.5.
- From below figure of boxplot, one can easily identify most of the polarity is concentrated between -0.25 to 0.50. So, it is basically showing only the concentration of polarity.
-
Analyzing Most Positive and Most Negative replies
- Note:- As per the insights I have gained by this project. 'TextBlob sentiment analyzer' is not efficient enough to detect the scarcastic comments. Since, it works on tokens of sentence and classify accordingly.
-
Word clouds can be useful to find your customer's pain points in business purposes, I am using it to get insights of public opinion about the presidential candidate and most frequently used keywords by the citizens.
-
Joe Biden
- From the below figure, one can easily interpret that polarity ranges from -1 to +1 and a larger number of people have positive reviews because it is mostly concentrated between 0 and 0.5.
- From below figure of boxplot, one can easily identify most of the polarity is concentrated between -0.25 to 0.50. So, it is basically showing only the concentration of polarity.
-
Analyzing Most Positive and Most Negative replies
- Note:- As per the insights I have gained by this project. 'TextBlob sentiment analyzer' is not efficient enough to detect the scarcastic comments. Since, it works on tokens of sentence and classify accordingly.
-
Word clouds can be useful to find your customer's pain points in business purposes, I am using it to get insights of public opinion about the presidential candidate and most frequently used keywords by the citizens.
-
People Sentiment
- From the below figures, it is very evident that Joe Biden is getting more positive replies as compare to negative reviews.
- The overall people sentiment is more favouralbe to Joe Biden over Donald Trump.
- Note:- I am assuming the all the users are unique. Hence, I have note removed the users who commented on both Joe Biden & Donald Trump