Skip to content

Word-extraction task performed over tweets regarding the "Brexit" topic using nltk and the twitter APIs. Word-counting using MAP-REDUCE queries and plot of istograms wrt several analysis dimensions.

Notifications You must be signed in to change notification settings

matbelcao/brexit-tweets-analysis

Repository files navigation

Brexit Tweets Analysis

This is an optional project developed during the course "054306 - UNSTRUCTURED AND STREAMING DATA ENGINEERING" during my studies at Politecnico di Milano


OVERVIEW ( full details in report.pdf )

The project work consists into analyzing some tweets about the Brexit topic and plotting some diagrams about the most frequent words, taking into account different dimensions like political stance, sentiment and language. The original starting CSV data are available at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KP4XRP.

The tweets data are gathered using a multithread python script in order to exploit a parallel (multi-account) interaction with the Twitter APIs. The script extrapolates the most salient words by applying filtering and transformation operations in the middle steps of the elaboration using the nltk library, and then stores the results with different granularities (single-tweet and user-aggregate arrays of tuples <word,count>) in a MongoDB database.



Finally, using several python scripts, are extracted the most frequent used words for the tweets written in English by exploiting a set of MAP-REDUCE queries over the MongoDB repository. The outputs are plots of several graphs that takes into account different parameters like political stance, sentiment and language.



The same kind of analysis is performed for the 4 main European languages (IT,FR,DE,ES), but in this case the output is limited on describing the most used words for each language.



How to run the code on your PC (Unix)

  1. install MongoDB (https://docs.mongodb.com/manual/administration/install-on-linux/)

  2. unzip the database file ./db/mongoDB_backup/db_compressed.rar

  3. import the database on MongoDB using the command from the main directory mongorestore -d brexit ./db/mongoDB_backup/brexit/ -u Admin -p Password --authenticationDatabase admin

  4. add some twitter-api developer keys in the file ./twitter-analyzers/credentials.csv ( the more accounts you use, the higher is the interaction throughput)

  5. run ./twitter-analyzers/multithread-tweets-analyzer.py for getting new tweets data

  6. run some python scripts inside the ./analysis_scripts/ folder to get some plot

About

Word-extraction task performed over tweets regarding the "Brexit" topic using nltk and the twitter APIs. Word-counting using MAP-REDUCE queries and plot of istograms wrt several analysis dimensions.

Topics

Resources

Stars

Watchers

Forks