Brexit Tweets Analysis

This is an optional project developed during the course "054306 - UNSTRUCTURED AND STREAMING DATA ENGINEERING" during my studies at Politecnico di Milano

OVERVIEW ( full details in report.pdf )

The project work consists into analyzing some tweets about the Brexit topic and plotting some diagrams about the most frequent words, taking into account different dimensions like political stance, sentiment and language. The original starting CSV data are available at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KP4XRP.

The tweets data are gathered using a multithread python script in order to exploit a parallel (multi-account) interaction with the Twitter APIs. The script extrapolates the most salient words by applying filtering and transformation operations in the middle steps of the elaboration using the nltk library, and then stores the results with different granularities (single-tweet and user-aggregate arrays of tuples <word,count>) in a MongoDB database.

Finally, using several python scripts, are extracted the most frequent used words for the tweets written in English by exploiting a set of MAP-REDUCE queries over the MongoDB repository. The outputs are plots of several graphs that takes into account different parameters like political stance, sentiment and language.

The same kind of analysis is performed for the 4 main European languages (IT,FR,DE,ES), but in this case the output is limited on describing the most used words for each language.

How to run the code on your PC (Unix)

install MongoDB (https://docs.mongodb.com/manual/administration/install-on-linux/)
unzip the database file ./db/mongoDB_backup/db_compressed.rar
import the database on MongoDB using the command from the main directory mongorestore -d brexit ./db/mongoDB_backup/brexit/ -u Admin -p Password --authenticationDatabase admin
add some twitter-api developer keys in the file ./twitter-analyzers/credentials.csv ( the more accounts you use, the higher is the interaction throughput)
run ./twitter-analyzers/multithread-tweets-analyzer.py for getting new tweets data
run some python scripts inside the ./analysis_scripts/ folder to get some plot

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
analysis_scripts		analysis_scripts
db		db
readme_src		readme_src
twitter_analyzers		twitter_analyzers
README.md		README.md
explorative_analysis.ipynb		explorative_analysis.ipynb
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brexit Tweets Analysis

OVERVIEW ( full details in report.pdf )

How to run the code on your PC (Unix)

About

Languages

matbelcao/brexit-tweets-analysis

Folders and files

Latest commit

History

Repository files navigation

Brexit Tweets Analysis

OVERVIEW ( full details in report.pdf )

How to run the code on your PC (Unix)

About

Topics

Resources

Stars

Watchers

Forks

Languages