GitHub - abhijajal/Social-Media-Data-Analysis-for-Accident-Reports: This project collects social media data (from Twitter) and analyze it to retrieve any roadside traffic accident related information (like the location of incident, severity, type of injuries, etc) in realtime. Python, Spark, Apache Kafka.

abhijajal / Social-Media-Data-Analysis-for-Accident-Reports Public

Notifications You must be signed in to change notification settings
Fork 2
Star 1

This project collects social media data (from Twitter) and analyze it to retrieve any roadside traffic accident related information (like the location of incident, severity, type of injuries, etc) in realtime. Python, Spark, Apache Kafka.

GPL-3.0 license

1 star 2 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitattributes		.gitattributes
Air.wav		Air.wav
Boston4C.csv		Boston4C.csv
Chicago4C.csv		Chicago4C.csv
LICENSE		LICENSE
Memphis4C.csv		Memphis4C.csv
NYC4C.csv		NYC4C.csv
Project Proposal_Social Media Data Analysis for Accident Reports.pdf		Project Proposal_Social Media Data Analysis for Accident Reports.pdf
SanFrancisco4Classes.csv		SanFrancisco4Classes.csv
Seattle4Classes.csv		Seattle4Classes.csv
WED night.txt		WED night.txt
accidentalTweetsLR.txt		accidentalTweetsLR.txt
accidentalTweetsNB.txt		accidentalTweetsNB.txt
analyzingEachTxtFile.py		analyzingEachTxtFile.py
analyzingLabeledTweetsDataset.py		analyzingLabeledTweetsDataset.py
archiveProducer.py		archiveProducer.py
archiveSearch.py		archiveSearch.py
big_data.py		big_data.py
combineDataset.py		combineDataset.py
count.py		count.py
data.json		data.json
divideDataset.py		divideDataset.py
finalDataset.txt		finalDataset.txt
gatheringTweets.py		gatheringTweets.py
labeledTweetDataset.txt		labeledTweetDataset.txt
oldTweets.txt		oldTweets.txt
pom.xml		pom.xml
pos_loc.py		pos_loc.py
positiveDataset.py		positiveDataset.py
positiveDataset.txt		positiveDataset.txt
readme.txt		readme.txt
sample.txt		sample.txt
setup.cfg		setup.cfg
setup.py		setup.py
severity_classifier.py		severity_classifier.py
spacy_test.py		spacy_test.py
test.sh		test.sh
trainModels.py		trainModels.py
tweetDataset.txt		tweetDataset.txt
tweetDataset_Banerjee.txt		tweetDataset_Banerjee.txt
tweetDataset_Divya.txt		tweetDataset_Divya.txt
tweetDataset_Jacky.txt		tweetDataset_Jacky.txt
tweetDataset_Jajal.txt		tweetDataset_Jajal.txt
tweetDataset_Matt.txt		tweetDataset_Matt.txt
tweet_producer_archiveKeyword.py		tweet_producer_archiveKeyword.py
tweet_producer_keyword.py		tweet_producer_keyword.py
tweet_producer_location.py		tweet_producer_location.py
tweets with less keyworfs.txt		tweets with less keyworfs.txt
tweets.txt		tweets.txt
twitter_consumer.py		twitter_consumer.py

Repository files navigation

Steps for runnning this Project:

1. Run "trainModels.py" to train the Models offline for the data stored in finalDataSet.txt.
2. After running this, two folders would be created for storing models.
3. Then open the terminal and run zookeeper, Kafka Server and create a topic named "twitter"
4. Run "twitter_producer.py" 
5. Run "twitter_consumer.py" 
6. Visualize the Data is Kibana at indexes twitternb and twitterlr

Code for Running Kafka: 
Starting the zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties

Starting the Kafka Server:
bin/kafka-server-start.sh config/server.properties

Creating a topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitter

List all the topics:
bin/kafka-topics.sh --list --zookeeper localhost:2181

Running Producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic twitter

Running Consumer:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic twitter --from-beginning