Skip to content

Analysis of tweets related to Covid-19 extracted from Twitter API configured to run on Airflow

License

Notifications You must be signed in to change notification settings

Shashank-sigmoid/covid-tweet-analysis-1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Covid Tweet Analysis

Problem Statement

Link

Architecture

structure

This repository is made to create the JAR file which is required for airflow deployment.

The configurations of all the files is according to the airflow deployment.

Steps required to follow to create the JAR file:

  1. Install sbt in your system
brew install sbt
  1. Change the bootstrap servers in TwitterToKafka and Query5_TwitterToKafka files from localhost:9092 to kafka:9092
props.put("bootstrap.servers", "kafka:9092")
  1. Change the spark configuration session builder in KafkaToMongo and Query5_KafkaToMongo according to the mongoDB container name
  .config("spark.mongodb.input.uri", "mongodb://root:root@mongo:27017")
  .config("spark.mongodb.output.uri", "mongodb://root:root@mongo:27017")
  1. Navigate to the directory of the project through terminal and write the command
sbt clean assembly

This command will create a JAR file which includes all the files, folders, library dependencies, etc.
This JAR file can take some time to create (around 5 minutes) and is quite large (200+ Mbs)
So, it is better not to be pushed in the remote repository.

  1. To execute the object present in the JAR file, run the below command in the terminal
java -cp <JAR_file_name> <object_name> 
  1. The above command would be used to run the JAR file in airflow using BashOperator

About

Analysis of tweets related to Covid-19 extracted from Twitter API configured to run on Airflow

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages

  • Scala 89.4%
  • JavaScript 10.6%