Skip to content

Latest commit

 

History

History
30 lines (27 loc) · 1.32 KB

README.md

File metadata and controls

30 lines (27 loc) · 1.32 KB

Streaming Twitter data using Apache Spark


Synopsis


Simple Spark application that connects to Twitter and prints twitter messages based on a filter (if any).
The Spark application can be run as a Standalone Application or on Hadoop.

Motivation


The motivation behind this project was to provide support to developers and researchers in connecting to Twitter using Apache Spark.

Execution


Prerequisites:
1)If you are running on Hadoop, ensure ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set

Instructions to run the application using an IDE:
1) Edit the run configuration to include the following arguments: [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]
2) Run the SparkApplication class - Main method is located here (Optional: edit the FILTERS array to filter out the tweets received)

Instructions to run the application on the command line:
1) Ensure maven is installed and enter "mvn clean package"
2) In the target folder, you should see a jar file with dependencies. Run "java -jar [generated_jar].jar [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]