Streaming Twitter data using Apache Spark

Synopsis

Simple Spark application that connects to Twitter and prints twitter messages based on a filter (if any).
The Spark application can be run as a Standalone Application or on Hadoop.

Motivation

The motivation behind this project was to provide support to developers and researchers in connecting to Twitter using Apache Spark.

Execution

Prerequisites:
1)If you are running on Hadoop, ensure ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set

Instructions to run the application using an IDE:
1) Edit the run configuration to include the following arguments: [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]
2) Run the SparkApplication class - Main method is located here (Optional: edit the FILTERS array to filter out the tweets received)

Instructions to run the application on the command line:
1) Ensure maven is installed and enter "mvn clean package"
2) In the target folder, you should see a jar file with dependencies. Run "java -jar [generated_jar].jar [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Streaming Twitter data using Apache Spark

Synopsis

Motivation

Execution

Files

README.md

Latest commit

History

README.md

File metadata and controls

Streaming Twitter data using Apache Spark

Synopsis

Motivation

Execution