Skip to content

Thomas-George-T/Twitter-streaming-using-Flume-and-Hive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GitHub GitHub top language



Description

This project streams/ingest Twitter feed using Flume. The tweets are stored in a Hive data lake using Avro format. This data can be cleansed using tools like OpenRefine, Pig etc. The cleansed data can then be used for visualization.

Components

Prerequisites

To run this software you need the following:

  1. Linux
  2. Hadoop 2.0
  3. Hive 2.0
  4. Flume
  5. Twitter Developer App Credentials

Steps

  1. Get credentials for developing twitter apps.

  2. Write a twitter.conf file and replace the variables with your secret keys given by twitter.

  3. Execute twitter.conf in the terminal

    flume-ng agent -n TwitterAgent -f $FLUME_CONF_DIR/twitter.conf
    
  4. Get the schema from the avro log file

    hdfs dfs -cat /user/flume/tweets/FlumeData.* | head
    
  5. Copy and then save the schema in a file called TwitterDataAvroSchema.avsc

  6. Edit the file for readability.

  7. Write a hql file called avrodataread.q to create table tweets using the AvroSerDe, mention the avro schema file in the tblproperties.

  8. Execute the file in terminal

    hive -f FlumeHiveTwitterApp/Hive scripts/avrodataread.q
    
  9. To create a table for processing or for visualization, use the file named create_tweets_avro_table.q and execute it.

    hive -f FlumeHiveTwitterApp/Hive scripts/create_tweets_avro_table.q
    
  10. Clean using tools like pig, OpenRefine etc.

  11. Visualize the data into a dashboard using tools like tablaeu, d3.js etc.

Releases

No releases published

Packages

No packages published

Languages