Skip to content

moki298/HadoopMapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Project3 Hadoop MapReduce Analysis - Books Set

The data set used in this project is available at openlibrary.org. The data will be analyzed using Hadoop.

Dump location: http://openlibrary.org/data/ol_dump_works_latest.txt.gz

The instructions for using this code on hadoop server that supports HadoopStreamingAPI are given below

As it has huge data we are going to analyse it using hadoop streaming API

Go to home directory

Intially, clone the files in this repository.

Once the repository is cloned, execute the below command on hadoop cluster, by changing the output path in the command

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /data/openlibrary/ol_dump_works_latest-20161202.txt -output /users-cloud-16fs/ballima/project3-out/output2 -mapper ~/project3/mapper.py -reducer ~/project3/reducer.py -file ~/project3/{mapper,reducer}.py

##Output

Once the job execution is complete the output can be seen in hdfs using below commad

hdfs dfs -cat /users-cloud-16fs/ballima/project3-out/output2/part-00000

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages