Skip to content

vikrambadhan/Hadoop_MapReduce_N-Gram_Count

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce_N-Gram_Count

Hadoop map reduce to compute n gram counts

The submission was programmed in python and tested on NYU Dataproc Hadoop Cluster.

To run the code: mapred streaming -input hw1.txt -output -mapper "python mapper.py" -reducer "python reducer.py" -file mapper.py -file reducer.py

--> This will run and output will be stored as <outputfile>

use this file and run: mapred streaming -input -output -mapper "python mapper2.py" -reducer "python reducer2.py" -file mapper2.py -file reducer2.py

The will be stored as a .txt file and we can parse it to check the output

We parse using the command,

hdfs dfs -cat .txt/par*

About

Hadoop map reduce code for solving n-gram problems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages