Skip to content

Commit

Permalink
Add word-count and n-gram proj dir
Browse files Browse the repository at this point in the history
  • Loading branch information
GuruMulay committed Sep 8, 2018
1 parent 5998062 commit 7f5add0
Show file tree
Hide file tree
Showing 14 changed files with 169 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This repo contains some of the projects from my Big Data class.
I am doing cleanup of my class folder before putting it on GH. So there will be several commits.
Some of the code was provided to us by the GTA to give a head start.
Binary file added n-gram-analysis-of-gutenberg/n-gram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added word-count-hadoop-map-reduce/bin/MainClass.class
Binary file not shown.
Binary file not shown.
Binary file not shown.
34 changes: 34 additions & 0 deletions word-count-hadoop-map-reduce/src/MainClass.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MainClass {
public static void main(String[] args) throws IOException, ClassNotFoundException,
InterruptedException {
if (args.length != 2) {
System.out.printf("Usage: <jar file> <input dir> <output dir>\n");
System.exit(-1);
}
Configuration conf =new Configuration(); // haddoop[conf file
Job job=Job.getInstance(conf);
job.setJarByClass(MainClass.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

13 changes: 13 additions & 0 deletions word-count-hadoop-map-reduce/src/WordCountMapper.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException{
String[] words = value.toString().split(" ");
for(String word: words) {
context.write(new Text(word), new Text("one"));
}
}
}
14 changes: 14 additions & 0 deletions word-count-hadoop-map-reduce/src/WordCountReducer.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,Text,Text,IntWritable>{
public void reduce(Text key, Iterable<Text> values, Context context) throws
IOException, InterruptedException {
int count = 0;
for(Text val: values) {
count++;
}
context.write(key, new IntWritable(count));
}
}
Empty file.
11 changes: 11 additions & 0 deletions word-count-hadoop-map-reduce/test1/part-r-00000
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
a 1
file 1
file. 1
input 1
is 2
test 1
text 1
the 1
this 2
to 1
wordcount 1
1 change: 1 addition & 0 deletions word-count-hadoop-map-reduce/test1/textfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
this is a test text file. this file is the input to wordcount
Empty file.
93 changes: 93 additions & 0 deletions word-count-hadoop-map-reduce/test2/part-r-00000
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
7 1
a 1
against 1
alive 1
and 1
are 1
at 1
birds 1
books 1
borne 2
but 4
cage 1
caged 1
called 1
cannot 1
cent 1
century 1
consider 1
daughter 1
did 1
enough 1
even 1
felt 1
filled 1
for 3
give 1
grandmother 1
had 5
happening 1
have 2
her 2
his 1
house 2
hundred 1
husband 2
i 7
if 1
important 1
in 4
is 2
it 5
its 1
law 1
least 1
leave 1
left 1
liked 1
likewise 1
mine 1
more 1
much 1
my 2
never 1
not 3
of 5
on 1
others 1
pain 1
per 1
prepared 1
rajah 1
read 1
reason 1
refused 1
room 1
s 1
seclusion 1
she 3
so 1
speak 1
still 1
subject 1
talking 1
taste 1
than 1
that 4
the 9
then 1
there 1
this 3
to 2
twentieth 1
twenty 1
uncomplaining 1
universe 1
was 3
way 1
we 1
what 1
why 1
with 1
would 1
zenana 1
Binary file added word-count-hadoop-map-reduce/word-count.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7f5add0

Please sign in to comment.