Hadoop Mapper and Reducer Scripts for Python

This repository contains solutions to common mapper and reducer problems in Hadoop using Python. Most online resources for Hadoop are geared towards Java environments, so this repository aims to provide Python solutions for Hadoop streaming.

Hadoop Installation:

Windows:

Watch this video for Hadoop installation on Windows.

Ubuntu:

Follow this video for Hadoop installation on Ubuntu.

Basic Hadoop Commands:

Format Namenode:
```
hdfs namenode -format
```
Start Hadoop Services:
```
start-all.sh
```
Create Input Directory in HDFS:
```
hdfs dfs -mkdir /input
```

Upload Input File to HDFS:

hdfs dfs -put /path/to/input.txt /input/input.txt

Run Hadoop Streaming:

hadoop jar /path/to/hadoop-streaming.jar \
-input /input/input.txt \
-output /output \
-file "/path/to/mapper.py" \
-mapper "python3 mapper.py" \
-file "/path/to/reducer.py" \
-reducer "python3 reducer.py"

Copy Output from HDFS to Local File:

hdfs dfs -text /output/* > /path/to/outputfile.txt

Remove Output and Input Directories from HDFS:

hadoop fs -rm -r /output
hadoop fs -rm -r /input

Testing Mapper and Reducer Scripts:

You can test the mapper and reducer scripts separately to ensure they work correctly:

Test Mapper Script:

cat /path/to/input.txt | python3 /path/to/mapper.py

Test Reducer Script:

cat /path/to/mapper_output.txt | python3 /path/to/reducer.py

Algorithm Explanations:

Recommendation System:

Mapper: Preprocesses user-item ratings.
Reducer: Generates recommendations based on similarity measures between users.

Page Rank:

Mapper: Prepares graph data with nodes and edges.
Reducer: Calculates the PageRank algorithm to determine node importance in the graph.

K-Means:

Mapper: Assigns data points to clusters based on centroid proximity.
Reducer: Updates centroid positions based on cluster assignments.

Weather Data Analysis:

Mapper: Extracts relevant weather data from input records.
Reducer: Aggregates weather data and computes statistics like average temperature or precipitation.

Word Count:

Mapper: Splits text into words and emits key-value pairs for each word.
Reducer: Counts the occurrences of each word.

Sample Input and Output:

You can find sample input and output files in the repository to test the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Kmeans		Kmeans
Pagerank		Pagerank
Recommendation system		Recommendation system
WeatherData		WeatherData
Wordcount		Wordcount
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop Mapper and Reducer Scripts for Python

Hadoop Installation:

Windows:

Ubuntu:

Basic Hadoop Commands:

Testing Mapper and Reducer Scripts:

Algorithm Explanations:

Recommendation System:

Page Rank:

K-Means:

Weather Data Analysis:

Word Count:

Sample Input and Output:

About

Releases

Packages

Languages

License

RobinMillford/Hadoop-MapRaduce-Problems

Folders and files

Latest commit

History

Repository files navigation

Hadoop Mapper and Reducer Scripts for Python

Hadoop Installation:

Windows:

Ubuntu:

Basic Hadoop Commands:

Testing Mapper and Reducer Scripts:

Algorithm Explanations:

Recommendation System:

Page Rank:

K-Means:

Weather Data Analysis:

Word Count:

Sample Input and Output:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages