Skip to content
Kayahan Tasyaran edited this page Dec 17, 2017 · 12 revisions

Description

This project is implementation of one of the most frequent algorithms (MapReduce) used for word occurrences in a given speech/text in simplest manner using Open MPI. Open MPI is used because MapReduce is mainly works on Distributed Systems.

Implementation

Project is implemented in C.

Input/Output Format

Input file is tokenized version of text file (No punctuation mark and all letters are lowercase). Output file consists of word and its corresponding occurrence in each line in lexicographical order.

Example commands (Both compile and run):

Compile:

mpicc code.c -o executable_name

Run:

mpirun -np 3 ./executable_name tokenized_speech output_file

mpirun --oversubscribe (If number of processors exceeds 4) -np 7 ./executable_name tokenized_speech output_file

Testing

To test results, run MPITest.py file with argument type described above. It prints result to the console whether true or false.

Example commands:

./MPITest.py input_file example_output_file generated_output_file_by_user

python MPITest.py input_file example_output_file generated_output_file_by_user

Clone this wiki locally