Home

Description

This project is implementation of one of the most frequent algorithms (MapReduce) used for word occurrences in a given speech/text in simplest manner using Open MPI. Open MPI is used because MapReduce is mainly works on Distributed Systems.

Implementation

Project is implemented in C.

Input/Output Format

Input file is tokenized version of text file (No punctuation mark and all letters are lowercase). Output file consists of word and its corresponding occurrence in each line in lexicographical order.

Example commands (Both compile and run):

Compile:

mpicc code.c -o executable_name

Run:

mpirun -np 3 ./executable_name tokenized_speech output_file

mpirun --oversubscribe (If number of processors exceeds 4) -np 7 ./executable_name tokenized_speech output_file

Testing

To test results, run MPITest.py file with argument type described above. It prints result to the console whether true or false.

Example commands:

./MPITest.py input_file example_output_file generated_output_file_by_user

python MPITest.py input_file example_output_file generated_output_file_by_user

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Description

Implementation

Input/Output Format

Testing

Clone this wiki locally