TF-IDF Calculation

This project was completed as the second homework assignment for the university course "Parallel Programming." The primary objective was to practice utilizing Python's reduce and lambda functions.

Overview

The task involves implementing the TF-IDF (term frequency–inverse document frequency) following the Map-Reduce paradigm.

Tasks

Calculating TF Values from a Single Text
- It should return a list of tuples (file identifier, word, frequency of the word occurrence).
- Calculate word frequency based on the formula:
```
tf(t, d) = Number of occurrences of word t in document d / Total number of words in document d
```
Calculating TF Values from All Texts
- Resulting in a single list containing tuples from all files.

Calculating IDF Values

IDF is calculated using the formula:

idf(t) = log(Number of documents in the set / Number of documents where word t appears)

Calculating TF-IDF Values
- The result should be in the form of a list of tuples (word, file identifier, value).
- Sort the list so that all words related to one file appear consecutively, and within each file, by descending TF-IDF values.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TF-IDF Calculation

Overview

Tasks

About

Releases

Packages

Languages

irinatomic/TF-IDF_reduce

Folders and files

Latest commit

History

Repository files navigation

TF-IDF Calculation

Overview

Tasks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages