GitHub - DennisDavydov/BM25: A school project of creating a BM25 ranking for information retrieval from a movie database

Overview

This project implements a simple yet powerful text search engine using the inverted index data structure and the BM25 ranking algorithm. It is designed to efficiently index documents and rank search results based on query relevance, demonstrating fundamental concepts in information retrieval systems.

The core of the project is split into two main components:

inverted_index.py: Builds an inverted index from a collection of documents. It utilizes the BM25 algorithm to calculate relevance scores between documents and queries. evaluate.py: Evaluates the effectiveness of the inverted index by comparing search results against a benchmark dataset, using metrics such as precision at K, recall, and average precision.

Usage

First, you need to build the inverted index from your dataset:

python inverted_index.py <path-to-your-dataset>

To evaluate the performance of your search engine:

python evaluate.py <path-to-your-dataset> <path-to-benchmark-data>

Replace '' with actual path to the required files.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
evaluate.py		evaluate.py
inverted_index.py		inverted_index.py
movies.tsv		movies.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Usage

About

Releases

Packages

Languages

DennisDavydov/BM25

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages