Skip to content

A school project of creating a BM25 ranking for information retrieval from a movie database

Notifications You must be signed in to change notification settings

DennisDavydov/BM25

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This project implements a simple yet powerful text search engine using the inverted index data structure and the BM25 ranking algorithm. It is designed to efficiently index documents and rank search results based on query relevance, demonstrating fundamental concepts in information retrieval systems.

The core of the project is split into two main components:

inverted_index.py: Builds an inverted index from a collection of documents. It utilizes the BM25 algorithm to calculate relevance scores between documents and queries. evaluate.py: Evaluates the effectiveness of the inverted index by comparing search results against a benchmark dataset, using metrics such as precision at K, recall, and average precision.

Usage

First, you need to build the inverted index from your dataset:

python inverted_index.py <path-to-your-dataset>

To evaluate the performance of your search engine:

python evaluate.py <path-to-your-dataset> <path-to-benchmark-data>

Replace '' with actual path to the required files.

About

A school project of creating a BM25 ranking for information retrieval from a movie database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages