Passage Retrieval and Ranking System

This repository implements a modular passage retrieval and ranking system using a variety of retrieval models and machine learning approaches. The system processes text data, builds embeddings, retrieves relevant passages for given queries, and ranks them using models like BM25, Logistic Regression, LambdaMART, and a Neural Network.

Features

Preprocessing:
- Text tokenization, normalization, stopword removal, and lemmatization.
- Embedding generation using pre-trained models like FastText.
Retrieval Models:
- BM25, Laplace Smoothing, and Dirichlet Smoothing for passage ranking.
Ranking Models:
- Logistic Regression for linear ranking.
- LambdaMART (XGBoost Ranker) for tree-based ranking.
- Neural Network (PyTorch) for deep learning-based ranking.
Evaluation:
- Metrics include Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG).
Feature Engineering:
- Cosine similarity and Word Mover’s Distance (WMD) between embeddings.
- Element-wise product features for ranking models.

Project Structure

project/
│
├── README.md                  # Project overview and usage instructions
├── requirements.txt           # Dependencies
├── main.py                    # Entry-point for executing the pipeline
│
├── preprocessing/             # Preprocessing-related modules
│   ├── text_processing.py     # Text cleaning, tokenization, lemmatization
│   ├── feature_engineering.py # Embedding and feature computation
│
├── retrieval/                 # Retrieval algorithms
│   ├── bm25.py                # BM25 ranking implementation
│   ├── smoothing.py           # Laplace and Dirichlet smoothing
│
├── models/                    # Ranking models
│   ├── logistic_regression.py # Custom logistic regression model
│   ├── lambdamart.py          # LambdaMART (XGBoost Ranker) model
│   ├── passage_ranking_nn.py  # PyTorch-based ranking neural network
│
├── evaluation/                # Evaluation metrics
│   ├── metrics.py             # MAP and NDCG calculations
│
├── utils/                     # Helper utilities
│   ├── io_operations.py       # File handling and data loading
│
├── tests/                     # Unit tests
    ├── test_metrics.py        # Tests for evaluation metrics
    ├── test_text_processing.py# Tests for preprocessing functions
    ├── test_models.py         # Tests for models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Passage Retrieval and Ranking System

Features

Project Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
evalutartion		evalutartion
models		models
preprocessing		preprocessing
retrieval		retrieval
tests		tests
utils		utils
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

mertnba/sci-text-retrieval

Folders and files

Latest commit

History

Repository files navigation

Passage Retrieval and Ranking System

Features

Project Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages