Implementation of LSA, LDA, and SBERT for Semantic Search

This project employs Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Sentence-BERT (SBERT) on the MS MARCO dataset, enabling semantic search functionality across these models. By utilizing GloVe embeddings of the documents and comparing them with provided queries using cosine similarity, it establishes a baseline for model comparison. Key evaluation metrics such as Precision, Average Precision, Recall, F1-Score, and Mean Average Precision (MAP) are computed to assess model performance.

Dependencies

-- nltk
-- tqdm
-- gensim
-- scipy
-- numpy
-- sklearn
-- sentence_transformers
-- Pytorch
-- GloVe embeddings

Run Locally

Clone the project

    git clone https://github.com/zthsk/semantic_search.git

Go to the project directory

    cd semantic_search

Install dependencies

    pip install nltk
    pip install tqdm
    pip install gensim
    pip install scipy   
    pip install numpy
    pip install scikit-learn
    pip install sentence-transformers
    pip install torch torchvision torchaudio

Train the LSA, LDA, BERT, and GloVe

    python train_models.py --bert sbert_embeddings.npy
    python train_models.py --lsa lsa_model.pny
    python train_models.py --lda lda_model.pny
    python train_models.py --glove glove_embeddings.npy

Query the model with a single query

    python query.py --model [bert, lsa, lda] --query "your query"

Query the model with a list of queries

    ./run_queries.sh  # just update the queries you want in queries.txt

Results of a query with different models

Use the analysis.ipynb file to produce the following images:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
dataset.py		dataset.py
lda.py		lda.py
lsa.py		lsa.py
queries.txt		queries.txt
query.py		query.py
relevant_docs.py		relevant_docs.py
run_queries.sh		run_queries.sh
sbert.py		sbert.py
train_models.py		train_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of LSA, LDA, and SBERT for Semantic Search

Dependencies

Run Locally

Results of a query with different models

About

Releases

Packages

Languages

zthsk/semantic_search

Folders and files

Latest commit

History

Repository files navigation

Implementation of LSA, LDA, and SBERT for Semantic Search

Dependencies

Run Locally

Results of a query with different models

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages