Building a Search Engine

Warning work in progress!

A basic search engine that helps you index a corpus to search and rank the document data set. Built using Python and object-oriented programming principles to make the project extendable and maintainable.

Features:

Inverted Index - to improve search times.
Results Ranking - with term frequency–inverse document frequency (TF-IDF) to order results by relevance.
Query Expansion - to automatically add additional query terms (like synonyms) to improve results relevancy (see my testing analysis).
Result Evaluation - test and compare results with human-evaluated relevancy scores to gauge performance.

This started out as a course project, and I'm currently working on building this out further and adding more features to it. I'm planning to build out a front-end web interface so I can demo this project better. I will also be adding additional functionality to build on the project.

ToDo:

Spit up files and organize into packages.
Write Documentation!
Finish implementing stop words functionality.
Build a frontend web interface to the demo project.
Result snippet generation.
Implement advanced search operators (OR, NOT).
Improve query normalization.
Ranking improvements.
Add caching and on-demand loading to improve memory efficiency.

I hope to writing some more conprehensive documentation for this project in the near future.

Stay tuned :)

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.idea		.idea
tests		tests
.gitignore		.gitignore
Query-Expansion-Analysis.md		Query-Expansion-Analysis.md
README.md		README.md
counters.py		counters.py
document_source.py		document_source.py
document_transformer.py		document_transformer.py
documents.py		documents.py
eval.py		eval.py
hw3.py		hw3.py
index.py		index.py
indexing_process.py		indexing_process.py
query_expansion.py		query_expansion.py
query_process.py		query_process.py
search_api.py		search_api.py
syns.jsonl		syns.jsonl
tokenizer.py		tokenizer.py
wiki_small.json		wiki_small.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Search Engine

About

Releases

Packages

Languages

Navnedia/Building-A-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Building a Search Engine

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages