Simulating a Vector Database on CoNaLa dataset.
- CoNaLa: The Code/Natural Language Challenge dataset to retrieve program snippets relevant to user queries.
- Vector Database: in-memory vector database using Qdrant library.
- Embeddings: Sentence Transformer (all-MiniLM-L6-v2).
- prepare_data.ipynb: Notebook to view the data and perfrom simple Analysis of the Dataset.
- embeddings.ipynb: Contain the full code to create embedding using sentence-transformers, vector-database using qdrant and then retrieval based on cosine similarity.