You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To make LLM faster we need faster retrieval system. Here comes Embedding Quantization. Embedding quantization is great technique to save cost on Vector DB, significantly faster retrieval while preserving retrieval performance.
Unofficial Implementation of Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval and Evaluation of RAG system using "SEMALEX" evaluation metric .