LintDB

LintDB is a multi-vector database meant for Gen AI. LintDB natively supports late interaction like ColBERT and PLAID.

Key Features

Multi vector support: LintDB stores multiple vectors per document id and calculates the max similarity across vectors to determine relevance.
Bit-level Compression: LintDB fully implements PLAID's bit compression, storing 128 dimension embeddings in as low as 16 bytes.
Embedded: LintDB can be embedded directly into your Python application. No need to setup a separate database.
Full Support for PLAID and ColBERT: LintDB is built around PLAID and ColBERT.
Filtering: LintDB supports filtering on any field in the schema.

Installation

LintDB relies on OpenBLAS for accerlated matrix multiplication. To smooth the process of installation, we only support conda.

conda install lintdb -c deployql -c conda-forge

Usage

LintDB makes it easy to upload data, even if you have multiple tenants.

Below shows creating a database. LintDB defines a schema for a given database that can be used to index embeddings, floats, strings, even dates. Fields can be indexed, stored, or used as a filter.

from lintdb.core import (
  Schema,
  ColbertField,
  QuantizerType,
  Configuration,
  IndexIVF
)

schema = Schema(
  [
    ColbertField('colbert', DataType.TENSOR, {
      'dimensions': 128,
      'quantization': QuantizerType.BINARIZER,
      "num_centroids": 32768,
      "num_iterations": 10,
    })
  ]
)
config = Configuration()
index = IndexIVF(index_path, schema, config)
)

And querying the database. We can query any of the data fields we indexed.

from lintdb.core import (
Query,
VectorQueryNode
)
for id, query in zip(data.qids, data.queries):
  embedding = checkpoint.queryFromText(query)
e = np.squeeze(embedding.cpu().numpy().astype('float32'))

query = Query(
  VectorQueryNode(
    TensorFieldValue('colbert', e)
  )
)
results = index.search(0, query, 10)
print(results)

Late Interaction Model Support

LintDB aims to support late interaction and more advanced retrieval models.

ColBERTv2 with PLAID
XTR

Roadmap

LintDB aims to be a retrieval platform for Gen AI. We believe that to do this, we must support flexible retrieval and scoring methods while maintaining a high level of performance.

Improving performance and scalability
Improved benchmarks
Support CITADEL for scalable late interaction
Support learnable query adapters in the retrieval pipeline
Enhance support for arbitrary retrieval and ranking functions
Support learnable ranking functions

Comparison with other Vector Databases

LintDB is one of two databases that support token level embeddings. The other being Vespa.

Token Level Embeddings

Vespa

Vespa is a robust, mature search engine with many features. However, the learning curve to get started and operate Vespa is high. With embedded LintDB, there's no setup required. conda install lintdb -c deployql and get started.

Embedded

Chroma

Chroma is an embedded vector database available in Python and Javascript. LintDB currently only supports Python.

However, unlike Chroma, LintDB offers multi-tenancy support.

Documentation

For detailed documentation on using LintDB, refer to the official documentation

License

LintDB is licensed under the Apache 2.0 License. See the LICENSE file for details.

We want to offer a managed service

We need your help! If you'd want a managed LintDB, reach out and let us know.

Book time on the founder's calendar: https://calendar.app.google/fsymSzTVT8sip9XX6

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github		.github
benchmarks		benchmarks
cmake		cmake
conda		conda
docker		docker
docs		docs
lintdb		lintdb
ports		ports
tests		tests
third_party		third_party
tools		tools
.bumpversion.cfg		.bumpversion.cfg
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
icon.svg		icon.svg
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
vcpkg-configuration.json		vcpkg-configuration.json
vcpkg.json		vcpkg.json
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LintDB

Key Features

Installation

Usage

Late Interaction Model Support

Roadmap

Comparison with other Vector Databases

Token Level Embeddings

Vespa

Embedded

Chroma

Documentation

License

We want to offer a managed service

About

Releases 7

Languages

License

DeployQL/LintDB

Folders and files

Latest commit

History

Repository files navigation

LintDB

Key Features

Installation

Usage

Late Interaction Model Support

Roadmap

Comparison with other Vector Databases

Token Level Embeddings

Vespa

Embedded

Chroma

Documentation

License

We want to offer a managed service

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Languages