Skip to content

Commit

Permalink
Add Weaviate integration (georgia-tech-db#1360)
Browse files Browse the repository at this point in the history
Add Weaviate integration, features include:
1. Initiate the required environment and connect to Weaviate vector
database;
2. Create a class;
3. Delete a class;
4. Add data;
5. Make similarity-based queries.

---------

Co-authored-by: Andy Xu <xzdandy@gmail.com>
  • Loading branch information
hunteritself and xzdandy authored Nov 21, 2023
1 parent a323af3 commit 0268df2
Show file tree
Hide file tree
Showing 16 changed files with 248 additions and 3 deletions.
1 change: 1 addition & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ parts:
- file: source/reference/vector_databases/pgvector
- file: source/reference/vector_databases/pinecone
- file: source/reference/vector_databases/milvus
- file: source/reference/vector_databases/weaviate

- file: source/reference/ai/index
title: AI Engines
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/databases/hackernews.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Required:

Optional:

* ``tags`` is the tag used for filtering the query results. Check `available tags <https://hn.algolia.com/api#:~:text=filter%20on%20a%20specific%20tag.%20Available%20tags%3A>`_ to see a list of available filter tags.
* ``tags`` is the tag used for filtering the query results. Check `available tags <https://hn.algolia.com/api>`_ to see a list of available filter tags.

Create Connection
-----------------
Expand Down
31 changes: 31 additions & 0 deletions docs/source/reference/vector_databases/weaviate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Weaviate
==========

Weaviate is an open-source vector database designed for scalability and rich querying capabilities. It allows for semantic search, automated vectorization, and supports large language model (LLM) integration.
The connection to Weaviate is based on the `weaviate-client <https://weaviate.io/developers/weaviate/client-libraries/python>`_ library.

Dependency
----------

* weaviate-client

Parameters
----------

To use Weaviate, you need an API key and a URL of your Weaviate instance. Here are the `instructions for setting up a Weaviate instance <https://weaviate.io/developers/weaviate/quickstart>`_. After setting up your instance, you will find the API key and URL on the Details tab in Weaviate Cloud Services (WCS) dashboard. These details are essential for establishing a connection to the Weaviate server.

* `WEAVIATE_API_KEY` is the API key for your Weaviate instance.
* `WEAVIATE_API_URL` is the URL of your Weaviate instance.

The above values can either be set via the ``SET`` statement, or in the os environment fields "WEAVIATE_API_KEY", "WEAVIATE_API_URL"

Create Collection
-----------------

Weaviate uses collections (similar to 'classes') to store data. To create a collection in Weaviate, use the following SQL command in EvaDB:

.. code-block:: sql
CREATE INDEX collection_name ON table_name (data) USING WEAVIATE;
This command creates a collection in Weaviate with the specified name, linked to the table in EvaDB. You can also specify vectorizer settings and other configurations for the collection as needed.
1 change: 1 addition & 0 deletions evadb/catalog/catalog_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ class VectorStoreType(EvaDBEnum):
PINECONE # noqa: F821
PGVECTOR # noqa: F821
CHROMADB # noqa: F821
WEAVIATE # noqa: F821
MILVUS # noqa: F821


Expand Down
2 changes: 2 additions & 0 deletions evadb/evadb_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,6 @@
"MILVUS_PASSWORD": "",
"MILVUS_DB_NAME": "",
"MILVUS_TOKEN": "",
"WEAVIATE_API_KEY": "",
"WEAVIATE_API_URL": "",
}
11 changes: 11 additions & 0 deletions evadb/executor/executor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,17 @@ def handle_vector_store_params(
),
"PINECONE_ENV": catalog().get_configuration_catalog_value("PINECONE_ENV"),
}
elif vector_store_type == VectorStoreType.WEAVIATE:
# Weaviate Configuration
# Weaviate API key and URL Can be obtained from cluster details on Weaviate Cloud Services (WCS) dashboard
return {
"WEAVIATE_API_KEY": catalog().get_configuration_catalog_value(
"WEAVIATE_API_KEY"
),
"WEAVIATE_API_URL": catalog().get_configuration_catalog_value(
"WEAVIATE_API_URL"
),
}
elif vector_store_type == VectorStoreType.MILVUS:
return {
"MILVUS_URI": catalog().get_configuration_catalog_value("MILVUS_URI"),
Expand Down
3 changes: 2 additions & 1 deletion evadb/interfaces/relational/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,8 @@ def create_vector_index(
index_name (str): Name of the index.
table_name (str): Name of the table.
expr (str): Expression used to build the vector index.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB` or `MILVUS`.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB` or `WEAVIATE` or `MILVUS`.
Returns:
EvaDBCursor: The EvaDBCursor object.
Expand Down
3 changes: 2 additions & 1 deletion evadb/parser/evadb.lark
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ function_metadata_key: uid

function_metadata_value: constant

vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB | MILVUS)
vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB | WEAVIATE | MILVUS)

index_elem: ("(" uid_list ")"
| "(" function_call ")")
Expand Down Expand Up @@ -448,6 +448,7 @@ QDRANT: "QDRANT"i
PINECONE: "PINECONE"i
PGVECTOR: "PGVECTOR"i
CHROMADB: "CHROMADB"i
WEAVIATE: "WEAVIATE"i
MILVUS: "MILVUS"i

// Computer vision tasks
Expand Down
2 changes: 2 additions & 0 deletions evadb/parser/lark_visitor/_create_statements.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,8 @@ def vector_store_type(self, tree):
vector_store_type = VectorStoreType.PGVECTOR
elif str.upper(token) == "CHROMADB":
vector_store_type = VectorStoreType.CHROMADB
elif str.upper(token) == "WEAVIATE":
vector_store_type = VectorStoreType.WEAVIATE
elif str.upper(token) == "MILVUS":
vector_store_type = VectorStoreType.MILVUS
return vector_store_type
Expand Down
8 changes: 8 additions & 0 deletions evadb/third_party/vector_stores/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from evadb.third_party.vector_stores.milvus import MilvusVectorStore
from evadb.third_party.vector_stores.pinecone import PineconeVectorStore
from evadb.third_party.vector_stores.qdrant import QdrantVectorStore
from evadb.third_party.vector_stores.weaviate import WeaviateVectorStore
from evadb.utils.generic_utils import validate_kwargs


Expand Down Expand Up @@ -51,6 +52,12 @@ def init_vector_store(
validate_kwargs(kwargs, required_params, required_params)
return ChromaDBVectorStore(index_name, **kwargs)

elif vector_store_type == VectorStoreType.WEAVIATE:
from evadb.third_party.vector_stores.weaviate import required_params

validate_kwargs(kwargs, required_params, required_params)
return WeaviateVectorStore(index_name, **kwargs)

elif vector_store_type == VectorStoreType.MILVUS:
from evadb.third_party.vector_stores.milvus import (
allowed_params,
Expand All @@ -59,5 +66,6 @@ def init_vector_store(

validate_kwargs(kwargs, allowed_params, required_params)
return MilvusVectorStore(index_name, **kwargs)

else:
raise Exception(f"Vector store {vector_store_type} not supported")
115 changes: 115 additions & 0 deletions evadb/third_party/vector_stores/weaviate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# coding=utf-8
# Copyright 2018-2023 EvaDB
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import List

from evadb.third_party.vector_stores.types import (
FeaturePayload,
VectorIndexQuery,
VectorIndexQueryResult,
VectorStore,
)
from evadb.utils.generic_utils import try_to_import_weaviate_client

required_params = []
_weaviate_init_done = False


class WeaviateVectorStore(VectorStore):
def __init__(self, collection_name: str, **kwargs) -> None:
try_to_import_weaviate_client()
global _weaviate_init_done

self._collection_name = collection_name

# Get the API key.
self._api_key = kwargs.get("WEAVIATE_API_KEY")

if not self._api_key:
self._api_key = os.environ.get("WEAVIATE_API_KEY")

assert (
self._api_key
), "Please set your `WEAVIATE_API_KEY` using set command or environment variable (WEAVIATE_API_KEY). It can be found at the Details tab in WCS Dashboard."

# Get the API Url.
self._api_url = kwargs.get("WEAVIATE_API_URL")

if not self._api_url:
self._api_url = os.environ.get("WEAVIATE_API_URL")

assert (
self._api_url
), "Please set your `WEAVIATE_API_URL` using set command or environment variable (WEAVIATE_API_URL). It can be found at the Details tab in WCS Dashboard."

if not _weaviate_init_done:
# Initialize weaviate client
import weaviate

client = weaviate.Client(
url=self._api_url,
auth_client_secret=weaviate.AuthApiKey(api_key=self._api_key),
)
client.schema.get()

_weaviate_init_done = True

self._client = client

def create(
self,
vectorizer: str = "text2vec-openai",
properties: list = None,
module_config: dict = None,
):
properties = properties or []
module_config = module_config or {}

collection_obj = {
"class": self._collection_name,
"properties": properties,
"vectorizer": vectorizer,
"moduleConfig": module_config,
}

if self._client.schema.exists(self._collection_name):
self._client.schema.delete_class(self._collection_name)

self._client.schema.create_class(collection_obj)

def add(self, payload: List[FeaturePayload]) -> None:
with self._client.batch as batch:
for item in payload:
data_object = {"id": item.id, "vector": item.embedding}
batch.add_data_object(data_object, self._collection_name)

def delete(self) -> None:
self._client.schema.delete_class(self._collection_name)

def query(self, query: VectorIndexQuery) -> VectorIndexQueryResult:
response = (
self._client.query.get(self._collection_name, ["*"])
.with_near_vector({"vector": query.embedding})
.with_limit(query.top_k)
.do()
)

data = response.get("data", {})
results = data.get("Get", {}).get(self._collection_name, [])

similarities = [item["_additional"]["distance"] for item in results]
ids = [item["id"] for item in results]

return VectorIndexQueryResult(similarities, ids)
18 changes: 18 additions & 0 deletions evadb/utils/generic_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,16 @@ def try_to_import_chromadb_client():
)


def try_to_import_weaviate_client():
try:
import weaviate # noqa: F401
except ImportError:
raise ValueError(
"""Could not import weaviate python package.
Please install it with 'pip install weaviate-client`."""
)


def try_to_import_milvus_client():
try:
import pymilvus # noqa: F401
Expand Down Expand Up @@ -607,6 +617,14 @@ def is_chromadb_available() -> bool:
return False


def is_weaviate_available() -> bool:
try:
try_to_import_weaviate_client()
return True
except ValueError: # noqa: E722
return False


def is_milvus_available() -> bool:
try:
try_to_import_milvus_client()
Expand Down
5 changes: 5 additions & 0 deletions script/formatting/spelling.txt
Original file line number Diff line number Diff line change
Expand Up @@ -975,10 +975,13 @@ VideoFormat
VideoStorageEngineTest
VideoWriter
VisionEncoderDecoderModel
WEAVIATE
WH
WIP
WMV
WeakValueDictionary
Weaviate
WeaviateVectorStore
XGBoost
XdistTests
Xeon
Expand Down Expand Up @@ -1731,6 +1734,7 @@ testRayErrorHandling
testSimilarityFeatureTable
testSimilarityImageDataset
testSimilarityTable
testWeaviateIndexImageDataset
testcase
testcases
testdeleteone
Expand Down Expand Up @@ -1814,6 +1818,7 @@ wal
warmup
wb
weakref
weaviate
westbrae
wget
whitespaces
Expand Down
4 changes: 4 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,11 @@ def read(path, encoding="utf-8"):

chromadb_libs = ["chromadb"]

weaviate_libs = ["weaviate-client"]

milvus_libs = ["pymilvus>=2.3.0"]


postgres_libs = [
"psycopg2",
]
Expand Down Expand Up @@ -173,6 +176,7 @@ def read(path, encoding="utf-8"):
"pinecone": pinecone_libs,
"chromadb": chromadb_libs,
"milvus": milvus_libs,
"weaviate": weaviate_libs,
"postgres": postgres_libs,
"ludwig": ludwig_libs,
"sklearn": sklearn_libs,
Expand Down
Loading

0 comments on commit 0268df2

Please sign in to comment.