Skip to content

Latest commit

 

History

History
117 lines (89 loc) · 4.4 KB

knn.md

File metadata and controls

117 lines (89 loc) · 4.4 KB

k-NN Plugin

Short for k-nearest neighbors, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. See documentation for more information.

Basic Approximate k-NN

In the following example we create a 5-dimensional k-NN index with random data. You can find a synchronous version of this working sample in samples/knn/knn-basics.py and an asynchronous one in samples/knn/knn-async-basics.py.

$ poetry run python knn/knn-basics.py

Searching for [0.61, 0.05, 0.16, 0.75, 0.49] ...
{'_index': 'my-index', '_id': '3', '_score': 0.9252405, '_source': {'values': [0.64, 0.3, 0.27, 0.68, 0.51]}}
{'_index': 'my-index', '_id': '4', '_score': 0.802375, '_source': {'values': [0.49, 0.39, 0.21, 0.42, 0.42]}}
{'_index': 'my-index', '_id': '8', '_score': 0.7826564, '_source': {'values': [0.33, 0.33, 0.42, 0.97, 0.56]}}

Create an Index

dimensions = 5
client.indices.create(index_name, 
    body={
        "settings":{
            "index.knn": True
        },
        "mappings":{
            "properties": {
                "values": {
                    "type": "knn_vector", 
                    "dimension": dimensions
                },
            }
        }
    }
)

Index Vectors

Create 10 random vectors and insert them using the bulk API.

vectors = []
for i in range(10):
    vec = []
    for j in range(dimensions): 
        vec.append(round(random.uniform(0, 1), 2)) 
  
    vectors.append({
        "_index": index_name,
        "_id": i,
        "values": vec,
    })

helpers.bulk(client, vectors)

client.indices.refresh(index=index_name)

Search for Nearest Neighbors

Create a random vector of the same size and search for its nearest neighbors.

vec = []
for j in range(dimensions): 
    vec.append(round(random.uniform(0, 1), 2)) 

search_query = {
    "query": {
        "knn": {
            "values": {
                "vector": vec, 
                "k": 3
            }
        }
    }
}

results = client.search(index=index_name, body=search_query)
for hit in results["hits"]["hits"]:
    print(hit)

Approximate k-NN with a Boolean Filter

In the boolean-filter.py sample we create a 5-dimensional k-NN index with random data and a metadata field that contains a book genre (e.g. fiction). The search query is a k-NN search filtered by genre. The filter clause is outside the k-NN query clause and is applied after the k-NN search.

$ poetry run python knn/knn-boolean-filter.py 

Searching for [0.08, 0.42, 0.04, 0.76, 0.41] with the 'romance' genre ...

{'_index': 'my-index', '_id': '445', '_score': 0.95886475, '_source': {'values': [0.2, 0.54, 0.08, 0.87, 0.43], 'metadata': {'genre': 'romance'}}}
{'_index': 'my-index', '_id': '2816', '_score': 0.95256233, '_source': {'values': [0.22, 0.36, 0.01, 0.75, 0.57], 'metadata': {'genre': 'romance'}}}

Approximate k-NN with an Efficient Filter

In the lucene-filter.py sample we implement the example in the k-NN documentation, which creates an index that uses the Lucene engine and HNSW as the method in the mapping, containing hotel location and parking data, then search for the top three hotels near the location with the coordinates [5, 4] that are rated between 8 and 10, inclusive, and provide parking.

$ poetry run python knn/knn-efficient-filter.py

{'_index': 'hotels-index', '_id': '3', '_score': 0.72992706, '_source': {'location': [4.9, 3.4], 'parking': 'true', 'rating': 9}}
{'_index': 'hotels-index', '_id': '6', '_score': 0.3012048, '_source': {'location': [6.4, 3.4], 'parking': 'true', 'rating': 9}}
{'_index': 'hotels-index', '_id': '5', '_score': 0.24154587, '_source': {'location': [3.3, 4.5], 'parking': 'true', 'rating': 8}}