Short for k-nearest neighbors, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. See documentation for more information.
In the following example we create a 5-dimensional k-NN index with random data. You can find a synchronous version of this working sample in samples/knn/knn-basics.py and an asynchronous one in samples/knn/knn-async-basics.py.
$ poetry run python knn/knn-basics.py
Searching for [0.61, 0.05, 0.16, 0.75, 0.49] ...
{'_index': 'my-index', '_id': '3', '_score': 0.9252405, '_source': {'values': [0.64, 0.3, 0.27, 0.68, 0.51]}}
{'_index': 'my-index', '_id': '4', '_score': 0.802375, '_source': {'values': [0.49, 0.39, 0.21, 0.42, 0.42]}}
{'_index': 'my-index', '_id': '8', '_score': 0.7826564, '_source': {'values': [0.33, 0.33, 0.42, 0.97, 0.56]}}
dimensions = 5
client.indices.create(index_name,
body={
"settings":{
"index.knn": True
},
"mappings":{
"properties": {
"values": {
"type": "knn_vector",
"dimension": dimensions
},
}
}
}
)
Create 10 random vectors and insert them using the bulk API.
vectors = []
for i in range(10):
vec = []
for j in range(dimensions):
vec.append(round(random.uniform(0, 1), 2))
vectors.append({
"_index": index_name,
"_id": i,
"values": vec,
})
helpers.bulk(client, vectors)
client.indices.refresh(index=index_name)
Create a random vector of the same size and search for its nearest neighbors.
vec = []
for j in range(dimensions):
vec.append(round(random.uniform(0, 1), 2))
search_query = {
"query": {
"knn": {
"values": {
"vector": vec,
"k": 3
}
}
}
}
results = client.search(index=index_name, body=search_query)
for hit in results["hits"]["hits"]:
print(hit)
In the boolean-filter.py sample we create a 5-dimensional k-NN index with random data and a metadata
field that contains a book genre (e.g. fiction
). The search query is a k-NN search filtered by genre. The filter clause is outside the k-NN query clause and is applied after the k-NN search.
$ poetry run python knn/knn-boolean-filter.py
Searching for [0.08, 0.42, 0.04, 0.76, 0.41] with the 'romance' genre ...
{'_index': 'my-index', '_id': '445', '_score': 0.95886475, '_source': {'values': [0.2, 0.54, 0.08, 0.87, 0.43], 'metadata': {'genre': 'romance'}}}
{'_index': 'my-index', '_id': '2816', '_score': 0.95256233, '_source': {'values': [0.22, 0.36, 0.01, 0.75, 0.57], 'metadata': {'genre': 'romance'}}}
In the lucene-filter.py sample we implement the example in the k-NN documentation, which creates an index that uses the Lucene engine and HNSW as the method in the mapping, containing hotel location and parking data, then search for the top three hotels near the location with the coordinates [5, 4]
that are rated between 8 and 10, inclusive, and provide parking.
$ poetry run python knn/knn-efficient-filter.py
{'_index': 'hotels-index', '_id': '3', '_score': 0.72992706, '_source': {'location': [4.9, 3.4], 'parking': 'true', 'rating': 9}}
{'_index': 'hotels-index', '_id': '6', '_score': 0.3012048, '_source': {'location': [6.4, 3.4], 'parking': 'true', 'rating': 9}}
{'_index': 'hotels-index', '_id': '5', '_score': 0.24154587, '_source': {'location': [3.3, 4.5], 'parking': 'true', 'rating': 8}}