[PIXELS-580] implement using LSH index to perform approximate nearest neighbour search on vector column #88

TiannanSha · 2024-02-18T09:45:44Z

(todo probably will merge the ExactNNS PR first. But the first ~10 commits of this PR now is about implementing ExactNNS)

Implement using LSH index to perform approximate nearest neighbour search on vector column. This consists of two functions for user to call:

to build LSH index distributedly using multiple nodes:
select build_lsh(vec_col, numBits), where numBits lets user specify the number of bits in the hashed value. There are 2^numBits number of possible buckets that vec_col is divided into.
to use the built LSH index using only one node but all the queries will be distributed evenly across all nodes:
lsh_search_single_node(input_vec, distFunc, test_schema.test_arr_table.arr_col, k)
e.g. lsh_search_single_node(array[3.5, 3.5], 'euc', 'test_schema.test_arr_table.arr_col', 3)

… an input vector

… support more than ten features

…similarity

… work on a multinode environment

…and persist the mapping when JVM shuts down

bianhq

Also resolve the conflicts.

bianhq · 2024-05-11T05:47:40Z

connector/pom.xml

+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>com.amazonaws</groupId>


We should not use AWS Java SDK 1.x.
AWS Java SDK 2 is already included in the dependency.

bianhq · 2024-05-11T05:55:11Z

connector/src/main/java/io/pixelsdb/pixels/trino/vector/lshnns/CachedLSHIndex.java

+import software.amazon.awssdk.services.s3.model.PutObjectRequest;
+
+public class CachedLSHIndex {
+


We should use the Storage API to access the underlying storage system such as S3, instead of directly using the S3 client.

TiannanSha added 23 commits January 16, 2024 15:27

implement a simple trino udf to calcualte the sum of all the elements…

f88dedb

… an input vector

use array instead of features because apparently trino features don't…

5a6d75c

… support more than ten features

implement three types of distances: euclidean, dotproduct and cosine …

df04bbf

…similarity

make select exactNNS() udf work in trino

2300020

fix a test

1767b44

fix a comment

d52a31e

minor polish

2201829

make pixels vector type support trino array type

420408a

implement a exact NNS that acts as an aggregation function and should…

abbbee7

… work on a multinode environment

clean up

4ca554d

pretty much finished LSH build; LSH search wip

5bc07ba

wip: lsh search

01a3903

lshSearch work in progress

19fedc8

implement LSH search, including updating mapping from col to buckets …

3e89598

…and persist the mapping when JVM shuts down

fix the ser and deser for LSH index

0fc8cd1

auto decide s3dir for storing LSH buckets. clean up

9150fab

implement code for experiments; before fixing remote page too large

dd9016d

accidentally got ignored entire folder of lsh build

b843296

fix lsh_build

fad3f5d

implement LSH load and adjust lsh search

9eaf254

fix lsh_search() so that it works with lsh_load()

09afb3f

minor changes on lsh_load()

f231da9

add bucket write threshold to lsh_build() to avoid sending big messages

bbbb793

bianhq changed the title ~~Implement using LSH index to perform approximate nearest neighbour search on vector column~~ [PIXELS-580] implement using LSH index to perform approximate nearest neighbour search on vector column May 11, 2024

bianhq requested changes May 11, 2024

View reviewed changes

bianhq assigned TiannanSha Oct 16, 2024

bianhq added the enhancement New feature or request label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PIXELS-580] implement using LSH index to perform approximate nearest neighbour search on vector column #88

[PIXELS-580] implement using LSH index to perform approximate nearest neighbour search on vector column #88

TiannanSha commented Feb 18, 2024 •

edited

Loading

bianhq left a comment

bianhq May 11, 2024

bianhq May 11, 2024

		import software.amazon.awssdk.services.s3.model.PutObjectRequest;

		public class CachedLSHIndex {

[PIXELS-580] implement using LSH index to perform approximate nearest neighbour search on vector column #88

Are you sure you want to change the base?

[PIXELS-580] implement using LSH index to perform approximate nearest neighbour search on vector column #88

Conversation

TiannanSha commented Feb 18, 2024 • edited Loading

bianhq left a comment

Choose a reason for hiding this comment

bianhq May 11, 2024

Choose a reason for hiding this comment

bianhq May 11, 2024

Choose a reason for hiding this comment

TiannanSha commented Feb 18, 2024 •

edited

Loading