-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sparsevec in weighted vector search function #328
base: main
Are you sure you want to change the base?
Conversation
Benchmarks
|
863cdc9
to
5e89295
Compare
5e89295
to
b689c2d
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This pull request adds support for sparsevec in weighted vector search functions, focusing on enhancing the functionality and flexibility of vector operations in the Lantern project. Here are the key points:
-
Introduced new overloads for weighted_vector_search functions to handle combinations of vector and sparsevec types in
sql/lantern.sql
andsql/updates/0.3.1--0.3.2.sql
. -
Added a new
small_world_sparsevec.sql
file with sample data for testing sparse vector functionality alongside dense vectors. -
Updated
test/sql/weighted_search.sql
to include conditional tests for sparsevec, ensuring compatibility with systems that may not have the type available. -
Modified
docker/Dockerfile.dev
to upgrade pgvector to version 0.6.1 and simplify the development environment setup process. -
Enhanced input validation, error handling, and vector type casting in the weighted vector search functions to improve robustness and flexibility.
6 file(s) reviewed, 24 comment(s)
Edit PR Review Bot Settings
@@ -31,7 +31,7 @@ RUN gem install pg -- --with-pg-include=/usr/local/pgsql/include/ --with-pg-lib= | |||
# hack to make sure postgres user has write access to externally mounted volumes | |||
RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared | |||
|
|||
RUN cd /root/postgresql-15.5/contrib && make install -j | |||
RUN cd /root/postgresql-15.5/contrib && make install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Removed '-j' flag from make install. This might slow down the build process.
@@ -55,7 +55,7 @@ COPY . . | |||
RUN sudo rm -rf build \ | |||
&& mkdir build \ | |||
&& cd build \ | |||
&& cmake -DCMAKE_BUILD_TYPE=Debug .. \ | |||
&& cmake .. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Removed '-DCMAKE_BUILD_TYPE=Debug'. This changes the build type from Debug to default (usually Release). Ensure this is intentional and won't affect development workflows.
@@ -65,7 +65,7 @@ RUN git clone https://github.com/lanterndata/benchmark build/benchmark \ | |||
&& pip install -r external/requirements.txt | |||
|
|||
# Install perf | |||
RUN sudo apt update && sudo apt install -y linux-tools-common linux-tools-generic linux-tools-`uname -r` | |||
RUN sudo apt update && sudo apt install -y linux-tools-common linux-tools-generic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Removed 'linux-tools-uname -r
'. This might cause issues if specific kernel version tools are needed.
docker/Dockerfile.dev
Outdated
@@ -1,5 +1,5 @@ | |||
ARG VERSION=15 | |||
ARG PGVECTOR_VERSION=0.5.1 | |||
ARG PGVECTOR_VERSION=0.6.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Upgraded pgvector from 0.5.1 to 0.6.1. Verify compatibility with the rest of the system.
@@ -667,6 +667,7 @@ CREATE OR REPLACE FUNCTION _lantern_internal.maybe_setup_weighted_vector_search( | |||
$weighted_vector_search$ | |||
DECLARE | |||
pgvector_exists boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Added check for pgvector sparsevec type existence
('000', TRUE, '[0,0,0]', '{}/3'), | ||
('001', TRUE, '[0,0,1]', '{3:1}/3'), | ||
('010', FALSE, '[0,1,0]' , '{2:1}/3'), | ||
('011', TRUE, '[0,1,1]', '{2:1,3:1}/3'), | ||
('100', FALSE, '[1,0,0]', '{1:1}/3'), | ||
('101', FALSE, '[1,0,1]', '{1:1,3:1}/3'), | ||
('110', FALSE, '[1,1,0]', '{1:1,2:1}/3'), | ||
('111', TRUE, '[1,1,1]', '{1:1,2:1,3:1}/3'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Inserted test data covering all 3-bit combinations, with corresponding sparse vectors
test/sql/weighted_search.sql
Outdated
-- Check if the type 'sparsevec' exists and store the result in a variable | ||
SELECT EXISTS ( | ||
SELECT 1 | ||
FROM pg_type | ||
WHERE typname = 'sparsevec' | ||
) AS exists_sparsevec \gset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Added check for 'sparsevec' type existence
test/sql/weighted_search.sql
Outdated
\if :exists_sparsevec | ||
\echo 'The sparsevec type exists. Running commands...' | ||
\ir utils/small_world_sparsevec.sql | ||
SELECT '{1:0.4,2:0.3,3:0.2}/3' AS s3 \gset | ||
SELECT '[-0.5,-0.1,-0.3]' AS v3 \gset | ||
SELECT | ||
id, | ||
0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as dist | ||
FROM lantern.weighted_vector_search(CAST(NULL as "small_world"), exact => false, ef => 5, | ||
w1=> 0.9, col1=>'s'::text, vec1=>:'s3'::sparsevec, | ||
w2=> 0.1, col2=>'v'::text, vec2=>:'v3'::vector | ||
); | ||
\else | ||
\echo 'The sparsevec type does not exist. Skipping commands...' | ||
\endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Introduced conditional execution for sparsevec tests
test/sql/weighted_search.sql
Outdated
SELECT | ||
id, | ||
0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as dist | ||
FROM lantern.weighted_vector_search(CAST(NULL as "small_world"), exact => false, ef => 5, | ||
w1=> 0.9, col1=>'s'::text, vec1=>:'s3'::sparsevec, | ||
w2=> 0.1, col2=>'v'::text, vec2=>:'v3'::vector | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: New test case for weighted vector search with sparsevec and vector types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something like this?
round(cast(0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as numeric), 2) as dist
On arm processors the high precision floats seems to slightly differ from x86 ones which makes tests to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed^
test/sql/weighted_search.sql
Outdated
SELECT '{1:0.4,2:0.3,3:0.2}/3' AS s3 \gset | ||
SELECT '[-0.5,-0.1,-0.3]' AS v3 \gset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider using more descriptive variable names than 's3' and 'v3'
test/sql/weighted_search.sql
Outdated
@@ -121,6 +120,30 @@ SELECT count(*) | |||
w3=> 0.52, col3=>'v_real'::text, vec3=>:'v444'::vector | |||
); | |||
|
|||
-- Check if the type 'sparsevec' exists and store the result in a variable | |||
SELECT EXISTS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we follow the current pattern and not do this in the test itself?
Maybe, see how we filter out pgvector or lantern extras tests when relevant extensions are not installed and do something similar here
@@ -31,7 +31,7 @@ RUN gem install pg -- --with-pg-include=/usr/local/pgsql/include/ --with-pg-lib= | |||
# hack to make sure postgres user has write access to externally mounted volumes | |||
RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared | |||
|
|||
RUN cd /root/postgresql-15.5/contrib && make install -j | |||
RUN cd /root/postgresql-15.5/contrib && make install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
440.5 /usr/bin/install -c -m 755 postgres_fdw.so '/usr/local/pgsql/lib/postgres_fdw.so'
440.5 /usr/bin/install -c -m 644 ./postgres_fdw.control '/usr/local/pgsql/share/extension/'
440.6 /usr/bin/install -c -m 644 ./postgres_fdw--1.0.sql ./postgres_fdw--1.0--1.1.sql '/usr/local/pgsql/share/extension/'
440.6 make[1]: Leaving directory '/root/postgresql-15.5/contrib/postgres_fdw'
------
Dockerfile.dev:34
--------------------
32 | RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared
33 |
34 | >>> RUN cd /root/postgresql-15.5/contrib && make install -j
35 |
36 | # allow non-root users to install in the container to make it easier to run update-tests
--------------------
ERROR: failed to solve: process "/bin/sh -c cd /root/postgresql-15.5/contrib && make install -j" did not complete successfully: exit code: 2
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/ojyc3i59rx9nve3a7qeqpdid7
diqi@Dis-Laptop lantern %
The error I get with -j
@@ -55,7 +55,7 @@ COPY . . | |||
RUN sudo rm -rf build \ | |||
&& mkdir build \ | |||
&& cd build \ | |||
&& cmake -DCMAKE_BUILD_TYPE=Debug .. \ | |||
&& cmake .. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
The dockerfile is meant for development and I often use GDB from inside it, so having a build with symbols is often good.
Though, I usually attach a folder from host and rebuild the DB for debugging so not a big deal but would still like to know why the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build did not succeed for me without it
4c52324
to
6d6c72d
Compare
6d6c72d
to
17bc1f5
Compare
No description provided.