db-benchmark
├── README.md
├── compose.yaml <- compose file for docker containers
├── data <- data folder containing .dvc files
│ └── Books.json.gz.dvc
├── dbs <- dbs that implement helpers.db_connector class
│ ├── chromadb.py
│ ├── milvus.py
│ ├── qdrant.py
│ ├── vespadb.py
│ └── pgvector.py
├── helpers <- helper classes for data loading and db interaction
│ ├── data_processor.py
│ └── dp_connector.py
├── dvc.yaml <- dvc pipeline
├── main.py <- main file
├── params.yaml <- dvc experiment parameters
└── requirements.txt
- docker
- docker-compose
The first step is to create your environment.
conda create --name <env-name> python=3.11
The necessary modules for this python project can be installed with the given requirements.txt file.
conda activate <env-name>
pip install -r requirements.txt
If you followed the steps before the dvc
command is now available. Run the following
command in the git root directory to download the data:
dvc update -R data/
The benchmarking process involves several steps managed by DVC:
-
Running the DVC Pipeline:
- Execute the main DVC pipeline which orchestrates the data processing and
benchmarking tasks with
dvc exp run
.
- Execute the main DVC pipeline which orchestrates the data processing and
benchmarking tasks with
-
Reviewing Results:
- Use the
dvc exp show
command to visualize and analyze the results of the experiments conducted as part of the benchmark.
- Use the
We found the following results (times in miliseconds):
insert | query | remove | |
---|---|---|---|
chromadb | 7.3517 | 5.8963 | 6.2306 |
pgvector | 7.0311 | 2.7662 | 0.30817 |
vespa | 8.7366 | 11.556 | 5.6302 |
milvus | 3.8154 | 2.0659 | 2.9517 |
qdrant | 6.9746 | 2.1554 | 6.8914 |