Capacity Planning while using GPU based Indices #37632

rohitreddy1698 · 2024-11-13T02:49:44Z

rohitreddy1698
Nov 13, 2024

Hello,

I am trying to compare the performances of GPU vs CPU based indices in Milvus and have Milvus setup on GKE for the same.
I have deployed Milvus using the Milvus Operator.
I am trying to do the benchmarking using the Zilliz Vector DB Bench tool : https://github.com/zilliztech/VectorDBBench
I have
4 : e2-highmen-16 nodes ( for other components )
30 : n1-highmen-8 nodes ( 25 for queyrnode and 5 for indexnode ) with 2 T4 GPUs ( 16 GiB memory ) per node

The data set I am using is 100M Laion dataset with 768 vector dimension. I am trying to test pure Search performance without any fiters.

indexNode:
      replicas: 5
      resources:
        limits:
          cpu: "6"
          nvidia.com/gpu: "2"
          memory: 48Gi
        requests:
          cpu: "3"
          nvidia.com/gpu: "2"
          memory: 24Gi

queryNode:
      replicas: 25
      resources:
        limits:
          cpu: "6"
          nvidia.com/gpu: "2"
          memory: 48Gi
        requests:
          cpu: "3"
          nvidia.com/gpu: "2"
          memory: 24Gi


➜  VectorDBBench git:(main) ✗ gsutil du -sh gs://isds-genai-milvus-on-gpu-gcp-dev-ad55084/
365.33 GiB   gs://milvus-on-gpu-gcp-bucket

I have a total of 25 * 2 * 16 = 800 GiB of GPU memory , which should be sufficient for GPU_IVF_FLAT and IVF_FLAT , but I am getting the following failed to deserialise error :

Index Type: IndexType.GPU_IVF_FLAT
2024-11-11 23:31:33,519 | WARNING: Failed to run performance case, reason = <MilvusException: (code=2001, message=show collection failed: At LoadSegment:  => failed to Deserialize index: raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:192
)> (task_runner.py:194) (81)
Traceback (most recent call last):
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 166, in _run_perf_case
    build_dur = self._optimize()
                ^^^^^^^^^^^^^^^^
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 260, in _optimize
    raise e from None
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 252, in _optimize
    return future.result(timeout=self.ca.optimize_timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
pymilvus.exceptions.MilvusException: <MilvusException: (code=2001, message=show collection failed: At LoadSegment:  => failed to Deserialize index: raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:192
)>
2024-11-11 23:31:33,521 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'Milvus'} failed to run, reason=<MilvusException: (code=2001, message=show collection failed: At LoadSegment:  => failed to Deserialize index: raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:192
)> (interface.py:184) (81)
Traceback (most recent call last):
  File "/opt/code/vectordb_bench/interface.py", line 165, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 107, in run
    return self._run_perf_case(drop_old)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 196, in _run_perf_case
    raise e from None
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 166, in _run_perf_case
    build_dur = self._optimize()
                ^^^^^^^^^^^^^^^^
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 260, in _optimize
    raise e from None
  File "/opt/code/vectordb_bench/backend/task_runner.py", line 252, in _optimize
    return future.result(timeout=self.ca.optimize_timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
pymilvus.exceptions.MilvusException: <MilvusException: (code=2001, message=show collection failed: At LoadSegment:  => failed to Deserialize index: raft inner error at /workspace/source/internal/core/src/index/VectorMemIndex.cpp:192
)>

Please help me understand the sizes taken up by the GPU indices and help me get over this issue.

rohitreddy1698 · 2024-11-13T02:50:15Z

rohitreddy1698
Nov 13, 2024
Author

I am using the milvus-2.4.9-gpu version of the docker image.

5 replies

liliu-z Nov 13, 2024
Maintainer

Did you meet uneven distribution upon nodes? Try to use nvidia-smi to grab more info

rohitreddy1698 Nov 13, 2024
Author

Did you meet uneven distribution upon nodes? - is this a controllable parameter.
From my understanding Milvus tries to eliminate skewness by hosting new segments on query nodes which have more resources ?
Try to use nvidia-smi to grab more info - This might not be efficient because I have 25 query nodes and the nvidia-smi command gives point in time usage as per my understanding - please do correct me if I am wrong here

I rather have dcgm exporter installed and built grafana dashboards from : https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard
Do you think these metrics will help ?

Presburger Nov 18, 2024

@rohitreddy1698 Currently, the GPU version of Milvus does not perform memory resource load balancing across multiple Query Node nodes; it only has a load balancing mechanism for multiple GPUs on a single node. I will add this feature within the next couple of days. We sincerely apologize for any inconvenience caused.

rohitreddy1698 Nov 18, 2024
Author

@Presburger , thanks a lot for your response.

Milvus does not perform memory resource load balancing across multiple Query Node nodes - I have made similar observations via the grafana metrics I have setup.
I have 28 query nodes out of which the :

peak utilisation was 14.9 GiB ( 100 % utilisation )
least utilisation was 9.57 GiB ( 78.2 % utilisation )

it only has a load balancing mechanism for multiple GPUs on a single node - This should explain why we were able to successfully benchmarking for 10 M records with 2 GPU per node as opposed to 1 GPU per node failing with similar issues.

I will add this feature within the next couple of days - Please do let me know if I can help there, I would be really interested to do so.

Regards,
Rohit Mothe

rohitreddy1698 Nov 27, 2024
Author

@Presburger

Has there been any development in this area ?
Please do let me know if we can help anything here.

Regards,
Rohit Mothe

Presburger · 2024-12-03T12:03:19Z

Presburger
Dec 3, 2024

@rohitreddy1698 I am currently working on implementing this feature, and if everything goes smoothly, I should be able to finish it by next week. Thank you for your patience.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capacity Planning while using GPU based Indices #37632

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Capacity Planning while using GPU based Indices #37632

rohitreddy1698 Nov 13, 2024

Replies: 2 comments · 5 replies

rohitreddy1698 Nov 13, 2024 Author

liliu-z Nov 13, 2024 Maintainer

rohitreddy1698 Nov 13, 2024 Author

Presburger Nov 18, 2024

rohitreddy1698 Nov 18, 2024 Author

rohitreddy1698 Nov 27, 2024 Author

Presburger Dec 3, 2024

rohitreddy1698
Nov 13, 2024

Replies: 2 comments 5 replies

rohitreddy1698
Nov 13, 2024
Author

liliu-z Nov 13, 2024
Maintainer

rohitreddy1698 Nov 13, 2024
Author

rohitreddy1698 Nov 18, 2024
Author

rohitreddy1698 Nov 27, 2024
Author

Presburger
Dec 3, 2024