How to increase the Milvus Serving QPS? #38060

xiaobingxia-at · 2024-11-28T00:57:12Z

xiaobingxia-at
Nov 28, 2024

I have a Milvus distributed cluster (2.4.17), the data is using partition key, and clustering key. There is ~30m 1536 dimensional vectors. The collection is using HNSW + MMAP + AWS GP3 EBS disk. (5000 IOPS, 250mb/s throughput)

I launched 16 threads to continuously send top100 searches to this collection, using partition key and clustering key as the filtering condition, I got following results:

Query latency p50:              7.303953170776367   ms, p90: 10.113954544067383  ms, p99: 14.021158218383789  ms.
Query rate: 1245.9399802145347 queries/second

I'm struggling to keep increasing the query rate and want to know what's the bottleneck.
The cluster has only 1 query node (15 Cores, 120 GB Mem + 1TB GP3 EBS Volume disk)
Per dashboard, the query node CPU utilization is around 20-40%, the proxy node CPU utilization is ~15%. So seems like nothing is throttled.
Tried to increase the number of query nodes and proxy nodes. But it doesn't help increase the QPS.
May I know what should I do to increase the QPS to >10k?

xiaobingxia-at · 2024-11-28T00:58:46Z

xiaobingxia-at
Nov 28, 2024
Author

I guess the bottleneck is at the EBS disk, 5000 IOPS, 250MB/s throughput?

3 replies

xiaofan-luan Nov 28, 2024
Maintainer

Use Local disk instead of EBS or use IO2 tier EBS instead of GP3 is the key here

xiaofan-luan Nov 28, 2024
Maintainer

the other way is to leverage multiple replica to improve the QPS

xiaobingxia-at Dec 1, 2024
Author

I tried to use the nvme local ssd disk, the final QPS still comes to 1400+ and can't go up any more. Right now I guess it is because of the limit of "aio-max-nr" config.

yhmo · 2024-11-28T02:36:47Z

yhmo
Nov 28, 2024
Collaborator

You can try increasing the queryNode.grouping.maxNQ value in the milvus.yaml:

queryNode:
  grouping:
    enabled: true
    maxNQ: 1000
    topKMergeRatio: 20

This configuration controls the requests merging behavior of query nodes. It merges small requests into one request to execute to improve the throughput, but each request latency also increases.
The default value is 1000. Assume there are 1000 search requests(nq=1 per request) sent to the server at the same time, they will be merged into a single request with nq=1000 to compute. The CPU will be fully used with high nq value. Let's say a request with nq=1 will take 5ms, if the 1000 requests are executed one by one, the total latency is 1000 * 5ms = 5 seconds. Now we merge them into a single request(nq=1000), the total latency is less than 5 seconds, maybe 2 ~ 3 seconds. QPS is increased but the latency of each request also increases.

3 replies

xiaofan-luan Nov 28, 2024
Maintainer

You can try increasing the queryNode.grouping.maxNQ value in the milvus.yaml:
queryNode:
  grouping:
    enabled: true
    maxNQ: 1000
    topKMergeRatio: 20
This configuration controls the requests merging behavior of query nodes. It merges small requests into one request to execute to improve the throughput, but each request latency also increases. The default value is 1000. Assume there are 1000 search requests(nq=1 per request) sent to the server at the same time, they will be merged into a single request with nq=1000 to compute. The CPU will be fully used with high nq value. Let's say a request with nq=1 will take 5ms, if the 1000 requests are executed one by one, the total latency is 1000 * 5ms = 5 seconds. Now we merge them into a single request(nq=1000), the total latency is less than 5 seconds, maybe 2 ~ 3 seconds. QPS is increased but the latency of each request also increases.

batch might not help a lot on diskANN. it do helps on IVF index

xiaobingxia-at Nov 28, 2024
Author

Does batch help HNSW?

liliu-z Nov 29, 2024
Maintainer

It doesn't hep anything usually

xiaobingxia-at · 2024-11-28T17:56:33Z

xiaobingxia-at
Nov 28, 2024
Author

I got sth new:
I noticed that using HNSW + GP3 EBS, it sometimes give me >10+ seconds search latency, but it is really rare case. I'm wondering what's the root cause:

5000 IOPS EBS is a bottleneck, the first search to one partition will take >10+ seconds, but later on, it will be cached in memory. So following searches will be 20ms. If this is the root cause, replacing the EBS volume with a local nvme disk will solve the problem.
We have 5GB memory usage (HNSW MMAP), but real data is 170GB+. The 10+ seconds latency is caused by MMAP efficiency, so even if I go with a nvme local ssd, it will still be very slow as 10+ seconds latency for some first-time searches.

Which is more likely the root cause? @xiaofan-luan

4 replies

liliu-z Nov 29, 2024
Maintainer

If

The part of indexes your queries involve is small enough to be fully cached in memory. In this case, disk doesn't matter
On the other side, if the memory is not enough, OS will have so many disk/mem swaps, and better disk can help accelerate.

Also the 10s case sounds like a super cold case as you mentioned

xiaobingxia-at Nov 29, 2024
Author

The memory on that query node is 120GB, and only 5GB memory is used. So the memory doesn't have any limit.
Considering 10+ seconds only happens on the first few searches (or p99), I believe you are right, it is super cold searches.
So just want to double confirm, for HNSW + MMAP, it will try to cache partial index in memory (with MMAP), if searches involve other parts of index, it will cache corresponding part of index in memory (with MMAP). In other words, the searches will cause the memory usage increase given HNSW + MMAP scenario. correct?

xiaobingxia-at Nov 29, 2024
Author

And what's the cache eviction strategy here?

liliu-z Nov 30, 2024
Maintainer

Yes, you can check the IOPS of your disk to confirm this point. The strategy is LRU

xiaobingxia-at · 2024-11-29T00:54:40Z

xiaobingxia-at
Nov 29, 2024
Author

So I'm trying to use the HNSW + MMAP + local nvme SSD disk, and IVFSQ8 + MMAP + local nvme SSD disk:

HNSW + MMAP + local nvme SSD: 47.212m 1536-d vectors, Query Node Memory Usage: 9GB, Disk Usage: 282.56GB.
- Query Latency (13-CORE query node): p50: 800ms, p99: 2.6 seconds
- After scaling the query nodes from 13-core to 26-core: p50: 6ms, p99: 750ms
IVFSQ8 + MMAP + local nvme SSD DISK: 42.56m 1536-d vectors, Query Node Memory Usage: 11GB (After Searching Starts, it surged to 16.59GB), disk Usage: 71.87GB.
- Query Latency (13-core query node): p50: 40ms, p99: 1.6 seconds.
- After scaling the query nodes from 13-core to 26-core: p50: 34ms p99: 197ms.

I'm surprised by these numbers. I thought HNSW would be faster? @xiaofan-luan

12 replies

xiaobingxia-at Nov 30, 2024
Author

Does HNSW + MMAP requires 1m aio-max-nr to be performant?

xiaofan-luan Nov 30, 2024
Maintainer

you can watch you ioutil see how many iops is actually running on your test server

xiaobingxia-at Dec 1, 2024
Author

I did the check for "system.io.w_s" and "system.io.r_s" metrics from datadog. Looks like the write request per second is <1000, and read request per second is very very low like <10.

xiaobingxia-at Dec 1, 2024
Author

this is confusing then, if the ssd disk's write/read per second is very low, so they are not the bottleneck. But the query node's CPU is also very low, query node is not the bottleneck as well. Proxy's CPU is ~20%, so proxy is not the bottleneck. Only Indexing Fleet's CPU utilization is ~100% most of the time. Then what's the bottleneck to help increase the QPS to 10k? Indexing fleet?

xiaobingxia-at Dec 1, 2024
Author

checked again, the system.io.r_s was 2800+ if just started the searching process. After ~30 minutes, it gradually went down to 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to increase the Milvus Serving QPS? #38060

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 22 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to increase the Milvus Serving QPS? #38060

xiaobingxia-at Nov 28, 2024

Replies: 4 comments · 22 replies

xiaobingxia-at Nov 28, 2024 Author

xiaofan-luan Nov 28, 2024 Maintainer

xiaofan-luan Nov 28, 2024 Maintainer

xiaobingxia-at Dec 1, 2024 Author

yhmo Nov 28, 2024 Collaborator

xiaofan-luan Nov 28, 2024 Maintainer

xiaobingxia-at Nov 28, 2024 Author

liliu-z Nov 29, 2024 Maintainer

xiaobingxia-at Nov 28, 2024 Author

liliu-z Nov 29, 2024 Maintainer

xiaobingxia-at Nov 29, 2024 Author

xiaobingxia-at Nov 29, 2024 Author

liliu-z Nov 30, 2024 Maintainer

xiaobingxia-at Nov 29, 2024 Author

xiaobingxia-at Nov 30, 2024 Author

xiaofan-luan Nov 30, 2024 Maintainer

xiaobingxia-at Dec 1, 2024 Author

xiaobingxia-at Dec 1, 2024 Author

xiaobingxia-at Dec 1, 2024 Author

xiaobingxia-at
Nov 28, 2024

Replies: 4 comments 22 replies

xiaobingxia-at
Nov 28, 2024
Author

xiaofan-luan Nov 28, 2024
Maintainer

xiaofan-luan Nov 28, 2024
Maintainer

xiaobingxia-at Dec 1, 2024
Author

yhmo
Nov 28, 2024
Collaborator

xiaofan-luan Nov 28, 2024
Maintainer

xiaobingxia-at Nov 28, 2024
Author

liliu-z Nov 29, 2024
Maintainer

xiaobingxia-at
Nov 28, 2024
Author

liliu-z Nov 29, 2024
Maintainer

xiaobingxia-at Nov 29, 2024
Author

xiaobingxia-at Nov 29, 2024
Author

liliu-z Nov 30, 2024
Maintainer

xiaobingxia-at
Nov 29, 2024
Author

xiaobingxia-at Nov 30, 2024
Author

xiaofan-luan Nov 30, 2024
Maintainer

xiaobingxia-at Dec 1, 2024
Author

xiaobingxia-at Dec 1, 2024
Author

xiaobingxia-at Dec 1, 2024
Author