-
-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High load in 1) SELECT queries when we have 7000 rows and 21000 branches 2) create extra branches when we've already have 21000 #5175
Comments
Because of sort by non-very good selective company name the results were not good (problem 1). |
@Hydrocharged might as well dig into this one as well since he fixed your other, lots of branches problem. |
@nkonev I apologize for the delay, I initially had some issues working with Docker on Windows, but I've finally got some answers! There are two core issues here. The first being that your For the first issue (slow For the second issue (slow branching), this deals with an implementation detail for the One feature of the table is that we do not add new entries if an existing entry would already grant the same benefits. For example, if you have an
With this change, the cost of adding new branches dropped significantly, with adding branches 21,000->28,000 spending a similar amount of time as adding branches 0->7,000, which essentially fixes the issue completely (at least for this test case). I'm also going to investigate changing how the |
Also, I'll mention that the increasing file size of |
About the second problem:
@Hydrocharged Thank you a lot for response. Yes, I'm able to apply your SQL. I've applied it, so I have
I have the following results of
The speed was increased, but there is still linear degradation. May be I'm doing something wrong. I didn't perform any GC. Did you do something else? |
Yes, you'll need to perform GC between the I spoke with @timsehn and I'm going to look into that aforementioned investigation of moving our table search to a tree search, which I'm assuming should help substantially even in a worst-case scenario. |
Thank you, @Hydrocharged With both It's much better result. I'm looking forward both |
Fantastic to hear! I'll close this issue for now, as I've created another issue for |
@bpf120 Hi, |
@Hydrocharged Thank you! I saw your PR. Will check without inserting in the branch control table. |
Sounds good! I do want to mention that it will not outperform inserting that additional row, as that's the intended and designed use of the table. For those cases where users and branches are not known ahead of time and will be seemingly random, then this is a significant speed up. |
Test with
|
Results are far better than before! Wonderful |
I saw very similar #5148 but I'm not sure if there is the same isuue.
So this test can be synthetic, but I think it worth to share the results.
Version is 0.52.8
Problem 1 - high CPU when I use sort
Reproduction:
Run web app in the first terminal window
Then create 7000 companies in the another terminal
Then create branch for every company and add a commit three times (sequentially)
Then run every-second requests in three terminal windows in parallel
Those requests turned into SQL like
(You can uncomment JDBC logging in
application.yaml
)Now we can see 20-23% CPU of dolt in
top
.- we have 3 every-second workers
If we open 10 every-second workers we will have
70-100% cpu
Note that it isn't special load tool like
wrk
, it's just every-second requests produced bywatch
.I tried to create indexes
but it doesn't change the CPU load.
Problem 2 - high CPU and linear time of many branch creation
Here I remained working web app and stopped all watch-based workers.
If we start creating branches again (watch'es still makes requests) CPU of dolt will be 120%
Update: I noticed (with help seven_thousand_companies_by_http) that adding more commits doesn't degrade as fast as adding more branches
Update 2: In other words:
We have 7000 companies, created by
seven_thousand_companies
.The very first pass of
create_draft_for_each_company
takes Measured time PT6M47.349344168SThe next pass - Measured time PT14M32.157333959S
The next pass - Measured time PT21M55.941600155S
The next pass - Measured time PT28M17.140720097S
I suspect that the culprit is - array of branches, which bumped into filesize limit here.
The text was updated successfully, but these errors were encountered: