Wasm buffer manager support #4523

benjaminwinger · 2024-11-14T00:16:50Z

This adds a simple alternative implementation to the BufferManager that allocates pages using malloc (via unique_ptr) and stores the page pointer in the PageState object.

Performance

There is a small performance penalty, pinning pages appears to be ~2-5x slower with this method, however even when doing a lot of new pins (i.e. where it's also allocating; there should be little difference for pins of pages which are already in memory) the overall cost is minimal.

E.g. summing the float edge property in the LDBC datagen-9_0-fb dataset took ~700ms using the normal buffer manager and ~900ms using this variant (obviously not a webassembly build; I overrode the macro used to enable/disable this feature).

Interestingly, it was actually faster when comparing the following query on the much larger graph500-30 dataset:
kuzu kugraph500-30 -d512 --noprogressbar --read_only <<< 'MATCH (v:N)-[e:E]->(:N) RETURN SUM(v.ID);', where the original runtime was ~4s, and improved to ~2.8s with this change (also on a machine with 128 threads; the difference may be that this approach scales better or has less of a setup cost, even though the allocation cost is higher; I'll look into it and try and see what the difference is).

Outstanding Webassembly bugs

Unfortunately it appears that this is not the issue with the web assembly tests which were running out of memory. They are still running out of memory with this change, so I have left them disabled; as far as I can tell the huge memory allocations in those tests are unrelated to the BufferManager/MemoryManager.

Other changes of note

I also lowered the default buffer pool size for testing back to 64MB. This should have been lowered before, but I guess was missed after it was raised when I was adding ColumnChunk memory to the tracked limit. With the higher limit most tests never do buffer manager evictions (including some of the tests which are disabled on webassembly due to this memory issue).

benjaminwinger · 2024-11-14T18:14:16Z

I'll look into it and try and see what the difference is).

The answer appears to be that MADV_DONTNEED does some internal locking and can be somewhat slower than malloc in a heavily multithreaded environment. However I also discovered that without the 512MB buffer pool restriction, the malloc version takes 5.8s compared to 2.7s for the regular version, so it seems like repeatedly allocating and freeing blocks with malloc is quite efficient (and presumably it re-uses the same chunks of memory given that we're always allocating the same size), but it's much less efficient than mmap + madvise when doing a large number of allocations without freeing anything.
So I don't think there's anything that should be changed about either version.

github-actions · 2024-11-19T23:28:34Z

Benchmark Result

Master commit hash: de8edf92ecff383c8b889bb671456134cd5c18f0
Branch commit hash: fa3472bed3f3301f13d08a2f12c5857d96f27ceb

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	643.44	643.10	0.34 (0.05%)
aggregation	q28	11889.70	12256.73	-367.03 (-2.99%)
filter	q14	126.89	127.65	-0.77 (-0.60%)
filter	q15	122.76	124.99	-2.23 (-1.78%)
filter	q16	303.66	304.07	-0.41 (-0.13%)
filter	q17	443.96	444.69	-0.74 (-0.17%)
filter	q18	1960.57	1962.72	-2.14 (-0.11%)
filter	zonemap-node	86.22	86.16	0.06 (0.07%)
filter	zonemap-node-lhs-cast	89.01	86.32	2.69 (3.12%)
filter	zonemap-rel	5499.22	5517.32	-18.10 (-0.33%)
fixed_size_expr_evaluator	q07	545.62	542.87	2.75 (0.51%)
fixed_size_expr_evaluator	q08	762.69	759.18	3.51 (0.46%)
fixed_size_expr_evaluator	q09	765.44	757.51	7.93 (1.05%)
fixed_size_expr_evaluator	q10	238.45	239.94	-1.48 (-0.62%)
fixed_size_expr_evaluator	q11	232.55	232.28	0.27 (0.12%)
fixed_size_expr_evaluator	q12	231.34	231.34	-0.01 (-0.00%)
fixed_size_expr_evaluator	q13	1462.62	1464.37	-1.76 (-0.12%)
fixed_size_seq_scan	q23	115.51	113.90	1.61 (1.41%)
join	q29	639.77	653.65	-13.88 (-2.12%)
join	q30	1341.20	1463.50	-122.29 (-8.36%)
join	q31	8.06	4.10	3.96 (96.75%)
ldbc_snb_ic	q35	422.87	419.08	3.79 (0.90%)
ldbc_snb_ic	q36	137.80	123.01	14.79 (12.02%)
ldbc_snb_is	q32	5.92	4.50	1.42 (31.49%)
ldbc_snb_is	q33	13.27	12.40	0.88 (7.07%)
ldbc_snb_is	q34	1.53	1.43	0.11 (7.47%)
multi-rel	multi-rel-large-scan	2105.16	1931.29	173.87 (9.00%)
multi-rel	multi-rel-lookup	5.72	6.36	-0.64 (-10.11%)
multi-rel	multi-rel-small-scan	87.91	95.26	-7.35 (-7.72%)
order_by	q25	136.91	129.91	7.00 (5.39%)
order_by	q26	462.19	462.12	0.07 (0.02%)
order_by	q27	1414.10	1414.64	-0.54 (-0.04%)
scan_after_filter	q01	171.22	170.71	0.51 (0.30%)
scan_after_filter	q02	154.40	156.23	-1.84 (-1.18%)
shortest_path_ldbc100	q37	81.46	80.14	1.32 (1.65%)
shortest_path_ldbc100	q38	480.26	444.31	35.95 (8.09%)
shortest_path_ldbc100	q39	66.40	60.48	5.92 (9.79%)
shortest_path_ldbc100	q40	562.47	536.57	25.90 (4.83%)
var_size_expr_evaluator	q03	2078.75	2079.70	-0.96 (-0.05%)
var_size_expr_evaluator	q04	2246.00	2237.54	8.46 (0.38%)
var_size_expr_evaluator	q05	2690.46	2687.41	3.05 (0.11%)
var_size_expr_evaluator	q06	1341.14	1343.17	-2.04 (-0.15%)
var_size_seq_scan	q19	1491.32	1476.17	15.15 (1.03%)
var_size_seq_scan	q20	2520.02	2578.96	-58.95 (-2.29%)
var_size_seq_scan	q21	2316.95	2301.50	15.45 (0.67%)
var_size_seq_scan	q22	132.47	133.03	-0.55 (-0.42%)

codecov · 2024-11-19T23:39:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.24%. Comparing base (756486b) to head (6d30abf).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #4523   +/-   ##
=======================================
  Coverage   87.24%   87.24%           
=======================================
  Files        1356     1356           
  Lines       56755    56754    -1     
  Branches     7078     7078           
=======================================
+ Hits        49515    49517    +2     
+ Misses       7068     7065    -3     
  Partials      172      172

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests
JS Bundle Analysis - Avoid shipping oversized bundles

github-actions · 2024-11-20T00:49:21Z

Benchmark Result

Master commit hash: de8edf92ecff383c8b889bb671456134cd5c18f0
Branch commit hash: c27b03ccb3cf3836a7de08a75e7873e8935fcafd

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	643.50	643.10	0.39 (0.06%)
aggregation	q28	11191.67	12256.73	-1065.06 (-8.69%)
filter	q14	127.41	127.65	-0.24 (-0.19%)
filter	q15	129.15	124.99	4.16 (3.33%)
filter	q16	302.87	304.07	-1.19 (-0.39%)
filter	q17	446.95	444.69	2.25 (0.51%)
filter	q18	1949.52	1962.72	-13.20 (-0.67%)
filter	zonemap-node	86.39	86.16	0.23 (0.27%)
filter	zonemap-node-lhs-cast	87.17	86.32	0.84 (0.97%)
filter	zonemap-rel	5518.60	5517.32	1.27 (0.02%)
fixed_size_expr_evaluator	q07	547.63	542.87	4.76 (0.88%)
fixed_size_expr_evaluator	q08	758.85	759.18	-0.33 (-0.04%)
fixed_size_expr_evaluator	q09	756.86	757.51	-0.64 (-0.09%)
fixed_size_expr_evaluator	q10	240.78	239.94	0.85 (0.35%)
fixed_size_expr_evaluator	q11	234.21	232.28	1.93 (0.83%)
fixed_size_expr_evaluator	q12	233.87	231.34	2.53 (1.09%)
fixed_size_expr_evaluator	q13	1458.40	1464.37	-5.97 (-0.41%)
fixed_size_seq_scan	q23	120.19	113.90	6.29 (5.52%)
join	q29	630.14	653.65	-23.50 (-3.60%)
join	q30	1366.43	1463.50	-97.07 (-6.63%)
join	q31	8.19	4.10	4.09 (99.79%)
ldbc_snb_ic	q35	444.91	419.08	25.83 (6.16%)
ldbc_snb_ic	q36	131.16	123.01	8.15 (6.63%)
ldbc_snb_is	q32	2.31	4.50	-2.19 (-48.60%)
ldbc_snb_is	q33	13.72	12.40	1.32 (10.68%)
ldbc_snb_is	q34	1.46	1.43	0.03 (2.15%)
multi-rel	multi-rel-large-scan	1908.61	1931.29	-22.68 (-1.17%)
multi-rel	multi-rel-lookup	10.11	6.36	3.75 (59.00%)
multi-rel	multi-rel-small-scan	88.98	95.26	-6.28 (-6.60%)
order_by	q25	129.27	129.91	-0.64 (-0.49%)
order_by	q26	459.39	462.12	-2.73 (-0.59%)
order_by	q27	1412.57	1414.64	-2.07 (-0.15%)
scan_after_filter	q01	168.46	170.71	-2.25 (-1.32%)
scan_after_filter	q02	155.22	156.23	-1.01 (-0.65%)
shortest_path_ldbc100	q37	88.89	80.14	8.75 (10.92%)
shortest_path_ldbc100	q38	439.51	444.31	-4.81 (-1.08%)
shortest_path_ldbc100	q39	55.40	60.48	-5.08 (-8.40%)
shortest_path_ldbc100	q40	520.25	536.57	-16.32 (-3.04%)
var_size_expr_evaluator	q03	2071.28	2079.70	-8.42 (-0.40%)
var_size_expr_evaluator	q04	2262.31	2237.54	24.77 (1.11%)
var_size_expr_evaluator	q05	2679.83	2687.41	-7.58 (-0.28%)
var_size_expr_evaluator	q06	1342.39	1343.17	-0.78 (-0.06%)
var_size_seq_scan	q19	1477.55	1476.17	1.37 (0.09%)
var_size_seq_scan	q20	2517.64	2578.96	-61.33 (-2.38%)
var_size_seq_scan	q21	2305.53	2301.50	4.04 (0.18%)
var_size_seq_scan	q22	131.64	133.03	-1.39 (-1.04%)

ray6080 · 2024-11-21T00:50:02Z

test/test_files/tck/match/match7.test

@@ -1,4 +1,5 @@
 -DATASET CSV tck
+-BUFFER_POOL_SIZE 268435456


Is this due to recursive joins?

ray6080 · 2024-11-21T01:15:55Z

src/storage/buffer_manager/buffer_manager.cpp

+    }
+#endif
+
+#if defined(_WIN32) && !BM_MALLOC


The macro here is becoming more a bit hard to follow now. I wonder if we should choose to duplicate the code a bit to separate them more clearly.

With the second #if/#else removed is it sufficiently clear?

ray6080 · 2024-11-21T01:16:37Z

src/storage/buffer_manager/buffer_manager.cpp

    } catch (AccessViolation& exc) {
        // If we encounter an acess violation within the VM region,
        // the page was decomitted by another thread
        // and is no longer valid memory
+#if BM_MALLOC


How can this be true when #if defined(_WIN32) && !BM_MALLOC is true?

I think I added those at two different times.

I'm not sure if accessing freed memory may cause an access violation on Windows, in which case optimisticRead won't be safe even when using malloc unless we handle those access violations (at least, those that occur within the frame).
The CI is passing with it disabled at the moment, which makes me wonder if it's not covered by tests, or if it in practice just isn't happening (maybe it doesn't actually happen at all, or it's just that freed memory that's been re-used wouldn't cause the access violation).
I think I'll just remove this block for now and leave the access violation handling disabled with BM_MALLOC.

ray6080 · 2024-11-21T15:04:28Z

src/storage/buffer_manager/buffer_manager.cpp

+    }
+#endif
+
+#if defined(_WIN32) && !BM_MALLOC


ray6080 · 2024-11-21T15:56:22Z

src/include/storage/buffer_manager/page_state.h

+#if BM_MALLOC
+    uint8_t* getPage() const { return page.get(); }
+    uint8_t* allocatePage(uint64_t pageSize) {
+        page = std::make_unique<uint8_t[]>(pageSize);


I think we should check allocation failure here to avoid seg faults.

Interestingly in webassembly the tests are compiled with the default ABORTING_MALLOC=1, which aborts the webassembly process if it runs out of memory instead of returning 0, but outside of the tests we're compiling with ALLOW_MEMORY_GROWTH, which has the default of having malloc return 0 on failure.
So I think we should set ABORTING_MALLOC=0 for the tests just so it's handled the same (not that we have any tests where we expect this yet).

Pages are stored in the page state structure; optimistic reads work as normal, except we need to handle the page pointer possibly being null

Should have been lowered before; with the higher limit most tests never do buffer manager evictions

github-actions · 2024-11-25T15:27:24Z

Benchmark Result

Master commit hash: 756486b02abe6c283067935b550c369629985efb
Branch commit hash: d0249511e2f92a426ea2a809369029eb69afc8e3

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	653.44	652.51	0.93 (0.14%)
aggregation	q28	11564.18	11660.77	-96.59 (-0.83%)
filter	q14	134.48	135.35	-0.87 (-0.64%)
filter	q15	139.73	135.49	4.24 (3.13%)
filter	q16	307.84	306.99	0.85 (0.28%)
filter	q17	452.33	452.63	-0.29 (-0.06%)
filter	q18	1949.33	1935.17	14.16 (0.73%)
filter	zonemap-node	97.00	94.56	2.44 (2.58%)
filter	zonemap-node-lhs-cast	96.15	94.82	1.33 (1.40%)
filter	zonemap-rel	5684.39	5724.56	-40.17 (-0.70%)
fixed_size_expr_evaluator	q07	579.60	578.03	1.57 (0.27%)
fixed_size_expr_evaluator	q08	813.69	811.74	1.95 (0.24%)
fixed_size_expr_evaluator	q09	808.94	809.61	-0.67 (-0.08%)
fixed_size_expr_evaluator	q10	244.02	247.06	-3.04 (-1.23%)
fixed_size_expr_evaluator	q11	240.31	237.13	3.18 (1.34%)
fixed_size_expr_evaluator	q12	234.10	233.69	0.41 (0.18%)
fixed_size_expr_evaluator	q13	1466.73	1467.01	-0.27 (-0.02%)
fixed_size_seq_scan	q23	120.08	122.17	-2.09 (-1.71%)
join	q29	562.70	646.05	-83.35 (-12.90%)
join	q30	1418.91	1442.43	-23.53 (-1.63%)
join	q31	4.92	8.23	-3.31 (-40.19%)
ldbc_snb_ic	q35	417.25	538.41	-121.17 (-22.50%)
ldbc_snb_ic	q36	120.16	108.10	12.06 (11.16%)
ldbc_snb_is	q32	5.84	2.13	3.71 (174.15%)
ldbc_snb_is	q33	14.99	12.40	2.58 (20.84%)
ldbc_snb_is	q34	1.30	1.45	-0.15 (-10.49%)
multi-rel	multi-rel-large-scan	1236.44	1201.99	34.44 (2.87%)
multi-rel	multi-rel-lookup	26.67	5.61	21.06 (375.36%)
multi-rel	multi-rel-small-scan	89.25	78.50	10.74 (13.68%)
order_by	q25	139.35	139.72	-0.37 (-0.26%)
order_by	q26	458.36	459.89	-1.53 (-0.33%)
order_by	q27	1453.84	1464.05	-10.20 (-0.70%)
scan_after_filter	q01	178.82	178.79	0.03 (0.02%)
scan_after_filter	q02	165.22	164.86	0.35 (0.22%)
shortest_path_ldbc100	q37	82.63	85.73	-3.10 (-3.62%)
shortest_path_ldbc100	q38	466.24	455.56	10.68 (2.34%)
shortest_path_ldbc100	q39	61.39	61.46	-0.07 (-0.11%)
shortest_path_ldbc100	q40	540.44	519.42	21.02 (4.05%)
var_size_expr_evaluator	q03	2070.52	2106.47	-35.94 (-1.71%)
var_size_expr_evaluator	q04	2241.20	2243.27	-2.07 (-0.09%)
var_size_expr_evaluator	q05	2735.07	2754.49	-19.42 (-0.71%)
var_size_expr_evaluator	q06	1332.21	1327.50	4.71 (0.35%)
var_size_seq_scan	q19	1447.20	1453.62	-6.42 (-0.44%)
var_size_seq_scan	q20	2696.61	2748.31	-51.69 (-1.88%)
var_size_seq_scan	q21	2269.39	2282.05	-12.66 (-0.55%)
var_size_seq_scan	q22	126.80	127.64	-0.84 (-0.66%)

benjaminwinger force-pushed the wasm_buffer_manager branch from 34bacfa to 39aa152 Compare November 14, 2024 17:19

benjaminwinger force-pushed the wasm_buffer_manager branch 2 times, most recently from 1160c49 to 3fb91cb Compare November 19, 2024 23:00

benjaminwinger force-pushed the wasm_buffer_manager branch from 3fb91cb to aaca54a Compare November 20, 2024 00:13

ray6080 self-requested a review November 21, 2024 00:47

ray6080 reviewed Nov 21, 2024

View reviewed changes

benjaminwinger force-pushed the wasm_buffer_manager branch from aaca54a to ba93738 Compare November 21, 2024 14:40

ray6080 approved these changes Nov 21, 2024

View reviewed changes

benjaminwinger force-pushed the wasm_buffer_manager branch from ba93738 to 41a859b Compare November 21, 2024 16:23

benjaminwinger added 2 commits November 25, 2024 09:49

Implement malloc-based version of the BufferManager for WASM

545aa44

Pages are stored in the page state structure; optimistic reads work as normal, except we need to handle the page pointer possibly being null

Lower default buffer pool size for tests

6d30abf

Should have been lowered before; with the higher limit most tests never do buffer manager evictions

benjaminwinger force-pushed the wasm_buffer_manager branch from 41a859b to 6d30abf Compare November 25, 2024 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasm buffer manager support #4523

Wasm buffer manager support #4523

benjaminwinger commented Nov 14, 2024

benjaminwinger commented Nov 14, 2024 •

edited

Loading

github-actions bot commented Nov 19, 2024

codecov bot commented Nov 19, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024

ray6080 Nov 21, 2024

ray6080 Nov 21, 2024

benjaminwinger Nov 21, 2024

ray6080 Nov 21, 2024

ray6080 Nov 21, 2024

benjaminwinger Nov 21, 2024

ray6080 Nov 21, 2024

ray6080 Nov 21, 2024

benjaminwinger Nov 21, 2024

github-actions bot commented Nov 25, 2024

Wasm buffer manager support #4523

Are you sure you want to change the base?

Wasm buffer manager support #4523

Conversation

benjaminwinger commented Nov 14, 2024

Performance

Outstanding Webassembly bugs

Other changes of note

benjaminwinger commented Nov 14, 2024 • edited Loading

github-actions bot commented Nov 19, 2024

Benchmark Result

codecov bot commented Nov 19, 2024 • edited Loading

Codecov Report

github-actions bot commented Nov 20, 2024

Benchmark Result

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 25, 2024

Benchmark Result

benjaminwinger commented Nov 14, 2024 •

edited

Loading

codecov bot commented Nov 19, 2024 •

edited

Loading