-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(NODE-6246): Significantly improve memory usage and performance of ObjectId #703
Conversation
You can create a new /**
* Generate a 12 byte id in hex string used in ObjectId's
*
* @param time - pass in a second based timestamp.
*/
static generateString(time?: number): string {
if ('number' !== typeof time) {
time = Math.floor(Date.now() / 1000);
}
const inc = ObjectId.getInc();
// set PROCESS_UNIQUE if yet not initialized
if (PROCESS_UNIQUE === null) {
PROCESS_UNIQUE = ByteUtils.randomBytes(5);
}
if (PROCESS_UNIQUE_START_SLICE_HEX === null) {
PROCESS_UNIQUE_START_SLICE_HEX = ByteUtils.toHex(PROCESS_UNIQUE);
}
let finalId = (time % (2 ** 32)).toString(16);
finalId += PROCESS_UNIQUE_START_SLICE_HEX;
finalId += (inc & 0xffffff).toString(16).padStart(6, '0');
return finalId;
} From my tests, it's slower than using buffer only but faster if you need to convert the buffer to hex: <Buffer 7b 5b 0f 3e b7 ca ef 8a 6b 08 ca 28>
Time generate buffer: 348.585796 ms
7b5b0f3eb7caef8a6ba160a8
Time generate buffer.toString('hex'): 1175.228544 ms
7b5b0f3eb7caef8a6ba160a8
Time generateString: 780.711906 ms Benchmark: const { performance } = require('perf_hooks');
const jsBson = require('./lib/bson.cjs');
let start = performance.now();
let r;
const now = Date.now();
for (let i = 0; i < 1e7; i++) {
r = jsBson.ObjectId.generate(now);
}
console.log(r);
console.log(`Time generate buffer: ${performance.now() - start} ms`);
start = performance.now();
for (let i = 0; i < 1e7; i++) {
r = jsBson.ObjectId.generate(now).toString('hex');
}
console.log(r);
console.log(`Time generate buffer.toString('hex'): ${performance.now() - start} ms`);
start = performance.now();
for (let i = 0; i < 1e7; i++) {
r = jsBson.ObjectId.generateString(now);
}
console.log(r);
console.log(`Time generateString: ${performance.now() - start} ms`); |
Hey folks, just wanted to drop a heads-up that you won't hear back properly for a bit due to the holiday 🎆 but thanks so much for reaching out about this issue! @SeanReece This looks really thoroughly researched I am looking forward to getting familiar with the details! |
@H4ad this is great! I was hesitant to touch |
Yeah, this will work
As a compatibility layer, I think keeping the buffer is still worthy to have, in case there is someone relying in something that can only be done via buffer. |
@nbbeeken I've marked the PR ready for review again 👍 Looking forward to your input on this. Let me know if you have any questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SeanReece! Thanks again for your effort here this is an interesting change to say the least. I am still taking in all the changes, but first and fore most we can get the tests passing again. Not sure if you've given then a whirl npm run check:node
runs the tests on node, and npm run check:web
runs them "in the browser" (it actually also runs on Node but with Node globals removed)
Unfortunately the BSON benchmarks aren't the easiest to run and compare, but I kicked them off on my end and I'm not getting as promising figures:
Values are in mb/s
bench | main | oid_perf | percDiff |
---|---|---|---|
objectid_singleFieldDocument._serialize_bson | 9.0253 | 8.9594 | -0.7328 |
objectid_singleElementArray._serialize_bson | 10.8499 | 10.4864 | -3.4077 |
objectid_array_10._serialize_bson | 35.6105 | 29.7977 | -17.7739 |
objectid_array_100._serialize_bson | 190.8082 | 75.2714 | -86.8438 |
objectid_array_1000._serialize_bson | 257.4163 | 95.0820 | -92.1050 |
objectid_singleFieldDocument._deserialize_bson | 11.6683 | 12.7491 | 8.8524 |
objectid_singleElementArray._deserialize_bson | 10.8365 | 11.4176 | 5.2223 |
objectid_array_10._deserialize_bson | 38.1746 | 34.5442 | -9.9849 |
objectid_array_100._deserialize_bson | 114.9375 | 73.5048 | -43.9739 |
objectid_array_1000._deserialize_bson | 131.2710 | 83.9106 | -44.0190 |
I will try a bit of sleuthing on my end to see if I can understand the scale and source of the regression here.
return ( | ||
this.buffer[11] === otherId.buffer[11] && ByteUtils.equals(this.buffer, otherId.buffer) | ||
); | ||
return this.__id === otherId.__id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests: should use otherId[kId] Buffer for equality when otherId has _bsontype === ObjectId
and should not rely on toString for otherIds that are instanceof ObjectId
need updating. Now we want to make sure that if we're checking another ObjectId instance we use the effcient string comparison. Previously the goal was to make sure we were using buffer comparison, with an optimization of checking the LSB.
Should we continue to check the LSB? or maybe just the least signficant hex character? Unsure if the same optimization applies to strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like performing a character comparison with strings is a little slower, so I think we can update/remove these tests.
const a = crypto.randomBytes(12).toString("hex");
const b = crypto.randomBytes(12).toString("hex");
console.log(a, b);
suite
.add("string compare", function () {
return a === b;
})
.add("String char compare", function () {
return a[0] === b[0] && a === b;
})
this.buffer = ObjectId.generate(typeof workingId === 'number' ? workingId : undefined); | ||
} else if (ArrayBuffer.isView(workingId) && workingId.byteLength === 12) { | ||
this.__id = ObjectId.generate(typeof workingId === 'number' ? workingId : undefined); | ||
} else if (ArrayBuffer.isView(workingId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we keep the && workingId.byteLength === 12
validation here? otherwise buffers could be of any length, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change goes along with the change to deserialization code here
Instead of allocating a new UInt8Array(12), copying over the 12 bytes from the parent buffer, then passing in that new buffer, only to convert it to a string and throw away that temporary buffer. You can pass in a buffer and an offset to new ObjectId(buffer, offset)
and it will just grab the next 12 bytes after that offset and encode that to hex.
It's probably worth enforcing the buffer size === 12 if no offset was passed in, also if offset + 12 > byteLength
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth enforcing the buffer size === 12 if no offset was passed in, also if offset + 12 > byteLength. WDYT?
Yea those sound like the correct assertions. It's important that when inspecting the ObjectId you get the same visual results as you do when serializing. So whether it is a view on a larger arrayBuffer or exactly 12 bytes, we just do not want to permit an OID to carry around a buffer that implies more or less than exactly that amount of data.
@@ -170,7 +170,7 @@ export const webByteUtils = { | |||
return Uint8Array.from(buffer); | |||
}, | |||
|
|||
toHex(uint8array: Uint8Array): string { | |||
toHex(uint8array: Uint8Array, _start?: number, _end?: number): string { | |||
return Array.from(uint8array, byte => byte.toString(16).padStart(2, '0')).join(''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this call .subarray
on the unint8Array with start/end? Otherwise won't this return a different result than the Node.js version of this API?
} else if (value._bsontype === 'ObjectId') { | ||
index = serializeObjectId(buffer, key, value, index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for moving this step up here? seems like it may have broken the version check that comes later for this BSONType
, unless I'm mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did see a measurable performance gain by moving the ObjectId check to the top if
in my testing, I believe since some of those checks like isUInt8Array()
and Symbol.for("@@mdb.bson.version")
are moderately expensive for such a hot path. Now that I'm thinking about this we can instantiate a single Symbol variable and just reference it instead of creating it every time, and maybe move the isUInt8Array()
down lower if that doesn't break any tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I narrowed the performance impact to the isUint8Array()
, isRegExp()
, and isDate()
calls, which all invoke Object.prototype.toString.call(value)
which is expensive. Did some refactoring and was able to get serialize from 2.4m ops/sec
-> 3m ops/sec
, which in my testing is only about -8% decrease in raw serialization perf. AND all the tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shared copy of the symbol sounds like a good call. As long as the version check is still performed, moving probably the most common BSON type into a position where it performs better sounds good to me.
@nbbeeken I really appreciate the thorough review 👍 Sorry about all the test failures. I think I got so caught up with benchmarking that completely slipped my mind. I'll get the tests running! Got any tips to get the BSON benchmarks running on my machine? Even on a clean main, I do expect there to be some regressions with raw serialization/deserialization since we're now converting/encoding a Buffer to a hex string, instead of just copying bytes, so running a benchmark that just serializes/deserializes is going to compound this. I still believe that through real-life use cases (combinations of deserialization+toJSON, etc), we shouldn't see any performance regressions. AND I'm sincerely hoping that we'll see much better hex encoding performance in V8 once this lands: https://github.com/tc39/proposal-arraybuffer-base64 While I was originally brainstorming this change I also played with the idea of having a large buffer to store a pool of objectIds, since the memory inefficiency comes from storing many small buffers. This would mean we wouldn't have any performance regressions from encoding strings on serialization/deserialization, BUT handling garbage collection / trimming the pool is definitely not an easy problem to solve. Another idea is to have the string representation be configurable as a flag (ie. enableLowMemoryOID) which would trigger the use of string based ObjectIds, but this means a lot more branches, tests, and code to manage. |
There's a small bug relating to handling paths I think that makes the benchmarks only run on linux. I am able to get it to run on my Mac with the following command: env LIBRARY=$(pwd) ITERATIONS=1000 WARMUP=1000 npm run check:granular-bench Then I wrote a quick script to merge two
That is understandable, right now our benchmarks are geared towards throughput, hence the mb/s units, we're testing to make sure that we continue to be able to process the same amount of data as we have previously. This does focus solely on DB interaction side of things and does not measure the throughput of a total application that maybe serializing everything to JSON after it leaves the driver. Possibly that is an area we should look at to see if it is seriously impacted.
Isn't this what
Yes this is a possibility, but I agree with the sentiment of branching code, I think we should and can find an acceptable middle ground between performance and memory rather than that 🚀 |
@nbbeeken I'm also an a Mac (M1). Still no luck running the benchmarks but I'm adding some debug logs to see if I can track it down 🤔
Yes NodeJS Here's some details I could find from a V8 developer: We can easily test this out by allocating some Buffers and checking the memoryUsage before/after const SIZE = 1_000_000; // creating 1 million buffers
const tests = {
allocUnsafe: () => {
const arr = [];
for (let i = 0; i < SIZE; i++) {
arr.push(Buffer.allocUnsafe(12));
}
return arr;
},
emulatePoolUint8Array: () => {
const buffer = new ArrayBuffer(SIZE * 12);
const arr = [];
for (let i = 0; i < SIZE; i++) {
arr.push(new Uint8Array(buffer, i * 12, 12));
}
return arr;
},
createSingleBuffer: () => {
return Buffer.allocUnsafe(SIZE * 12);
},
};
Object.keys(tests).forEach((key) => {
const bef = process.memoryUsage();
const res = tests[key]();
const aft = process.memoryUsage();
console.log(`${key} used: ${Math.round((aft.rss + aft.external - (bef.rss + bef.external)) / 1024 / 1024)}MB`);
});
|
Right, Right, that is the center of the issue. Thanks for breaking that down.
Since we have isolated our interactions with buffers to the mini Updated perf
I've run an autocannon test against a server that calls autocannon http://localhost:8080 -d 10 -c 30 -w 3 MB/s throughput of a
|
@nbbeeken You're a genius 😄 Don't know why this never occurred to me. Wow, I feel like a bit of idiot here 😄. For some reason, I assumed Node was maintaining a single large pool and just compacting/trimming old unused space. But, looking at the implementation it looks super simple, it just keeps a Uint8Array of size I think we can replicate this easily, reduce memory usage even more, and keep our serialization/deserialization performance. Even if we completely throw out these string changes, I've learned a lot here and there are lots of non-string related changes that would improve performance. Let me throw together a test to see if this will work. Here's what I'm thinking.
|
tyty! But I must give credit to @addaleax for suggesting the idea of a Uint8Array pool. I have read Node's pool implementation before so I figured we could kindly steal that approach.
Yea! and I've learned a lot here too. I think it was not completely out of the question to begin with. I can imagine most use cases probably end up turning OIDs into hex strings anyway (JSON.stringifiy will do that), so the REST example could have shown a low impact to performance if it meant we just moved the hex-ify work "up" into BSON. Turns out it has too large of an impact, but there was potential there!
Those sound like good jumping-off points. Seems like we intend to make an OID-specific pool, I'm not sure if it could be kept general easily 🤔 making it specific to OID means we don't have to return a If you'd like to keep this PR as is for comparison, feel free to start a new one but continuing to work here is fine as well. This is really great work, thanks again for donating your time and talents! 💚 |
Description
This MR refactors ObjectId class to persist strings instead of a raw Buffer, which greatly reduces memory usage and improves performance of ObjectIds.
Why?
Performance Improvements
Real world use cases
In the benchmarks above you can see there are some performance regressions with the raw serialize/deserializing BSON. This is because we now need to convert buffer to/from string instead of just copying bytes, but this regression seems to be moot during "real world" benchmarks.
I believe the 2 most common use cases are
Using the MFlix Sample Dataset "Comments" documents.
Memory Improvements
With this change, retained memory usage is reduced by ~45% (ObjectId size decreases from 128 bytes to 72 bytes). I've also removed an extra buffer allocation during deserialization which reduces the strain on the buffer pool, reducing the likelihood of the internal buffer pool having to allocate another block of
Buffer.poolSize
.By reducing the amount of memory that needs to be allocated we are further improving performance since garbage collection is quite expensive.
Example
Using the MFlix Sample Dataset "Comments" collection. Grabbing 1 million documents
BEFORE: 654 MB used
AFTER: 343 MB used
-47% reduction in memory usage
cacheBuffer Option
Similar to the previous
cacheHexString
static variable, this MR addscacheBuffer
option that also persists theBuffer
on the ObjectId to speed up some operations that require buffer, such as.id
Previous ObjectID w/ cacheHexString VS New ObjectId w/ cacheBuffer
What is changing?
Is there new documentation needed for these changes?
This MR deprecates
cacheHexString
which may need to be documented.This MR adds
cacheBuffer
which may need to be documentedWhat is the motivation for this change?
We are been running into memory issues while pulling large Mongo result sets into memory, and after lots of memory profiling the issue seems to be related to ObjectId, specifically how memory inefficient Buffer/ArrayBuffer is when storing lots of small Buffers.
We were expecting an ObjectId to consume ~12bytes of memory (+ some overhead), but in reality this consumes 128 bytes per ObjectId (96 bytes for just the Buffer).
I opened an issue with the NodeJS Performance team but this appears to be working as designed: nodejs/performance#173
Storing a string in Node/V8 is much more memory efficient since it's a primitive. A 24 character hex string only consumes 40 bytes of memory, AND it's much faster to serialize/deserialize.
Release Highlight
Fill in title or leave empty for no highlight
Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat(NODE-1234)!: rewriting everything in coffeescript