-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDRIVER-4363 add client bulk write #1590
Conversation
This is a drive-by fix to fix the referenced payload numbers.
Otherwise, if the type mismatches, the test runner aborts with an error: "expected int64, int32, or double [...]" that does not identify the field.
To do exact BSON matches
To ease reading error message
include some tests of basic usage and libmongoc-specific behavior. Specification and prose tests test driver-agnostic behavior.
To support `w=majority`
137db61
to
b031041
Compare
src/libmongoc/tests/unified/result.c
Outdated
if (write_errors) { | ||
if (write_errors && !result->write_errors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the write_errors
in the nested if
is redundant.
bson_oid_init (&oid, NULL); | ||
BSON_ASSERT (BSON_APPEND_OID (&tmp, "_id", &oid)); | ||
BSON_ASSERT (bson_concat (&tmp, document)); | ||
BSON_ASSERT (bson_append_document (&op, "document", 8, &tmp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than hardcoding a string length, I find it more maintainable to use something like strlen("document")
instead. The compiler should optimize away the call to strlen. See this example on godbolt https://godbolt.org/z/bhzoddYKo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHP internals often does sizeof(<str literal>) - 1
, but strlen()
is certainly more concise if it gets compiled the same.
There also used to be a PHP macro for expanding a string literal into two arguments (the string and length), but it had a cryptic name and casual readers might not realize it expands and assume a parameter is missing. If that sounds useful, readability could probably be addressed with a more verbose name (e.g. STR_AND_LEN()
). The main benefit is avoiding duplication (be it the original string or its length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler should optimize away the call to strlen.
Neat, thank you for the example. Updated to use the BSON_APPEND_*
macros to remove the hard coded lengths. The macros add a strlen
call.
} | ||
|
||
mongoc_bulkwritereturn_t | ||
mongoc_bulkwrite_execute (mongoc_bulkwrite_t *self, const mongoc_bulkwriteopts_t *opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is nearly 600 lines long. Is it possible to break this function up into smaller functions? The general rule of thumb that I use is to make functions that fit in a single screen size without needing to scroll (approximately 80 lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to split into two new helpers: _bulkwritereturn_apply_result
and _bulkwritereturn_apply_reply
.
An attempt to more aggressively split did not improve readability IMO. I think separate functions are helpful when it is "easy" to reason about what state is changed. Result handling seemed like an improvement, since only the mongoc_bulkwritereturn_t
is updated.
Though this function is still long, block scopes with comments were used to logically group. Code folding can collapse blocks when reading:
tf->maxMessageSizeBytes = sl.maxMessageSizeBytes; | ||
tf->maxBsonObjectSize = sl.maxBsonObjectSize; | ||
|
||
const int32_t opsBytes = tf->maxMessageSizeBytes - 1122; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment the meaning of 1122 or create a constant with a meaningful name and assign this value to it. Please do the same for other locations where magic numbers are used, e.g., 57, 217, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values are copied from the prose test. The prose test describes the calculation.
Added a comment referring to the prose tests.
Replace with existing `mongoc_optional_t`.
Find once and store the offset.
Array filters do not apply to replace. Remove to match spec.
The return was not being set. Callers can determine if the write was acknowledged by checking if `mongoc_bulkwritereturn_t::res` is NULL.
Only select a new stream if another batch is needed.
Reduces repeated assignment
Without the error propagation change, the comparison of int to string aborts (rather than returning an error).
Fix updateResults and deleteResults to return values as int64 to match spec.
Use `lookup_as_int64` for `ok` field. Server's `BulkWriteReplyItem` appends as double. Support other numerics for future proofing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes verified with this patch build.
bson_oid_init (&oid, NULL); | ||
BSON_ASSERT (BSON_APPEND_OID (&tmp, "_id", &oid)); | ||
BSON_ASSERT (bson_concat (&tmp, document)); | ||
BSON_ASSERT (bson_append_document (&op, "document", 8, &tmp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler should optimize away the call to strlen.
Neat, thank you for the example. Updated to use the BSON_APPEND_*
macros to remove the hard coded lengths. The macros add a strlen
call.
tf->maxMessageSizeBytes = sl.maxMessageSizeBytes; | ||
tf->maxBsonObjectSize = sl.maxBsonObjectSize; | ||
|
||
const int32_t opsBytes = tf->maxMessageSizeBytes - 1122; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values are copied from the prose test. The prose test describes the calculation.
Added a comment referring to the prose tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed requested changes from my previous review. Note that I did not diligently examine the test changes (just the public API we'd end up using in PHPC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
mongoc_bulkwriteopts_set_let (mongoc_bulkwriteopts_t *self, const bson_t *let) | ||
{ | ||
BSON_ASSERT_PARAM (self); | ||
BSON_ASSERT (let || true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not introduce more instances of this funny line. I think we need something like:
#define BSON_OPTIONAL_PARAM (x) (void)(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added BSON_OPTIONAL_PARAM
.
Summary
This PR implements the
MongoClient.bulkWrite
API proposed in: mongodb/specifications#1534RST documentation was omitted from this PR in case the API changes during review. I plan to add RST documentation in a future PR.
Commits are separated to ease filtering the JSON test files and minor drive-by improvements. See the bulk-write.md file in mongodb/specifications#1534 for the expected behavior.
Testing
JSON files are copied from the mongodb/specifications#1534. These were reviewed in the spec PR and I do not expect need another review. test-mongoc-crud.c includes the prose tests from the specification. test-mongoc-bulkwrite.c includes libmongoc-specific tests.
Background
Terms
The following terms are used in this PR description:
MongoCollection.bulkWrite
refers to the existing API from the CRUD specification. The C driver implements in terms ofmongoc_bulk_operation_t
.MongoClient.bulkWrite
refers to the new API in the new Bulk Write specification. This PR implements in terms of the newmongoc_bulkwrite_t
handle.MongoCollection.bulkWrite
splits write operations by type, and sends separateinsert
,update
, anddelete
server commands.MongoCollection.bulkWrite
writes to only one collection.MongoClient.bulkWrite
uses thebulkWrite
server command. ThebulkWrite
command supports insert, update, and delete operations for multiple namespaces. ThebulkWrite
command is introduced in server 8.0.MongoClient.bulkWrite
can write to multiple collections.Both bulk write APIs require splitting large writes into separate commands to stay within server size limits. E.g. a call to
MongoClient.bulkWrite
with many writes may result in multiplebulkWrite
commands sent.Rationale
The following describes rationale behind some decisions in this PR:
Models
The specification defines model classes for each write. Here is the
DeleteOneModel
definition (with comments removed for brevity):This PR represents the required fields in arguments, and groups optional fields in an opaque
opts
struct:I thought this was a reasonable compromise between matching the spec and existing C driver patterns. If the spec later extends
DeleteOneModel
,mongoc_bulkwrite_deleteoneopts_t
can be extended.The existing C driver implementation of
MongoCollection.bulkWrite
usesbson_t
for options:However, I think a typed
mongoc_bulkwrite_deleteoneopts_t
is preferable tobson_t
. It may ease development with compile time type checks and auto complete.Naming
This PR uses a slightly different API naming pattern. Parts are logically separated by an underscore. Example:
mongoc_bulkwriteopts_set_bypassdocumentvalidation
.This differs slightly from the established pattern of separating each word. Example in current API:
mongoc_bulk_operation_set_bypass_document_validation
.I find the logical separation easier to read.
Validation
The existing C driver implementation of
MongoCollection.bulkWrite
includes default client-side validation:The default flags for writes are all the same:
UTF-8 validation appears broken. Passing invalid UTF-8 for string values does not result in error. See: CDRIVER-4448. The default validation only enables
BSON_VALIDATE_EMPTY_KEYS
to check there are no empty keys (e.g.{"": "foo"}
).CDRIVER-3731 is a request to configure validation at a broader scope. Quoting jeroen/mongolite#206:
The C driver benchmarks disable the default validation. I re-enabled the validation and reran the benchmark. This resulted in significant performance drops in the base comparison:
Since UTF-8 validation is already broken, there are requests to disable, validation negatively impacts performance, and it is not a spec requirement, this PR excludes default validation in
MongoClient.bulkWrite
.