Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-2926 BSON Binary Vector Subtype Support #1658
DRIVERS-2926 BSON Binary Vector Subtype Support #1658
Changes from 13 commits
ec64aa9
d5ab5f1
91212ca
8757836
830632a
67b410d
07667a1
80f19fa
7255b6c
0ff289b
b3d6ea0
8cfc15a
a6ee71b
0d10725
5935ce0
a8b464e
f50677b
2d4ea72
f50b1cc
d6f160b
60088d9
a1b87f7
d267b2a
2cd0b4a
0b888fb
c823174
fcc1be5
d00541a
b30ed35
0ccc399
ae32422
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"result in up to significant" -> "result in significant"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"\x09"
-> 9.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a can of worms. I'm going to leave it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take this back. I will use simple integers when describing subtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence seems to be hiding a lot of complexity. What should the driver APIs look like? What happens if the "padding" field is non-zero but the dtype is a multiple of 8? How does the padding field change the output? Are we planning to add a NONPACKED_BIT type which represents the data as unit1 (or bool) eg the user would actually give [1, 0, 0, 1] for a 4-bit vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my understanding that it is up to the driver to implement the API as they please. Is that not true?
If the padding is non-zero for a dtype where it does not make sense, then the tests will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous documents I saw had other data types too, why is this spec limited to these 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the defined work. The other document is not up-to-date or correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yesterday I gained some clarity on the initial requirements. They were for implementation in Python and MongoT. This specification is under the same time pressure, so perhaps we discuss the full design and whether this should be expanded to include those that are not yet implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is out of place. We should move this to a "rationale" or "Q&A" section like we do for other specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole bit was placed here to make the
*
more easy to understand. I would prefer it to stay as-is. Going into detail about the brand new field of Quantization in LLMs is not something I want to do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Shane here. We generally avoid editorializing like this in our RFC-like specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x09
-> 9. We usually just refer to the subtype as a regular number. For example in https://github.com/mongodb/specifications/blob/master/source/bson-binary-encrypted/binary-encrypted.mdThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit that there is inconsistency. It is coming from the Tech Spec document. For our purposes, the drivers team, I am happy to make it consistent. When we refer to "subtype" we use integers. When we refer to "dtype", we use hex. Cool? @ShaneHarvey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file path might change, let's just use the full JIRA link to the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should describe all the assertions that the test runner should make. Many are obvious, but one that might be missed, for example, is that you can round trip from vector to bson and then back to vector.
IIRC the bson_corpus README does this pretty well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is outdated now. If you need more, would you please be specific? I am also happy to revisit once another driver takes a stab at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in person, so I think we're in alignment now. The idea is to just look at all the assertions being done in the Python driver test runner, and write them down in prose here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2d4ea72 - Added Required Tests section to README. Removed JSON from tests.