-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source-mysql: Correctly handle unsigned integer values #1564
Conversation
source-mysql/discovery.go
Outdated
if sval, ok := asInt64(val); ok && t.Unsigned { | ||
return uint64(sval) & 0xFF, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work, but I don't grok why we need to cast to int64 and then to uint64, instead of directly from int8 to uint8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like asInt64
could instead be an asUnsigned
function which does a dynamic check of the actual integer type, and then does a re-interpret cast from signed -> unsigned (if you have an int8
value, return uint8(value)
).
Though I suppose mediumint
may require a bit more specialized handling (an isMedium
bool)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, on reflection I think the asInt64
helper is an unnecessary bit of complexity. I'll replace it with type assertions for the appropriate integer sizes in each case (int8 for tinyint
, int16 for smallint
, etc) and only apply a mask to mediumint
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one comment for me to understand how this works, and probably a good idea to have that answer as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments
Also, can you speak to how migration of the state checkpoints work, from "foo": "int" => "foo": {"type": "int"} ?
Has this dynamic "maybe it's a string or object" always been supported, and you're just using it here?
] | ||
}, | ||
"id": { | ||
"type": "integer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice-to-have is that we could add a minimum: 0
constraint here 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
source-mysql/discovery.go
Outdated
if sval, ok := asInt64(val); ok && t.Unsigned { | ||
return uint64(sval) & 0xFF, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like asInt64
could instead be an asUnsigned
function which does a dynamic check of the actual integer type, and then does a re-interpret cast from signed -> unsigned (if you have an int8
value, return uint8(value)
).
Though I suppose mediumint
may require a bit more specialized handling (an isMedium
bool)?
Yes, we previously implemented support for struct-valued type information in MySQL so that we could store the list of enum/set options, and we're just reusing that same capability here to store metadata about integer signedness. Regarding migration, I don't believe that any of the changes I made in this PR would cause older metadata with a bare string |
Currently we capture the values of an unsigned integer column correctly via backfills, but incorrectly cast uintN->intN when processing them as replication events (because the value encoding used in the MySQL binlog doesn't distinguish between signed and unsigned integers of a particular bit-width). The behavior is currently incorrect and the snapshot demonstrates the incorrect negative values obtained via replication.
The bulk of this churn is because of the type metadata changes to support unsigned integer columns better. There's also a handful of snapshot changes from the recent PR which made the default backfill mode be unfiltered when the key of a table is a text column type. Since CI builds have been passing those changes must be from the long tests that we skip during CI.
…GNED` The `TestUnsignedIntegers` test shows that we correctly handle replicated values of unsigned integer columns given correct column metadata from the initial discovery. This test shows that the DDL query parsing logic also produces correct metadata for such column types.
Adds a `minimum: 0` clause to the generated JSON schema for unsigned integer column types.
fa4c14e
to
87c4886
Compare
Description:
Previously there was a bug where values of an unsigned integer column which were received via replication would incorrectly be cast to the bit-equivalent signed integer value. This applied to all widths of integer -- a
TINYINT UNSIGNED
column could demonstrate this with value222
being incorrectly captured as-34
.This applied only to replication, backfilled values were and are still handled correctly.
The root of the issue is that (unlike the value encoding used for backfill query results) the MySQL binlog value representation doesn't distinguish based on the signedness of integer values, only their bit-width. Consequently the client library parsed all such values as signed integers of the appropriate width and we serialized them unmodified.
The fix was to keep track of an additional signed/unsigned flag as part of the column type metadata for integer columns. When the column is unsigned we cast the value to an unsigned integer and where necessary (primarily for the 24-bit
MEDIUMINT
type but we'll handle any weirdness that might theoretically come up by goingintN -> int64 -> uint64 -> mask
) trim off any sign-extended upper bits.Workflow steps:
No user action is required, except in cases where we believe that unsigned integer values might previously have been captured incorrectly. In those cases we should re-backfill the impacted bindings, ideally discarding old data and materializing into a fresh destination table.
This change is