-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow typeLength to come from opts.column when decoding FIXED_LEN_BYTE_ARRAY #108
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @j4ys0n if you have a simple parquet file with this, you can add it to test/test-files.js
and that would provide a basic integration type test for it as well, but I don't see that as blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a unit test for this, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j4ys0n Can you add a test with a parquet file that this fails on? I tried one and it already was working.
A simple test in test/test-files.js
that looks something like:
it('nested_fixed_length_byte_array.parquet loads', async function () {
const schema = await readSchema('nested_fixed_length_byte_array.parquet');
const data = await readData('nested_fixed_length_byte_array.parquet');
const typeLength = schema.fields["BLOCK_NUMBER"].typeLength;
assert.equal(typeLength, 16);
const expectedBuffer = Buffer.from([0x00, 0x00, 0x03, 0xe8]);
assert.deepEqual(data[0], { BLOCK_NUMBER: expectedBuffer });
});
Testing again with float16_nonzeros_and_nans.parquet and float16_zeros_and_nans.parquet and they do show this fixing things. I'm about to get up a pr that tests all the various reference tests, so we should just be able to uncomment those two and that shows this works. PR coming soon to fix that, then I think we can update and merge this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the tests from the reference test, I think we are good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allllllrightythen
apologies for the unresponsiveness - i'm away for a bit. thank you for moving this forward 🙏 |
No problem. Thanks for the PR! This was released with v1.5.0 |
Problem
typeLength is present in column options but decoding is throwing an error.
thrown: "missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)"
options object for reference:
using
parquet-tools schema
here is the schema for this column:The parquet file is a direct export from snowflake and the data type of the column is
NUMBER(38,0)
.Solution
I traced through the code to find where the decode was erroring and added the ability to take the
typeLength
fromcolumn
in the column options when it is not present at the top level.Change summary:
see above
Steps to Verify:
decode a parquet file with this type of field.