Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: StringT: re-support other string encodings #62

Merged
merged 1 commit into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/String.js
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,16 @@ function encodingWidth(encoding) {
return 1;
case 'utf16le':
case 'utf16-le':
case 'utf-16be':
case 'utf-16le':
srl295 marked this conversation as resolved.
Show resolved Hide resolved
case 'utf16be':
case 'utf16-be':
case 'ucs2':
return 2;
default:
throw new Error('Unknown encoding ' + encoding);
//TODO: assume all other encodings are 1-byters
//throw new Error('Unknown encoding ' + encoding);
return 1;
Comment on lines +108 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the consumer could be warned of the unknown encoding, but not sure that there is facility for that.

Note that the pattern of throwing on an unknown encoding is a pattern copied from existing code in this same module, specifically in StringT.byteLength():

restructure/src/String.js

Lines 150 to 151 in fcf7d64

default:
throw new Error('Unknown encoding ' + encoding);

So is there a latent issue in byteLength() that may bite future users? I would assume that it does not need to be addressed as urgently as this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ the throw may have only been on encode() or bytelength() before, which is a more restricted set of encodings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-mac-roman is actually a known encoding, as well, it's handled by the underlying encoder. So one thing that could be done is to expand the set of encodings here to be exhaustive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this PR brings back the previous behavior which is good enough. Lets do the minimal changes, unless there's a reproduction with an actual bug

}
}

Expand Down
6 changes: 6 additions & 0 deletions test/String.js
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ describe('String', function() {
const string = new StringT(null, 'utf16le');
assert.equal(string.fromBuffer(Buffer.from('🍻', 'utf16le')), '🍻');
});

it('should decode x-mac-roman', function() {
const string = new StringT(null, 'x-mac-roman');
const buf = new Uint8Array([0x8a, 0x63, 0x63, 0x65, 0x6e, 0x74, 0x65, 0x64, 0x20, 0x63, 0x68, 0x87, 0x72, 0x61, 0x63, 0x74, 0x65, 0x72, 0x73]);
assert.equal(string.fromBuffer(buf), 'äccented cháracters');
})
});

describe('size', function() {
Expand Down
Loading