XALANJ-2725: Fix for when UTF16 surrogate pair crosses buffer boundaries #184

jkesselm · 2024-02-22T20:18:22Z

Fixes the specific buffer-crossing issue tested in the associated xalan-test branch.

As discussed in XALANJ-2725, there are still some edge conditions possible here. But it fixes one known bad case, and at least partially guards against another.

My preferred fix would be to have malformed UTF16 input throw exceptions rather than trying to dance around this to output (unusable) Numeric Character References for isolated surrogates, but the code is currently inconsistent about that and seems to suggest that we moved away from that for some reason... and I don't recall why we thought the fake-NCRs were a good idea.

If we stay with fake-NCRs for isolated surrogates, I'm seriously considering changing them to be fake-entity-references, which will at least not be syntactically incorrect; this could be done by replacing the current output, eg &#55308;, with something more like &ERR_INVALID_UTF16_SURROGATE_55308; , using the MsgKey string so we at least are in synch with the internationalization layer for clarity.

…lution, and I'm not sure whether any of the other surrogate handling needs similar fixes -- I don't know whether they ever run into the buffer break problem.

…er ended in a high surrogate.

kubycsolutions added 4 commits February 2, 2024 14:02

just documentation/parameter names

162e1f0

refactoring

856e896

This one's working for the test added in 2725. May not be cleanest so…

ec7f0e2

…lution, and I'm not sure whether any of the other surrogate handling needs similar fixes -- I don't know whether they ever run into the buffer break problem.

Document the characters()other()characters() issue if first char buff…

dfb7277

…er ended in a high surrogate.

jkesselm self-assigned this Feb 22, 2024

jkesselm requested review from garydgregory and mukulga February 22, 2024 20:28

jkesselm merged commit 77aa724 into master Feb 23, 2024
2 checks passed

jkesselm deleted the XALANJ-2725 branch February 23, 2024 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XALANJ-2725: Fix for when UTF16 surrogate pair crosses buffer boundaries #184

XALANJ-2725: Fix for when UTF16 surrogate pair crosses buffer boundaries #184

jkesselm commented Feb 22, 2024

XALANJ-2725: Fix for when UTF16 surrogate pair crosses buffer boundaries #184

XALANJ-2725: Fix for when UTF16 surrogate pair crosses buffer boundaries #184

Conversation

jkesselm commented Feb 22, 2024