-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JsonLocation
consistently off by one character for many invalid JSON parsing cases
#1173
Comments
Sounds like a flaw indeed. This is likely due to code not compensating for already read (invalid) character.
One thing that would be useful as the first step would be addition of (failing) test cases. |
I've taken a stab at adding some tests in #1175. |
@hal7df Thank you for the tests! One thing I forgot to ask earlier: if you haven't yet done so (I don't think you have but just in case if you did just LMK), we'd need a CLA: https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf which is needed for the first contribution (but is good for all future contributions, so one time chore). If you could send it whenever you have a change that'd be great. Apologies for forgetting to mention this earlier; looking forwards to all the contributions! |
@cowtowncoder I did notice the mention of the CLA in the contributing doc in the main Jackson repo, but it appeared to make an exception for test code changes. If you'd like one nonetheless I can probably get one to you after sometime the holidays (my employer is involved, so I'd have to run it up the chain when everyone gets back). |
@hal7df Yeah you are right that CLA is really needed for code that we ship, and test code isn't covered. In the meantime I've been able to fix many cases, as well as realized that there's one trickier case (that of unrecognized tokens). |
JsonLocation
consistently off by one character for many invalid JSON parsing cases
Fixed most cases; 3 fail in a way that is bit trickier to fix and may need refactoring of decoding -- those I'll tackle for 2.17. But the bulk have been resolved for 2.16 branch, to be included in eventual 2.16.2. |
@hal7df Ok. So, turns out that about 2/3 are simple cases where last character just needs to be "unread" (offset adjusted) and things will work. 3 cases are bit more involved since they read invalid token; and while in theory location could be constructed before the first character is read, that'd be wasteful for the common case of not needing location. So will preferably try to adjust location after the fact. This, however, may be challenging for case of buffer boundaries etc. But one failing case cannot be solved per se but needs to be document (if not already done): one showing case of non-ASCII Unicode character, Column number for byte-based input. |
Ugggh. Fix here breaks something else -- ability to continue parsing in some cases with error recovery (single-character syntax problems). Only caught by |
Had to revert initial fix; this will also mean eventual fix will only go in 2.17, not earlier. Added new test against regression wrt error recovery. Hoping to fix in 2.17 very soon now. |
Issue
The
JsonLocation
attached to aJsonProcessingException
thrown when parsing an invalid JSON string is consistently one character to the right of the invalid character, except in cases where the error is due to an unexpected EOF. This affects bothJsonLocation.getCharOffset()
andJsonLocation.getColumnNr()
(presumablyJsonLocation.getByteOffset()
as well, although I haven't tested that explicitly); because this information is used to construct the exception message, that is affected as well.I first noticed this on Jackson 2.10.0, but the issue persists on Jackson 2.16.0.
Minimally reproducible example
Expected output
Actual output
(The unexpected end-of-input case here is correct IMO, just including it to be thorough)
The text was updated successfully, but these errors were encountered: