Incorrect handling of unicode ANSI skipping #30

ztravis · 2024-04-05T18:58:55Z

The current implementation of unicode ANSI replacement sequences is slightly wrong - according to https://www.biblioscape.com/rtf15_spec.htm

A scope delimiter (i.e. "{" or "}") should end the current skippable data
Control words or symbols should be considered a single skipped character (and in my testing with MS office, they're ignored)
Any binary data is also considered a single skipped character

I plan on opening a PR for these, just wanted to open an issue first in case that takes a while.

Gurushesh-Metapercept · 2024-07-29T11:47:32Z

@ztravis , could you please tell me how to run rtfparserkit? I have created a Maven project and added rtfparserkit as a dependency, but I'm still encountering errors. Have you successfully run it? If so, could you share any guidelines or solutions you have?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect handling of unicode ANSI skipping #30

Incorrect handling of unicode ANSI skipping #30

ztravis commented Apr 5, 2024

Gurushesh-Metapercept commented Jul 29, 2024

Incorrect handling of unicode ANSI skipping #30

Incorrect handling of unicode ANSI skipping #30

Comments

ztravis commented Apr 5, 2024

Gurushesh-Metapercept commented Jul 29, 2024