UTF-8 Normalization #351

SamuelYvon · 2024-01-12T15:34:01Z

Hey!

Can we assume the input is UTF-8 normalized? Many SIMD-powered implementation, or implementations that do not rely on building Strings right away will assume the input is UTF-8 normalized. Is this a valid assumption?

sharpobject · 2024-01-15T00:09:54Z

Yes, you can assume that byte-wise different strings are different.

Because normalization is about equivalence of sequences of code-points and because the reference solution does not contain any code to normalize the city names, I think that the reference solution treats equivalent names represented by different sequences of code-points as unequal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 Normalization #351

UTF-8 Normalization #351

SamuelYvon commented Jan 12, 2024

sharpobject commented Jan 15, 2024

UTF-8 Normalization #351

UTF-8 Normalization #351

Comments

SamuelYvon commented Jan 12, 2024

sharpobject commented Jan 15, 2024