Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex does not match isolated combining mark as whitespace if preceded by whitespace #724

Open
digitalheir opened this issue Feb 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@digitalheir
Copy link

digitalheir commented Feb 24, 2024

Description

I believe Regexes should function on Unicode scalars, not on Swift Chars. This is a failure mode: <space>+<combining mark> (such as " ̃") is seen as a single whitespace character, where all other programming languages I know of regard it conceptually as a single whitespace character plus a single non-spacing combining character.

Reproduction

let aTilde = "" // \u{0061} + \u{0303}
let aMatch = try! /\S/.firstMatch(in: aTilde) 
print(aMatch?.output) // "ã" hm... I would have expected only the scalar 'a' to match
let combiningTilde = "̃" // \u{0303}
let tildeMatch = try! /\S/.firstMatch(in: combiningTilde)
print(tildeMatch?.output) // "̃" correct to me
let spaceWithTilde = " ̃" // space+tilde
let spaceTildeMatch = try! /\S/.firstMatch(in: spaceWithTilde)
print(spaceTildeMatch?.output) // nil, but I would expect \u{0303} to match

Expected behavior

tilde scalar was expected to match regex, since it is not a whitespace codepoint (WS) according to Unicode specification, but non-spacing (Mn)

Environment

5.9

Additional information

No response

@digitalheir digitalheir added the bug Something isn't working label Feb 24, 2024
@AnthonyLatsis
Copy link

cc @hamishknight

@hamishknight hamishknight transferred this issue from swiftlang/swift Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants