-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-ASCII space characters are trimmed (IRI) #56
Comments
Looking at https://url.spec.whatwg.org/ it says that a URL parser should "Remove any leading and trailing C0 control or space from input." There's no reference to non-ASCII space characters. The same spec says "Standardize on the term URL. URI and IRI are just confusing." I'm learning a lot today. :) So, I'm not sure what the significance of non-ASCII spaces in an IRI is. Is it correct to say that they're allowed in URLs? |
That requires some investigation and rethinking, as at some places it is talking about the 'real space' character, or the single byte representing it. It also references tabs specifically. And I did look at the algorithm describing the state-machine... It is a complicated thing. same applies for #61 |
Thanks for your thoughts @vanHoesel. I was hoping you'd contribute to the conversation. :) |
Trailing non-ASCII space characters in URI are trimmed when the URI is expressed in IRI form with as_iri() and then fed to new().
I am not sure if the problem is because new() trims trailing spaces, or because as_iri() unescapes non-ASCII space characters. (Maybe the latter?)
The text was updated successfully, but these errors were encountered: