Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII space characters are trimmed (IRI) #56

Open
ranvis opened this issue Nov 28, 2018 · 3 comments
Open

Non-ASCII space characters are trimmed (IRI) #56

ranvis opened this issue Nov 28, 2018 · 3 comments

Comments

@ranvis
Copy link

ranvis commented Nov 28, 2018

Trailing non-ASCII space characters in URI are trimmed when the URI is expressed in IRI form with as_iri() and then fed to new().

say length URI->new(URI->new('%20')->as_iri); # 3
say length URI->new(URI->new('%09')->as_iri); # 3
say length URI->new(URI->new('%0B')->as_iri); # 3
say length URI->new(URI->new('%E3%80%80')->as_iri); # 0
say length URI->new(URI->new('%E2%81%9F')->as_iri); # 0

I am not sure if the problem is because new() trims trailing spaces, or because as_iri() unescapes non-ASCII space characters. (Maybe the latter?)

@oalders
Copy link
Member

oalders commented Feb 5, 2019

Looking at https://url.spec.whatwg.org/ it says that a URL parser should "Remove any leading and trailing C0 control or space from input." There's no reference to non-ASCII space characters.

The same spec says "Standardize on the term URL. URI and IRI are just confusing." I'm learning a lot today. :)

So, I'm not sure what the significance of non-ASCII spaces in an IRI is. Is it correct to say that they're allowed in URLs?

@vanHoesel
Copy link
Member

That requires some investigation and rethinking, as at some places it is talking about the 'real space' character, or the single byte representing it. It also references tabs specifically. And I did look at the algorithm describing the state-machine... It is a complicated thing.

same applies for #61

@oalders
Copy link
Member

oalders commented Feb 6, 2019

Thanks for your thoughts @vanHoesel. I was hoping you'd contribute to the conversation. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants