Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI 5.19 breaks Catalyst-Controller-DBIC-API #133

Open
eserte opened this issue Dec 27, 2023 · 14 comments
Open

URI 5.19 breaks Catalyst-Controller-DBIC-API #133

eserte opened this issue Dec 27, 2023 · 14 comments

Comments

@eserte
Copy link
Contributor

eserte commented Dec 27, 2023

The problem is described in https://rt.cpan.org/Ticket/Display.html?id=150855
It seems that a change in URI.pm is causing the problem. Before (until URI 5.18):

$ perl -MHTTP::Request::Common=POST -e 'warn((POST "http://localhost/something", { foo => undef })->dump)' 
POST http://localhost/something
Content-Length: 4
Content-Type: application/x-www-form-urlencoded

foo=

After (with URI 5.19):

$ perl -MHTTP::Request::Common=POST -e 'warn((POST "http://localhost/something", { foo => undef })->dump)' 
POST http://localhost/something
Content-Length: 3
Content-Type: application/x-www-form-urlencoded

foo

Note the mising equals sign.

@eserte
Copy link
Contributor Author

eserte commented Dec 28, 2023

It also seems that t/13_req.t in https://metacpan.org/release/YTURTLE/Net-Azure-EventHubs-0.09 also started to fail because of this change. See http://matrix.cpantesters.org/?dist=Net-Azure-EventHubs%200.09

@eserte
Copy link
Contributor Author

eserte commented Dec 29, 2023

Web-Request-0.11 seems to fail with URI >= 5.19, so possibly the same problem.
See also doy/web-request#4

@karenetheridge
Copy link
Member

This commit looks likely: 9ee8098

The new code looks more correct to me though, IMHO.

@eserte
Copy link
Contributor Author

eserte commented Dec 30, 2023

rfc3986 does not tell much about the format of query strings. It mentions that it can include key=value pairs, but even does not tell what separator characters (; or & or something else?) should be used. Especially it does not tell anything about value-less keys. Is there any other standard which can be followed?

@karenetheridge
Copy link
Member

I think the closest we have is https://url.spec.whatwg.org/#application/x-www-form-urlencoded, which seeks to obsolete parts of RFC3986. It speaks only of "null byte sequences", which could be interpreted as either undef or '' -- we should probably treat these equivalently when serializing, and when deserializing either could be correct (so it would be incorrect to assert one over the other in a test).

@eserte
Copy link
Contributor Author

eserte commented Dec 31, 2023

+1 for treating "" and undef equivalently. This could mean that the following should print the same:

$ perl5.39.5 -MURI -E '$u=URI->new("http:"); $u->query_form(foo => ""); say $u->query'
foo=
$ perl5.39.5 -MURI -E '$u=URI->new("http:"); $u->query_form(foo => undef); say $u->query'
foo

However I learned now that "foo" should be treated the same as "foo=" when parsing x-www-form-urlencoded content, so there's an additional problem in some parsing code...

@eserte
Copy link
Contributor Author

eserte commented Dec 31, 2023

For example, CGI.pm may do strange things if the equals sign is missing:

$ echo -n "foo=" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=4 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'foo' => ''
        };
$ echo -n "foo" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=3 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'keywords' => 'foo'
        };

But if there's another parameter, then it looks sane again:

$ echo -n "bar=baz&foo=" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=12 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'bar' => 'baz',
          'foo' => ''
        };
$ echo -n "bar=baz&foo" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=11 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'bar' => 'baz',
          'foo' => ''
        };

@vanHoesel
Copy link
Member

vanHoesel commented Dec 31, 2023

I have been following this conversation a bit, and too be honest, it gets me slightly worried.

Surely, in the old age off HTTP 1.0, and HTML 3.5 and JavaScript 1.2 when I joined the band, those POST request with x-www-form-urlencoded, it was not possible to make the difference between an empty textbox, or a texttbox that was not filled in, and we only had empty strings. Therefor, it made sense to treat them equally and passing it as foo= was the right thing to do... as an empty string.

But nowadays, we have api-clients that are talking to api-back-ends and potentially need to be able to send undef, or JSON's null, or nil 'values' instead of 'empty string'. As I got accustomed to, is that foo on itself was meant to be handled as undef, and foo= as an empty string.

Whatever the code would do downstream is up to the application developer.

But it does worry, that we might indeed break downstream code. If we do, we might be better off rolling back or prevent breakage... for now. And suggest to see what else is breaking – I think our amazing smoke-testers already do an amazing job – and work with the downstream developers to prepare a fix that will work the moment we say that foo on itself means undef

Oh... and happy new-year :-)

@abraxxa
Copy link

abraxxa commented Feb 21, 2024

How we proceed in fixing this?

@haarg
Copy link
Member

haarg commented Feb 21, 2024

It should be trivial to update Catalyst-Controller-DBIC-API to fix its tests to account for this change.

@abraxxa
Copy link

abraxxa commented Feb 21, 2024

As the change in URI affects other dists too I was hoping it is reverted here.

@abraxxa
Copy link

abraxxa commented Feb 21, 2024

I've changed the test to pass an empty string instead of undef so the same URI as before is generated and tested.

@haarg
Copy link
Member

haarg commented Feb 21, 2024

A few data points:

CGI.pm:

  • foo+bar parses as { keywords => [ 'foo', 'bar' ] }
  • foo&bar=baz parses as { foo => '', bar => 'baz' }
  • { foo => undef, bar => 'baz' } encodes as bar=baz

HTTP::Body (used by Catalyst):

  • foo+bar parses as {}
  • foo&bar=baz parses as { bar => 'baz' }

URL::Encode:

  • foo&bar=baz parses as { foo => undef, bar => "baz" }

WWW::Form::UrlEncoded:

  • foo&bar=baz parses as { foo => '', bar => 'baz' }
  • { foo => undef, bar => 'baz' } encodes as foo=&bar=baz

URI 5.18:

  • foo&bar=baz parses as { foo => '', bar => 'baz' }
  • { foo => undef, bar => 'baz' } encodes as foo=&bar=baz

URI 5.19+:

  • foo&bar=baz parses as { foo => undef, bar => 'baz' }
  • { foo => undef, bar => 'baz' } encodes as foo&bar=baz

Based on the URL spec, decoding a parameter without an equals sign should give a value of an empty string. An "empty byte sequence" is an empty string. It is also used to describe a value decoded from key=. So the new URI behavior for parsing is definitely wrong.

As for encoding, the spec says "Assert: tuple’s name and tuple’s value are scalar value strings". So encoding anything other than a string is undefined behavior. Based on this, I would say that the new URI behavior for encoding is also wrong.

HTTP::Body also apparently has a bug related to this.

@karenetheridge
Copy link
Member

karenetheridge commented Feb 22, 2024

Mojolicious:

  • foo&bar=baz parses as { foo => '', bar => 'baz' }
  • foo=&bar=baz parses as { foo => '', bar => 'baz' }
  • { foo => '', bar => 'baz' } encodes as foo=&bar=baz
  • { foo => undef, bar => 'baz' } encodes as bar=baz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants