Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #2212 Enhances validation of HTTP header names #2213

Closed
wants to merge 8 commits into from

Conversation

carryel
Copy link

@carryel carryel commented Sep 14, 2024

This is a PR for Issue #2212.

The patch contains two contents.

  1. If the HTTP header name contains \r\n and it is an incomplete packet, ignore it and proceed with parsing. At this time, I was wondering whether to allow not only \r\n but also single \n, so for now, I decided to allow only \r\n.
  2. Since there is an existing validation utility that validates cookie headers(https://github.com/eclipse-ee4j/grizzly/blob/master/modules/http/src/main/java/org/glassfish/grizzly/http/util/CookieHeaderParser.java), I tried using the existing validation instead of creating a new one.

Overall, I patched it so that there would be no major changes while maintaining the existing logic, and added related test cases.

@carryel carryel requested a review from arjantijms September 14, 2024 14:25
@breakponchito
Copy link

breakponchito commented Oct 24, 2024

@carryel Do you know if the validation should need to be applied to the heder value also? I tested to include the invalid characters on the value side and those are processed, example:

  • curl -i -v -H "X-MyHeader: \nhello" myendpoint
  • curl -i -v -H "X-MyHeader: \rhello" myendpoint
  • curl -i -v -H "X-MyHeader: \0hello" myendpoint

cases that currently are prevented:

  • curl -i -v -H "X-MyHeader\n: hello" myendpoint
  • curl -i -v -H "X-MyHeader\r: hello" myendpoint
  • curl -i -v -H "X-MyHeader\0: hello" myendpoint
  • curl -i -v -H "X-MyHeader\x00: hello" myendpoint
  • curl -i -v -H "X-MyHeader\x0A: hello" myendpoint
  • curl -i -v -H "X-MyHeader\x0D: hello" myendpoint

in all cases adding the header with a value containing the invalid chars the request is processed

@carryel
Copy link
Author

carryel commented Oct 25, 2024

@breakponchito As you know, the validation of header name and header value are different. The header value should be slightly less strict than the header name.

It is true that the header name requires token validation, and the header value requires field-content-related validation.

However, I only did the minimum patch because I was worried about backward compatibility and possible side effects.

There are so many related RFCs that there are slight differences, but if I extract a few, it is roughly as follows.

field-name = token

field-value = *field-content
field-content = field-vchar
[ 1*( SP / HTAB / field-vchar ) field-vchar ]
field-vchar = VCHAR / obs-text

+ Added the validation of HTTP header values
+ Name and value's validation are provided as options. (Improved based on #1 @breakponchito)
@breakponchito
Copy link

some test fail, it seems something regarding network connection, @carryel Can you try to run again the tests?

+ trivial) updated license and re-trigger status checks(eclipse-ee4j#2213).
@carryel
Copy link
Author

carryel commented Nov 19, 2024

@breakponchito It passed all checks except for the reviews. :( Thanks for notification. :)

@breakponchito
Copy link

I tried to run a manual test with an application and with the new grizzly property enabled to validate header value but no success, the request resolve the header and continue to get the resource:
request:
curl -i -v -H "X-MyHeader: he\rllo" http://localhost:8080/PayaraServletProject-1.0-SNAPSHOT/PayaraServletProject

response:
`

  • Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    HTTP/1.1 200 OK
    < Server: Payara Server 6.2024.11 #badassfish
    Server: Payara Server 6.2024.11 #badassfish
    < X-Powered-By: Servlet/6.0 JSP/3.1 (Payara Server 6.2024.11 #badassfish Java/Azul Systems, Inc./11)
    X-Powered-By: Servlet/6.0 JSP/3.1 (Payara Server 6.2024.11 #badassfish Java/Azul Systems, Inc./11)
    < Content-Type: text/json;charset=ISO-8859-1
    Content-Type: text/json;charset=ISO-8859-1
    < Content-Length: 233
    Content-Length: 233
    < X-Frame-Options: SAMEORIGIN
    X-Frame-Options: SAMEORIGIN
    `

probably I'm missing something to reject the response

@carryel
Copy link
Author

carryel commented Nov 19, 2024

@breakponchito The examples below may help you analyze your testcases.

> curl -i -v -H "X-MyHeader: he\rllo" http://localhost:8080
...
X-MyHeader: he\rllo
...
> printf 'GET / HTTP/1.1\r\nHost: localhost\r\nX-MyHeader: he\rllo\r\n\r\n' | nc localhost 8080
...
X-MyHeader: he

llo
...

For the first command, it appears that it was treated as a string of '\' char(0x5c) and 'r' char(0x72), not '\r' char(0x0d).

@carryel
Copy link
Author

carryel commented Nov 19, 2024

To explain a little more, the header value can have the following specifications, which are slightly different from the header name.

field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar    = VCHAR / obs-text
VCHAR          = %x21-7E ; visible (printing) characters
obs-text       = %x80-FF
SP             = %x20
HTAB           = %x09 ; horizontal tab

See: 
https://datatracker.ietf.org/doc/html/rfc7230#section-3.2
https://datatracker.ietf.org/doc/html/rfc5234#appendix-B.1

I interpreted that '\' char(0x5c) and 'r' char(0x72) belong to the above VCHAR.

@breakponchito
Copy link

breakponchito commented Nov 19, 2024

yep I saw how to test correctly instead of just send each character represented alone. With the following curls each character is scaped and validated against the new grizzly validations:

breakponchito pushed a commit to breakponchito/patched-src-grizzly that referenced this pull request Nov 19, 2024
+ trivial) updated license and re-trigger status checks(eclipse-ee4j#2213).
@breakponchito
Copy link

breakponchito commented Nov 22, 2024

Hey @carryel Based on what I review with my team, What could be the reason to not protect the Header Value content against new line character added alone?,
as we thought about it and following the RFC-9110 definition:

**field-value    = *field-content
  field-content  = field-vchar
                   [ 1*( SP / HTAB / field-vchar ) field-vchar ]
  field-vchar    = VCHAR / obs-text
  obs-text       = %x80-FF**

Field values containing CR, LF, or NUL characters are invalid and dangerous, due to the varying ways that implementations might parse and interpret those characters; a recipient of CR, LF, or NUL within a field value MUST either reject the message or replace each of those characters with SP before further processing or forwarding of that message. Field values containing other CTL characters are also invalid; however, recipients MAY retain such characters for the sake of robustness when they appear within a safe context (e.g., an application-specific quoted string that will not be processed by any downstream HTTP parser).

As we thought about it and following the RFC-9110 definition we should need to protect against that character.
I think be restrict against that character will produce some backward compatibility errors on the HTTP Header Value content side but I think it is better to minimize risk on the implementation and left to the implementor decide to change that behavior
if needed. I mean increase the level of restriction to prevent new line character on the Header Value content and by
default left the properties set as true to protect against those characters, and if for some reason this is not
needed change the default behavior by setting the property. Any thoughts let me know

@carryel
Copy link
Author

carryel commented Nov 23, 2024

@breakponchito Hi,

Actually, LF(x0A, '\n') is a bit special. However, LF itself is not included in the Header Value content, nor is this char used as a header value. The line break(the line terminator) of the HTTP header is CRLF(x0D x0A) according to the official specification. However, I think most HTTP servers usually treat a standalone LF as a line terminator.

The link below may help explain a little.
https://stackoverflow.com/questions/5757290/http-header-line-break-style

19.3 Tolerant Applications
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.

3.5. Message Parsing Robustness
Although the line terminator for the start-line and header fields is
the sequence CRLF, a recipient MAY recognize a single LF as a line
terminator and ignore any preceding CR.

When I simply tested LF with NGINX, it was like this.

`curl -i -v -H $'X-MyHeader: \nhello' http://localhost/...` : X-MyHeader's value is empty.
`curl -i -v -H $'X-MyHeader: h\nello' http://localhost/...` : X-MyHeader's value is "h".
`curl -i -v -H $'X-MyHeader: h\r\nello' http://localhost/...` : X-MyHeader's value is "h".

Patched Grizzly may also appear to allow LF, but the Header Value content does not include LF, and is processed as empty and "h" respectively. I think there would be an issue if Grizzly returned or allowed "\nhello" or "h\nello" as the value of "X-MyHeader".

In conclusion, it is not that LF is not prevented, but LF is processed as CRLF during the parsing process, and the final field values ​​to be utilized do not include LF.

@breakponchito
Copy link

@arjantijms @carryel Do you know who can review and approve?

@carryel
Copy link
Author

carryel commented Dec 6, 2024

@breakponchito Unfortunately, I am not sure who among the current Grizzly project members can review. Looking at the latest commits and activities, @arjantijms seems the most appropriate, but I am curious if he can review or what the current grizzly project operation status is.

Related issue: #2211

Pandrex247 pushed a commit to Pandrex247/patched-src-grizzly that referenced this pull request Dec 9, 2024
+ trivial) updated license and re-trigger status checks(eclipse-ee4j#2213).
breakponchito pushed a commit to breakponchito/patched-src-grizzly that referenced this pull request Dec 11, 2024
+ trivial) updated license and re-trigger status checks(eclipse-ee4j#2213).
@breakponchito
Copy link

breakponchito commented Dec 11, 2024

@carryel Hi again, to continue with the discussion for the RFC-9110 header validations
On our side after a second review of the content of the RFC-9110 we saw that
from the link you shared regarding tolerant applications for the LF character
the following RFC's are referred: RFC-2616 and RFC-7230. Checking details of each of
them we saw that the RFC-7230 obsoleted the RFC-2616 and the RFC-9110 obsoleted the
RFC-7230, this indicate that we must follow the RFC-9110 on top of them and
this indicate that LF character should need to be rejected. That is why we are not
agree to permit this character on the Header Value content and we need to
apply the restrict validation on this content also.

for reference:

  • RFC-9110 obsoletes the following list of RFC's: 2818, 7230, 7231, 7232, 7233, 7235, 7538, 7615, 7694
  • you can check that at the top of the RFC 9110 definition here: https://www.rfc-editor.org/rfc/rfc9110.html ,
  • from the link you shared regarding the tolerant applications that can process the LF, that information was added on the RFC-2616 then this was obsoleted by the 7230, and now the 9110 obsoleted the 7230, and
  • the information from the stackoverflow link is from 2011 it seems not uptodate

@carryel @dmatej @arjantijms let me know your thoughts

// Fast for correct values, slower for incorrect ones
try {
return isText[c];
} catch (ArrayIndexOutOfBoundsException ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to check if you are in the 0 - isText.length range than "exception driven code"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I agree that checking the range is better than throwing an exception. The reason for doing this is to maintain similarity with other existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great opportunity to stop this madness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for the comment:

* <p>Implementation note:<br>
* This class has been carefully tuned. </p>

need to dig at origin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for doing this is to maintain similarity with other existing code.

👍
+ to investigate the change in both places later (perhaps never)

@carryel
Copy link
Author

carryel commented Dec 12, 2024

@breakponchito Thanks for reviewing so many RFCs. I'll try to reflect the correction about restricting single LF.

@dmatej
Copy link
Contributor

dmatej commented Dec 12, 2024

I don't currently have the knowledge and time to learn all those specifications so I would believe you for now, just one idea/note - obsoleted specs are not always completely abandoned by the industry and if we simply forbid some features, we can have some compatibility issues. On the other hand if compatibility means compromising security, we should press in this direction at least.
Currently I have no idea if we need anyhow support any of obsoleted RFCs (just AI warned me that we should be careful).

+ Field values ​​cannot have a single LF as per RFC-9110(https://www.rfc-editor.org/rfc/rfc9110.html#section-5.5).
+ Additionally, this patch automatically removes support for multiple lines of http headers as mentioned in RFC-2616(https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2).
Note) The features related to this issue eclipse-ee4j#2212 only work when the STRICT_HEADER_NAME_VALIDATION_RFC_9110 and STRICT_HEADER_VALUE_VALIDATION_RFC_9110 options are enabled, and the existing code base behavior is maintained when options are not enabled.

+ Instead of throwing ArrayIndexOutOfBoundsException when judging Token and Text char, it ensures checking within the array range.
+ Since several existing http testcases do not respect CRLF in header values, I modified them to comply with the spec where it was not intentional.

This patch passed all grizzly existing testcases locally, depending on the presence of options.
> mvn clean test
All passed.
> mvn clean test -Dorg.glassfish.grizzly.http.STRICT_HEADER_NAME_VALIDATION_RFC_9110=true -Dorg.glassfish.grizzly.http.STRICT_HEADER_VALUE_VALIDATION_RFC_9110=true
All passed.
@carryel
Copy link
Author

carryel commented Dec 14, 2024

@breakponchito The content of header value has been modified to no longer allow LF.
@dmatej @pzygielo Added array range check to CookieHeaderParser utility. No matter how I look at it, it seems like there will be no side-effects.

More details are left in the commit log of d258c8b.

@carryel carryel requested review from dmatej and pzygielo December 23, 2024 01:27
@pzygielo
Copy link
Contributor

Unclear target branch - should it be master (as is) or main?

@carryel
Copy link
Author

carryel commented Dec 24, 2024

@pzygielo When I look at the source tree, it looks like main is correct. I'm wondering if I should clean up the PR based on main again, or if I should merge it to master first and then move it to main.

@carryel
Copy link
Author

carryel commented Dec 24, 2024

Moved to #2219

@carryel carryel closed this Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants