Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which leading characters should be escaped? #66

Closed
gibson042 opened this issue Nov 27, 2023 · 10 comments · Fixed by #77
Closed

Which leading characters should be escaped? #66

gibson042 opened this issue Nov 27, 2023 · 10 comments · Fixed by #77

Comments

@gibson042
Copy link
Contributor

As noted in #58 (comment) , an unescaped non-digit leading character could still be interpreted as part of an escape sequence spanning concatenated RegExp.escape output.

consider e.g. new RegExp("\\c" + RegExp.escape("J")) in a web browser implementing |ExtendedAtom|, where the result should match "\\cJ" and [unlike /\cJ/] fail to match "\n"

@gibson042 gibson042 mentioned this issue Nov 27, 2023
30 tasks
@bergus
Copy link
Contributor

bergus commented Nov 27, 2023

The explainer argues that the output of RegExp.escape is not put "in a place where it would obviously mean something else". See also #17 and #29.

@gibson042
Copy link
Contributor Author

#29 is indeed analogous, and #29 (comment) is a concise explanation of affected scenarios. However, I don't see a clear decision on this question in either that issue or #37.

@ljharb
Copy link
Member

ljharb commented Nov 27, 2023

cc @erights and @bakkot for thoughts?

@bakkot
Copy link
Collaborator

bakkot commented Nov 27, 2023

I'm inclined to say that the context after \c is similar to the context after \x, in that if you see either \x${v} or \c${v} you should know that the output of v is at least potentially going to be used as part of the escape sequence. There is no other reason to write those strings.

It's not like \1${v} where the \1 is a coherent thing to write on its own, such that you'd be surprised for the ${v} to become part of the escape sequence.

@ljharb
Copy link
Member

ljharb commented Nov 27, 2023

That matches my intuition here.

@gibson042
Copy link
Contributor Author

gibson042 commented Nov 27, 2023

I'm inclined to say that the context after \c is similar to the context after \x, in that if you see either \x${v} or \c${v} you should know that the output of v is at least potentially going to be used as part of the escape sequence. There is no other reason to write those strings.

It's not like \1${v} where the \1 is a coherent thing to write on its own, such that you'd be surprised for the ${v} to become part of the escape sequence.

\x and \c are in fact coherent on their own or whenever not followed by content that completes an escape, although I agree that the intent of either is generally (and pretty much exclusively) to express such an escape. But I would nonetheless be surprised if whether \x${v} or \c${v} is an escape depends upon the value of v (and surprised in the same was as \x41 vs. \x4x and \cJ vs. \c=, plus one level of indirection).

@bakkot
Copy link
Collaborator

bakkot commented Nov 27, 2023

The question is whether that surprise is a problem we should solve. I don't think it is. There is no reasonable expectation for what behavior you're going to get if you write \c${RegExp.escape(v)}.

I think it's fine to say that the output of RegExp.escape is not safe to use in a place where it would obviously mean something else, and that "immediately after \x" and "immediately after \c" are such places.

@ljharb
Copy link
Member

ljharb commented Nov 27, 2023

Sounds like we're in agreement here, so I'll close this, but will reopen if that's inaccurate.

@ljharb ljharb closed this as completed Nov 27, 2023
@ljharb
Copy link
Member

ljharb commented Apr 8, 2024

Reopening pending confirmation from @gibson042 and @erights.

@ljharb ljharb reopened this Apr 8, 2024
@erights
Copy link

erights commented Apr 8, 2024

To clarify for those watching this thread. Template tags would enable safe context sensitive escaping. I originally objected to RegExp.escape because it does not. I knew that there was an unsolvable even-vs-odd backslash problem, and thought there were others. After @bakkot clarified that the only such hazard was even-vs-odd backslash, I felt this was safe enough, because the exceptional unsafe case was narrow, easy to state, easy to remember, and easy to check by eyeball. To get there, as far as I am concerned, we gave up on the goal that the output of RegExp.escape be readable.

Later on in the current tc39 meeting, we'll discuss Array.isTemplateObject. Whatever my opinion on this specifically is (which is complicated), I strongly support the motivation for the proposal: To enable programmers (authors and reviewers) to reason clearly about the distinction between the literal parts that were authored as part of the program, vs attacker-controlled less-trusted data handled by that more-trusted program.

Therefore, I feel strongly that the first character needs to be adequately escaped to restore that simple safety property. My concern does not hinge on whether one would write \c${v} on purpose. It might be written by accident, in which case the RegExp may be buggy, in the sense that it does not mean what its author thought it meant. But even that case should not compromise this safety property. Reviewers who don't otherwise care about what the RegExp means should still be able, by eyeball and without an unrealistic memory burden, to reason about the impact of attacker-controlled less-trusted data, simply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants