From 9137b6fcd8d9907739aa873d495a58113ef7bd54 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 4 Apr 2024 13:50:42 -0600 Subject: [PATCH 01/14] WIP --- spec.html | 381 ++++++++---------------------------------------------- 1 file changed, 52 insertions(+), 329 deletions(-) diff --git a/spec.html b/spec.html index 2cc6451913..372e753dd6 100644 --- a/spec.html +++ b/spec.html @@ -588,7 +588,6 @@

Terminal Symbols

In contrast, in the syntactic grammar, a contiguous run of fixed-width code points is a single terminal symbol.

Terminal symbols come in two other forms:

@@ -16263,179 +16262,48 @@

Syntax

Unicode Format-Control Characters

The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).

It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.

-

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see ) outside of comments, string literals, template literals, and regular expression literals.

+

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. These characters can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text, they are treated as white space characters (see ) outside of comments, string literals, template literals, and regular expression literals.

- +

White Space

White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a |StringLiteral|, a |RegularExpressionLiteral|, a |Template|, or a |TemplateSubstitutionTail| where they are considered significant code points forming part of a literal value. They may also occur within a |Comment|, but cannot appear within any other kind of token.

-

The ECMAScript white space code points are listed in .

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- Code Points - - Name - - Abbreviation -
- `U+0009` - - CHARACTER TABULATION - - <TAB> -
- `U+000B` - - LINE TABULATION - - <VT> -
- `U+000C` - - FORM FEED (FF) - - <FF> -
- `U+FEFF` - - ZERO WIDTH NO-BREAK SPACE - - <ZWNBSP> -
- any code point in general category “Space_Separator” - - - <USP> -
-
- -

U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) code points are part of <USP>.

-
- -

Other than for the code points listed in , ECMAScript |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

-

Syntax

WhiteSpace :: - <TAB> - <VT> - <FF> - <ZWNBSP> - <USP> + <U+0009 (CHARACTER TABULATION)> + <U+000B (LINE TABULATION)> + <U+000C (FORM FEED)> + <U+0020 (SPACE)> + <U+00A0 (NO-BREAK SPACE)> + <U+FEFF (ZERO WIDTH NO-BREAK SPACE)> + > any code point with the Unicode General_Category “Space_Separator” + +

Other than for some of the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

+
- +

Line Terminators

-

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (). A line terminator cannot occur within any token except a |StringLiteral|, |Template|, or |TemplateSubstitutionTail|. <LF> and <CR> line terminators cannot occur within a |StringLiteral| token except as part of a |LineContinuation|.

+

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (). A line terminator cannot occur within any token except a |StringLiteral|, |Template|, or |TemplateSubstitutionTail|. U+000A (LINE FEED) and U+000D (CARRIAGE RETURN) line terminators cannot occur within a |StringLiteral| token except as part of a |LineContinuation|.

A line terminator can occur within a |MultiLineComment| but cannot occur within a |SingleLineComment|.

Line terminators are included in the set of white space code points that are matched by the `\\s` class in regular expressions.

-

The ECMAScript line terminator code points are listed in .

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- Code Point - - Unicode Name - - Abbreviation -
- `U+000A` - - LINE FEED (LF) - - <LF> -
- `U+000D` - - CARRIAGE RETURN (CR) - - <CR> -
- `U+2028` - - LINE SEPARATOR - - <LS> -
- `U+2029` - - PARAGRAPH SEPARATOR - - <PS> -
-
-

Only the Unicode code points in are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they meet the requirements listed in . The sequence <CR><LF> is commonly used as a line terminator. It should be considered a single |SourceCharacter| for the purpose of reporting line numbers.

+

Only the Unicode code point sequences matched by |LineTerminatorSequence| are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they are matched by |WhiteSpace|. The sequence « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » is commonly used as a line terminator. It should be considered a single |SourceCharacter| for the purpose of reporting line numbers.

Syntax

LineTerminator :: - <LF> - <CR> - <LS> - <PS> + <U+000A (LINE FEED)> + <U+000D (CARRIAGE RETURN)> + <U+2028 (LINE SEPARATOR)> + <U+2029 (PARAGRAPH SEPARATOR)> LineTerminatorSequence :: - <LF> - <CR> [lookahead != <LF>] - <LS> - <PS> - <CR> <LF> + <U+000A (LINE FEED)> + <U+000D (CARRIAGE RETURN)> [lookahead != <U+000A (LINE FEED)>] + <U+2028 (LINE SEPARATOR)> + <U+2029 (PARAGRAPH SEPARATOR)> + <U+000D (CARRIAGE RETURN)> <U+000A (LINE FEED)>
@@ -16546,10 +16414,10 @@

Syntax

`A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z` UnicodeIDStart :: - > any Unicode code point with the Unicode property “ID_Start” + > any code point with the Unicode property “ID_Start” UnicodeIDContinue :: - > any Unicode code point with the Unicode property “ID_Continue” + > any code point with the Unicode property “ID_Continue”

The definitions of the nonterminal |UnicodeEscapeSequence| is given in .

@@ -17117,7 +16985,7 @@

Syntax

The definition of the nonterminal |HexDigit| is given in . |SourceCharacter| is defined in .

-

<LF> and <CR> cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n` or `\\u000A`.

+

U+000A (LINE FEED) and U+000D (CARRIAGE RETURN) cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n`, `\\x0A`, or `\\u{A}`.

@@ -17188,23 +17056,17 @@

Static Semantics: SV ( ): a String

The SV of EscapeSequence :: `0` is the String value consisting of the code unit 0x0000 (NULL).
  • - The SV of CharacterEscapeSequence :: SingleEscapeCharacter is the String value consisting of the code unit whose numeric value is determined by the |SingleEscapeCharacter| according to . + The SV of CharacterEscapeSequence :: SingleEscapeCharacter is the String value consisting of the single code unit associated with |SingleEscapeCharacter| according to .
  • - - @@ -17212,13 +17074,7 @@

    Static Semantics: SV ( ): a String

    `\\b` - - @@ -17226,13 +17082,7 @@

    Static Semantics: SV ( ): a String

    `\\t` - - @@ -17240,13 +17090,7 @@

    Static Semantics: SV ( ): a String

    `\\n` - - @@ -17254,13 +17098,7 @@

    Static Semantics: SV ( ): a String

    `\\v` - - @@ -17268,13 +17106,7 @@

    Static Semantics: SV ( ): a String

    `\\f` - - @@ -17282,13 +17114,7 @@

    Static Semantics: SV ( ): a String

    `\\r` - - @@ -17296,13 +17122,7 @@

    Static Semantics: SV ( ): a String

    `\\"` - - @@ -17310,13 +17130,7 @@

    Static Semantics: SV ( ): a String

    `\\'` - - @@ -17324,13 +17138,7 @@

    Static Semantics: SV ( ): a String

    `\\\\` - -
    - Escape Sequence - - Code Unit Value - - Unicode Character Name + |SingleEscapeCharacter| - Symbol + Code Unit
    - `0x0008` - - BACKSPACE - - <BS> + 0x0008 (BACKSPACE)
    - `0x0009` - - CHARACTER TABULATION - - <HT> + 0x0009 (CHARACTER TABULATION)
    - `0x000A` - - LINE FEED (LF) - - <LF> + 0x000A (LINE FEED)
    - `0x000B` - - LINE TABULATION - - <VT> + 0x000B (LINE TABULATION)
    - `0x000C` - - FORM FEED (FF) - - <FF> + 0x000C (FORM FEED)
    - `0x000D` - - CARRIAGE RETURN (CR) - - <CR> + 0x000D (CARRIAGE RETURN)
    - `0x0022` - - QUOTATION MARK - - `"` + 0x0022 (QUOTATION MARK)
    - `0x0027` - - APOSTROPHE - - `'` + 0x0027 (APOSTROPHE)
    - `0x005C` - - REVERSE SOLIDUS - - `\\` + 0x005C (REVERSE SOLIDUS)
    @@ -17705,7 +17513,7 @@

    Static Semantics: TRV ( ): a String

    -

    TV excludes the code units of |LineContinuation| while TRV includes them. <CR><LF> and <CR> |LineTerminatorSequence|s are normalized to <LF> for both TV and TRV. An explicit |TemplateEscapeSequence| is needed to include a <CR> or <CR><LF> sequence.

    +

    TV excludes the code units of |LineContinuation| while TRV includes them. « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » and « U+000D (CARRIAGE RETURN) » |LineTerminatorSequence|s are normalized to « U+000A (LINE FEED) » for both TV and TRV. An explicit |TemplateEscapeSequence| is needed to include a « U+000D (CARRIAGE RETURN) » or « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » sequence.

    @@ -33105,42 +32913,14 @@

    Expanded Years

    - +

    Time Zone Offset String Format

    ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601. The format is described by the following grammar. - The usage of Unicode code points in this grammar is listed in .

    - - - - - - - - - - - - -
    - Code Point - - Unicode Name - - Abbreviation -
    - `U+2212` - - MINUS SIGN - - <MINUS> -
    -
    -

    Syntax

    UTCOffset ::: @@ -33149,11 +32929,9 @@

    Syntax

    TemporalSign Hour HourSubcomponents[~Extended] TemporalSign ::: - ASCIISign - <MINUS> - - ASCIISign ::: one of - `+` `-` + `+` + `-` + <U+2212 (MINUS SIGN)> Hour ::: `0` DecimalDigit @@ -35921,42 +35699,24 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    CharacterEscape :: ControlEscape - 1. Return the numeric value according to . + 1. Return the numeric value of the code point associated with |ControlEscape| in . - - - - - - @@ -35964,16 +35724,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `n` - - - @@ -35981,16 +35732,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `v` - - - @@ -35998,16 +35740,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `f` - - - @@ -36015,16 +35748,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `r` - - -
    - ControlEscape - - Numeric Value + |ControlEscape| Code Point - Unicode Name - - Symbol -
    `t` - 9 - - `U+0009` - - CHARACTER TABULATION - - <HT> + U+0009 (CHARACTER TABULATION)
    - 10 - - `U+000A` - - LINE FEED (LF) - - <LF> + U+000A (LINE FEED)
    - 11 - - `U+000B` - - LINE TABULATION - - <VT> + U+000B (LINE TABULATION)
    - 12 - - `U+000C` - - FORM FEED (FF) - - <FF> + U+000C (FORM FEED)
    - 13 - - `U+000D` - - CARRIAGE RETURN (CR) - - <CR> + U+000D (CARRIAGE RETURN)
    @@ -36040,7 +35764,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    1. Return the numeric value of U+0000 (NULL). -

    `\\0` represents the <NUL> character and cannot be followed by a decimal digit.

    +

    `\\0` represents U+0000 (NULL) and cannot be followed by a decimal digit.

    CharacterEscape :: HexEscapeSequence @@ -49611,7 +49335,6 @@

    Number Conversions

    Time Zone Offset String Format

    - From 0660251f9d6a47c6de3947b73b5a146bcb825bf8 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Fri, 5 Apr 2024 20:29:07 -0600 Subject: [PATCH 02/14] add a note about notation back --- spec.html | 1 + 1 file changed, 1 insertion(+) diff --git a/spec.html b/spec.html index 372e753dd6..35b2792b49 100644 --- a/spec.html +++ b/spec.html @@ -588,6 +588,7 @@

    Terminal Symbols

    In contrast, in the syntactic grammar, a contiguous run of fixed-width code points is a single terminal symbol.

    Terminal symbols come in two other forms:

      +
    • In the lexical and RegExp grammars, Unicode code points without a conventional printed representation are instead shown in the form "<U+0000 (NULL)>" where `0000` is 4 to 6 hexits representing the code point in hexadecimal notation and `NULL` is the code point name.
    • In the syntactic grammar, certain terminal symbols (e.g. |IdentifierName| and |RegularExpressionLiteral|) are shown in italics, as they refer to the nonterminals of the same name in the lexical grammar.
    From 85c7abf95bf178b1c15b06fe4d34741722c8b890 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Fri, 5 Apr 2024 20:36:12 -0600 Subject: [PATCH 03/14] more consistent notation --- spec.html | 40 ++++++++-------------------------------- 1 file changed, 8 insertions(+), 32 deletions(-) diff --git a/spec.html b/spec.html index 35b2792b49..fa9faec693 100644 --- a/spec.html +++ b/spec.html @@ -15983,7 +15983,7 @@

    Syntax

    The components of a combining character sequence are treated as individual Unicode code points even though a user might think of the whole sequence as a single character.

    In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value. Within a comment, such an escape sequence is effectively ignored as part of the comment.

    -

    ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence `\\u000A`, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence `\\u000A` occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write `\\n` instead of `\\u000A` to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

    +

    ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence `\\u000A`, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence `\\u000A` occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write `\\n` instead of `\\u000A` to cause a LINE FEED to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

    @@ -44915,19 +44915,13 @@

    Code Point - - Unicode Character Name - Escape Sequence - U+0008 - - - BACKSPACE + U+0008 (BACKSPACE) `\\b` @@ -44935,10 +44929,7 @@

    - U+0009 - - - CHARACTER TABULATION + U+0009 (CHARACTER TABULATION) `\\t` @@ -44946,10 +44937,7 @@

    - U+000A - - - LINE FEED (LF) + U+000A (LINE FEED) `\\n` @@ -44957,10 +44945,7 @@

    - U+000C - - - FORM FEED (FF) + U+000C (FORM FEED) `\\f` @@ -44968,10 +44953,7 @@

    - U+000D - - - CARRIAGE RETURN (CR) + U+000D (CARRIAGE RETURN) `\\r` @@ -44979,10 +44961,7 @@

    - U+0022 - - - QUOTATION MARK + U+0022 (QUOTATION MARK) `\\"` @@ -44990,10 +44969,7 @@

    - U+005C - - - REVERSE SOLIDUS + U+005C (REVERSE SOLIDUS) `\\\\` From 80989f2e23c63efb716afa445ef47f7a4b648597 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 16:16:36 -0600 Subject: [PATCH 04/14] hexits --- spec.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.html b/spec.html index fa9faec693..ac3a9d7e8f 100644 --- a/spec.html +++ b/spec.html @@ -588,7 +588,7 @@

    Terminal Symbols

    In contrast, in the syntactic grammar, a contiguous run of fixed-width code points is a single terminal symbol.

    Terminal symbols come in two other forms:

      -
    • In the lexical and RegExp grammars, Unicode code points without a conventional printed representation are instead shown in the form "<U+0000 (NULL)>" where `0000` is 4 to 6 hexits representing the code point in hexadecimal notation and `NULL` is the code point name.
    • +
    • In the lexical and RegExp grammars, Unicode code points without a conventional printed representation are instead shown in the form "<U+0000 (NULL)>" where `0000` is 4 to 6 hexadecimal digits representing the code point in hexadecimal notation and `NULL` is the code point name.
    • In the syntactic grammar, certain terminal symbols (e.g. |IdentifierName| and |RegularExpressionLiteral|) are shown in italics, as they refer to the nonterminals of the same name in the lexical grammar.
    From b9365c3763132800c58b8985ca2bdeb26308d620 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 16:20:10 -0600 Subject: [PATCH 05/14] revert note change --- spec.html | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/spec.html b/spec.html index ac3a9d7e8f..6a828d3cb0 100644 --- a/spec.html +++ b/spec.html @@ -16275,13 +16275,11 @@

    Syntax

    <U+0009 (CHARACTER TABULATION)> <U+000B (LINE TABULATION)> <U+000C (FORM FEED)> - <U+0020 (SPACE)> - <U+00A0 (NO-BREAK SPACE)> <U+FEFF (ZERO WIDTH NO-BREAK SPACE)> > any code point with the Unicode General_Category “Space_Separator” -

    Other than for some of the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    +

    Other than for the code points listed as explicit alternatives of |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    From 5d70c9edfdc1a0a7f922265ad92fe910ac1337d0 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 16:40:41 -0600 Subject: [PATCH 06/14] feedback --- spec.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec.html b/spec.html index 6a828d3cb0..bb2b82557c 100644 --- a/spec.html +++ b/spec.html @@ -588,7 +588,7 @@

    Terminal Symbols

    In contrast, in the syntactic grammar, a contiguous run of fixed-width code points is a single terminal symbol.

    Terminal symbols come in two other forms:

      -
    • In the lexical and RegExp grammars, Unicode code points without a conventional printed representation are instead shown in the form "<U+0000 (NULL)>" where `0000` is 4 to 6 hexadecimal digits representing the code point in hexadecimal notation and `NULL` is the code point name.
    • +
    • In the lexical and RegExp grammars, Unicode code points without a conventional printed representation are instead shown in the form "<U+0000 (NULL)>" where `0000` is 4 to 6 hexadecimal digits representing the code point in hexadecimal notation and `NULL` is the code point name or alias.
    • In the syntactic grammar, certain terminal symbols (e.g. |IdentifierName| and |RegularExpressionLiteral|) are shown in italics, as they refer to the nonterminals of the same name in the lexical grammar.
    @@ -15983,7 +15983,7 @@

    Syntax

    The components of a combining character sequence are treated as individual Unicode code points even though a user might think of the whole sequence as a single character.

    In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value. Within a comment, such an escape sequence is effectively ignored as part of the comment.

    -

    ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence `\\u000A`, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence `\\u000A` occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write `\\n` instead of `\\u000A` to cause a LINE FEED to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

    +

    ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence `\\u000A`, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence `\\u000A` occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write `\\n` instead of `\\u000A` to cause a U+000A LINE FEED to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

    From e15a4debd8d00629ac7a147810e9a3887a3827a9 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 16:54:37 -0600 Subject: [PATCH 07/14] un-revert the note --- spec.html | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/spec.html b/spec.html index bb2b82557c..40b16b047c 100644 --- a/spec.html +++ b/spec.html @@ -16279,7 +16279,10 @@

    Syntax

    > any code point with the Unicode General_Category “Space_Separator” -

    Other than for the code points listed as explicit alternatives of |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    +

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as the are both have a Unicode General_Category of “Space_Separator&rdquo.

    +
    + +

    Other than for the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    From 4104527695e87e4917a1ba1d305ddd5f024d0783 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 17:03:18 -0600 Subject: [PATCH 08/14] more <(LF|CR|LS|PS)> --- spec.html | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/spec.html b/spec.html index 40b16b047c..344c57eb37 100644 --- a/spec.html +++ b/spec.html @@ -16917,15 +16917,15 @@

    Syntax

    DoubleStringCharacter :: SourceCharacter but not one of `"` or `\` or LineTerminator - <LS> - <PS> + <U+2028 (LINE SEPARATOR)> + <U+2029 (PARAGRAPH SEPARATOR)> `\` EscapeSequence LineContinuation SingleStringCharacter :: SourceCharacter but not one of `'` or `\` or LineTerminator - <LS> - <PS> + <U+2028 (LINE SEPARATOR)> + <U+2029 (PARAGRAPH SEPARATOR)> `\` EscapeSequence LineContinuation @@ -17034,10 +17034,10 @@

    Static Semantics: SV ( ): a String

    The SV of DoubleStringCharacter :: SourceCharacter but not one of `"` or `\` or LineTerminator is the result of performing UTF16EncodeCodePoint on the code point matched by |SourceCharacter|.
  • - The SV of DoubleStringCharacter :: <LS> is the String value consisting of the code unit 0x2028 (LINE SEPARATOR). + The SV of DoubleStringCharacter :: <U+2028 (LINE SEPARATOR)> is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).
  • - The SV of DoubleStringCharacter :: <PS> is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR). + The SV of DoubleStringCharacter :: <U+2029 (PARAGRAPH SEPARATOR)> is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).
  • The SV of DoubleStringCharacter :: LineContinuation is the empty String. @@ -17046,10 +17046,10 @@

    Static Semantics: SV ( ): a String

    The SV of SingleStringCharacter :: SourceCharacter but not one of `'` or `\` or LineTerminator is the result of performing UTF16EncodeCodePoint on the code point matched by |SourceCharacter|.
  • - The SV of SingleStringCharacter :: <LS> is the String value consisting of the code unit 0x2028 (LINE SEPARATOR). + The SV of SingleStringCharacter :: <U+2028 (LINE SEPARATOR)> is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).
  • - The SV of SingleStringCharacter :: <PS> is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR). + The SV of SingleStringCharacter :: <U+2029 (PARAGRAPH SEPARATOR)> is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).
  • The SV of SingleStringCharacter :: LineContinuation is the empty String. @@ -17499,19 +17499,19 @@

    Static Semantics: TRV ( ): a String

    The TRV of LineContinuation :: `\` LineTerminatorSequence is the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV of |LineTerminatorSequence|.
  • - The TRV of LineTerminatorSequence :: <LF> is the String value consisting of the code unit 0x000A (LINE FEED). + The TRV of LineTerminatorSequence :: <U+000A (LINE FEED)> is the String value consisting of the code unit 0x000A (LINE FEED).
  • - The TRV of LineTerminatorSequence :: <CR> is the String value consisting of the code unit 0x000A (LINE FEED). + The TRV of LineTerminatorSequence :: <U+000D (CARRIAGE RETURN)> is the String value consisting of the code unit 0x000A (LINE FEED).
  • - The TRV of LineTerminatorSequence :: <LS> is the String value consisting of the code unit 0x2028 (LINE SEPARATOR). + The TRV of LineTerminatorSequence :: <U+2028 (LINE SEPARATOR)> is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).
  • - The TRV of LineTerminatorSequence :: <PS> is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR). + The TRV of LineTerminatorSequence :: <U+2029 (PARAGRAPH SEPARATOR)> is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).
  • - The TRV of LineTerminatorSequence :: <CR> <LF> is the String value consisting of the code unit 0x000A (LINE FEED). + The TRV of LineTerminatorSequence :: <U+000D (CARRIAGE RETURN)> <U+000A (LINE FEED)> is the String value consisting of the code unit 0x000A (LINE FEED).
  • From cb8bd8c9e90621a72baf79be9604cef0e664d250 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 17:04:25 -0600 Subject: [PATCH 09/14] revert ASCIISign change --- spec.html | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/spec.html b/spec.html index 344c57eb37..7bc76a2f11 100644 --- a/spec.html +++ b/spec.html @@ -32931,10 +32931,12 @@

    Syntax

    TemporalSign Hour HourSubcomponents[~Extended] TemporalSign ::: - `+` - `-` + ASCIISign <U+2212 (MINUS SIGN)> + ASCIISign ::: one of + `+` `-` + Hour ::: `0` DecimalDigit `1` DecimalDigit From 0a64ef7453608e0b479cdad11a261d77782422c2 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 17:07:12 -0600 Subject: [PATCH 10/14] fix formatting --- spec.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.html b/spec.html index 7bc76a2f11..1e1c66c832 100644 --- a/spec.html +++ b/spec.html @@ -16279,7 +16279,7 @@

    Syntax

    > any code point with the Unicode General_Category “Space_Separator” -

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as the are both have a Unicode General_Category of “Space_Separator&rdquo.

    +

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as the are both have a Unicode General_Category of “Space_Separator”.

    Other than for the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    From a5c63f07d21bce92d606183c551c26a631b4665d Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 19:07:34 -0600 Subject: [PATCH 11/14] everyone always forgets about Annex A --- spec.html | 1 + 1 file changed, 1 insertion(+) diff --git a/spec.html b/spec.html index 1e1c66c832..df867e26d0 100644 --- a/spec.html +++ b/spec.html @@ -49315,6 +49315,7 @@

    Number Conversions

    Time Zone Offset String Format

    + From eb07d91a7650b002c8ed8b46b534c58fe44ba64b Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 19:56:58 -0600 Subject: [PATCH 12/14] typo --- spec.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.html b/spec.html index df867e26d0..419b987447 100644 --- a/spec.html +++ b/spec.html @@ -16279,7 +16279,7 @@

    Syntax

    > any code point with the Unicode General_Category “Space_Separator” -

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as the are both have a Unicode General_Category of “Space_Separator”.

    +

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as the both have a Unicode General_Category of “Space_Separator”.

    Other than for the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    From 93c25fc8d42f3d4a79ac6eb728cd3bc311e850b9 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 16 May 2024 20:06:55 -0600 Subject: [PATCH 13/14] still typo --- spec.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.html b/spec.html index 419b987447..11dc9f60a3 100644 --- a/spec.html +++ b/spec.html @@ -16279,7 +16279,7 @@

    Syntax

    > any code point with the Unicode General_Category “Space_Separator” -

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as the both have a Unicode General_Category of “Space_Separator”.

    +

    U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) are matched by |WhiteSpace| as they both have a Unicode General_Category of “Space_Separator”.

    Other than for the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

    From 525c12792bb2d72bfdaf22dc61b8296a143848f5 Mon Sep 17 00:00:00 2001 From: Michael Ficarra Date: Thu, 23 May 2024 10:57:22 -0600 Subject: [PATCH 14/14] fix Single Character Escape Sequences table --- spec.html | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/spec.html b/spec.html index 11dc9f60a3..8e650f34a2 100644 --- a/spec.html +++ b/spec.html @@ -17073,7 +17073,7 @@

    Static Semantics: SV ( ): a String

    - `\\b` + U+0062 (LATIN SMALL LETTER B) 0x0008 (BACKSPACE) @@ -17081,7 +17081,7 @@

    Static Semantics: SV ( ): a String

    - `\\t` + U+0074 (LATIN SMALL LETTER T) 0x0009 (CHARACTER TABULATION) @@ -17089,7 +17089,7 @@

    Static Semantics: SV ( ): a String

    - `\\n` + U+006E (LATIN SMALL LETTER N) 0x000A (LINE FEED) @@ -17097,7 +17097,7 @@

    Static Semantics: SV ( ): a String

    - `\\v` + U+0076 (LATIN SMALL LETTER V) 0x000B (LINE TABULATION) @@ -17105,7 +17105,7 @@

    Static Semantics: SV ( ): a String

    - `\\f` + U+0066 (LATIN SMALL LETTER F) 0x000C (FORM FEED) @@ -17113,7 +17113,7 @@

    Static Semantics: SV ( ): a String

    - `\\r` + U+0072 (LATIN SMALL LETTER R) 0x000D (CARRIAGE RETURN) @@ -17121,7 +17121,7 @@

    Static Semantics: SV ( ): a String

    - `\\"` + U+0022 (QUOTATION MARK) 0x0022 (QUOTATION MARK) @@ -17129,7 +17129,7 @@

    Static Semantics: SV ( ): a String

    - `\\'` + U+0027 (APOSTROPHE) 0x0027 (APOSTROPHE) @@ -17137,7 +17137,7 @@

    Static Semantics: SV ( ): a String

    - `\\\\` + U+005C (REVERSE SOLIDUS) 0x005C (REVERSE SOLIDUS)