tc39 · ben-allen · May 10, 2023 · May 12, 2023 · Jun 9, 2023 · FrankYFTang
diff --git a/spec/annexes.html b/spec/annexes.html
@@ -164,6 +164,9 @@ <h1>Implementation Dependent Behaviour</h1>
  <li>
  In Segmenter:
  <ul>
+ <li>
+ The set of data used for standard sentence break suppressions (<emu-xref href="#sec-intl-segmenter-constructor"></emu-xref>)
+ </li>
  <li>
  Boundary determination algorithms (<emu-xref href="#sec-findboundary"></emu-xref>)
  </li>

diff --git a/spec/locale.html b/spec/locale.html
@@ -19,7 +19,7 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
  <emu-alg>
  1. If NewTarget is *undefined*, throw a *TypeError* exception.
  1. Let _relevantExtensionKeys_ be %Locale%.[[RelevantExtensionKeys]].
- 1. Let _internalSlotsList_ be &laquo; [[InitializedLocale]], [[Locale]], [[Calendar]], [[Collation]], [[HourCycle]], [[NumberingSystem]] &raquo;.
+ 1. Let _internalSlotsList_ be &laquo; [[InitializedLocale]], [[Locale]], [[Calendar]], [[Collation]], [[HourCycle]], [[NumberingSystem]], [[SentenceBreakSuppressions]] &raquo;.
  1. If _relevantExtensionKeys_ contains *"kf"*, then
  1. Append [[CaseFirst]] as the last element of _internalSlotsList_.
  1. If _relevantExtensionKeys_ contains *"kn"*, then
@@ -52,6 +52,8 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
  1. If _numberingSystem_ is not *undefined*, then
  1. If _numberingSystem_ does not match the Unicode Locale Identifier `type` nonterminal, throw a *RangeError* exception.
  1. Set _opt_.[[nu]] to _numberingSystem_.
+ 1. Let _sentenceBreakSuppressions_ be ? GetOption(_options_, *"sentenceBreakSuppressions"*, ~string~, &laquo;*"none"*, *"standard"* &raquo;, *"none"*).
+ 1. Set _opt_.[[ss]] to _sentenceBreakSuppressions_.
  1. Let _r_ be ! ApplyUnicodeExtensionToTag(_tag_, _opt_, _relevantExtensionKeys_).
  1. Set _locale_.[[Locale]] to _r_.[[locale]].
  1. Set _locale_.[[Calendar]] to _r_.[[ca]].
@@ -65,6 +67,7 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
  1. Else,
  1. Set _locale_.[[Numeric]] to *false*.
  1. Set _locale_.[[NumberingSystem]] to _r_.[[nu]].
+ 1. Set _locale_.[[SentenceBreakSuppressions]] to _r_.[[ss]].
  1. Return _locale_.
  </emu-alg>
  </emu-clause>
@@ -175,7 +178,7 @@ <h1>Intl.Locale.prototype</h1>
  <h1>Internal slots</h1>
 
  <p>
- The value of the [[RelevantExtensionKeys]] internal slot is &laquo; *"ca"*, *"co"*, *"hc"*, *"kf"*, *"kn"*, *"nu"* &raquo;. If %Collator%.[[RelevantExtensionKeys]] does not contain *"kf"*, then remove *"kf"* from %Locale%.[[RelevantExtensionKeys]]. If %Collator%.[[RelevantExtensionKeys]] does not contain *"kn"*, then remove *"kn"* from %Locale%.[[RelevantExtensionKeys]].
+ The value of the [[RelevantExtensionKeys]] internal slot is &laquo; *"ca"*, *"co"*, *"hc"*, *"kf"*, *"kn"*, *"nu"*, *"ss"* &raquo;. If %Collator%.[[RelevantExtensionKeys]] does not contain *"kf"*, then remove *"kf"* from %Locale%.[[RelevantExtensionKeys]]. If %Collator%.[[RelevantExtensionKeys]] does not contain *"kn"*, then remove *"kn"* from %Locale%.[[RelevantExtensionKeys]].
  </p>
  </emu-clause>
  </emu-clause>
@@ -312,6 +315,16 @@ <h1>get Intl.Locale.prototype.numberingSystem</h1>
  </emu-alg>
  </emu-clause>
 
+ <emu-clause id="sec-Intl.Locale.prototype.sentenceBreakSuppressions">
+ <h1>get Intl.Locale.prototype.sentenceBreakSuppressions</h1>
+ <p>`Intl.Locale.prototype.sentenceBreakSuppressions` is an accessor property whose set accessor function is *undefined*. Its get accessor function performs the following steps:</p>
+ <emu-alg>
+ 1. Let _loc_ be the *this* value.
+ 1. Perform ? RequireInternalSlot(_loc_, [[InitializedLocale]]).
+ 1. Return _loc_.[[SentenceBreakSuppressions]].
+ </emu-alg>
+ </emu-clause>
+
  <emu-clause id="sec-Intl.Locale.prototype.language">
  <h1>get Intl.Locale.prototype.language</h1>
  <p>`Intl.Locale.prototype.language` is an accessor property whose set accessor function is *undefined*. The following algorithm refers to <a href="https://www.unicode.org/reports/tr35/#Identifiers">UTS 35's Unicode Language and Locale Identifiers grammar</a>. Its get accessor function performs the following steps:</p>

diff --git a/spec/segmenter.html b/spec/segmenter.html
@@ -17,16 +17,19 @@ <h1>Intl.Segmenter ( [ _locales_ [ , _options_ ] ] )</h1>
 
  <emu-alg>
  1. If NewTarget is *undefined*, throw a *TypeError* exception.
- 1. Let _internalSlotsList_ be &laquo; [[InitializedSegmenter]], [[Locale]], [[SegmenterGranularity]] &raquo;.
+ 1. Let _internalSlotsList_ be &laquo; [[InitializedSegmenter]], [[Locale]], [[SentenceBreakSuppressions]], [[SegmenterGranularity]] &raquo;.
  1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).
  1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_).
  1. Set _options_ to ? GetOptionsObject(_options_).
  1. Let _opt_ be a new Record.
  1. Let _matcher_ be ? GetOption(_options_, *"localeMatcher"*, ~string~, &laquo; *"lookup"*, *"best fit"* &raquo;, *"best fit"*).
  1. Set _opt_.[[localeMatcher]] to _matcher_.
+ 1. Let _sentenceBreakSuppressions_ be ? GetOption(_options_, *"sentenceBreakSuppressions"*, ~string~, &laquo; *"none"*, *"standard"* &raquo;, *"none"*).
+ 1. Set _opt_.[[ss]] to _sentenceBreakSuppressions_.
  1. Let _localeData_ be %Segmenter%.[[LocaleData]].
  1. Let _r_ be ResolveLocale(%Segmenter%.[[AvailableLocales]], _requestedLocales_, _opt_, %Segmenter%.[[RelevantExtensionKeys]], _localeData_).
  1. Set _segmenter_.[[Locale]] to _r_.[[locale]].
+ 1. Set _segmenter_.[[SentenceBreakSuppressions]] to _r_.[[ss]].
  1. Let _granularity_ be ? GetOption(_options_, *"granularity"*, ~string~, &laquo; *"grapheme"*, *"word"*, *"sentence"* &raquo;, *"grapheme"*).
  1. Set _segmenter_.[[SegmenterGranularity]] to _granularity_.
  1. Return _segmenter_.
@@ -74,11 +77,8 @@ <h1>Internal slots</h1>
  </p>
 
  <p>
- The value of the [[RelevantExtensionKeys]] internal slot is &laquo; &raquo;.
+ The value of the [[RelevantExtensionKeys]] internal slot is &laquo; *"ss"* &raquo;.
  </p>
- <emu-note>
- Intl.Segmenter does not have any relevant extension keys.
- </emu-note>
 
  <p>
  The value of the [[LocaleData]] internal slot is implementation-defined within the constraints described in <emu-xref href="#sec-internal-slots"></emu-xref>.
@@ -160,6 +160,9 @@ <h1>Intl.Segmenter.prototype.resolvedOptions ( )</h1>
  <td>*"locale"*</td>
  </tr>
  <tr>
+ <td>[[SentenceBreakSuppressions]]</td>
+ <td>*"sentenceBreakSuppressions"*"</td>
+ </tr>
  <td>[[SegmenterGranularity]]</td>
  <td>*"granularity"*</td>
  </tr>
@@ -185,6 +188,7 @@ <h1>Properties of Intl.Segmenter Instances</h1>
 
  <ul>
  <li>[[Locale]] is a String value with the language tag of the locale whose localization is used for segmentation.</li>
+ <li>[[SentenceBreakSuppressions]] is one of the String values *"none"* or *"standard"*, identifying whether to suppress certain sentence breaks that would otherwise be found by <a href="https://unicode.org/reports/tr14/">Unicode Standard Annex #14</a> rules</li>
  <li>[[SegmenterGranularity]] is one of the String values *"grapheme"*, *"word"*, or *"sentence"*, identifying the kind of text element to segment.</li>
  </ul>
  </emu-clause>
@@ -394,17 +398,18 @@ <h1>FindBoundary ( _segmenter_, _string_, _startIndex_, _direction_ )</h1>
  <emu-note>Boundary determination is implementation-dependent, but general default algorithms are specified in <a href="https://unicode.org/reports/tr29/">Unicode Standard Annex #29</a>. It is recommended that implementations use locale-sensitive tailorings such as those provided by the Common Locale Data Repository (available at <a href="https://cldr.unicode.org">https://cldr.unicode.org</a>).</emu-note>
  <emu-alg>
  1. Let _locale_ be _segmenter_.[[Locale]].
+ 1. Let _sentenceBreakSuppressions_ be _segmenter_.[[SentenceBreakSuppressions]].
  1. Let _granularity_ be _segmenter_.[[SegmenterGranularity]].
  1. Let _len_ be the length of _string_.
  1. If _direction_ is ~before~, then
  1. Assert: _startIndex_ &ge; 0.
  1. Assert: _startIndex_ &lt; _len_.
- 1. Search _string_ for the last segmentation boundary that is preceded by at most _startIndex_ code units from the beginning, using locale _locale_ and text element granularity _granularity_.
+ 1. Search _string_ for the last segmentation boundary that is preceded by at most _startIndex_ code units from the beginning, using locale _locale_, sentence break suppression _sentenceBreakSuppressions_, and text element granularity _granularity_.
  1. If a boundary is found, return the count of code units in _string_ preceding it.
  1. Return 0.
  1. Assert: _direction_ is ~after~.
  1. If _len_ is 0 or _startIndex_ &ge; _len_, return +&infin;.
- 1. Search _string_ for the first segmentation boundary that follows the code unit at index _startIndex_, using locale _locale_ and text element granularity _granularity_.
+ 1. Search _string_ for the first segmentation boundary that follows the code unit at index _startIndex_, using locale _locale_, sentence break suppressions _sentenceBreakSuppressions_, and text element granularity _granularity_.
  1. If a boundary is found, return the count of code units in _string_ preceding it.
  1. Return _len_.
  </emu-alg>