Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test InCB=Extend for Gujarati Shadda #957

Merged
merged 2 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion unicodetools/data/ucd/dev/auxiliary/GraphemeBreakTest.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<body bgcolor='#FFFFFF'>
<h2>Grapheme_Cluster_Break Chart</h2>
<p><b>Unicode Version:</b> 17.0.0</p>
<p><b>Date:</b> 2024-10-14, 12:06:04 GMT</p>
<p><b>Date:</b> 2024-10-30, 21:25:11 GMT</p>
<p>This page illustrates the application of the Grapheme_Cluster_Break specification. The material here is informative, not normative.</p> <p>The first chart shows where breaks would appear between different sample characters or strings. The sample characters are chosen mechanically to represent the different properties used by the specification.</p><p>Each cell shows the break-status for the position between the character(s) in its row header and the character(s) in its column header. The × symbol indicates no break, while the ÷ symbol indicated a break. The cells with × are also shaded to make it easier to scan the table. For example, in the cell at the intersection of the row headed by “CR” and the column headed by “LF”, there is a × symbol, indicating that there is no break between CR and LF.</p>
<p>After the heavy blue line in the table are additional rows, either with different sample characters or for sequences. Some column headers may be composed, reflecting “treat as” or “ignore” rules.</p>
<p>If your browser handles titles (tooltips), then hovering the mouse over the row header will show a sample character of that type. Hovering over a column header will show the sample character, plus its abbreviated general category and script. Hovering over the intersected cells shows the rule number that produces the break-status. For example, hovering over the cell at the intersection of LVT and T shows ×, with the rule 8.0. Checking below the table, rule 8.0 is “( LVT | T) × T”, which is the one that applies to that case. Note that a rule is invoked only when no lower-numbered rules have applied.</p>
Expand Down Expand Up @@ -294,6 +294,14 @@ <h3><a href='#samples' name='samples'>Sample Strings</a></h3>
<span title='U+094D DEVANAGARI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj)'>&#x25CC;&#x94D;</span><span title='9.3'><span>&nbsp;</span>&nbsp;</span>
<span title='U+0924 DEVANAGARI LETTER TA (ConjunctLinkingScripts_LinkingConsonant)'>&#x924;</span><span title='0.3'><span style='border-right: 1px solid blue'>&nbsp;</span>&nbsp;</span>

</font></td></tr>
<tr><th style='text-align:right'><a href='#s36' name='s36'>36</a></th><td><font size='5'>
<span title='0.2'><span style='border-right: 1px solid blue'>&nbsp;</span>&nbsp;</span><span title='U+0AB8 GUJARATI LETTER SA (ConjunctLinkingScripts_LinkingConsonant)'>&#xAB8;</span><span title='9.0'><span>&nbsp;</span>&nbsp;</span>
<span title='U+0AFB GUJARATI SIGN SHADDA (Extend_ConjunctLinkingScripts_ExtCccZwj)'>&#x25CC;&#xAFB;</span><span title='9.0'><span>&nbsp;</span>&nbsp;</span>
<span title='U+0ACD GUJARATI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj)'>&#x25CC;&#xACD;</span><span title='9.3'><span>&nbsp;</span>&nbsp;</span>
<span title='U+0AB8 GUJARATI LETTER SA (ConjunctLinkingScripts_LinkingConsonant)'>&#xAB8;</span><span title='9.0'><span>&nbsp;</span>&nbsp;</span>
<span title='U+0AFB GUJARATI SIGN SHADDA (Extend_ConjunctLinkingScripts_ExtCccZwj)'>&#x25CC;&#xAFB;</span><span title='0.3'><span style='border-right: 1px solid blue'>&nbsp;</span>&nbsp;</span>

</font></td></tr>
</table>
<hr width='50%'>
Expand Down
7 changes: 4 additions & 3 deletions unicodetools/data/ucd/dev/auxiliary/GraphemeBreakTest.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# GraphemeBreakTest-16.0.0.txt
# Date: 2024-05-02, 15:02:48 GMT
# GraphemeBreakTest-17.0.0.txt
# Date: 2024-10-30, 21:25:11 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -1115,7 +1115,8 @@
÷ 0061 × 094D ÷ 0924 ÷ # ÷ [0.2] LATIN SMALL LETTER A (Other) × [9.0] DEVANAGARI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj) ÷ [999.0] DEVANAGARI LETTER TA (ConjunctLinkingScripts_LinkingConsonant) ÷ [0.3]
÷ 003F × 094D ÷ 0924 ÷ # ÷ [0.2] QUESTION MARK (Other) × [9.0] DEVANAGARI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj) ÷ [999.0] DEVANAGARI LETTER TA (ConjunctLinkingScripts_LinkingConsonant) ÷ [0.3]
÷ 0915 × 094D × 094D × 0924 ÷ # ÷ [0.2] DEVANAGARI LETTER KA (ConjunctLinkingScripts_LinkingConsonant) × [9.0] DEVANAGARI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj) × [9.0] DEVANAGARI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj) × [9.3] DEVANAGARI LETTER TA (ConjunctLinkingScripts_LinkingConsonant) ÷ [0.3]
÷ 0AB8 × 0AFB × 0ACD × 0AB8 × 0AFB ÷ # ÷ [0.2] GUJARATI LETTER SA (ConjunctLinkingScripts_LinkingConsonant) × [9.0] GUJARATI SIGN SHADDA (Extend_ConjunctLinkingScripts_ExtCccZwj) × [9.0] GUJARATI SIGN VIRAMA (Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj) × [9.3] GUJARATI LETTER SA (ConjunctLinkingScripts_LinkingConsonant) × [9.0] GUJARATI SIGN SHADDA (Extend_ConjunctLinkingScripts_ExtCccZwj) ÷ [0.3]
#
# Lines: 1093
# Lines: 1094
#
# EOF
4 changes: 2 additions & 2 deletions unicodetools/data/ucd/dev/auxiliary/LineBreakTest.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<body bgcolor='#FFFFFF'>
<h2>Line_Break Chart</h2>
<p><b>Unicode Version:</b> 17.0.0</p>
<p><b>Date:</b> 2024-10-14, 12:25:22 GMT</p>
<p><b>Date:</b> 2024-10-30, 21:25:12 GMT</p>
<p>This page illustrates the application of the Line_Break specification. The material here is informative, not normative.</p> <p>The first chart shows where breaks would appear between different sample characters or strings. The sample characters are chosen mechanically to represent the different properties used by the specification.</p><p>Each cell shows the break-status for the position between the character(s) in its row header and the character(s) in its column header. The × symbol indicates no break, while the ÷ symbol indicated a break. The cells with × are also shaded to make it easier to scan the table. For example, in the cell at the intersection of the row headed by “CR” and the column headed by “LF”, there is a × symbol, indicating that there is no break between CR and LF.</p>
<p>Some column headers may be composed, reflecting “treat as” or “ignore” rules.</p>
<p>If your browser handles titles (tooltips), then hovering the mouse over the row header will show a sample character of that type. Hovering over a column header will show the sample character, plus its abbreviated general category and script. Hovering over the intersected cells shows the rule number that produces the break-status. For example, hovering over the cell at the intersection of H3 and JT shows ×, with the rule 26.03. Checking below the table, rule 26.03 is “JT | H3 × JT”, which is the one that applies to that case. Note that a rule is invoked only when no lower-numbered rules have applied.</p>
Expand Down Expand Up @@ -93,7 +93,7 @@ <h3><a href='#rules' name='rules'>Rules</a></h3>
<tr><th style='text-align:right'><a href='#r7.02' name='r7.02'>7.02</a></th><td style='text-align:right'></td><td>×</td><td> ZW</td></tr>
<tr><th style='text-align:right'><a href='#r8.0' name='r8.0'>8.0</a></th><td style='text-align:right'>ZW SP* </td><td>÷</td><td></td></tr>
<tr><th style='text-align:right'><a href='#r8.1' name='r8.1'>8.1</a></th><td style='text-align:right'>ZWJ_O </td><td>×</td><td></td></tr>
<tr><th style='text-align:right'><a href='#r9.0' name='r9.0'>9.0</a></th><td style='text-align:right'>(?&lt;X&gt;[^SP BK CR LF NL ZW]) ( CM | ZWJ )* </td><td>→</td><td> {X}</td></tr>
<tr><th style='text-align:right'><a href='#r9.0' name='r9.0'>9.0</a></th><td style='text-align:right'>(?&lt;X&gt;[^BK CR LF NL SP ZW]) ( CM | ZWJ )* </td><td>→</td><td> {X}</td></tr>
<tr><th style='text-align:right'><a href='#r10.0' name='r10.0'>10.0</a></th><td style='text-align:right'>( CM | ZWJ ) </td><td>→</td><td> A</td></tr>
<tr><th style='text-align:right'><a href='#r11.01' name='r11.01'>11.01</a></th><td style='text-align:right'></td><td>×</td><td> WJ</td></tr>
<tr><th style='text-align:right'><a href='#r11.02' name='r11.02'>11.02</a></th><td style='text-align:right'>WJ </td><td>×</td><td></td></tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1393,7 +1393,10 @@ public GenerateGraphemeBreakTest(UCD ucd, Segmenter.Target target) {
"क" + "\u094D" + "a",
"a" + "\u094D" + "त",
"?" + "\u094D" + "त",
"क" + "\u094D\u094D" + "त"));
"क" + "\u094D\u094D" + "त",
// From L2/14-131, §3.2; made into a single EGC by 179-C31.
// This test would have caught ICU-22956.
"સૻ્સૻ"));
}
}

Expand Down
Loading