Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude C1 control characters from nonascii #630

Merged
merged 2 commits into from
Oct 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 43 additions & 43 deletions docs/source/02_Terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,46 +142,46 @@ is fixed or noted.
Starting with HED standard schema versions 8.3.0 and above, HED will allow UTF-8 characters in various settings.
The types of characters referred to in this specification are:

| Name | Description |
|-----------------| ----------- |
| `alphanumeric` | `letters` and/or `digits` |
| `ampersand` | ASCII code 38 |
| `ascii` | utf-8 codes 0 to 127 (single byte) |
| `asterisk` | ASCII code 42 |
| `at-sign` | ASCII code 64 |
| `backslash` | ASCII code 92 |
| `blank` | ASCII code 32 |
| `caret` | ASCII code 94 |
| `colon` | ASCII code 58 |
| `comma` | ASCII code 44 |
| `dollar` | ASCII code 36 |
| `digits` | 0-9 |
| `double-quote` | ASCII code 34 |
| `equals` | ASCII code 61 |
| `exclamation` | ASCII code 33 |
| `forward-slash` | ASCII code 47 |
| `greater-than` | ASCII code 62 |
| `hyphen` | ASCII code 45 |
| `left-paren` | ASCII code 40 |
| `less-than` | ASCII code 60 |
| `letters` | `lowercase` and/or `uppercase` |
| `lowercase` | ASCII characters a-z |
| `name` | `alphanumeric`, `hyphen`, `period`, `underscore`, `nonascii` |
| `newline` | ASCII code 10 (linefeed) |
| `nonascii` | utf-8 codes greater than 128 (multi-byte) |
| `number-sign` | ASCII code 35 |
| `numeric` | digits, period, hyphen, plus, caret, E, e |
| `percent-sign` | ASCII code 37 |
| `period` | ASCII code 46 |
| `plus` | ASCII code 43 |
| `printable` | ASCII 32 <= code < 127 |
| `question-mark` | ASCII code 63 |
| `right-paren` | ASCII code 41 |
| `semicolon` | ASCII code 59 |
| `single-quote` | ASCII code 39 |
| `tab` | ASCII code 09 |
| `text` | `printable` and/or `nonascii` excluding comma and curly braces.|
| `tilde` | ASCII code 126 |
| `underscore` | ASCII code 95 |
| `uppercase` | ASCII characters A-Z |
| `vertical-bar` | ASCII code 124 |
| Name | Description |
|-----------------|-----------------------------------------------------------------|
| `alphanumeric` | `letters` and/or `digits` |
| `ampersand` | ASCII code 38 |
| `ascii` | utf-8 codes 0 to 127 (single byte) |
| `asterisk` | ASCII code 42 |
| `at-sign` | ASCII code 64 |
| `backslash` | ASCII code 92 |
| `blank` | ASCII code 32 |
| `caret` | ASCII code 94 |
| `colon` | ASCII code 58 |
| `comma` | ASCII code 44 |
| `dollar` | ASCII code 36 |
| `digits` | 0-9 |
| `double-quote` | ASCII code 34 |
| `equals` | ASCII code 61 |
| `exclamation` | ASCII code 33 |
| `forward-slash` | ASCII code 47 |
| `greater-than` | ASCII code 62 |
| `hyphen` | ASCII code 45 |
| `left-paren` | ASCII code 40 |
| `less-than` | ASCII code 60 |
| `letters` | `lowercase` and/or `uppercase` |
| `lowercase` | ASCII characters a-z |
| `name` | `alphanumeric`, `hyphen`, `period`, `underscore`, `nonascii` |
| `newline` | ASCII code 10 (linefeed) |
| `nonascii` | utf-8 codes >= 160 (multi-byte) |
| `number-sign` | ASCII code 35 |
| `numeric` | digits, period, hyphen, plus, caret, E, e |
| `percent-sign` | ASCII code 37 |
| `period` | ASCII code 46 |
| `plus` | ASCII code 43 |
| `printable` | ASCII 32 <= code < 127 |
| `question-mark` | ASCII code 63 |
| `right-paren` | ASCII code 41 |
| `semicolon` | ASCII code 59 |
| `single-quote` | ASCII code 39 |
| `tab` | ASCII code 09 |
| `text` | `printable` and/or `nonascii` excluding comma and curly braces. |
| `tilde` | ASCII code 126 |
| `underscore` | ASCII code 95 |
| `uppercase` | ASCII characters A-Z |
| `vertical-bar` | ASCII code 124 |
3 changes: 3 additions & 0 deletions docs/source/Appendix_A.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,9 @@ behavior of certain value classes (for example the `numericClass` value class).
- Valid International Resource Identifier as standardized by [rfc3987](https://datatracker.ietf.org/doc/html/rfc3987).
``````

See [**2.2 Character sets and restrictions**](./02_Terminology.md#22-character-sets_and_restrictions) for
definitions of the various character class definitions.

````{admonition} Notes on rules for allowed characters in the HED schema.
:class: tip

Expand Down
4 changes: 2 additions & 2 deletions docs/source/Appendix_B.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,15 @@ of errors keyed to the HED specification.

A HED string contains an invalid character.

**a.** A non-printable character (ASCII code < 32 or == 127) appears in a HED string.
**a.** An invalid character (character code < 32 or 127 <= character code < 160) appears in a HED string.
**b.** Curly braces appear in a HED string not in a sidecar.


**Notes:**
- Starting with HED 8.3.0, HED supports UTF-8 encoding.
- Different parts of a HED string have different rules for acceptable characters.

See
See also:
[**3.2.4 Tags that take values**](03_HED_formats.md#324-tags-that-take-values) and
[**3.2.5: Tag extensions**](03_HED_formats.md#325-tag-extensions) for
an explanation of the rules for tag values and extensions.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,4 @@
html_static_path = ['_static']
html_css_files = [
'custom.css',
]
]
15 changes: 14 additions & 1 deletion tests/javascript_tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
"tests": {
"string_tests": {
"fails": [
"Item/Bl\b"
"Item/Bl\b",
"Item/ABC\u009e"
],
"passes": [
"Red, Blue, Description/Red",
Expand Down Expand Up @@ -57,6 +58,18 @@
0,
"Item/Bl\b"
]
],
[
[
"onset",
"duration",
"HED"
],
[
4.5,
0,
"Item/{abc}"
]
]
],
"passes": [
Expand Down
7 changes: 6 additions & 1 deletion tests/json_tests/CHARACTER_INVALID.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
"tests": {
"string_tests": {
"fails": [
"Item/Bl\b"
"Item/Bl\b",
"Item/ABC\u009E"
],
"passes": [
"Red, Blue, Description/Red",
Expand Down Expand Up @@ -42,6 +43,10 @@
[
["onset", "duration", "HED"],
[ 4.5, 0, "Item/Bl\b"]
],
[
["onset", "duration", "HED"],
[ 4.5, 0, "Item/{abc}"]
]
],
"passes": [
Expand Down
15 changes: 14 additions & 1 deletion tests/python_tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
"tests": {
"string_tests": {
"fails": [
"Item/Bl\b"
"Item/Bl\b",
"Item/ABC\u009e"
],
"passes": [
"Red, Blue, Description/Red",
Expand Down Expand Up @@ -57,6 +58,18 @@
0,
"Item/Bl\b"
]
],
[
[
"onset",
"duration",
"HED"
],
[
4.5,
0,
"Item/{abc}"
]
]
],
"passes": [
Expand Down
6 changes: 3 additions & 3 deletions tests/run_consolidate_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ def combine_tests(test_names, test_dir, output_path):
def main(exclude_names=[], out_name='temp.json'):
relative_dir = "json_tests" # relative directory to read

script_dir = os.path.dirname(os.path.abspath(__file__)) # directory of this script
target_dir = os.path.join(script_dir, relative_dir) # full path of the
script_dir = os.path.dirname(os.path.abspath(__file__)) # directory of this script
target_dir = os.path.join(script_dir, relative_dir) # full path of the

# Write the indicated files
file_names = [f for f in os.listdir(target_dir) if os.path.isfile(os.path.join(target_dir, f))]
Expand All @@ -30,7 +30,7 @@ def main(exclude_names=[], out_name='temp.json'):


if __name__ == '__main__':
exclude_names =['SCHEMA', 'TAG_NAMESPACE', 'VERSION_DEPRECATED']
exclude_names = ['SCHEMA', 'TAG_NAMESPACE', 'VERSION_DEPRECATED']

javascript_name = "javascript_tests.json"
main(exclude_names, javascript_name)
Expand Down
8 changes: 0 additions & 8 deletions tests/test_summarize_testdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ def setUpClass(cls):
cls.test_files = [os.path.join(test_dir, f) for f in os.listdir(test_dir)
if os.path.isfile(os.path.join(test_dir, f))]


@staticmethod
def get_test_info(test_file, details=True):
indent = " "
Expand Down Expand Up @@ -55,13 +54,6 @@ def test_summary(self):
print(out_str)
self.assertEqual(True, True) # add assertion here

# def test_summary_full(self):
# for test_file in self.test_files:
# print(test_file)
# out_str = self.get_test_info(test_file, details=True)
# print(out_str + '\n')
#
# self.assertEqual(True, True) # add assertion here


if __name__ == '__main__':
Expand Down