You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Moreover (feel free to factor out into a separate issue): Regarding the list of characters that must be percent-encoded, I wonder if it has been considered to include ( and ) here. Their presence may not strictly make the output ambiguous, but does make it more complex to parse.
For example, I might quote (but not crazy) in the text this is an artificial (but not crazy) example with a RangeSelector: (line breaks inserted for readability)
Note that the total of three closing parentheses after crazy. I think one cannot decide how many of these are part of the cited string (one), and how many are part of the selector(…) syntax (two), without the parser either backtracking or keeping track of its recursion depth.
The proof of concept converter tool is based on PEG.js, which does not support backtracking, so if I am not mistaken cannot parse this. In fact, that tool does not allow parentheses in the values at all — see the last line of the source, where it defines validchar as any of a-zA-Z0-9<>/[]:%+@.-!$&;*_ (is this list based on a particular spec?).
The text was updated successfully, but these errors were encountered:
tilgovi
added a commit
to apache/incubator-annotator
that referenced
this issue
Apr 3, 2020
Due to unresolved questions about the fragment identifier format and
difficulties around parsing the fragment identifier correctly, remove
the fragment-identifier package entirely for the time being.
See also [w3c/web-annotation#443].
Close#66.
[w3c/web-annotation#443]: w3c/web-annotation#443
tilgovi
added a commit
to apache/incubator-annotator
that referenced
this issue
Apr 3, 2020
Due to unresolved questions about the fragment identifier format and
difficulties around parsing the fragment identifier correctly, remove
the fragment-identifier package entirely for the time being.
See also [w3c/web-annotation#443].
Close#66.
[w3c/web-annotation#443]: w3c/web-annotation#443
Due to unresolved questions about the fragment identifier format and
difficulties around parsing the fragment identifier correctly, remove
the fragment-identifier package entirely for the time being.
See also [w3c/web-annotation#443].
Close#66.
[w3c/web-annotation#443]: w3c/web-annotation#443
In the note about selectors and states, section 5:
The referenced RFC3986 defines the following grammar for a fragment identifier:
In some of the note’s examples, some characters are not percent-encoded that are not actually valid in a fragment identifier.
Square brackets
[
]
are are found in e.g. example 18:Angular brackets
<
>
(which delimit URIs, so cannot be used inside them) are found in e.g. example 17:May this be worth a thorough review?
Parentheses
Moreover (feel free to factor out into a separate issue): Regarding the list of characters that must be percent-encoded, I wonder if it has been considered to include
(
and)
here. Their presence may not strictly make the output ambiguous, but does make it more complex to parse.For example, I might quote
(but not crazy)
in the textthis is an artificial (but not crazy) example
with a RangeSelector: (line breaks inserted for readability)Note that the total of three closing parentheses after
crazy
. I think one cannot decide how many of these are part of the cited string (one), and how many are part of theselector(…)
syntax (two), without the parser either backtracking or keeping track of its recursion depth.The proof of concept converter tool is based on PEG.js, which does not support backtracking, so if I am not mistaken cannot parse this. In fact, that tool does not allow parentheses in the values at all — see the last line of the source, where it defines
validchar
as any ofa-zA-Z0-9<>/[]:%+@.-!$&;*_
(is this list based on a particular spec?).The text was updated successfully, but these errors were encountered: