Replies: 9 comments 8 replies
-
I personally like this very much! As a user, I was also surprised that djot
parses markup in links, but then does nothing with it (see the third bullet
from https://github.com/jgm/djot/issues/232). I wasn’t able to pin down
exactly what’s wrong, but I think you nailed it perfectly — indeed, the
problem seems to be that the link text should be treated as raw, verbatim
text, rather than as something which can have markup inside. This makes
sense because, like raw inlines, links are leaf lelements and can’t further
be nested.
…On Saturday, 30 September 2023, Natasha Kerensikova < ***@***.***> wrote:
Hello, I'm here to question the parsing of *[link](url*), and not only
because I'm WIP parser has some issue parsing it.
First let's consider a regular emphasis instead of a strong one, because
these are much more common in URLs. Then let's consider a slightly more
contrived example:
_``foo_bar``{foo_bar=baz} [link](foo_bar) [link][foo_bar] end._
Can you spot where emphases start and end?
In more abstract (but slightly biased) terms, the rational mentions three
types of containers: block-level, inline-level, and raw text. In my (young)
mental model, a _ in raw text is just a _ and has nothing to in emphasis,
so I don't expect ``foo_bar`` to open or close an emphasis. I can easily
classify attributes in the same raw text type, because of the "low-level"
and "ast-leaf" feel of attributes.
And since references and direct links also don't contain anything other
than a string and are not "real" text, I would be tempted to classify them
as raw text as well.
It turns out that currently, they are not raw text, but they are
(obviously) not inline-level either. They are in a weird fourth type, where
emphasis can be closed but not opened (see #88
<#88>). This fourth type is a
significant burden of my mental model, and I think it would be a good think
to see it gone.
I found and understand the issue of "infinite look-ahead" issue of ](...),
and yet AFAICT we already have the same issue with attributes, looking all
the way to } or the end of the current block before deciding whether
foo_bar=baz contains an emphasis delimiter.
So at this point, as a user wanting a lightweight cognitive overload from
her lightweight markup language, my backwards-incompatible proposition is
to treat ](, ][, and attribute-opening { the same way we treat inline
code spans: they start a URL/reference/attribute span, without any emphasis
or any other inline-element delimiter, all the way to their corresponding
closing marker or to the end of the block. Maybe ]( should be implicitly
closed by the next ASCII space or tab instead of the end of the block.
I think there could be a case to expand this implicitly-closing scheme to
all inline elements, so that there is no spooky interaction at a distance
which makes _foo open an emphasis or not depending on whether a match can
be found before the end of the block (but if looking all the way to the end
of the block is too much cognitive overload, your block is too long, so
it's more a parser-writer matter than a user matter). I guess it would be a
mater of the trade-off between consistency and false-positive rate. And we
can avoid spooky interaction in foo_bar anyway.
—
Reply to this email directly, view it on GitHub
<#247>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANB3M7EDJIPJQTKOSBOWE3X47PHXANCNFSM6AAAAAA5NNTWBA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
#232 shows a blindspot in my proposal, in that some kind of escape of closing parenthesis would be needed:
There is no such issue with references, because references already appear only within brackets, so the syntax naturally prevents any embedded There is a somewhat related issue with attributes: if we don't backtrack, what to do when encountering a parse error? I would be tempted to extend attribute syntax to make any text parsable, but including all punctuation feels messy. Maybe using for |
Beta Was this translation helpful? Give feedback.
-
There's an important disanalogy: anything can count as verbatim text (inline code), but there are restrictions on what can be a link destination, reference, or attribute. Suppose someone writes
How, exactly, do we treat this an an attribute? |
Beta Was this translation helpful? Give feedback.
-
That is indeed what is left to debate to turn this proposition into a specification. I wouldn't mind treating it as Unspecified Behavior, since the construct is not valid, and let whatever result be implementation-defined. However I agree there is value (for users) in consistency across implementations, I just think it places an upper bound to the amount of efforts to specify what happens in anomalous cases. What i had in mind was to parse URL/reference/attribute all way to their corresponding closing marker, or to the end of the block, or to where it can't be parsed anymore. In that case it depends on how much attribute syntax is extended. With current attribute restrictions, this would backtrack (only) from the space after
If we extended attribute syntax to allow null values, that would make the following (which I think is still valid) HTML:
If we extended attribute syntax to allow the same character set for keys and barevalues as we already do for names, it would be a span with 5 null-valued attributes, but I don't how a HTML render could make something useful with them. Similarly, for URLs an implicit closing would be added before the first whitespace, while for references AFAICT there is no way to include something invalid. I agree that none of these ideas are really useful outcomes for the user, at least directly. However, having an invisible part of the construct eat everything until the end of the block makes debugging much easier by pointing exactly where the syntax error is. As a user, this is something I prefer by far compared to trying to second-guess what I could have meant and getting it almost-but-not-quite right (to the point that I have removed all brackets from my website so I can use a text-search to find markdown links I mistyped). |
Beta Was this translation helpful? Give feedback.
-
This seems reasonable. What would be really nice (in terms of avoiding implementation complexity) would be to eliminate the need for any backtracking in the attribute parser. For example we could treat an unclosed quoted value as implicitly closed. Another alternative would be to throw an error in any of these conditions. That's not something that markdown parsers have ever traditionally done -- every document is valid markdown, it's just a question of what it means. And it can be nice in many contexts not to have to worry about the possibility of an error. But arguably this would be more helpful to the user. |
Beta Was this translation helpful? Give feedback.
-
Assuming we allow null-valued attributes (promoting them to empty-string-valued attribute if needed), we can end attribute parsing on the first unrecognized character, if that happens in a value that ends the key-value pair, and if that happens in a key that ends a null-valued key, and the unrecognized character can be handed over to the inline parser. Depending on the parser architecture, that might count as a one-character backtrack or a natural data flow. That is the simplest I can imagine, which would lead to the slightly-worse rendering for your example:
Being a |
Beta Was this translation helpful? Give feedback.
-
We're considering Djot syntax for a documentation tool, and at least having a version of the spec in which these cases are simply parse errors would really increase its appeal. I hear feedback from users today, who are mostly writing in Markdown and ReST using other tools, that they have to spend a bunch of time double-checking output because typos in their markup silently lead to unintended HTML rather than to red squiggle underlines. |
Beta Was this translation helpful? Give feedback.
-
Thinking about it a more, I would like to add two more points. First, inline code spans are not the only precedent where Djot has opening markers which open unconditionally, and a lack of closing marker automatically closes the element when its parent is closed. This is also the case at block level with fenced div and fenced code blocks:
So I'm tempted to argue that the logic can be extended not only to attribute spans and inline URLs, but also to all span elements. Let I have no idea what impact it would have on parsers (mine or any other), I came up with this idea considering only the parser in my brain when I look at some Djot source (to be fair, that brain is usually relies syntax coloration, so that would mean Djot source in a terminal or a textarea). The second point is an answer to the comment raised by @david-christiansen I think the official parsers as well as mine already have some kind of warning mechanism (at least for unmatched link references), that could be leveraged by having the parser in the documentation tool have a kind of What I mean is that the specific parser could help with that situation, while still allowing others parsers to be more lax and something it guessed the used might have meant for any malformed input. So at Djot specification level, the question becomes whether to standardize only correct input, and let parsers do whatever they want in other cases, or go further and specify a set of errors and/or warnings and fallbacks for some or all incorrect input. I don't have a strong opinion either way, I'd rather let that to the vision of those here before me. A tighter specification is more work for better interoperability, at the risk of mandating a certain parser architecture (I saw that in CommonMark), while a looser specification allows more parser diversity at the risk of surprising users when going from one parser to another (which is one of the main issues with Markdown). |
Beta Was this translation helpful? Give feedback.
-
Currently, any non-closing marker can open an emphasis/strong, and any non-opening marker can close it. Perhaps it would be more logical if only opening marker could open inline block, and only closing one - close. And this will look more natural, since in modern texts the underscore, for example, is standard practice for connecting words. The only “inconvenient” option in this case will look like:
But for such a rare situation, escape comes in handy. |
Beta Was this translation helpful? Give feedback.
-
Hello, I'm here to question the parsing of
*[link](url*)
, and not only because my WIP parser has some issue parsing it.First let's consider a regular emphasis instead of a strong one, because these are much more common in URLs. Then let's consider a slightly more contrived example:
Can you spot where emphases start and end?
In more abstract (but slightly biased) terms, the rational mentions three types of containers: block-level, inline-level, and raw text. In my (young) mental model, a
_
in raw text is just a_
and has nothing to with emphasis, so I don't expect``foo_bar``
to open or close an emphasis. I can easily classify attributes in the same raw text type, because of the "low-level" and "ast-leaf" feel of attributes.And since references and direct links also don't contain anything other than a string and are not "real" text, I would be tempted to classify them as raw text as well.
It turns out that currently, they are not raw text, but they are (obviously) not inline-level either. They are in a weird fourth type, where emphasis can be closed but not opened (see #88). This fourth type is a significant burden of my mental model, and I think it would be a good thing to see it gone.
I found and understand the issue of "infinite look-ahead" issue of
](...)
, and yet AFAICT we already have the same issue with attributes, looking all the way to}
or the end of the current block before deciding whetherfoo_bar=baz
contains an emphasis delimiter.So at this point, as a user wanting a lightweight cognitive overload from her lightweight markup language, my backwards-incompatible proposition is to treat
](
,][
, and attribute-opening{
the same way we treat inline code spans: they start a URL/reference/attribute span, without any emphasis or any other inline-element delimiter, all the way to their corresponding closing marker or to the end of the block. Maybe](
should be implicitly closed by the next ASCII space or tab instead of the end of the block.I think there could be a case to expand this implicitly-closing scheme to all inline elements, so that there is no spooky interaction at a distance which makes
_foo
open an emphasis or not depending on whether a match can be found before the end of the block (but if looking all the way to the end of the block is too much cognitive overload, your block is too long, so it's more a parser-writer matter than a user matter). I guess it would be a mater of the trade-off between consistency and false-positive rate. And we can't avoid spooky interaction infoo_bar
anyway.Beta Was this translation helpful? Give feedback.
All reactions