Replies: 10 comments 5 replies
-
@joshbruce Did you use a script to generate this table? We are failing 157 of 624 commonmark tests which means we're passing 467 of 624. That's about 75% spec-compliant in v0.5.0 👍 |
Beta Was this translation helpful? Give feedback.
-
@styfle: Nice! No I did it by hand back in April. Think we've had two releases since then. Not sure I have time right now to keep up with maintaining the ticket, maybe we should close it?? |
Beta Was this translation helpful? Give feedback.
-
Scratch that last about spec compliance - just remembered it's an epic related to all the 0.x stuff. Maybe I can get to it this weekend. (Trying to find a new day job is full-time work as it has been said. 😀) |
Beta Was this translation helpful? Give feedback.
-
I counted the remaining So we're passing 502/624 which means v0.6.0 is 80% spec-compliant 👍 |
Beta Was this translation helpful? Give feedback.
-
v0.7.0 is > 82%
Keep up the good work, we are getting there slowly but surely. |
Beta Was this translation helpful? Give feedback.
-
It appears that a major roadblock to CommonMark compliance is the block-level tokenizing strategy that is being used here. marked uses a recursive block tokenizing strategy, but CommonMark was very clearly written with a line-by-line token strategy in mind. (see: the block structure parsing strategy recommended by CommonMark) Many of the failed test cases in the container blocks sections (blockquotes, list items, and lists) stem from the fact that CommonMark is fundamentally designed around the idea of line-by-line tokenizing while marked uses recursive tokenizing. marked's recursive tokenizing strategy works as follows: Whenever it detects the start of a new container block it
The problem is that with CommonMark it is impossible to determine when a code block ends until you've tokenized the contents of the container block due to CommonMark's lazy continuation. It is, however, very difficult to tokenize the contents of a container block without knowing where the container block ends. The obvious solution is to rewrite the block-level tokenizer to operate line-by-line instead of recursively. It would help achieve CommonMark/GFM complience, and it would remove some of the "hacky" code that already exists in the tokenizer. Is this something that maintainers of this project would find acceptable if done well? Or is this such a drastic change to the project that a pull request like this would be immediately denied? |
Beta Was this translation helpful? Give feedback.
-
As long as marked is still easy to extend, fast, secure, and all of the current tests pass I don't think it matters how it happens. Full CommonMark compliance would be amazing but I don't know that it is the golden trophy. There are some specs that just don't make actual sense in the real world. For Example #416:
I don't think sacrificing speed is worth making sure a |
Beta Was this translation helpful? Give feedback.
-
Here are the latest values for reference. I copied these from the latest CI run GFM
CommonMark
|
Beta Was this translation helpful? Give feedback.
-
V4.2.2Backslash escapes is 100% 🎉 GFM
CommonMark
|
Beta Was this translation helpful? Give feedback.
-
V4.2.3GFM
CommonMark
|
Beta Was this translation helpful? Give feedback.
-
Marked version: 0.3.19
Markdown flavor: CommonMark
Proposal type: other
What pain point are you perceiving?
We are not compliant with the CommonMark specification. The CM spec is the foundation for the GitHub Flavored Markdown specification. (GFM only adds some extensions for things like tables.)
Part of how we got here was that historically we did not strictly stick to the specification. In some cases we implemented custom features not discussed in the specifications (header id, for example).
What solution are you suggesting?
PR #1160 looks to run Marked against the CommonMark spec test cases: http://spec.commonmark.org/0.28/spec.json
So far the results show that we are roughly 60% spec-compliant.
The spec is divided into sections and each section has a number of test cases. (Counts may be off and are meant to represent estimate percent compliance with each section.)
Beta Was this translation helpful? Give feedback.
All reactions