Skip to content
This repository has been archived by the owner on Nov 5, 2024. It is now read-only.

Add TextMate grammar support back to VSCode extension #410

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Myriad-Dreamin
Copy link
Collaborator

@Myriad-Dreamin Myriad-Dreamin commented Jan 4, 2024

This PR adds TextMate grammar support back to VSCode extension, using grammar from michidk, @michidk.

There is a problem drop at the begin of PR: what's the relationship between tm grammars and semantic tokens providers? The short answer is we need both of them.

From some comment and commit, a language support would like to provide both of them to gain a best syntax and semantic language support.

In #252, @nvarner commented:

If TextMate and semantic token highlighting are both enabled at once, the semantic tokens appear on top of TextMate (this is the intention behind semantic tokens; the stated purpose is to highlight just a few things based on semantics that TextMate can't do by syntax, like based on if a variable is mutable or not). However, the TextMate grammar is out of date; I don't think it's been updated since it was pulled from an early version of Typst (but see #256 for a possible small update). It's hard to maintain the TextMate grammar because Typst's syntax is still in flux, whereas semantic token highlighting "just works" because it's based on Typst's own parser. If the TextMate grammar makes up the background of the semantic tokens, errors in it could highlight the code wrong, and it would have some cost for VS Code to perform TextMate highlighting.

It is correct that "the TextMate grammar is out of date", but there are two direction rather than avoid using TextMate grammar.

  1. simplify TextMate grammar to avoid conflict with semantic token highlighting.
  2. fix them if users report them.

I prefer to point 2, since we community also needs a nice TextMate grammar for syntax highlighting on GitHub. It would be nice if we can contribute one.

What does it fix?


Results are following:

image

@nvarner
Copy link
Owner

nvarner commented Jan 4, 2024

I've done similar research and come to similar conclusions, and I have started work on this in the readd-textmate branch.

In particular, I've been writing a new textmate grammar based on the Typst lexer/parser. This is because any incompatibility between the grammar and Typst could cause worse bugs than we have without textmate. For example, if a region of markup was incorrectly highlighted by TM as code, it would be hard for users to see why only certain brackets are wrongly highlighted (or not highlighted). Moreover, because users can't opt out of TM highlighting, a bug like this would affect all users.

While writing this grammar, I've come to believe that it's likely impossible, or at least very difficult, for TM to correctly give a complete highlighting of Typst code (in summary, it has very poor support for syntactic structures which run across multiple lines, and Typst has these). This has led me to a few other ideas:

  1. Try again with a much simpler grammar. It would mostly split a Typst file into markup/code/math regions, then enforce correct bracket matching within each region. We could also provide injection grammars if desired to provide better highlighting. I plan to try this soon.
  2. Change the VS Code config so it doesn't even attempt bracket highlighting, and don't provide a TM grammar. This is certain to resolve the bracket highlighting issue, but comes with a number of downsides, so I'd rather not take this approach unless no other will work.

I also have a few relevant thoughts on the scope of this project. In particular, I think #252, #260 and #375 are strictly out of scope. I view this project as primarily a language server with associated extensions for language clients when needed, so VS Code's behavior when it does not use the LSP, and the reliance of other extensions on TM scopes, isn't a core part of our requirements. So, even if it would resolve these issues, if TM causes problems, I would rather not add it. That said, I'd like to resolve those issues if possible, since it would improve the user experience with Typst.

Similarly, I don't think of it as the responsibility of this project to provide a TextMate grammar for Typst. It's also possible for other VS Code extensions to provide a TM grammar for Typst in case users would like non-LSP highlighting. If we provide a very simple grammar, other extensions (or even a second extension sourced from this repo) could provide injection grammars, and could easily be disabled if they cause problems for users. Again, I'd like to provide a grammar because it would be good to, but not if it comes at a high cost.

(On the topic of linked issues, #264 and #401 are the result of incorrect handling of multiline semantic tokens. It may be superficially hidden by TM, but the fundamental issue will remain. I'm aware of this and understand the underlying problem, but haven't gotten to fixing it yet.)

Ultimately, I think we shouldn't, and that we don't need to, ship an incorrect TM grammar. If we do include one, I'd prefer it be simple so that we have a lowered maintenance overhead.

  • As I recall, the original TM grammar from Typst was subtly wrong, and these bugs were the original reason I sought to replace TM highlighting by semantic highlighting. The Typst language has also changed over time, but the grammar doesn't seem to have been regularly updated. Do we know that this grammar is correct?
  • Do your views or priorities differ from mine on the scope of the project, expected maintenance requirements from TM grammars, or similar? Under different assumptions, we certainly would want to maintain a high-quality TM grammar.

@Myriad-Dreamin
Copy link
Collaborator Author

Myriad-Dreamin commented Jan 5, 2024

@nvarner, my consideration is following:

You position is totally correct and your comment brings me many new points. This repository is all about typst LSP, rather than typst language support. If we maintainers find a bug in TextMate grammar (doesn't ship in LSP extension), we have no responsibility to resolve them, but close them as unplanned.

However, as a part of community, if we don't have a way to redirecting them, it would be frustrated. for regular document writers, who don't quite know coding, it is hard to let them distinguish a LSP and a fully language support. When they find a syntax highlighting error, they consider a bug in LSP, come here, and open an issue, for example, Brackets are colorized inside strings and comments #314.

Considering that, imo the most correct decision is removing language-configuration.json and create a new vscode extension repository named "Typst Language Support" containing the language-configuration file and the tmLanguage file.

  • We focus on providing semantic highlighting, and don't get bother by textmate syntax highlighting or bracket colorizing. When some user report a related issue, we tell that we have no responsibility to resolve them, but close them as unplanned or (more smartly) transfer the issue to the language support repository. To me, I feel more comfortable on transferring issues than closing them as unplanned.

  • When users activate the LSP extension, we recommend them installing "Typst Language Support", so to teach them install some extra extension to gain a best experience. To me, I feel more comfortable on whatever contributing to user experience.

  • Considering that TM grammar is error prone, if we would like to we add it, we have better to isolate it in another extension, so that we can disable them separately.

  • As TM grammar becoming more mature, We most users may not meet a TM grammar error, so we can get it better in most cases. And once there are errors, they can manually disable syntax highlighting by "Typst Language Support" in some workspaces, but still get correct semantic highlighting.

It brings more complexity to users but more correct to me. As for writing a new tm grammar, I'm happy that you have restarted it, we can move discussion then.


The create a new "Typst Language Support", there are two options:

  • Keep them in current repository and put them in somewhere.

    Pros: our community is still small, we can aggregates issues and discussions.

    Cons: It is not a LSP thing.

  • Create a new repository and move syntax files to the new repo.

    Pros and Cons: conversely.


Experiment, we extension can contribute a language without https://github.com/nvarner/typst-lsp/blob/0d5b9330a1d515fa47456e755f751ea4056282b8/editors/vscode/language-configuration.json

image

Result with only LSP extension:

image

Result with only Language Support extension (TextMate grammar to add):

image

Result with only Both extension:

image

@Myriad-Dreamin
Copy link
Collaborator Author

If we consider removing the language-configuration file, we can change this PR and still close/transfer these 7 issues.

@nvarner
Copy link
Owner

nvarner commented Jan 7, 2024

I like your suggested approach. I think it's best to keep the Typst Language Support extension within this repository, since it will likely be closely related to the LSP extension. I'll do some more experimentation and find the best way to split the extension and make users aware that they'll need to install two things.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.