Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_literals should include opening and closing quotes as child nodes #126

Open
dhanak opened this issue Jan 11, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@dhanak
Copy link

dhanak commented Jan 11, 2024

Initial problem: I don't want TAB to mess with the insides of my strings, especially my multiline docstrings. I don't want it to remove tabs/spaces from or insert them into the start of lines (or anywhere, for that matter) inside docstrings. I do, however, want to indent a line that has a string on it, assuming the line starts with the opening quote (e.g., passed as an argument to a function).

"""
Don't indent this line.
    Don't unindent this line either.
"""
print("foo",
      "bar", # ident this line, when I press TAB
      "baz\nbazinga$(x)") # also indent this line

Solution attempt: Add indentation rule to keep indentation as is when the parent-is a string_literal. In Emacs terms, add this to treesit-simple-indent-rules:

((parent-is "string_literal") no-indent 0)

Complication: string_literals can be leaf nodes, when they are just plain strings, but can also be branch nodes, when they contain interpolation or escape sequences. I argue that they should always be branch nodes, and always include the opening and closing quotes as leaf nodes, as a minimum. Without it, all their non-special contents effectively appear as whitespace to tree-sitter clients.

And this is a problem, because the tree-sitter indentation implementation in Emacs (which I assume is conceptually correct), treesit--indent-1, first identifies the nearest leaf node around or after the beginning of the line. Then it finds the matching indentation rule, and indents based on that. That is, it essentially ignores all whitespace, which is good. But here it also ignores the contents of the string as well as the opening quote, which is bad.

With the current node tree, there is no way to tell whether the start of the line is inside the string (that includes escape sequences or interpolations), or at the opening quote. At least, I couldn't figure out how that can be done.

When its just a plain string, without child nodes, then it works as expected, because the string_literal is the leaf node, and its start concides with the beginning of the line, so the rule doesn't apply.

I'm using Emacs v29.1, and a julia treesit grammar from Dec 18, 2023 (I don't know how its exact version can be determined).

@savq
Copy link
Collaborator

savq commented Jan 27, 2024

I think I understand the issue, but it's not super-clear why having the quotes as child nodes solves it.

Anyways, we already have tokens for string start/end quotes, they're just not visible. Fixing this might just require making them visible and updating the tests.

@savq savq added the enhancement New feature or request label Jan 27, 2024
@dhanak
Copy link
Author

dhanak commented Jan 29, 2024

When the open quote is a child node, I can add a rule to match it and indent it. And add other rules to keep the indentation of any other child node of a string_literal node. When the quote is not a child node, I simply cannot match for the beginning of the string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants