Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible improvements for tool builders #29

Open
neon12345 opened this issue Feb 9, 2024 · 10 comments
Open

Possible improvements for tool builders #29

neon12345 opened this issue Feb 9, 2024 · 10 comments

Comments

@neon12345
Copy link

I am the author of a tool for converting the antlr parse tree into a denser AST, which can output a standalone JavaScript version to query/transform/print/visualize using CSS selectors. (The end goal is to use machine learning with rule induction for AST transformation/printing.) Demo

Since there is the possibility of changing things with a new version, it would be nice to have a unique antlr state for as many positions in the grammar as possible. This does not currently apply to recursive rules.

@ftomassetti
Copy link
Collaborator

Hi @neon12345 I trying the demo but I get an error. Can you help me figuring out how to use the tool?
I also created several libraries to generate an AST on top of ANTLR's parse tree. One we created for TypeScript is called Tylasu https://github.com/Strumenta/tylasu . Maybe there are some overlaps with you work.

I think getting suggestions for ANTLR 5 is very useful, especially when coming from tool builders like you.

Regarding your question, I am not sure I get what you mean with "unique antlr state for as many positions in the grammar as possible". Could you give me an example?

@neon12345
Copy link
Author

@ftomassetti

Your ISP needs to support IPv6. Other than that, when you press the Run button, the source code on the top left is sent to the server and the returned JavaScript AST is transformed with the script at the bottom into the top right result editor. There is also some help accessible with the bottom tab.

@neon12345
Copy link
Author

@ftomassetti

For recursive rules, the parentState is set for the context. What we need from the context and currently have to patch in, is the state before the precpred.

        _localctx = _tracker.createInstance<PatternContext>(parentContext, parentState);
        pushNewRecursionContext(_localctx, startState, RulePattern);
        setState(1913);

        if (!(precpred(_ctx, 2))) throw FailedPredicateException(this, "precpred(_ctx, 2)");
        setState(1914);
        match(Swift5Parser::AS);

@ericvergnaud ericvergnaud pinned this issue Feb 9, 2024
@ericvergnaud ericvergnaud unpinned this issue Feb 9, 2024
@ftomassetti
Copy link
Collaborator

ftomassetti commented Feb 12, 2024

@neon12345 this is how I see the demo: https://www.loom.com/share/1523249bd9f2458785db29ef2c4ea421
I guess my ISP should support IPv6

@neon12345
Copy link
Author

neon12345 commented Feb 12, 2024

@ftomassetti
Can you test IPv6 compatibility with for example https://ipv6-test.com or https://test-ipv6.com/ to be sure? Also try ping antlr.syncsocial.de

@kaby76
Copy link
Collaborator

kaby76 commented Feb 12, 2024

For recursive rules, the parentState is set for the context. What we need from the context and currently have to patch in, is the state before the precpred.

Can't this be compute using "parentState" and the ATN containing that NFA state number? Given a parse tree, get "parentState" and "see if it is one of your recursive productions". Then, examine the ATN: follow the edges in the ATN back until you find a transition that involves "_p". Presumably, this would be a PrecedencePredicateTransition edge.

@neon12345
Copy link
Author

Can't this be compute using "parentState" and the ATN containing that NFA state number?

The way how we work with antlr is a visitor that gives the information about the position in the grammar with the antlr states. From a parsing perspective, it is best to get the information from the current context without many calculations. Thus from our perspective, it would be more valuable to give another state to the context. But I don't know other use cases and why the state had to be stored this way.

@ftomassetti
Copy link
Collaborator

@ftomassetti Can you test IPv6 compatibility with for example https://ipv6-test.com or https://test-ipv6.com/ to be sure? Also try ping antlr.syncsocial.de

Oh... surprisingly ipv6 does not work for me, so I am afraid I cannot watch the demo

@kaby76
Copy link
Collaborator

kaby76 commented Feb 12, 2024

it would be more valuable to give another state to the context. But I don't know other use cases and why the state had to be stored this way.

Considering the result of a parse is a parse tree, it seems fine to "hang" the "calling state of the ATN to another ATN" in the parent parse tree node. (It really should have been an edge ID because, possibly, one could have multiple edges with the same non-terminal symbol from a state.)

To avoid computing the precedence predicate state in a parse tree traversal, you could pre-compute prior to parse the precedence predicate state from the ATNs, and place them in a O(1) map.

So, for the java grammar, state "1372" would be the "parentState" for an ExpressionContext, but the computed precedence predicate state be "1370".

graphviz (12)

@neon12345
Copy link
Author

@kaby76
Thanks for the suggestions, I don't have a problem getting the values I need and it is fine for me to keep things like they are. I just wonder if the stored information is used as intended or could be improved, considering the state is also not consistently stored in the parse tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants