-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add source position information to span nodes #202
base: main
Are you sure you want to change the base?
Add source position information to span nodes #202
Conversation
Prior to this change, variables were using _index and _pos suffixes (or no suffix) to all refer to stream indexes. This change switches all of them to use _index, which will make future refactors clearer.
Edit: On further checking there doesn't seem to be an issue, but I have updated the tests to make things clearer |
Prior to this change, the only positional variables which were tracked when moving through the stream was the current index in the stream. We would like to start reporting richer positional information from the parser, specifically the position (in terms of row/column) in the source object. This change allows this by updating the Stream class to additionally track its current (zero-indexed) row and column while moving through the stream. We will persist this information into the syntax tree in the following commits.
Prior to this change, the location in the parser stream was represented by the current index. We would like to update the parser stream to also return the source position (row/column) as part of the location information. This change adds a type alias to represent a location in the parser stream, and a property to return it. Currently, this is just the index, but we will update it in future commits. This allows us to break up the change to make it easier to review, as we can switch code paths which create span information to use locations before making any functional changes.
Prior to this change, we were passing around integers when creating span information. This change uses locations instead. This has no functional change, but will make it easier to start passing through source position information in the following commit.
Prior to this change, we were tracking source position information in the stream, but were not storing it in the resulting spans. This change adds start_position and end_position SourcePosition attributes to Span nodes. To make this work we end up complicating the constructor for Span nodes, so that it can either take a location or an index/position combo (for JSON deserialization).
00ec1ca
to
314ca90
Compare
The size of the diff here held me off from looking too closely for a while... Have you considered other approaches to speeding up For one possible approach, for the JS Could that sort of an approach work for you as well? |
No problem about the delay, thanks for getting back to me. I think there are definitely other approaches which could speed up My original thinking was that by providing tools to enable that inside |
I would prefer an approach that added the least cost to users who do not need the line/column position for all nodes. Always calculating the positions during the parse seems like mostly unnecessary work, while providing some kind of side channel for getting the newline indices during the parse seems like a much lighter addition to |
(Note: The original context for this PR is covered in django-ftl/fluent-compiler#32, but the key parts are copied in here)
Hi 👋. Firstly, thanks for this library, and for the project in general!
I'm currently working with
python-fluent
(viafluent-compiler
anddjango-ftl
) as part of a large Django project. Its generally been great, but unfortunately parsing and compiling our fluent files contributes quite a lot (~10s) to our apps startup time.With that in mind, I was hoping to contribute a couple of optimisations to
fluent-compiler
andproject-fluent
to speed up the handling of large fluent files.One of the biggest contributor to compile times in
fluent-compiler
is thespan_to_position
function. This function takes thestart
of aSpan
node, and coverts it into (row
,column
) tuple. However in order to do this, it needs to scan through the source for all newlines to work out how many rows their have been.This PR is an attempt to solve that problem by returning more information from the parsing code in
fluent.syntax
. Specifically, we add a new type of AST node to represent a row/column position in the source. We then track the current row/column while scanning through the source stream, and store that inSpan
nodes as they are created.I've tried to make this PR as easy-to-review as possible, but unfortunately there is still quite a large diff, so if there is anything which would be helpful in terms of breaking commits up/squashing them together, please let me know.
I'm not entirely clear on whether the structure of the AST nodes is part of the fluent spec, or if it is something which
python-fluent
can decide unilaterally. If this turns out to be an insurmountable problem, I can switch to trying to solve this insidefluent-compiler
instead.(Note: This PR will result in a one-line merge conflict with #201, so I will rebase as needed if one is merged first)