Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some helpers to deal with streams of bytes #81

Merged
merged 3 commits into from
Apr 30, 2024

Conversation

jap
Copy link
Contributor

@jap jap commented Feb 26, 2024

This PR adds two little helpers that make it easier to parse streams of bytes.

If this is something that stands a chance of being merged, I can also spend some time on adding docs.

(It also contains an unrelated update to the pre-commit config to make the flake8 plugin work again.)

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.57%. Comparing base (5be0cd3) to head (6ae512f).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
+ Coverage   94.44%   94.57%   +0.12%     
==========================================
  Files           8        8              
  Lines        1026     1050      +24     
==========================================
+ Hits          969      993      +24     
  Misses         57       57              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@spookylukey
Copy link
Member

This looks like a useful addition. Some questions (to which I don't know the answer, just thinking out loud):

  • should it be bconcat or concatb etc? Is there prior art in stdlib or somewhere?
    • An argument for concatb- in terms of discoverability in IDEs that give completion, it's more obvious that you have two variants if you see concat and concatb next to each other. Similarly in docs, if they are next to each other you could avoid repeating everything. Maybe there are other arguments though.
  • are there other utilities that need the same treatment?

src/parsy/__init__.py Outdated Show resolved Hide resolved
Unfortunately this requires manually writing out the mapped function
into the resulting parser, as the `join` method needs to be bound to
the same type as the elements to be joined. Because these elements can
be an empty list, this type needs to be inferred from the type of the
input stream.
Note that this already worked, except the type annotation was wrong
and there were no tests.
@jap
Copy link
Contributor Author

jap commented Feb 29, 2024

@wbolster 's comment makes sense! So I've just amended this commit to remove bstring again, but kept the tests and modified the type annotation of string to also accept bytes. (The function name is a bit confusing now but I can live with that.)

In a similar vein, I changed .concat to work on both lists of str and bytes, but unfortunately that was a bit more involved, and required writing out the bind/map operation because the mapped function needs access to the type of the input stream :(
The tests from the bconcat version still work though.

return self.map("".join)

@Parser
def parser(stream: bytes | str, index: int) -> Result:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be typing.AnyStr btw. (that would retain the type if Result would become a generic type)

@wbolster
Copy link

wbolster commented Mar 4, 2024

@spookylukey any chance you can have another look? in its current form this PR does not change any API, and just makes more stuff transparently work with both unicode strings (str) and binary strings (bytes)

@spookylukey
Copy link
Member

This looks good, thanks!

@spookylukey spookylukey merged commit fbaff6b into python-parsy:master Apr 30, 2024
spookylukey added a commit that referenced this pull request May 2, 2024
This illustrates a flaw with the reverted commits from PR #81

Refs #81
@spookylukey
Copy link
Member

I'm afraid I had to revert this.

Somehow in the middle of the night I realised the approach in this PR can't work. It makes the assumption that the type of the current parser's return values will be the type of the input stream, but there is no reason why this should be the case. Looking again, I see that this is documented in the new docstring, but it's a backwards incompatible breaking change.

I've added a test which demonstrates this - on reverting the change in this PR, the test passes, but not otherwise. We can't assume that no-one is writing code like in the test.

I can't see there is any way to get this to work. You can't test the type of the current return value either, as you might have zero items - an empty list of str is the same as an empty list of byte.

A backwards compatible alternative would be to do:

    def concat(self, joiner="") -> Parser:
        return self.map(joiner.join)

Would that work for you? Otherwise it's back to thinking about concatb etc.

@wbolster
Copy link

wbolster commented May 2, 2024

hmmm 🤔

adding an argument to .concat() to indicate how the joining should happen takes away most of the ‘it just works’ convenience, so i don't think it's really worth it in that case, as one could as well use

parser.map(b"".join)

… instead, which is more explicit, more flexible, and also shorter than the suggested

parser.concat(self, joiner=b"") 

@spookylukey
Copy link
Member

So we could go with concatb(), or we could also put this in the docstring of concat():

Synonym for .map("".join)

Which would give people a clue about how to easily implement concatb themselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants