add some helpers to deal with streams of bytes #81

jap · 2024-02-26T22:55:10Z

This PR adds two little helpers that make it easier to parse streams of bytes.

If this is something that stands a chance of being merged, I can also spend some time on adding docs.

(It also contains an unrelated update to the pre-commit config to make the flake8 plugin work again.)

codecov-commenter · 2024-02-27T12:33:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.57%. Comparing base (5be0cd3) to head (6ae512f).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #81      +/-   ##
==========================================
+ Coverage   94.44%   94.57%   +0.12%     
==========================================
  Files           8        8              
  Lines        1026     1050      +24     
==========================================
+ Hits          969      993      +24     
  Misses         57       57

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

spookylukey · 2024-02-27T12:37:56Z

This looks like a useful addition. Some questions (to which I don't know the answer, just thinking out loud):

should it be bconcat or concatb etc? Is there prior art in stdlib or somewhere?
- An argument for concatb- in terms of discoverability in IDEs that give completion, it's more obvious that you have two variants if you see concat and concatb next to each other. Similarly in docs, if they are next to each other you could avoid repeating everything. Maybe there are other arguments though.
are there other utilities that need the same treatment?

src/parsy/__init__.py

Unfortunately this requires manually writing out the mapped function into the resulting parser, as the `join` method needs to be bound to the same type as the elements to be joined. Because these elements can be an empty list, this type needs to be inferred from the type of the input stream.

Note that this already worked, except the type annotation was wrong and there were no tests.

jap · 2024-02-29T13:13:35Z

@wbolster 's comment makes sense! So I've just amended this commit to remove bstring again, but kept the tests and modified the type annotation of string to also accept bytes. (The function name is a bit confusing now but I can live with that.)

In a similar vein, I changed .concat to work on both lists of str and bytes, but unfortunately that was a bit more involved, and required writing out the bind/map operation because the mapped function needs access to the type of the input stream :(
The tests from the bconcat version still work though.

wbolster · 2024-02-29T13:15:21Z

src/parsy/__init__.py

- return self.map("".join)
+
+ @Parser
+ def parser(stream: bytes | str, index: int) -> Result:


this could be typing.AnyStr btw. (that would retain the type if Result would become a generic type)

wbolster · 2024-03-04T12:47:36Z

@spookylukey any chance you can have another look? in its current form this PR does not change any API, and just makes more stuff transparently work with both unicode strings (str) and binary strings (bytes)

spookylukey · 2024-04-30T20:04:32Z

This looks good, thanks!

This illustrates a flaw with the reverted commits from PR #81 Refs #81

spookylukey · 2024-05-02T13:30:57Z

I'm afraid I had to revert this.

Somehow in the middle of the night I realised the approach in this PR can't work. It makes the assumption that the type of the current parser's return values will be the type of the input stream, but there is no reason why this should be the case. Looking again, I see that this is documented in the new docstring, but it's a backwards incompatible breaking change.

I've added a test which demonstrates this - on reverting the change in this PR, the test passes, but not otherwise. We can't assume that no-one is writing code like in the test.

I can't see there is any way to get this to work. You can't test the type of the current return value either, as you might have zero items - an empty list of str is the same as an empty list of byte.

A backwards compatible alternative would be to do:

    def concat(self, joiner="") -> Parser:
        return self.map(joiner.join)

Would that work for you? Otherwise it's back to thinking about concatb etc.

wbolster · 2024-05-02T15:30:30Z

hmmm 🤔

adding an argument to .concat() to indicate how the joining should happen takes away most of the ‘it just works’ convenience, so i don't think it's really worth it in that case, as one could as well use

parser.map(b"".join)

… instead, which is more explicit, more flexible, and also shorter than the suggested

parser.concat(self, joiner=b"")

spookylukey · 2024-05-03T19:16:02Z

So we could go with concatb(), or we could also put this in the docstring of concat():

Synonym for .map("".join)

Which would give people a clue about how to easily implement concatb themselves

unbreak pre-commit by pointing it to github for flake8

294590f

wbolster reviewed Feb 28, 2024

View reviewed changes

src/parsy/__init__.py Outdated Show resolved Hide resolved

jap added 2 commits February 29, 2024 13:52

Advertise support for the string parser to also work on bytes.

be1cbc0

Note that this already worked, except the type annotation was wrong and there were no tests.

jap force-pushed the bytes-helpers branch from 6ae512f to be1cbc0 Compare February 29, 2024 13:04

wbolster reviewed Feb 29, 2024

View reviewed changes

spookylukey merged commit fbaff6b into python-parsy:master Apr 30, 2024

spookylukey added a commit that referenced this pull request May 2, 2024

Added test for concat where input stream is bytes

96e316b

This illustrates a flaw with the reverted commits from PR #81 Refs #81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add some helpers to deal with streams of bytes #81

add some helpers to deal with streams of bytes #81

jap commented Feb 26, 2024

codecov-commenter commented Feb 27, 2024

spookylukey commented Feb 27, 2024

jap commented Feb 29, 2024

wbolster Feb 29, 2024

wbolster commented Mar 4, 2024

spookylukey commented Apr 30, 2024

spookylukey commented May 2, 2024

wbolster commented May 2, 2024

spookylukey commented May 3, 2024

add some helpers to deal with streams of bytes #81

add some helpers to deal with streams of bytes #81

Conversation

jap commented Feb 26, 2024

codecov-commenter commented Feb 27, 2024

Codecov Report

spookylukey commented Feb 27, 2024

jap commented Feb 29, 2024

wbolster Feb 29, 2024

Choose a reason for hiding this comment

wbolster commented Mar 4, 2024

spookylukey commented Apr 30, 2024

spookylukey commented May 2, 2024

wbolster commented May 2, 2024

spookylukey commented May 3, 2024