Fix Bech32 decoder + encoder #312

jonathanknowles · 2019-05-23T13:22:07Z

Issue Number

This PR fixes the following issues:

Bech32 decoder fails for certain valid Bech32 strings. #311 (Bech32 decoder fails for certain known-to-be-valid Bech32 strings)
Bech32 encoder can be coerced into producing a string that can't be decoded #314 (Bech32 encoder can be coerced into producing a string that can't be decoded)

Overview

For #311:

Adds a new type DataPart, which wraps [Word5].
Changes the types of decode and encode so that they operate on DataPart instead of ByteString.
Ensures that the round-trip relationship between encode and decode is preserved.
Provides auxilliary functions dataPartFromBytes and dataPartToBytes to convert to (and from) ByteString and DataPart:
- dataPartFromBytes pads with trailing zeros where appropriate.
- dataPartToBytes trims trailing zeros where appropriate.
Provides unit tests to ensure that all reference Bech32 strings mentioned in the Bech32 standard can be decoded successfully.

For #314:

Reworks smart constructors for HumanReadablePart and DataPart so that inputs are always converted to lower case.

KtorZ

The change itself looks good, a few remarks about style & infinite shrinkers 😅

lib/bech32/test/Codec/Binary/Bech32Spec.hs

lib/bech32/src/Codec/Binary/Bech32/Internal.hs

…h32 alphabet. Also add an accompanying `Read` instance for `DataPart` to accompany the existing `Show` instance. This change also adds a QuickCheck property for the following relationship: >>> read (show (dp :: DataPart)) == dp

…rnal`.

Use `Text` as the internal representation for `HumanReadablePart` and `DataPart`. As a side-effect, we can discard the `Read` and `Show` instances for `DataPart`.

The `encode` function was mistakenly expected (by the specification) to produce an invalid Bech32 string, when given an upper-case human readable part. If this invalid string is passed to the `decode` function, it fails (unsurprisingly) to decode. This change updates the specification so that the `encode` function is now expected to produce a valid Bech32 string.

In the `Arbitrary` instance for `HumanReadablePart`, we don't need to (and shouldn't) convert the generated string to lower-case before calling `humanReadablePartFromText`. The `humanReadablePartFromText` function itself already converts to lower case.

KtorZ

Looks good. Are you planning on making a PR upstream to fix the reference implementation ?

jonathanknowles · 2019-05-24T09:30:18Z

Looks good. Are you planning on making a PR upstream to fix the reference implementation ?

I'm happy to do that, as it's relatively simple to fix. But perhaps we should wait for upstream to confirm first? I've raised an issue here: sipa/bech32#49

KtorZ · 2019-05-24T09:35:11Z

Well, looking at the JavaScript implementation on bitcoinjs/bech32

We have the following behavior:

> bech32.encode('test', [])
'test12hrzfj'
> bech32.encode('Test', [])
'test12hrzfj'
> bech32.encode('TEST', [])
'test12hrzfj'
> bech32.encode('TEsT', [])
'test12hrzfj'
> bech32.encode('TesT', [])
'test12hrzfj'

> bech32.decode("test12hrzfj")
{ prefix: 'test', words: [] }
> bech32.decode("test13jgcyw")
Error: Invalid checksum for test13jgcyw

which is quite aligned to what we now expect. This seems like a fair indication that there's indeed a bug in the Haskell reference implementation.

jonathanknowles self-assigned this May 23, 2019

jonathanknowles requested review from KtorZ and piotr-iohk May 23, 2019 13:25

jonathanknowles force-pushed the jonathanknowles/bech32-fix branch from a6f3751 to 9eea6d3 Compare May 23, 2019 13:31

KtorZ suggested changes May 23, 2019

View reviewed changes

Rewrite Bech32 encoder and decoder in terms of base-32 (Word5) strings.

8f6e00a

jonathanknowles force-pushed the jonathanknowles/bech32-fix branch from 9eea6d3 to 2639338 Compare May 24, 2019 01:59

jonathanknowles added 4 commits May 24, 2019 02:33

Rename mkHumanReadablePart to humanReadablePartFromBytes.

0a6957a

Remove unnecessary bech32 prefix from functions within `Bech32.Inte…

ce734e0

…rnal`.

Wrap code to meet coding standards.

5b4d574

jonathanknowles force-pushed the jonathanknowles/bech32-fix branch 2 times, most recently from 1f6b14c to 3e0ab16 Compare May 24, 2019 05:18

jonathanknowles added 2 commits May 24, 2019 05:31

Encode to and decode from Text instead of ByteString.

ec0b86f

Use `Text` as the internal representation for `HumanReadablePart` and `DataPart`. As a side-effect, we can discard the `Read` and `Show` instances for `DataPart`.

jonathanknowles force-pushed the jonathanknowles/bech32-fix branch from 3e0ab16 to d7e9762 Compare May 24, 2019 05:36

jonathanknowles changed the title ~~Fix Bech32 decoder~~ Fix Bech32 decoder + encoder May 24, 2019

jonathanknowles requested a review from KtorZ May 24, 2019 08:19

KtorZ approved these changes May 24, 2019

View reviewed changes

jonathanknowles mentioned this pull request May 24, 2019

Bech32 encoder can be coerced into producing a string that can't be decoded #314

Closed

KtorZ merged commit 055c68e into master May 24, 2019

KtorZ deleted the jonathanknowles/bech32-fix branch May 24, 2019 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Bech32 decoder + encoder #312

Fix Bech32 decoder + encoder #312

jonathanknowles commented May 23, 2019 •

edited

Loading

KtorZ left a comment

KtorZ left a comment

jonathanknowles commented May 24, 2019

KtorZ commented May 24, 2019 •

edited

Loading

Fix Bech32 decoder + encoder #312

Fix Bech32 decoder + encoder #312

Conversation

jonathanknowles commented May 23, 2019 • edited Loading

Issue Number

Overview

KtorZ left a comment

Choose a reason for hiding this comment

KtorZ left a comment

Choose a reason for hiding this comment

jonathanknowles commented May 24, 2019

KtorZ commented May 24, 2019 • edited Loading

jonathanknowles commented May 23, 2019 •

edited

Loading

KtorZ commented May 24, 2019 •

edited

Loading