Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bech32 encoder can be coerced into producing a string that can't be decoded #314

Closed
jonathanknowles opened this issue May 24, 2019 · 3 comments
Assignees

Comments

@jonathanknowles
Copy link
Member

jonathanknowles commented May 24, 2019

Context

When encode is applied to a HumanReadablePart containing one or more upper-case characters, it results in an invalid Bech32 string that cannot be decoded with the decode function.

Steps to Reproduce

$ stack ghci bech32
> lower = fromRight undefined $ mkHumanReadablePart "test"
> upper = fromRight undefined $ mkHumanReadablePart "TEST"

Expected behaviour

> encode lower mempty
Right "test12hrzfj"
> encode upper mempty
Right "test12hrzfj" -- SAME

# Should give the same output:
> decode $ fromRight undefined $ encode lower mempty
Right (HumanReadablePart "test","")
> decode $ fromRight undefined $ encode upper mempty
Right (HumanReadablePart "test","") -- SUCCESS

Actual behaviour

# Should give the same output:
> encode lower mempty
Right "test12hrzfj"
> encode upper mempty
Right "test13jgcyw" -- DIFFERENT

# Should give the same output:
> decode $ fromRight undefined $ encode lower mempty
Right (HumanReadablePart "test","")
> decode $ fromRight undefined $ encode upper mempty
Left (StringToDecodeContainsInvalidChars []) -- FAILURE

Cause

This appears to be caused by the encode function. Internally, it calculates a checksum based on the supplied HumanReadablePart, which can be upper or lower case (or mixed case). Unfortunately, the checksum calculation is case-sensitive. Later on, the human-readable part is converted to lower case before inclusion in the final output. If the original HumanReadablePart contained one or more upper case characters, the checksum will be inconsistent with the human readable part of the final output.

-- | Encode a human-readable string and data payload into a Bech32 string.      
encode :: HumanReadablePart -> ByteString -> Either EncodingError ByteString    
encode hrp@(HumanReadablePart hrpBytes) payload = do    
    let payload5 = toBase32 (BS.unpack payload)
    let payload' = payload5 ++ bech32CreateChecksum hrp payload5    
    let rest = map (word5ToChar Arr.!) payload'
    let output = B8.map toLower hrpBytes <> B8.pack "1" <> B8.pack rest   
    guardE (BS.length output <= encodedStringMaxLength) EncodedStringTooLong    
    return output

Why this is definitely a bug

The Bech32 specification states:

"The lowercase form is used when determining a character's value for checksum purposes."

Also, comparing with another official (JavaScript) implementation bitcoinjs/bech32

We have the following behavior:

> bech32.encode('test', [])
'test12hrzfj'
> bech32.encode('Test', [])
'test12hrzfj'
> bech32.encode('TEST', [])
'test12hrzfj'
> bech32.encode('TEsT', [])
'test12hrzfj'
> bech32.encode('TesT', [])
'test12hrzfj'

> bech32.decode("test12hrzfj")
{ prefix: 'test', words: [] }
> bech32.decode("test13jgcyw")
Error: Invalid checksum for test13jgcyw

which is quite aligned with what we now expect and suggest the existence of a bug in the haskell implementation.

Addendum

This bug appears to also affect the Haskell reference implementation for Bech32.

QA

  • Added an extra regression test to highlight the bug described above: Codec/Binary/Bech32Spec

  • Also adjusted our arbitrary generator to produce uppercased and lowercased human-readable-part and have roundtrip properties check for both (was only producing lowercased HRP before).

  • Behavior is now aligned with another (more maintained) reference implementation a.k.a bitcoinjs/bech32

@jonathanknowles jonathanknowles changed the title Bech32 encode function can be coerced into producing a string that can't be decoded with decode. Bech32 encoder can be coerced into producing a string that can't be decoded May 24, 2019
@jonathanknowles jonathanknowles self-assigned this May 24, 2019
@jonathanknowles
Copy link
Member Author

Raised this issue as a bug against the Haskell Bech32 reference implementation here: sipa/bech32#49

@jonathanknowles
Copy link
Member Author

jonathanknowles commented May 24, 2019

Fix available in #312

@KtorZ KtorZ added this to the Bugs - Sprint 19-20 milestone May 24, 2019
@piotr-iohk
Copy link
Contributor

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants