Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to decode non-ISO 646 values sent in IA5Strings #191

Open
mikcox opened this issue Mar 13, 2020 · 1 comment
Open

Unable to decode non-ISO 646 values sent in IA5Strings #191

mikcox opened this issue Mar 13, 2020 · 1 comment

Comments

@mikcox
Copy link

mikcox commented Mar 13, 2020

Hello!

First off, thank you SO MUCH for this package. It's been immensely useful!

I happened to notice an oddity when trying to decode messages from a device that's sending me payloads that include non-ISO 646 characters in fields that are supposed to be IA5Strings. Namely, I'm trying to decode the following payload:

b'0U\x7fNE0C\x02\x01{\x16\x0476PK\x04\x06(\x11\xa5\xdb\xc4\xee\x01\x01\x00\x02\x01\xff\n\x01\x01\x04\x03\x00\x00\x00\xa0\x1d\x16\x1bLE-Mark\xe2\x80\x99s Bose Headphones\t\x03\xc0\x02\x110\x0b\x02\x04^k\xec\xc4\x02\x03\x0b1#'

with an ASN.1 spec that's something like:

class MacAddress(univ.OctetString):
    pass


class BluetoothDetect(univ.Sequence):
    pass


BluetoothDetect.tagSet = univ.Sequence.tagSet.tagExplicitly(tag.Tag(tag.tagClassApplication, tag.tagFormatConstructed, 78))
BluetoothDetect.componentType = namedtype.NamedTypes(
    namedtype.NamedType('scanID', univ.Integer()),
    namedtype.NamedType('sensorID', char.IA5String()),
    namedtype.NamedType('macAddress', MacAddress()),
    namedtype.NamedType('reservedLAP', univ.Boolean()),
    namedtype.OptionalNamedType('btClassicChannel', univ.Integer()),
    namedtype.NamedType('detectType', univ.Enumerated(namedValues=namedval.NamedValues(('btClassic', 0), ('ble', 1), ('btClassicPassive', 2)))),
    namedtype.NamedType('btClassicHeader', univ.OctetString()),
    namedtype.OptionalNamedType('deviceName', char.IA5String().subtype(explicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 0))),
    namedtype.OptionalNamedType('manufacturerName', char.IA5String().subtype(explicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 1))),
    namedtype.NamedType('rssi', univ.Real())
)

After investigating the payload, I have a hunch that the issue is related to that pesky non-ISO 646 "fancy single quote" / backtick-like character in the deviceName field (represented in the payload as \xe2\x80\x99). When I remove that character, it decodes using the above ASN.1 spec without a problem.

Is there any way we can get all decoders in this package to include some sort of character replacement like the errors option in the built-in python str.decode('utf-8', errors='replace')? It's annoying to lose an entire payload when we hit a character like this.

Alternatively, do you have any suggestions for how to do my own replacement on any non-IA5String characters in the payload before I send it to the decoder?

Thanks in advance!

@mikcox
Copy link
Author

mikcox commented Mar 25, 2020

I did another loop back at this problem recently and wanted to post an update on what I've learned:

The character in question was a "Right Single Quotation Mark" (ASCII decimal 146). I've confirmed that the device that I'm pulling from is NOT actually technically to specification, since it's sending ASCII values above 128 in IA5Strings.

That said, I don't have control over that device, and it'd be awesome if we could add some simple error catching like this in the IA5String parser.

In the meantime, I'm pre-sanitizing my payloads of a few of the common ASCII characters above 128 that people might use:

# Given a string of bytes, replace a handful of common ASCII values above 128 with similar characters
# that have ASCII values below 128
def replace_non_iso(data: bytes) -> bytes:
    patterns = [
        [b'\xe2\x80\x99', b"'\x00\x00"],
        [b'\xe2\x80\x9c', b'"\x00\x00'],
        [b'\xe2\x80\x9d', b'"\x00\x00'],
        [b'\xe2\x80\x9e', b'"\x00\x00'],
        [b'\xe2\x80\x9f', b'"\x00\x00'],
        [b'\xc3\xa9', b'e\x00'],
        [b'\xe2\x80\x9c', b'"\x00\x00'],
        [b'\xe2\x80\x93', b'-\x00\x00'],
        [b'\xe2\x80\x92', b'-\x00\x00'],
        [b'\xe2\x80\x94', b'-\x00\x00'],
        [b'\xe2\x80\x94', b'-\x00\x00'],
        [b'\xe2\x80\x98', b"'\x00\x00"],
        [b'\xe2\x80\x9b', b"'\x00\x00"],
        [b'\xe2\x80\x90', b'-\x00\x00'],
        [b'\xe2\x80\x91', b'-\x00\x00'],
        [b'\xe2\x80\xb2', b"'\x00\x00"],
        [b'\xe2\x80\xb3', b"'\x00\x00"],
        [b'\xe2\x80\xb4', b"'\x00\x00"],
        [b'\xe2\x80\xb5', b"'\x00\x00"],
        [b'\xe2\x80\xb6', b"'\x00\x00"],
        [b'\xe2\x80\xb7', b"'\x00\x00"],
        [b'\xe2\x81\xba', b"+\x00\x00"],
        [b'\xe2\x81\xbb', b"-\x00\x00"],
        [b'\xe2\x81\xbc', b"=\x00\x00"],
        [b'\xe2\x81\xbd', b"(\x00\x00"],
        [b'\xe2\x81\xbe', b")\x00\x00"]
    ]
    for pattern in patterns:
        data = data.replace(pattern[0], pattern[1])

    return data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant