Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator parseJSONMap fails on JSON objects containing long integer values #2310

Open
guidiaz opened this issue Nov 17, 2022 · 10 comments
Open
Labels
breaking change ⚠️ Introduces breaking changes bug 🐜 Something isn't working

Comments

@guidiaz
Copy link
Contributor

guidiaz commented Nov 17, 2022

For instance: {"data": 122637770461000000000}

@guidiaz guidiaz added the bug 🐜 Something isn't working label Nov 17, 2022
@tmpolaczyk
Copy link
Contributor

Alright, so this issue seems to be because of CBOR limitations. When executing an operator like StringParseJsonMap, the data is converted first from JSON to CBOR, and then from CBOR to RadonType:

JSON -> CBOR -> RadonType

A RadonInteger has a limit of -2^127 to 2^127 - 1, but CBOR has a lower limit for the integer size, from -2^64 to 2^64 - 1. So the conversion results in an error:

Encode { from: "cbor::value::Value", to: "RadonMap" }

That issue can be fixed by skipping the intermediate CBOR conversion, but then a bigger problem arises: RadonTypes are serialized as CBOR when used in the network protocol. Therefore, as soon as a RadonInteger reaches 2^64, it will be an error to serialize it. For example, this is an error that is seen in the node when a data request uses the IntegerMultiply operator to convert an input of [1] into an integer greater than 2^64 (1 * 1000000000000000 * 100000):

Couldn't decode tally value from bytes: Failed to encode RadonInteger into Vec<u8>

(the error says tally value but this is the result of the aggregation phase)

In that case the data request resolves with 0 commits, because it is not possible to serialize the result. Sheikah doesn't detect that error when using "try data request", so I guess the toolkit will not notice this error as well.

So possible solutions for this general integer serialization issue:

  • reduce the range of valid RadonIntegers to the same as valid CBOR integers
  • serialize big integers as CBOR floats
  • serialize big integers using some CBOR extension

I believe all of them are breaking changes.

@guidiaz
Copy link
Contributor Author

guidiaz commented Nov 18, 2022

On StringParseJsonArray, StringParseJsonMap and StringParseXmlMap, I would suggest to:

  • serialize big integers (>64 bits) to RadonFloats, so "absolute", "greaterThan", "lessThan", "modulo", "multiply", "negate" and "power" math operators could still be used.

@guidiaz
Copy link
Contributor Author

guidiaz commented Nov 18, 2022

In that case the data request resolves with 0 commits, because it is not possible to serialize the result. Sheikah doesn't detect that error when using "try data request", so I guess the toolkit will not notice this error as well.

@tmpolaczyk Shouldn't the node commit something like an "aggregation overflow error" instead?

As a matter of fact, there's a specific enum value for this situation defined in the Witnet.sol library.

@tmpolaczyk
Copy link
Contributor

@tmpolaczyk Shouldn't the node commit something like an "aggregation overflow error" instead?

As a matter of fact, there's a specific enum value for this situation defined in the Witnet.sol library.

Yeah, it should, but this error has never happened before so the current behavior is to not commit anything. Basically here instead of returning an err we should return an ok with serialization of a dedicated radon error like RadError::Encode:

log::error!("Couldn't decode tally value from bytes: {}", e);

@guidiaz
Copy link
Contributor Author

guidiaz commented Nov 18, 2022

Would it make sense to rephrase the log error message to "Couldn't encode value from < whatever > into < RadonType >: {}" ?

@tmpolaczyk
Copy link
Contributor

Would it make sense to rephrase the log error message to "Couldn't encode value from < whatever > to < RadonType >: {}" ?

Yes, that log is wrong.

@tmpolaczyk
Copy link
Contributor

Opened #2312 to track the issue of nodes not commiting anything in case of encode error.

@tmpolaczyk
Copy link
Contributor

The CBOR RFC seems to indicate that big integers can be supported by representing them as bytes or strings:

https://www.rfc-editor.org/rfc/rfc8949.html#name-bignums

But not sure if this solution works well for our case, because:

  • Depending on the context, the bytes should be interpreted as ether bytes or as a big integer. So radon operators such as ArrayReduce(AverageMean) would need to implicitly convert any bytes to integer, and maybe in some cases that is not desired.
  • Negative integers are supposed to be represented by strings, but not all integers are valid utf8 strings. The two CBOR libraries that we use (cbor and serde_cbor) do not allow non-utf8 strings. So we would need to use a different CBOR library that allows using non-utf8 strings.

@aesedepece
Copy link
Member

* Negative integers are supposed to be represented by strings, but not all integers are valid utf8 strings. The two CBOR libraries that we use (cbor and serde_cbor) do not allow non-utf8 strings. So we would need to use a different CBOR library that allows using non-utf8 strings.

Now I'm curious of which integer numbers cannot be formatted as strings 🤔

@tmpolaczyk
Copy link
Contributor

Now I'm curious of which integer numbers cannot be formatted as strings thinking

I guess most of them, for example -100000000000000000000 is encoded as

C3                       # tag(3)
   49                    # bytes(9)
      056BC75E2D630FFFFF # "\u0005k\xC7^-c\u000F\xFF\xFF"

And "\u0005k\xC7^-c\u000F\xFF\xFF" is not valid UTF8, so we wouldn't be able to parse it with the current libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ⚠️ Introduces breaking changes bug 🐜 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants