Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cast(json_parse(x) as map(varchar, json)) doesnt canonicalize the map json. #24207

Open
kgpai opened this issue Dec 5, 2024 · 3 comments
Open
Assignees
Labels

Comments

@kgpai
Copy link
Contributor

kgpai commented Dec 5, 2024

Description

Consider the below query :

SELECT TRY(CAST(json_parse(c0) AS map(varchar, json))) from (values ('{"m": {"rn": "w", "pl": "4"}}')) t(c0);

Returns:  {m={"rn":"w","pl":"4"}}

Whereas:

SELECT TRY(json_parse(c0) ) from (values ('{"m": {"rn": "w", "pl": "4"}}')) t(c0);

Returns: 
{"m":{"pl":"4","rn":"w"}}

The problem here is that the inner json string {"rn"...} is not sorted. This causes inconsistencies with Prestissimo where it is sorted. Secondly json_parse without the cast canonicalizes the json and has the inner json sorted; so CAST not sorting the inner json seems like a miss. This can cause inconsistencies if we group by json and the json is taken from a cast.

Environment

  • Presto version used: latest

Expected Behavior

Cast should also canonicalize the inner json.

SELECT TRY(CAST(json_parse(c0) AS map(varchar, json))) from (values ('{"m": {"rn": "w", "pl": "4"}}')) t(c0);

Returns:   {m={"pl":"4","rn":"w"}} 

Current Behavior

See description.

Possible Solution

Sort/Canonicalize the inner json.

Context

This causes inconsistencies with Prestissimo/Velox and results in verification noise.

@kgpai
Copy link
Contributor Author

kgpai commented Dec 5, 2024

@tdcmeehan
Copy link
Contributor

Although JSON objects inherently have no concept of sort order, within the Java eval ensuring the keys are sorted is important because they're stored simply as strings, so the sorting is necessary for ordering, hashing, etc.

  1. I feel like, if possible, comparisons between Prestissimo and Presto should try not to factor in the sorted order of the keys of the objects. Velox should not mimic this behavior unless it simplifies the code or is more performant.
  2. However, this is definitely a bug in the Java eval. The first example should definitely ensure that the keys are sorted.

@tdcmeehan tdcmeehan moved this from 🆕 Unprioritized to 📋 Prioritized Backlog in Bugs and support requests Dec 5, 2024
@kgpai
Copy link
Contributor Author

kgpai commented Dec 5, 2024

I feel like, if possible, comparisons between Prestissimo and Presto should try not to factor in the sorted order of the keys of the objects. Velox should not mimic this behavior unless it simplifies the code or is more performant.

It does simplify the code and makes it more performant ; canonicalization makes it possible for us to treat the underlying json as a varchar and thus simplifies group by's and comparisons.

@infvg infvg self-assigned this Dec 10, 2024
infvg added a commit to infvg/presto that referenced this issue Dec 10, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 10, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 10, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 10, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 10, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 11, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 12, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
infvg added a commit to infvg/presto that referenced this issue Dec 12, 2024
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Prioritized Backlog
Development

No branches or pull requests

3 participants