Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using IndexMap: json order is random after a 33 keys in map #378

Open
ritchie46 opened this issue Jul 7, 2024 · 5 comments
Open

Comments

@ritchie46
Copy link
Contributor

This python snippet wouldn't round trip if simd-json was used:

import json

obj = {c: None for c in "abcdefghijklmnopqrstuvwxyz1234567"}
s = json.dumps(obj)
assert json.loads(s) == obj

This is because Object uses a hashmap which doesn't preserve the order of the json string keys. This can be resolved by using
I would like to consider using indexmap, which has excellent performance, AND guarantees index order.

This issue was discovered in: pola-rs/polars#17439

If this is accepted, I am willing to make a PR.

@Licenser
Copy link
Member

Licenser commented Jul 7, 2024

Hi ritchie,

by specification json objects are unordered sets but that also doesn't mean it has to be unordered. the order for the first 32 keys is an implementation detail. We use halfbrown that uses a simple vector for up to 32 elements which makes small objects, which are frequent with json, extremely fast and small (memory wise).

That's just for background :). Generally if we can make it optional and preserve the current behaviour I'm in, giving people more options is nice!

A feature to watch out for is known-key, it is quite powerful when using lookups of the same key repeatedly and it'd be good to preserve it

I could see a few ways to make this work:

  1. a feature flag, probably the easiest but there is some chance of conflicts with other flags
  2. a feature flag in halfbrown, same as above but it limits the interface change to the library for hashmaps that simd-json already uses
  3. make the map type a generic - this will require more work and generally I like it but OTOH I'm not sure how well that would play out or how much work it would be
  4. same as abovbe but inside halfbrown

@ritchie46
Copy link
Contributor Author

Ah, given that you also maintain halfbrown, maybe doing that upstream is nicest. Would a halfbrown based on IndexMap be worth it?

For me a feature flag would be fine. So that would be option 2. 👀

@Licenser
Copy link
Member

Licenser commented Jul 7, 2024

perfect, sounds like we got a plan then :)!

Also #377 might be of interest for you, it keeps order as long as no mutations are done already (and it should "just work" with index map as well)

@ritchie46
Copy link
Contributor Author

Great! Thanks for being receptive to this. :)

For #377, I need a little bit more context. 😅 What does it do? What is the semantic difference between a borrowed value and a tape?

@Licenser
Copy link
Member

Licenser commented Jul 7, 2024

the borrowed value is a fully mutable DOM equivalent to serde_json::Value; the tape value is a flat structire (a single vec) that represents the JSON document, since it has no nesting and doesn't require extra allocations, inserts, etc it is significantly faster.

the downside is that the tape is not easily mutable; the lazy value bridges the gab between the two :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants