Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating dropping rapidjson #1107

Open
jcoupey opened this issue Apr 29, 2024 · 7 comments
Open

Evaluating dropping rapidjson #1107

jcoupey opened this issue Apr 29, 2024 · 7 comments

Comments

@jcoupey
Copy link
Collaborator

jcoupey commented Apr 29, 2024

Boost has its own JSON library, and I've heard that it's easier and more intuitive to use than rapidjson. Switching may not harm performance, provided we trust their benchmarks.

Add to this the fact that rapidjson is painfully unmaintained unversioned, maybe we should consider switching. Of course we'd have to evaluate that in the light of our use-case. The most demanding part for us is probably the parsing of huge matrices spitted out by routing engines.

@SebMilardo
Copy link
Contributor

Hi! Could you assign this issue to me?

@SebMilardo
Copy link
Contributor

SebMilardo commented Jun 17, 2024

The new library seems to be working. However, there are some changes that I'd like to discuss. According to https://stackoverflow.com/questions/68105736/boost-json-serialization-format-boost-1-76-0 the Boost Json library outputs doubles in scientific notation. E.g.:

./vroom -i ../docs/example_1.json -r osrm -a car:routing.openstreetmap.de/routed-car/ -p car:443 
{"code":0,"summary":{"cost":21703,"routes":2,"unassigned":1,"delivery":[3],"amount":[3],"pickup":[2],"setup":0,"service":1800,"duration":21703,"waiting_time":0,"priority":0,"violations":[],"computing_times":{"loading":431,"solving":19,"routing":0}},"unassigned":[{"id":6,"location":[2.89357E0,4.890736E1],"type":"job"}],"routes":[{"vehicle":1,"cost":9357,"delivery":[2],"amount":[2],"pickup":[1],"setup":0,"service":900,"duration":9357,"waiting_time":0,"priority":0,"steps":[{"type":"start","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[2],"arrival":1600416020,"duration":0,"violations":[]},{"type":"job","location":[1.98935E0,4.8701E1],"id":1,"setup":0,"service":300,"waiting_time":0,"job":1,"load":[1],"arrival":1600419600,"duration":3580,"violations":[]},{"type":"job","location":[2.03655E0,4.861128E1],"id":2,"setup":0,"service":300,"waiting_time":0,"job":2,"load":[2],"arrival":1600421179,"duration":4859,"violations":[]},{"type":"job","location":[2.28325E0,4.85958E1],"id":5,"setup":0,"service":300,"waiting_time":0,"job":5,"load":[1],"arrival":1600423393,"duration":6773,"violations":[]},{"type":"end","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[1],"arrival":1600426277,"duration":9357,"violations":[]}],"violations":[]},{"vehicle":2,"cost":12346,"delivery":[1],"amount":[1],"pickup":[1],"setup":0,"service":900,"duration":12346,"waiting_time":0,"priority":0,"steps":[{"type":"start","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[0],"arrival":1600416000,"duration":0,"violations":[]},{"type":"pickup","location":[2.41808E0,4.922619E1],"id":4,"setup":0,"service":300,"waiting_time":0,"job":4,"load":[1],"arrival":1600421846,"duration":5846,"violations":[]},{"type":"delivery","location":[2.39719E0,4.907611E1],"id":3,"setup":0,"service":300,"waiting_time":0,"job":3,"load":[0],"arrival":1600424119,"duration":7819,"violations":[]},{"type":"break","id":2,"setup":0,"service":300,"waiting_time":0,"load":[0],"arrival":1600424419,"duration":7819,"violations":[]},{"type":"end","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[0],"arrival":1600429246,"duration":12346,"violations":[]}],"violations":[]}]} 

This should "not" be a problem as the JSON standard allows scientific notation, but I'm not sure if this can cause problems. Otherwise,I have to add a new function similarly to the one in https://www.boost.org/doc/libs/1_85_0/libs/json/doc/html/json/examples.html and manage doubles as before.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Jun 19, 2024

Thanks for your work on the PR!

Even if using scientific notation is valid in term of standard, I would not be surprised if quite a few users would choke on the output change. So yes I think we should aim at keeping the same "usual" notation as before. Especially if this is about coordinates, I don't think I've ever seen any service/file format using scientific notation for lon, lat.

@jcoupey
Copy link
Collaborator Author

jcoupey commented Jun 19, 2024

One nice thing about rapidjson is that it provides in output the exact same coordinates than what users provided in input. It may sound trivial but it's not obvious to me because the input coordinates are parsed, stored with floating point precision, then serialized again to output. I wonder how Boost.JSON behaves in that regard.

@jcoupey jcoupey changed the title Evaluating using Boost.JSON Evaluating dropping rapidjson Jun 26, 2024
@jcoupey
Copy link
Collaborator Author

jcoupey commented Jun 26, 2024

Just updated the title to broaden the scope here as trying simdjson has been mentioned in #1129.

@SebMilardo
Copy link
Contributor

Results using a large (124MB) input file and the input_parser parse function (2 warmup runs + 10 runs using hyperfine "./vroom -i ../docs/example_7.json -l 1" -i --warmup 2):

rapidjson:

  Time (mean ± σ):      1.730 s ±  0.009 s    [User: 0.977 s, System: 0.743 s]
  Range (min … max):    1.717 s …  1.746 s    10 runs

boost json: parsing times are similar to rapidjson but reading/contains/get fields seem to be slower. I could try to directly access the fields and in case catch the exceptions to see if results are faster.

  Time (mean ± σ):      3.198 s ±  0.034 s    [User: 1.401 s, System: 1.782 s]
  Range (min … max):    3.162 s …  3.263 s    10 runs

simdjson: it is actually faster but it lacks a function to build JSON strings (which should be under development. See: simdjson/simdjson#1386). BTW I am not a fan of how simdjson parses files. Because it heavily uses iterators and - as the docs suggests: "doing a single pass over the input" - I ended up writing code that look like this which is not immediately readable.

  Time (mean ± σ):      1.354 s ±  0.016 s    [User: 0.698 s, System: 0.647 s]
  Range (min … max):    1.340 s …  1.392 s    10 runs

I will test a little bit more boost json and if it does not improve I would suggest to pause/close this issue and reopen it when the simdjson JSON builder is complete as I don't think it is worth to add an additional dependency just for reading Json. @jcoupey what do you think?

@jcoupey
Copy link
Collaborator Author

jcoupey commented Jul 17, 2024

@SebMilardo thanks again for the ongoing work on that! This last piece of feedback seems to confirm that 1. boost json is probably not what we want to switch to 2. simdjson is an interesting option in the long run but not a drop-in replacement for rapidjson currently.

As you suggest, I would keep this ticket open for reference, maybe closing #1129 if your further tests do no improve results. Additionally we could maybe keep an open draft PR with the simdjson work for reference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants