Evaluating dropping rapidjson #1107

jcoupey · 2024-04-29T14:24:12Z

Boost has its own JSON library, and I've heard that it's easier and more intuitive to use than rapidjson. Switching may not harm performance, provided we trust their benchmarks.

Add to this the fact that rapidjson is painfully ~~unmaintained~~ unversioned, maybe we should consider switching. Of course we'd have to evaluate that in the light of our use-case. The most demanding part for us is probably the parsing of huge matrices spitted out by routing engines.

SebMilardo · 2024-05-24T08:48:00Z

Hi! Could you assign this issue to me?

SebMilardo · 2024-06-17T14:41:27Z

The new library seems to be working. However, there are some changes that I'd like to discuss. According to https://stackoverflow.com/questions/68105736/boost-json-serialization-format-boost-1-76-0 the Boost Json library outputs doubles in scientific notation. E.g.:

./vroom -i ../docs/example_1.json -r osrm -a car:routing.openstreetmap.de/routed-car/ -p car:443 
{"code":0,"summary":{"cost":21703,"routes":2,"unassigned":1,"delivery":[3],"amount":[3],"pickup":[2],"setup":0,"service":1800,"duration":21703,"waiting_time":0,"priority":0,"violations":[],"computing_times":{"loading":431,"solving":19,"routing":0}},"unassigned":[{"id":6,"location":[2.89357E0,4.890736E1],"type":"job"}],"routes":[{"vehicle":1,"cost":9357,"delivery":[2],"amount":[2],"pickup":[1],"setup":0,"service":900,"duration":9357,"waiting_time":0,"priority":0,"steps":[{"type":"start","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[2],"arrival":1600416020,"duration":0,"violations":[]},{"type":"job","location":[1.98935E0,4.8701E1],"id":1,"setup":0,"service":300,"waiting_time":0,"job":1,"load":[1],"arrival":1600419600,"duration":3580,"violations":[]},{"type":"job","location":[2.03655E0,4.861128E1],"id":2,"setup":0,"service":300,"waiting_time":0,"job":2,"load":[2],"arrival":1600421179,"duration":4859,"violations":[]},{"type":"job","location":[2.28325E0,4.85958E1],"id":5,"setup":0,"service":300,"waiting_time":0,"job":5,"load":[1],"arrival":1600423393,"duration":6773,"violations":[]},{"type":"end","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[1],"arrival":1600426277,"duration":9357,"violations":[]}],"violations":[]},{"vehicle":2,"cost":12346,"delivery":[1],"amount":[1],"pickup":[1],"setup":0,"service":900,"duration":12346,"waiting_time":0,"priority":0,"steps":[{"type":"start","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[0],"arrival":1600416000,"duration":0,"violations":[]},{"type":"pickup","location":[2.41808E0,4.922619E1],"id":4,"setup":0,"service":300,"waiting_time":0,"job":4,"load":[1],"arrival":1600421846,"duration":5846,"violations":[]},{"type":"delivery","location":[2.39719E0,4.907611E1],"id":3,"setup":0,"service":300,"waiting_time":0,"job":3,"load":[0],"arrival":1600424119,"duration":7819,"violations":[]},{"type":"break","id":2,"setup":0,"service":300,"waiting_time":0,"load":[0],"arrival":1600424419,"duration":7819,"violations":[]},{"type":"end","location":[2.35044E0,4.871764E1],"setup":0,"service":0,"waiting_time":0,"load":[0],"arrival":1600429246,"duration":12346,"violations":[]}],"violations":[]}]}

This should "not" be a problem as the JSON standard allows scientific notation, but I'm not sure if this can cause problems. Otherwise,I have to add a new function similarly to the one in https://www.boost.org/doc/libs/1_85_0/libs/json/doc/html/json/examples.html and manage doubles as before.

jcoupey · 2024-06-19T09:32:42Z

Thanks for your work on the PR!

Even if using scientific notation is valid in term of standard, I would not be surprised if quite a few users would choke on the output change. So yes I think we should aim at keeping the same "usual" notation as before. Especially if this is about coordinates, I don't think I've ever seen any service/file format using scientific notation for lon, lat.

jcoupey · 2024-06-19T09:34:59Z

One nice thing about rapidjson is that it provides in output the exact same coordinates than what users provided in input. It may sound trivial but it's not obvious to me because the input coordinates are parsed, stored with floating point precision, then serialized again to output. I wonder how Boost.JSON behaves in that regard.

jcoupey · 2024-06-26T10:05:33Z

Just updated the title to broaden the scope here as trying simdjson has been mentioned in #1129.

SebMilardo · 2024-07-17T11:17:53Z

Results using a large (124MB) input file and the input_parser parse function (2 warmup runs + 10 runs using hyperfine "./vroom -i ../docs/example_7.json -l 1" -i --warmup 2):

rapidjson:

  Time (mean ± σ):      1.730 s ±  0.009 s    [User: 0.977 s, System: 0.743 s]
  Range (min … max):    1.717 s …  1.746 s    10 runs

boost json: parsing times are similar to rapidjson but reading/contains/get fields seem to be slower. I could try to directly access the fields and in case catch the exceptions to see if results are faster.

  Time (mean ± σ):      3.198 s ±  0.034 s    [User: 1.401 s, System: 1.782 s]
  Range (min … max):    3.162 s …  3.263 s    10 runs

simdjson: it is actually faster but it lacks a function to build JSON strings (which should be under development. See: simdjson/simdjson#1386). BTW I am not a fan of how simdjson parses files. Because it heavily uses iterators and - as the docs suggests: "doing a single pass over the input" - I ended up writing code that look like this which is not immediately readable.

  Time (mean ± σ):      1.354 s ±  0.016 s    [User: 0.698 s, System: 0.647 s]
  Range (min … max):    1.340 s …  1.392 s    10 runs

I will test a little bit more boost json and if it does not improve I would suggest to pause/close this issue and reopen it when the simdjson JSON builder is complete as I don't think it is worth to add an additional dependency just for reading Json. @jcoupey what do you think?

jcoupey · 2024-07-17T14:27:40Z

@SebMilardo thanks again for the ongoing work on that! This last piece of feedback seems to confirm that 1. boost json is probably not what we want to switch to 2. simdjson is an interesting option in the long run but not a drop-in replacement for rapidjson currently.

As you suggest, I would keep this ticket open for reference, maybe closing #1129 if your further tests do no improve results. Additionally we could maybe keep an open draft PR with the simdjson work for reference?

jcoupey added refactor dependency labels Apr 29, 2024

jcoupey assigned SebMilardo May 24, 2024

SebMilardo mentioned this issue Jun 17, 2024

Use Boost Json #1129

Draft

2 tasks

jcoupey changed the title ~~Evaluating using Boost.JSON~~ Evaluating dropping rapidjson Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating dropping rapidjson #1107

Evaluating dropping rapidjson #1107

jcoupey commented Apr 29, 2024

SebMilardo commented May 24, 2024

SebMilardo commented Jun 17, 2024 •

edited

Loading

jcoupey commented Jun 19, 2024

jcoupey commented Jun 19, 2024

jcoupey commented Jun 26, 2024

SebMilardo commented Jul 17, 2024

jcoupey commented Jul 17, 2024

Evaluating dropping rapidjson #1107

Evaluating dropping rapidjson #1107

Comments

jcoupey commented Apr 29, 2024

SebMilardo commented May 24, 2024

SebMilardo commented Jun 17, 2024 • edited Loading

jcoupey commented Jun 19, 2024

jcoupey commented Jun 19, 2024

jcoupey commented Jun 26, 2024

SebMilardo commented Jul 17, 2024

jcoupey commented Jul 17, 2024

SebMilardo commented Jun 17, 2024 •

edited

Loading