Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Boost Json #1129

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

SebMilardo
Copy link
Contributor

Issue

#1107

Tasks

  • test performance
  • review

@jcoupey
Copy link
Collaborator

jcoupey commented Jun 19, 2024

I'm really curious to evaluate this change for our use-case. In my experience the main bottleneck for us when it comes to json is parsing huge matrices, either from a file (in case of a custom matrix), or from the network (e.g. osrm-routed output). In this situation parsing matrices will dominate the problem loading time reported in summary.computing_times.loading.

@SebMilardo did you run any tests so far?

@SebMilardo
Copy link
Contributor Author

Not yet, but I'm planning to run some tests this weekend

@jcoupey
Copy link
Collaborator

jcoupey commented Jun 19, 2024

Great! You probably want to test various problem sizes, including instances with several thousands of point to really notice differences. If you're only interested in the loading time and the solving time becomes a pain, you may make things faster by 1. only using TSP instances (the dedicated code scales better) 2. using -l 0 to stop the search prior to any local search.

@SebMilardo
Copy link
Contributor Author

Bad news, I've started testing the parse function in input_parser and rapidjson is always faster than boost::json. I'm using an hand crafted input file with 100000 vehicles and 100000 jobs (a ~20MB .json file) and basically boost::json is faster at parsing the string but slower at accessing the parsed objects which resulted in the parse function being ~25% slower on average. I'm playing around with options, allocators, error checks, etc to make boost::json faster but I also found this FAQ (https://www.boost.org/doc/libs/1_85_0/libs/json/doc/html/json/frequently_asked_questions.html) and this library (https://github.com/simdjson/simdjson). Simdjson seems to be way faster than both rapidjson and boost::json at the cost of creating read only objects. As Vroom uses the Json objects to build its own objects it might be worth a try.

@jcoupey
Copy link
Collaborator

jcoupey commented Jun 25, 2024

Thanks for testing and reporting. So we'd basically trade a "better maintained" project with a somewhat simple user code against a ~25% slowdown on parsing. This may indeed be too high a price, especially since rapidjson (despite it's dev state) has been "just working" the whole time. Happy to get other views on that.

@jcoupey
Copy link
Collaborator

jcoupey commented Jun 25, 2024

Simdjson seems to be way faster than both rapidjson and boost::json at the cost of creating read only objects.

Might be a good option. As you point out: we never modify parsed objects but just read parts of them to populate our own C++ objects. Also the dev around simdjson seems to be quite active. My concern here would be the time spent, as you already invested quite some in order to adjust the whole codebase for boost::json. Do you think it would be easier/faster/possible to start with a quick benchmark outside VROOM?

@SebMilardo
Copy link
Contributor Author

No problem! I think I can integrate simdjson in the parse function and compare the results. Now that I have a general understanding of where .json data is used in the codebase it is just a matter of learning how this new library works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants