-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node: Processor multithreading (simple) #3166
Conversation
3bb80c8
to
b1acab8
Compare
85a818f
to
b734f68
Compare
The following graphs were generated by listening to mainnet gossip and mainnet pythnet, using 24 cores. These metrics were generated using PR #3210 . The charts on the left measure the delay from when P2P puts the observation on the channel until the processor pulls it off. The charts on the right measure the time from when P2P enqueues the observation and the processor finishes processing it. The X-axis is microseconds delay, and the Y-axis is percentage of the time. Remember that Prometheus histograms are cumulative. The three pairs of charts are:
|
6cea7bb
to
4131970
Compare
Here are some charts comparing the original single threaded processor with three versions of multithreading. The data was generated by listening to mainnet pythnet and mainnet gossip.
It is interesting that the three versions of multithreading look very similar. However, even the simple multithreading handled almost all observations in under 750 microseconds, whereas the existing single threaded approach took between five and ten milliseconds to reach that point. So it seems like mulithreading is helpful, but even the simple approach may be "good enough". |
Some further data. I ran the benchmarks for the various configurations. For each config, I ran it three times with the total time for each run posted below.
So it seems like multithreading doesn't have much impact on the benchmarks. |
1406630
to
f3a7b24
Compare
Although this PR does not have as big of an impact as we had hoped, it is an improvement, and it positions us to make additional improvements going forward. Therefore I am marking it ready for review. |
116dca1
to
a9b1e20
Compare
Catching panics or not discussion:We discussed a bit in the code comments, but starting this higher-level comment so the discussion doesn't get buried in the code comments. First of all, errors should be caught and handled appropriately, nobody would say anything else. When it comes to panics, I think we have to differentiate
In the case of watchers, it was agreed (#2187) that panics should be caught generically and watchers always just restarted. Because watchers are pretty self-contained, restarting them isn't very dangerous and because they're the most panic-prone components, it improves reliability a lot. But when it comes to the processor, I don't think that there should be attempts to recover from unexpected panics because the processor has more potential to end up in an inconsistent state. Here is what I think this could look like: #3267 Workers don't return until their context is canceled. Errors are simply logged. If there was an error in the worker that is not possible to handle, it should instead be a panic. What do you think? |
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
981da16
to
9d3eb96
Compare
It is desirable that the multithreaded processor handle errors / panics in the same way as the current processor, meaning that panics kill the guardian and errors restart the entire processor. It is also desirable to use the |
82ff5d3
to
a06d2af
Compare
a06d2af
to
5df2ef7
Compare
Hey, what's the state of this PR? It has conflicts. |
c23c6c3
to
996b2a0
Compare
Change-Id: I72e56c106c8a275c54af6cb073aa16a5c7d75fbe
996b2a0
to
a281c2e
Compare
No description provided.