Parallelize Execution #8134
Replies: 17 comments 41 replies
-
Great use of the new discussions feature and thanks for whipping this up to get the ball rolling! As we all know, ABCI and mempool changes to Tendermint and thus the Cosmos SDK are coming right around the corner. Specifically, We need to make sure that write operations to the underlying merkle-ized tree are not order-dependent (I don't think they are with IAVL?). Once we can safely make this assumption, we should ensure any changes we make to message handlers are done generically enough such that they still make sense and are easy to understand even if they're not executed in parallel. |
Beta Was this translation helpful? Give feedback.
-
I raised this with Jae back in 2017. The iavl hash is dependent on insert order. There is no way to change that with a btree like algorithm. The only efficient merklized data structure i know of is a Trie, as in Ethereum. That hash is stable wrt insert order. However, it doesn't support range queries unless you store the keys unhashed, allowing arbitrary depth. And the proofs are notably bigger. If this is important, I would look at Eth 2 Sparse Merkle Trie to start. |
Beta Was this translation helpful? Give feedback.
-
I think pre-processing of the tx shell could be parallelized somehow, but note that the sig verifier does modify the state for that account, and you need to ensure that the txs ante handing is sequential for the same account (this could be done with a bit of careful logic). Following that is fee collection, which always increases the same account balance (fee collector) and I see no way of parallelizing that without a major architecture change. |
Beta Was this translation helpful? Give feedback.
-
That said, I think supporting some level of optimistic parallel execution would be useful, it just must be done very carefully to work at all, and by some people with deep db knowledge to be bulletproof |
Beta Was this translation helpful? Give feedback.
-
One way to approach Parallel Execution is to use something like Solana Sealevel. In a nutshell:
|
Beta Was this translation helpful? Give feedback.
-
I'm not clear on the goal of this issue.
I personally view (1) as the best thing to focus on for now, and within it parallelizing auth and bank. I don't think we best know how to abstract in the style of (3) yet. (Or at least I definitely don't) (Also its v cool that threads got added to github issues) |
Beta Was this translation helpful? Give feedback.
-
BTW: Cardano has a great research on how to parallelize smart-contract execution. In a nutshell, they extend the UTXO model to provide more smart-contract capabilities. |
Beta Was this translation helpful? Give feedback.
-
First step should be a profiling strategy and set of core benchmarks. |
Beta Was this translation helpful? Give feedback.
-
I agree with some hard coded fixes for auth/fees so the ante handlers can prep the next tx when the previous is running. Ideally any changes we can make that are opt in and non invasive are nice to add to the current sdk. For a design like Solana, one should build a new sdk. It is a radically different design and you cannot force everyone using the current sdk to change to such architecture. |
Beta Was this translation helpful? Give feedback.
-
Separate logical stores is actually how the SDK is currently designed and I'm actually not in favor of it (see #6370). I believe there should be a single logical store with proper namespaces per module/domain. If you have a single logical store that is insertion-order independent, than you still get the ability to execute writes in parallel. |
Beta Was this translation helpful? Give feedback.
-
Logical store = iavl, datastrucures, etc..
Why? Can you elaborate on the last part? Provide examples or references. In typical applications, you have a single logical store separated by some domain namespace. This also allows you to perform atomic operations. Not to say this can't be done with multiple logical stores, but it's more difficult. |
Beta Was this translation helpful? Give feedback.
-
A lot of really interesting ideas here. I just want to first try to get a sense of how important it is for us to start designing this now. Are we seeing current scaling issues? Or are we looking at this from a competitive lens and seeing performant solutions like Solana and wanting to stay in the race? If addressing this sooner rather than later is important to secure Cosmos's relevance in the space, I'm happy to start considering approaches. I wonder if someone like @zmanian or @jackzampolin might be able to comment on this a bit... I have tended agree with @ValarDragon in that the first thing we should focus on is likely parallelizing computationally intensive parts of the SDK rather than a full on parallel execution model. Although, on the other hand, getting a parallel execution model like Sealevel right is going to require lots of design and planning, and if we're going in that direction we should consider it while we're thinking about other things like #7539 and #7459. So anyway, please let me know your thoughts regarding priority and roadmap. |
Beta Was this translation helpful? Give feedback.
-
Regarding a parallel execution model, I want to share my current thoughts on how a module API for that might look. There are obviously lots of things we'd need to think of in terms of the storage layer and scheduler so that everything gets run deterministically, but I think it's useful to start by thinking about what would be needed at the module level. Synchronous R/W SchedulingMethods would communicate a list of store keys or key prefixes which they would need synchronous R/W access to before the method is invoked. This list of keys would be the input for a batching algorithm that parallelizes methods based on non-overlapping synchronous store access. Let’s say we add a method So the To make the dev UX a bit better, the actual We could add support for parallelism incrementally in that if a method doesn’t conform to the parallel API, it would get run sequentially in a separate phase from parallelizable methods. Full Read Access for Last Block's StateParallelizable methods would have full read access to their own state and any other module's state at the end of the last block. In some cases, this is good enough and we could define it as the expected behavior. For instance, I think it's fine to say for something like voting, that the weight is based on the stake as of the end of the last block. Two phase inter-module callsIn this model, calls to other modules would need to be initialized in |
Beta Was this translation helpful? Give feedback.
-
We can't parallelize transactions without any a-priori knowledge. One of the basic problems is state versioning.
I think the only reasonable way to solve it is to only allow message processing which doesn't share any resource (state). |
Beta Was this translation helpful? Give feedback.
-
STM might be a helpful concept here. In the delivertx, all the transactions are executed in parallel on cache wrapped state with an optimistic assumption that they will not cause a race condition. After all of the message execution is done, we commit the state change sequentially, and if any of the execution turns out to had access to a state that has previously accessed, that state change does not get committed(=rolled back). Those "failed" messages then get batched and run again(maybe sequentially). |
Beta Was this translation helpful? Give feedback.
-
The more I think about this, the more I think that framework level parallelism is something we should be designing towards even if we don't fully implement it for a while. Parallelism in begin and end blockers would be great of course and can be done now in specific cases if needed, but I think there is only so much parallelism we can introduce within individual transactions. For instance, when sending coins sure someone could send multiple coins in a single transaction and we could parallelize that state access, but I imagine this will be very rare. Users generally send one coin at a time not a bundle. So I think the big gains are executing multiple transactions in parallel which requires some grouping algorithm based on a priori knowledge of state access (a synchronization or preparation phase). It would be good to understand where the parallel execution model I described will not be feasible, but in most of the cases I've examined it seems feasible and likely desirable. And we can still at a "synchronized" phase for tx's that can't be parallelized yet or even at all. Adding this level of parallelism at the framework level would likely be both a big competitive edge for Cosmos chains and the Hub and provide the community with greater future assurance of scalability that doesn't depend on sharding as CPUs already have 64 cores on a chip and will continue to evolve. From my perspective, I would like to move our 1.0 roadmap and ADR 033 (#7459) to include this. |
Beta Was this translation helpful? Give feedback.
-
Also seems like with SMTs being a hopefully soon type feature seems like some of the hard ground work is already laid. This would be amazing! |
Beta Was this translation helpful? Give feedback.
-
Starting a discussion on parallelization of execution.
The Cosmos-SDK is synchronous in most to all cases. While this simplifies implementation details, there are areas that could be parallelized to increase throughput.
The first idea that comes to mind is to split bank accounts into denomination then account address. This would lend itself to one account sending multiple transfers of different denominations that could update state at the same time.
Much of the performance improvements to be had, are currently not here. Tendermint could introduce
DeliverBatchTX
orDeliverBlock
instead of delivering each transaction individually. Also, we can't forget about IAVL and the performance increases that could be made there.Beta Was this translation helpful? Give feedback.
All reactions