-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to sign the stateroot? #1526
Comments
For me, I think option A is the best way at present, even through don't like it. Consider that if stateroot shuold be provided to the outside, like for SPV, cross-chain, it should be verifiable, requiring the signatures of all the consensus nodes. |
They can send the current state root in the prepare request, it will be signed automatically. But then there is no difference with include it in the header. |
They can sign header and stateroot separately. |
That's exactly why it should be in the header, because that's what CNs naturally agree upon and sign. Also, speaking of outside users, any P2P-distributed solution would require those users implementing P2P protocol support which may not be something they want to do. And to me that's also very important from the other perspective, ensuring user's trust in the network. This whole state is something that matters to users, making header contain state hash gives clear guarantee that whatever this state is it won't suddenly change. If there is a need to change the state because of some bug or whatever then is should be done in a clear and explicit manner in subsequent blocks (like described in #1287), but not by rewriting the history. So, Option D, definitely. |
My favorite version is send the StateRoot of the previous block in the proposal, and include it in the header. |
Store in block footer? | header: version, hash, ... |
| body: txs |
| footer: stateRoot | All consensus nodes sign the whole block, and set |
We should consider of vm upgrade compatiblity first before decide where to put root. |
I've been thinking about it for a while and although it has the obvious downside of state lagging one block behind, it still gives us clear predictable behaviour: for transaction in block N you get a state at block N + 1. And it fits our current transaction handling model nicely, it's very easy to implement. So it probably really is the best option we have at the moment.
Wouldn't that complicate life for light nodes/wallets? I think it would be nice for them to be able to operate just using headers. Also,
This versioning is essential IMO, but not just for VM, we should also consider whole contract execution environment like native contracts or syscalls (that might also be changed in some incompatible way for whatever reasons). |
Description
For the above problem, there are 4 ways to deal with: Solution1Ignore. For the future, we will correct it. Affect
Applicable scope:
Solution2Affect
Applicable scope:
Solution3Roll back the block and re-execute the old block with the new version of vm. Affect
Applicable scope:
Solution4Affect
Applicable scope:
Summary
I think that we put state root in block header will better than p2p message, which will benifit upgrade. What do you think? |
I vote for Solution 3 |
We discussed this before,@igormcoelho emphasized that failed should not be converted. Perhaps we need a Solution 4,which is 3 with restrictions. |
Oh, it'd be so easy to discuss that in one room with a whiteboard, but anyway. My take on solutions 3 and 4 is that they're dangerous as they would change the state for old blocks which I think is not acceptable for several reasons:
Solution 2 basically has the same issues, but it also directly contradicts one of the core Neo features which is one block finality. And I don't think that we're only left with solution №1, we should take it and amend with one important addition from #1287 which is corrective transactions to get a solution number 5. Solution 5 It basically follows solution number one in that we're releasing VM version 2 and all blocks starting with some number N get processed using this new VM. But at the same time if we know that the bug fixed in VM version 2 has caused some wrong states we explicitly correct them with another corrective transaction in one of subsequent blocks. So if accounts A1 and A2 have wrong balances we adjust them by this transaction. Obviously it's a very powerful (and dangerous) mechanism, so this transaction would be signed by CNs or even whole governance committee (now that we have it). And it would only be applied when there is a need to, if some wrong cryptocuties were generated because of this bug we may as well decide that they're way too cute to change them (and thus they just won't be corrected). But if there is a change in the state, the user would know why it has happened and it at the same time won't break any references to old state. Affects
Applicable scope: just about anything. |
Nice discussion @Tommo-L , as always. I know many options are already on the table, but like @roman-khimov , I still feel like some extra important information/options is missing for decision-making. Let me try to contribute There are two things still left behind from reasoning: "in-time" consensus perspective and contract programming nature itself. Issues with Contract Programming What happens is that, if users "abuse" from this, and suppose they "assert()" every operation on their Entry Script for transactions, paying this tiny extra GAS cost, they could ensure that "known bad operations" will be kept forever (example, it knows that DIV operation does not fail with zero, so it quickly submits a tx with an assert that x/0 equals 5, and it passes, locking our ability to fix that forever). On my perspective, which I don't think was discussed before here, I'll guess what I think we should do: Solution 6: We should allow users to mark deployed contracts as BUGFIX or NOBUGFIX.
Now we come to the point, which contracts should be BUGFIX and which shouldn't? Now, we let users choose, and we let Neo Foundation also choose. Ecosystem adjustments and Two NEO's
This also means that we can create an alternative NEP-5-NEO-TOKEN-NOBUGFIX, which is collateralized by Native Neo BUGFIX. Hybrid BUGFIX/NOBUGFIX My opinion is that minting when raising funds in a NOBUGFIX contract could be done this way, accepting "volatile" BUGFIX Neo, even if the contract token itself is NOBUGFIX. The only risk is such state-changing unexpected event happening on NEO token during crowdfunding, so losses may affect only fundraising itself, at that particular time, but without any risk of "contaminating" this risk to third-parties (or DEX), since states wouldn't change for that specific token. My vote For the original question, we would need A (distribute global state with consensus p2p) and D (store on block header last state of NOBUGFIX contracts). Affects
|
I agree with these points,Igor, I vote for Solution 6. |
@erikzhang Can you please have a look at these solutions, which one do you prefer? |
Option B is good to me. |
This one seems to be cross-chain compatible, but still it has a problem of state changing suddenly. It's like you have a $1000 on your account at block N and now you have $100 at block N+1. You may wonder why did it happen, but the only answer you get is "we've had a software upgrade". I think that for every state change there has to be a clear traceable answer to that question of why.
How are we going to solve cross-chain issues and the problem of synchronization between two chains in the same network (like #1702 (comment))? And what's the advantage of it? If we're to return to the two main points reminded to us by @Tommo-L:
CNs have to have an up-to-date state to participate in consensus, they can't do anything useful until they have this state (it can be seen even in the current neox-2.x implementation), so I don't see how detached state makes CNs more performant. In fact it may even slow things down because of networking issues (or even completely break consensus for the same reason).
And this one now has like 7 different solutions right in this thread. I want to make sure we're doing the best we can for Neo 3. |
@KickSeason if I understood correctly, the issue I see on Solution 7 is precisely that we may not want to keep some states from vm-1.x, for example, if these resulted in asset losses that were fixable in vm-2.x. We already have 7 solutions, maybe we can agree on the basics first:
@Tommo-L @KickSeason @roman-khimov @shargon and specially @erikzhang , is there any chance you agree with some idea of having bugfixing limited to some contracts (like native Neo) and some states stored on header for other contracts (like some token that wants immutability)? (presented Solution 6). We can find some hybrid final solution, as long as we agree on fundamentals that we want network to provide. |
@KickSeason,
That is a good point! Nice insight. |
Because there is some intention that we have when writing software and the source code is just an attempt at formalizing this intention, this can be a nice attempt, but still it (quite often) happens that the formalization is not exactly what we've intended. A bit shorter version: there are bugs.
Because "we" are the ones exploiting this bug? Sorry, but probably that's the only possibility I can think of. In general, I think that once the bug is known most of the people would want to fix it, because their intentions didn't include this bug. And to be fair that's actually why I think marking contracts as bugfix-impossible (part of solution number 6) won't be used a lot and thus one generic bugfixing path is enough. But at the same time IMO it's not correct making a direct relationship between allowing/disallowing bug fixes and allowing/disallowing state root changes. It is possible to be able to fix bugs and don't change old state roots at the same time. And make any state changes resulting from bug fixes traceable.
We should and therefore I also should note that I'm basing on the following expected characteristics set:
Basically, it's all about making the behavior of the system predictable. |
@roman-khimov When node-2.x persist the first block after upgrade, it use new execution logic but the storages are from node-1.x. Why is there sudden changing? If there is change, it must happen in some tx in N + 1 block. |
Ah, maybe I've misunderstood this one a little. So basically it's the same as solution number 1, it's just that the new node doesn't contain the logic for VM 1.x and the only way to get the state for old blocks is via P2P from old nodes? |
Yea! |
OK, thanks for clarifying that. For some reason my first impression was that new (VM 2.x) nodes bring the new state as if they were running from the genesis block. But then solution number 7 (S7) has the same basic characteristics (limited scope) as solution number 1 (S1) and is mostly concerned with questions of compatibility and maintenance, where S1 is about keeping the VM 1.x code in the node, S7 makes the node cleaner by removing it and relying on the network to get proper states for VM 1.x epoch. It's a nice hack, but at the same time I think that shifting this maintenance burden from the node code to node instances is a bit problematic as nothing guarantees long-term node-1.x existence in the network. And we can have like 10 VM versions, so there would have to exist nodes for each VM version and someone would have to maintain them. And what if some non-VM bug would need to be fixed? We'll have to update all 10 versions of the node. And it would be hard to ensure we're not breaking anything. I think in practice it would outweigh the effort required to maintain compatibility in one code base and single code base is more reliable, it's trivial to test the code against known behavior (state root should match). This mechanism of P2P state sharing may be useful for some other purposes, though. |
I think we're one step back now as both points had been discussed previously. |
@Tommo-L It would be nice to have some observing list of "strange states" out there, including poor implementation and bad versioning. Specially for tokens, some random Solid-state calls stored on chain may prevent these nodes from syncing. |
Some other options we can consider in the future:
|
These are different approaches to how to calculate state, and I'm open to any better suggestions wrt this topic. But this is a separate topic from where to store state data and it's the most important one. I think the advantages of header-included state root are well covered here, so let's concentrate on possible negative effects. Performance and size concerns are now ruled out by #1526 (comment) (and #1526 (comment)), so we have just two problems left:
Repairing the state
It essentially is a fork, although in a state chain and forks are better be avoided (at least in case of Neo). And it also is changing the game rules while playing which is not a good practice. Instead, if we're in the situation where we need to repair the state we need to... repair it actually. Not replace it, not rerun old transactions under new rules, not fork the state (or block) chain, but repair the state we've got. And this can be done following the scheme from #1526 (comment) with explicit committee-signed corrective transactions that will fix the state. It of course also needs execution environment versioning, but that's not a big problem either. Potential consensus failures First of all, can state difference cause consensus to fail now? Yes it can. There is a potential for this even now, without any state roots. If CNs are to disagree on GAS contract state they can fail primary node proposal validation and depending on mempools this situation can happen again and again after view changes. But good luck debugging that without clear state data. Second, how likely in practice are we going to see complete consensus failure (network stop)? Not really likely. For it to happen we need at least Third, even if we're to imagine CN state split the question is --- is it better to continue running the network with unknown state or fix the problem and continue running properly synchronized network? This is exactly the case where stopping the network is appropriate, because doing so can prevent bigger problems (like the ones requiring state repair). Remember #1989, it's not a critical bug (luckily), but it is a perfect demonstration of the problem that can be detected and prevented by (different implementations of) CNs exchanging their state during consensus process. Continuing running the network unsychronized can potentially lead even to accepting wrong blocks. And that is a catastrophic scenario for the network, especially the one declaring one block finality. This must be prevented even if it costs network outage. So I'm absolutely sure that exchanging state data during consensus process (and including it in the header) actually improves the network reliability. We're likely to see some CNs go wild occasionally, but this will be a signal, either CN needs to be fixed, or there is an implementation problem, or there is some protocol problem that resulted in this, but without this signal we're just blind to these problems. We're not likely to see On the importance of this issue for Neo 3 |
Regarding this and #2905/#2974. The way to do this is via block version one with an additional field that works the same way NeoGo |
We will need to resync in 3.8 or we will delay 3.7 forever... |
An non-resynchronizing update can be performed with a trick:
|
Prototyped in nspcc-dev/neo-go#3566, works as expected. Block version switches at the given hardfork and then version 1 blocks have one field more. The problem for us though is that we can't have #3175/#3178 in the same hardfork with version 1 blocks without breaking NeoFS network compatibility for C# nodes (the trick from #3175 (comment) won't work with stateroot-enabled blocks, it must happen somewhere before). So we're blocked by those PRs currently. |
Q1: Should stateroot be part of core or plugin?
As we discussed at last night meeting, most people agreed that It should be part of the core to ensure the data consistency of the all cn nodes.
Q2: Which method should we use to sign state root?
This is still under discussion, there are some options:
MinerTransaction
in neo2.xWhat do you think?
The text was updated successfully, but these errors were encountered: