Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-37: Tweaking Difficulty Adjustment Algorithm #79

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

kushti
Copy link
Member

@kushti kushti commented Sep 23, 2022

This PR is proposing an alternative difficulty adjustment algorithm and an activation method for it.

@Vertergo
Copy link

Vertergo commented Sep 23, 2022

Hi Kushti,

I'm worried the third proposition (limiting change to 50%) may exacerbate the situation under certain circumstances. Specifically, under the same conditions that just occurred following the ETH merge (all else equal), the same problem would have existed for an even longer period (projected into the future).

I would like to suggest a failsafe mechanism or trigger in order to maintain variations closer to mean (120 seconds). For instance:

If last epoch average block time is less than or equal to 90 seconds, then readjust difficulty based on last epoch only;
If last epoch average block time is greater than or equal to 150 seconds, then readjust difficulty based on last epoch only;
If last 8 epoch average block time is less than or equal to 90 seconds, then readjust difficulty based on last epoch only;
If last 8 epoch average block time is greater than or equal to 150 seconds, then readjust difficulty based on last epoch only;
Return to the standard formula if average block time for the last epoch is within 90-150 seconds AND average block time for last 8 epochs is within 90-150 seconds;

Example 1 (8 epochs of varying average block times): 110, 120, 130, 140, 140, 145, 130, 180
8th epoch is greater than 150 seconds, therefore use it solely to adjust next epoch's difficulty
120, 130, 140, 140, 145, 130, 180, (120) == total average over 8 epochs = 125.625 seconds
Next epoch difficulty based on 125.625 seconds (return to standard difficulty calculating formula)

Example 2 (8 epochs of varying average block times): 110, 120, 130, 140, 140, 145, 130, 500
8th epoch is greater than 150 seconds, therefore use it solely to adjust next epoch's difficulty
120, 130, 140, 140, 145, 130, 500, (120) == total average over 8 epochs = 178.125 seconds
Adjustment brought last epoch block times back to near 120 seconds, good!
8 epoch average is greater than 150 seconds, therefore use last epoch solely to adjust next epoch's difficulty
Next epoch difficulty based on (120) seconds average block time

Example 3 (8 epochs of varying average block times): 110, 120, 130, 140, 140, 145, 130, 500
8th epoch is greater than 150 seconds, therefore use it solely to adjust next epoch's difficulty.
120, 130, 140, 140, 145, 130, 500, (500) == total average over 8 epochs = 225.625 seconds
Adjustment did not bring last epoch block times back to near 120 seconds, bad!
Next epoch difficulty based on (500) second average block time
130, 140, 140, 145, 130, 500, 500, (120) == total average over 8 epochs = 225.625 seconds
Adjustment brought last epoch block times back to near 120 seconds, good!
Next epoch difficulty based on (120) second average block time
140, 140, 145, 130, 500, 500, 120, (120) == 224.375 seconds
...

Respectfully,

Vertergo

@renatomello
Copy link

renatomello commented Sep 24, 2022

Hi Kushti,

I'm worried the third proposition (limiting change to 50%) may exacerbate the situation under certain circumstances. Specifically, under the same conditions that just occurred following the ETH merge (all else equal), the same problem would have existed for an even longer period (projected into the future).

I would like to suggest a failsafe mechanism or trigger in order to maintain variations closer to mean (120 seconds). For instance:

If last epoch average block time is less than or equal to 90 seconds, then readjust difficulty based on last epoch only; If last epoch average block time is greater than or equal to 150 seconds, then readjust difficulty based on last epoch only; If last 8 epoch average block time is less than or equal to 90 seconds, then readjust difficulty based on last epoch only; If last 8 epoch average block time is greater than or equal to 150 seconds, then readjust difficulty based on last epoch only; Return to the standard formula if average block time for the last epoch is within 90-150 seconds AND average block time for last 8 epochs is within 90-150 seconds;

Example 1 (8 epochs of varying average block times): 110, 120, 130, 140, 140, 145, 130, 180 8th epoch is greater than 150 seconds, therefore use it solely to adjust next epoch's difficulty 120, 130, 140, 140, 145, 130, 180, (120) == total average over 8 epochs = 125.625 seconds Next epoch difficulty based on 125.625 seconds (return to standard difficulty calculating formula)

Example 2 (8 epochs of varying average block times): 110, 120, 130, 140, 140, 145, 130, 500 8th epoch is greater than 150 seconds, therefore use it solely to adjust next epoch's difficulty 120, 130, 140, 140, 145, 130, 500, (120) == total average over 8 epochs = 178.125 seconds Adjustment brought last epoch block times back to near 120 seconds, good! 8 epoch average is greater than 150 seconds, therefore use last epoch solely to adjust next epoch's difficulty Next epoch difficulty based on (120) seconds average block time

Example 3 (8 epochs of varying average block times): 110, 120, 130, 140, 140, 145, 130, 500 8th epoch is greater than 150 seconds, therefore use it solely to adjust next epoch's difficulty. 120, 130, 140, 140, 145, 130, 500, (500) == total average over 8 epochs = 225.625 seconds Adjustment did not bring last epoch block times back to near 120 seconds, bad! Next epoch difficulty based on (500) second average block time 130, 140, 140, 145, 130, 500, 500, (120) == total average over 8 epochs = 225.625 seconds Adjustment brought last epoch block times back to near 120 seconds, good! Next epoch difficulty based on (120) second average block time 140, 140, 145, 130, 500, 500, 120, (120) == 224.375 seconds ...

Respectfully,

@Vertergo

I agree with this proposition in principle. However, there's something wrong with your Example 3.

@TypoDaPsycho
Copy link

Hi Kushti,

I'm worried the third proposition (limiting change to 50%) may exacerbate the situation under certain circumstances. Specifically, under the same conditions that just occurred following the ETH merge (all else equal), the same problem would have existed for an even longer period (projected into the future).

Can you share any additional details that support this statement above? Like example calculations you have that show a negative impact on current situation.

And like already mentioned by someone else, example 3 appears to have flaws when compared to previous 2 examples.

@kushti
Copy link
Member Author

kushti commented Sep 26, 2022

Hi Kushti,

I'm worried the third proposition (limiting change to 50%) may exacerbate the situation under certain circumstances. Specifically, under the same conditions that just occurred following the ETH merge (all else equal), the same problem would have existed for an even longer period (projected into the future).

With significantly shorter epoch length should catch up quickly anyway.

@ovsiannikov
Copy link

if it's hard fork anyway. Let use Dark Gravity Wave 3. It widely used and fine tested.

Copy link

@girtsjj girtsjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it. Tests look ok.

eip-0037.md Outdated Show resolved Hide resolved
@zawy12
Copy link

zawy12 commented Oct 5, 2022

  1. Do not use least squares.
  2. Do not limit how much the difficulty changes.
  3. Do not use any epoch like BTC uses.
  4. Enforce monotonic timestamps.
  5. Dark gravity wave is just a really complicated simple moving average that shouldn't be used.

Violating 2 and 4 allows a >50% hashrate attacker to get an unlimited number of blocks in only a few epochs even in the current implementations of BTC, LTC, etc. No small coin has survived violating 3 (I mean Ergo is the first I've seen that seems to have survived using some form of it.)

I tested least squares several different ways with no improvement over a simple moving average (someone can probably mathematically prove that should be expected). If you try to predict the hashrate ahead of time (the future) by any method, you're making an assumption an attacker can exploit. For example, a big miner can come in on a epoch (which I assume means it is not changing every block and thereby "like BTC") and get all that epoch's blocks then leave. Now it's a bad situation because all your preferred dedicated miners will have a high difficulty and are strongly motivated to leave, causing the next epoch to have an even easier difficulty than the initial one, causing the big miner to want to come back, so a terrible amplifying oscillation develops. Repeat this logic with least squares trying to get a slope from 2 epochs and you can see it makes it worse. Going to 3 or more epochs for a slope doesn't fundamentally change the situation. It only makes the averaging period longer. If a big miner stays for a longer averaging period, the recovery with slow block times will be even longer. The best you can do is make an honest estimate of what the hashrate is. Using least squares seems to be just approximating a simple moving average (SMA) algorithm if you make the epochs shorter and shorter to solve the problem. SMA can also have an oscillation problem if you have 10x or 20x your normal hashrate jumping on and off the chain. This is why BCH had to upgrade to ASERT. The oscillations in an SMA can be stopped by giving more weight to more recent blocks which is what all good algorithms do. LWMA is SMA with linear weightings and approximates ASERT, EMA, and WTEMA.

If hashrate is changing by > 10x every day you can solve this coin hopping and stuck chain problem with "real time targeting" (RTT) and tight timestamp limits. Tight timestamp limits means honest miners ignore a block for say 1 block time after it's timestamp is valid if it's timestamp is not within say +/- 10 seconds of local time when the honest miner receives it. Many people don't like RTT because it allows the miner to set his own difficulty by changing his timestamp but tighter timestamps should fix it. But tighter timestamps allow a subsequent attack of attacker choosing a timestamp on the cusp of the ignore decision but I have a solution that involves the subsequent miner's hash helping subsequent miners determine if a borderline timestamp should get a reward, reminiscent of ColorDAG's solution to selfish mining which this also solves but without a DAG.
For RTT stuff I have a long TSA article, but just see Tom Harding's RTT paper and his chain2 public implementation of it. He focused on getting precise block times which I disagreed with because that's supposed to cause more orphans and he says it doesn't and I say if it doesn't then make block times faster instead of using RTT. So he used k=5 (or maybe 6?) in chain2 where for the purposes of dealing with large hashrate jumping on and off your chain, I would use k=2 or maybe 3. Also higher k means the +/- 10 second timestamp limits need to be tighter, so again I think smaller is better. K=1 is just a normal PoW as his paper shows.

If BCH's ASERT or LWMA (mine which about 60 coins used with good success, but not as good as the following) are too complicated, you might get by with BCH's old SMA (simple moving average which I think violated 2 and 4 above).

A simple alternative is the following which is almost exactly like ASERT. We historically called WTEMA (original by Jacob Eliosoff with further work from Tom Harding and myself, then independently discovered by ETH devs in two inferior forms that they used for their PoW). I later found out from Jonathan Toomin it's a "1st order IIR filter" which is an approximation to an exponential moving average. ASERT's discoverer (technically he was the 2nd of 3 parties to think of it) and myself prefer the following instead of ASERT, but Jonathan decided on ASERT for BCH because of 1% to 3% better precision with large N and it can handle out of sequence timestamps without the extra code I have below.

"WTEMA"
aka "Improved version of ETH's"
aka "1st order IIR filter"
aka "EMA aka ASERT aka NEFDA approximation"

Drum roll please:
target = T + T*t/B/N - T/N

B = target block time like 120 seconds
t = solvetime of prior block
N = filter = 100 to 1000. This is called the "mean lifetime" in terms of blocks of the exponential moving average.
T = prior_target is in all 3 terms above so that the integer divisions do not have hardly any error or overflow.

It's easy to see how it works: if t/B = 1 there is no change which is when the prior solvetime equaled the block time. If t>B you can see target gets easier. With a little math and assuming N > 10, you can see it approximates this which is called the "relative" form of ASERT:

target = T * ( e^(t/B-1) )^(1/N)
This is interesting because t/B-1 is an error signal, e^x makes it an EMA as is standard in stuff like stocks, and the root 1/N means we don't apply the full adjustment in a single block and notice that since each adjustment is a multiplication, the root eventually cancels itself out. EMA's e^x is not magic but is a math trick that prevents a slight error in WTEMA. That is, a pair of equal and opposite solvetimes which could have been noise in WTEMA are like this (1-1%) * (1+1%) = 0.9999 but in ASERT are more accurately accomplished like this e^(1%) * e^(-1%) = 1.000

BCH's ASERT is called "absolute ASERT" which is mathematically the same as the above, but functionally different with good benefits such as cancelling an error seen in nBits approximation. It's an interesting topic in itself because it doesn't look at the prior block target or solvetime, but at genesis target and timestamp, and prior height and timestamp, and yet is mathematically the same. See Mark Lundeberg's paper somewhere. Also Imperial college researchers discovered it and wrote a paper with similar conclusions about the absolute form.

If "t" goes negative (non-monotonic timestamps) there are potential problems other than the unlimited blocks in finite time such as overflow or underflow. Coins don't generally want to change this consensus rule that was one of Satoshi's biggest mistakes, but you can enforce it in the difficulty algorithm by going back to the 12th block in the past (assuming you use BTC's MTP=11 rule to typically get the 6th block in past).

As far as I can tell, if you change the MTP=11 to MTP = 1 in BTC, it enforces monotonic timestamps, but I have no idea what dependencies that might break.

Here is the code to enforce monotonicity inside the difficulty algorithm without changing the MTP = 11 setting.

 // H is current block height
// Assuming MTP=11 is used which guarantees in BTC that 11th timestamp in past is not before 12th
// This is more easily recognized as the "MTP" aka 6th block in the past if the 
// timestamps were in sequence with height.
// Idea originally from Kyuupichan as implemented in Harding's WT-144
previous_timestamp = timestamps[H-12];
for ( uint64_t i = H-11; i <= H; i++) {
      if (timestamps[i] > previous_timestamp  ) {   
            this_timestamp = timestamps[i];
      } 
      else {  this_timestamp = previous_timestamp+1 ;   }
      solvetimes[i] = this_timestamp - previous_timestamp;
      previous_timestamp = this_timestamp;
}
t = solvetimes[H]; 

Also, you need to change the future time limit (FTL) in BTC that is 7200 seconds to be something less than N/20 (a small fraction of the difficulty averaging window) and it should be coded that way so there's not a surprise for future devs like the first Verge attack. Also, peer time should be removed, or at least ket to <1/2 the FTL. In BTC I believe it's 70 minutes which is about 1/2 the 7200 seconds. The 70 minute peer time rule is that if local time differs from median of peer time by more than 70 minutes, it reverts to local time and throws a warning to let the node operator know. This is another Satoshi mistake in not knowing basic consensus requirements. There should not be any peer time, just local time. PoW is only as secure as the clock which means you shouldn't use any consensus to get the time. Every node needs to decide on its own without NTP, GPS, peer time, or it's cell tower what current UTC is. They could use all of these, 1 of these, or none, but they shouldn't tell anyone or agree with anyone on what they use or it's subject to sybil or eclipse attack to violate the PoW.

//  FTL in BTC clones is MAX_FUTURE_BLOCK_TIME in chain.h.
//  FTL in Ignition, Numus, and others can be found in main.h as DRIFT.
//  FTL in Zcash & Dash clones need to change the 2*60*60 here:
//  if (block.GetBlockTime() > nAdjustedTime + 2 * 60 * 60)
//  which is around line 3700 in main.cpp in ZEC and validation.cpp in Dash

FTL
MAX_FUTURE_BLOCK_TIME set in chain.h, validating here:
https://github.com/bitcoin/bitcoin/blob/6ab84709fc1dca62a7db4693e2ff72f40a7eb650/src/validation.cpp#L3488

Peer time:
DEFAULT_MAX_TIME_ADJUSTMENT set in timedata.h, validation here:
https://github.com/bitcoin/bitcoin/blob/3c5fb9691b7b49042160cb161daa07ab2827c064/src/timedata.cpp#L80

@zawy12
Copy link

zawy12 commented Oct 5, 2022

My comment above is probably overwhelming (and I keep updating it), but there are security issues and I don't know what devs will want to do.

In short, Ergo's current plan may be sufficient despite my complaints if monotonicity is enforced for security. The "double weight" to most recent epoch can stop most oscillations, but the epoch method isn't optimal and delays in adjusting difficulty causes oscillations. I see 8 epochs of 128 blocks which is a history of 1024 blocks. Instead of N=1024 with WTEMA, try N=512 and I think you will get the same level of stability. This results in 2.7% StdDev changes in difficulty under constant hashrate for WTEMA. You can check your data to see if that's what you're getting with 1024 blocks. StdDev changes linearly with N in all algorithms, so once you know your StdDev under constant hashrate, you can adjust WTEMA's N to get the same stability. Then try different hashrate changes and you will see WTEMA responds faster with better confirmation times. This is how I determine the best difficulty algorithms. They always have a "filtering" or "stability" constant "N" which is a reference to the number of blocks they're "averaging" over. In your method N is explicitly a number of blocks but it's an indirect reference in WTEMA and ASERT which I described in the previous comment as a root because it's a sequence of block multiplications.

It takes WTEMA N/2 blocks to increase difficulty half-way ("50%") to the needed difficulty in response to any change in hashrate and 2x N to fully get to the correct difficulty. Your epoch method with N=1024 will fully respond in 1024 blocks which is same as 2x WTEMA's N=512 that I am predicting is the equivalent. Your method is predictive and double weight to most recent epoch, so you may get close to WTEMA's 50% response in 1/4 your N=1024 which is same as 1/2 of WTEMA's N=512. Your last epoch before adjusting causes a N/8 delay in responding (if I understand "epoch" correctly) and that also invites oscillations in addition to the 1-epoch delay in changing difficulty.

If your current method were perfected to change every block, being predictive and giving more weight to recent blocks, I think it would be mathematically very close to LWMA, which is very close to EMA and ASERT in overall capability. EMA and other rolling changes are used in stock trends (but not least squares) apparently because least squares is meant for static data instead of rolling data.

I strongly suspect WTEMA N=512 is the same stability as yours with N=1024 blocks because LWMA is so similar to yours and LWMA with 1024 is equal in stability to WTEMA with N=512 and it has a very similar speed of response. Most of the coins using LWMA have N=60 with block times of 2 minutes and this was due to my recommendation (N=60). They respond really fast but have 2.7% * 1024 / 60 = 37% StdDev in difficulty changes. In hindsight, N=200 would have been a lot better, and your epochs with N=1024 might be better thanks to the stability despite not changing every block or being as precise as LWMA. But LWMA at N=1024 is a lot of computation that epochs don't require.

N=512 with WTEMA is a little larger than BCH's ASERT "288" which is really N = 288/ln(2) = 415, so you're already really close to BCH ASERT but you have 2 minute block times instead of 10 minutes, so you're about 4x faster in time with the same level of stability (which I think is the same kind of metric in both blocks and time).

I disagreed with Jonathan on selecting N (although he's a super expert on mining) and prefered faster response. He was like "OK, maybe for smaller coins a smaller N is needed". So that's what I'll say. If you don't need your currently level of stability, you can reduce N especially if you switch to WTEMA. Bitcoin Gold was the largest coin using my LWMA and they still have N=45 (600 second blocks) which is N=22.5 for WTEMA. Click on their 1 week chart to see what that's like. They were really happy with it despite this hindsight of knowing larger is better because they saw what was happening to other coins not responding fast enough, but the core of bad oscillations is not slow speed but delays like your epochs cause. My bias towards low N in the early days was based on Monero clones suffering BAD oscillation with N=576 in SMAs. What I didn't realize was that SMA was a smalllish part of the problem. Monero code also ignores the most recent solvetimes with "cut" and "lag", causing a delay in response of around 10% of the averaging window. So your epoch delay at 1/8 may cause a big problem....if not the sole source of the current problem. BTW WTEMA and relative ASERT are disasters if you do not use the previous target and the previous solvetime. Monero clones have timestamps shifted 1 block which caused a problem.

@aslesarenko
Copy link
Member

@zawy12, thanks for suggestions and for pointing out to the alternatives

Co-authored-by: Ricardo M. <lordrip@gmail.com>
@kushti
Copy link
Member Author

kushti commented Oct 6, 2022

  1. Do not use least squares.

    1. Do not limit how much the difficulty changes.

    2. Do not use any epoch like BTC uses.

    3. Enforce monotonic timestamps.

    4. Dark gravity wave is just a really complicated simple moving average that shouldn't be used.

Ergo protocol is enforcing monotonic timestamps since day 1.

Using timestamp of the previous block only is introducing strong trust assumption in time provider. I do not know atm how many machines are using ntp (vs hardware clocks). And for a server on ntp which is hosting the node, it is possible to feed the node with low-difficulty chain continuation for one who can control network time (in one way or another, e.g. by hacking ntp protocol or servers). Especially if diff change is not limited.

@zawy12
Copy link

zawy12 commented Oct 7, 2022

TL;DR: If you develop oscillations, just change difficulty every block with the current math. If you start suffering attacks with timestamp manipulation due to timestamps being allowed too far into the future or due to the 50% cap, read the following.

Your monotonic stamps prevents a lot of problems.

Nodes that mine should know that their clock is approximately correct. Other nodes don't matter. Their maximum error plus the allowed future time limit on timestamps should be a small fraction of the difficulty averaging window so that the most recent timestamp should not be capable of having a large effect on difficulty if honest miners are >50% hashrate.

By suggesting N = 512 for WTEMA to mimic your current "averaging window" of 1024 blocks, I'm estimating a similar response speed and stability but with higher precision. The manipulation of difficulty that's possible from the most recent timestamp being far in the future should be about the same or a lot less with WTEMA. For example, if the future time limit & local time error are not correctly limited, and if the previous timestamp were allowed to be 1024 block times into the future, a simple moving average would cut the difficulty in half. For ASERT and WTEMA, it would be about a 65% reduction. With epochs, the cheat isn't fixed for 128 blocks whereas rolling methods can mostly correct it in the next honest timestamp. The attacker can't reliably expect to get the next block to benefit from the manipulation on rolling methods, but with 128 block epochs he can expect to get some. This shouldn't be important because the future time limit and local time error should be smallish compared to 1024 block times. But with doubleweight on most recent 128 blocks and prediction, a timestamp only 128 blocks into the future might have a large effect. This isn't a reason to implement a 50% cap, but to make the difficulty adjustment slower or to reduce the future time limit to implement the 50% cap in a different way. There's a crucial difference in using the clock instead of code to implement a 50% limit. PoW is only as secure as the clock. The clock and therefore allowed timestamps must be more secure than the consensus mechanism (PoW). You can find this statement in the original Byzantine paper and even earlier in the late 1970's. The only way for the clock to be more secure than PoW is for node operators to know what time it is without asking peers. When you enforce a 50% cap on difficulty in code instead of with the clock, you're over-riding the objective measurement of current hashrate (consensus) with a prior belief about what "should be". Code going against the objective measurement of consensus always allows attacks on the consensus.

I saw your difficulty chart and it looks like the new method was implemented 4 days ago and it looks a lot better. It's possibly very close to being like ASERT, WTEMA, and LWMA, except for the epochs causing a 128 block delay. I believe you could it change difficulty every block and average block time would still be accurate. The reason to do the change is that delaying adjustment is what usually ends up in oscillations. Since it appears you didn't have catastrophic oscillations with the older method of 1024 blocks per change (as I would have expected even before the merge), I think you'll be OK with the current method. But if oscillations return, adjusting every block is the fix.

@arobsn arobsn mentioned this pull request Dec 7, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.