What Happens If Beacon Chain Consensus Fails?
By Adrian Sutton
Over on Twitter, there was an interesting discussion about the importance of client diversity for the beacon chain. As part of that cyber_hokie asks:
cyber_hokie – @cyber_hokie: As the majority client, isn’t it more likely that issues with Prysm would cause more severe financial penalties for minority clients voting on an alternative chain under accountable safety given Prysm validators likely hold the 2/3 threshold?
It’s important to note up front that this isn’t a Prysm specific issue. This isn’t a criticism of Prysm at all. All the major beacon chain clients are great – the issue here is any one client getting too large a share of validators. How much is too large? In an ideal world every client would have less than 1/3rd of validators. Then a bug in a single client doesn’t prevent the chain from continuing to finalise. More than 2/3rds majority is a really big problem because of the potential for those clients to finalise on an incorrect fork.
As Dankrad explains, if a client with 2/3rd of validators has a consensus affecting bug, it will fork off to its own chain and finalise that chain. That becomes an unrecoverable situation for those validators. If they fix the bug and switch back to the correct chain they will be slashed for creating surround votes – the correct chain’s justified checkpoint is earlier than the incorrect one. The only option for them is to send a voluntary exit and watch their balance drain until it takes effect. Given 2/3rds of validators are trying to exit, the exit queue is going to be extremely long and thus costly for majority client validators.
Which brings us to cyber_hokie’s question. If 2/3rds of validators are on the incorrect chain and they can’t switch, why wouldn’t we just accept that chain as the canonical one? Minority client users would then take the hit for inactivity on the majority chain but could switch over. Sounds simple and obvious right? Sadly, it’s not.
Firstly, again as Dankrad points out, we’ve been very clear that it’s bad for the network for clients to have more than 1/3rd share and dangerous for validators to centralise on a client or provider that has such large share. Most validator penalties ramp up as more validators have the same issue, so if you use a majority client or a majority cloud provider or a majority staking service you run the risk of incurring large penalties if they have an issue, due to it affecting so many validators at once. It would be extremely unfair to validators who heeded these warnings to then be penalised for doing the right thing.
Secondly, if you remember back to hard truths for ETH stakers, Ethereum doesn’t exist to make stakers rich. Stakers are service providers to Ethereum, highly interchangeable and largely quite replaceable. Stakers are not the target market for Ethereum or its reason to exist. The service they provide is necessary and they are paid for that service but they shouldn’t expect to be treated any differently to a miner and definitely shouldn’t expect the Ethereum community to bail them out when they mess up.
Which leads to the third and most important point. A client with a majority of validators, doesn’t necessarily have a majority of users or “value”. Even on the beacon chain today there are already some applications following the chain and making decisions based on that state. Those users don’t necessarily show up in validator numbers and so could all follow the chain with a minority of validators. So while a chain may be a majority from a consensus point of view it may simultaneously be the minority chain from a network value point of view.
Once we get to the merge, it becomes much more clear that the weight of DApps, exchanges and other users is going to be far more important than the number of validators. Unless you’re willing to argue that we should accept whatever chain Infura follows as canonical today (and you shouldn’t be), you shouldn’t argue that we should accept whatever chain a majority consensus client follows.
Finally, accepting the incorrect chain as canonical would mean embedding whatever bug happened to occur as the expected behaviour. That creates significant technical debt and may even introduce security flaws into the specification. It may not even be fully deterministic given that many consensus bugs have been caused by incorrect caching behaviour so nodes follow different chains depending on whether they were online or offline at a particular point in time or based on when they got a particular network message.
And that’s before we start on the political/governance challenges of a bail out. The DAO fork didn’t cost normal users anything and it still caused a chain split.
The days of Ethereum bailing people out are long gone.