09.08.2021: Post Mortem

Tl;dr: The ChainX blockchain got stuck. The previous state was exported and used as the genesis for a new blockchain. Penalized PCX has been returned and blockchain is back to normal. Please resume your usual activities.

Now that the blockchain is back to normal operation, we can recollect the events that took place last week. In the (European) evening of 9 August 2021, ChainX validators were unable to finalize blocks which ultimately led to the blockchain getting stuck. In this article, we’re going to recollect the events of the past week and explain a few concepts. If anything in this article is unclear, please join us in Telegram and ask us your questions.

It begins

Substrate’s consensus mechanism is split into two pieces: BABE and GRANDPA. BABE creates the blocks and GRANDPA finalizes them. This ensures fast block production and finalization, with GRANDPA lagging two or three blocks behind. On this fateful evening of 09.08.2021 however, a large number of nodes were unable to perform their GRANDPA responsibilities, leaving the network unable to finalize blocks.

GRANDPA requires two-thirds (2/3) of all validators to perform their GRANDPA responsibilities and vote on which blocks to finalize, keeping everyone on the same blockchain and not create multiple forks. If less than two-thirds of validators vote on a block, it can’t finalize and will wait until they have. For every block that passes, all nodes get penalized (slashed), which reduces the interest pool (or jackpot or pots balances) by a certain amount. We’re leaving the exact calculations of that amount out of this article for simplicity’s sake. When an interest pool is penalized so far that it is empty, the node is withdrawn from service and becomes a ‘drop out’. This frees up a validator spot for the next biggest candidate to take over. That candidate then assumes the responsibilities of the validator that was previously withdrawn. Given this new validator can perform their GRANDPA responsibilities correctly, the validators regain their two-thirds requirement and blocks are finalized again. But it didn’t…

A lot of nodes (the physical server that runs the validator or candidate software) had gotten stuck at the previous finalized block: 3,696,666 whilst the rest of the nodes had passed to 3,696,667 without any problems.

The network was now in a state where the ‘responsive validator nodes’ are waiting for the ‘stuck validator nodes’ to catch up and resume their responsibilities, or get withdrawn from the network through penalizing their interest pools and have them replaced with a ‘responsive candidate node’.

It gets worse

The community and PolkaX were scrambling to get the contact details of all the validators and candidates, to instruct and guide them on fixing their nodes. It was a race against the clock because, in the background, things were getting worse. BABE by design keeps producing blocks so that transactions are still recorded. But if it gets too far ahead of GRANDPA, it will slow down. Normally, blocks are created every 6 seconds, but with GRANDPA stuck, block creation was slowed down to 10 seconds, 30 seconds, 2 minutes, 10 minutes, and would inevitably… stop. The community was unable to reach all the validators in time, and those that were reached were unable to repair their nodes in time, leaving the blockchain to slowly come to a halt.

Time to rebuild

Nodes were being repaired and blocks were being finalized, but the damage was done, the blockchain had come to a complete halt at block 3,696,822. GRANDPA was finalizing blocks again, reaching 3,696,796 in the end, to no avail.

The PolkaX team exported the chain state at block 3,696,796 and used it as the genesis for a new and fresh blockchain: ChainX 3.0. On the 10 and 11.08.2021, after communication with the developers of ParityTech (the ones who built Substrate and Polkadot), PolkaX made a modification to the ChainX binary, disconnecting BABE and GRANDPA. Both are still used, but BABE will simply not wait for GRANDPA anymore. After a quick check that everything was in order, PolkaX started the new network in the morning of 12.08.2021, using the previous exported chain state as the genesis. Validators were slowly being added and the network rebuilt itself. Transactions were possible and users could resume their normal activities again.

The ChainX network was deemed stable on 14.08.2021 so the PolkaX team got to calculating the penalties incurred during the event. Later that day, almost all validators and candidates were refunded the previously penalized PCX to their interest pools. Three nodes, in agreement with their operators, were refunded a few days later.

Crisis overcome

As of writing this article, the ChainX blockchain is back to normal operation. The new binary has BABE and GRANDPA decoupled so that if this ever happens again, the blockchain won’t stop generating blocks, allowing for more time for the network to self-heal. We will also be working more closely with our validators, create better communication channels and set up more stringent hardware requirements.

We’d like to take this opportunity to thank our community for their patience and support. We’d like to thank our validators for being readily available and helping each other with rebuilding the network. You are all amazing!

We hope that our recent predicament serves as an example for proper node operation and maintenance for other substrate projects and that they may learn from our humbling experience.

Extra information:

  • We have retained the old blockchain’s transaction history on scan-old.chainx.org. The new blockchain will be available on scan.chainx.org.
  • There’s currently an imbalance in staking rewards. This is due to vote age and will even out naturally over time.

For further reading on the above-mentioned concepts, please see:

About ChainX

ChainX will become the largest Bitcoin Layer 2 Network in the world. The first Substrate blockchain to go live will provide Polkadot with the most valuable digital assets on the market. Committed to realizing trustless and decentralized bridges for Bitcoin and other assets, it forms an inter blockchain asset gateway, pathing the way for a truly interoperable network of blockchains.

Website | Github | Wallet | Twitter | Medium | Telegram | White paper

ChainX is the largest Layer-2 network of Bitcoin, based on Substrate, and will evolve into the Polkadot Secondary Relay Chain.