This blog delves deep on what are the existing proposals for scaling Ethereum transactions and how do they work. In the previous blog we went through the current architecture, explaining how it works now.
Verkle Tries
There is a disadvantage to merkle proofs that you have to send all siblings at all levels to recompute the hash. Which leads to network congestion and slow transactions.
Comparison - 3.5MB for 1K leaves in Merkel proofs to 1.5KB for 1K leaves.
In a Verkle tree, you do not need to provide sister nodes; instead, you just provide the path, with a little bit extra as a proof. This is why Verkle trees benefit from greater width and Merkle Patricia trees do not: a tree with greater width leads to shorter paths in both cases, but in a Merkle Patricia tree this effect is overwhelmed by the higher cost of needing to provide all the width - 1
sister nodes per level in a proof. In a Verkle tree, that cost does not exist.
Editing the tree and cost of computing a proof both are low
Zero Knowledge Proofs (ZK)
For example, you can make a proof for the statement "I know a secret number such that if you take the word ‘cow', add the number to the end, and SHA256 hash it 100 million times, the output starts with
0x57d00485aa
". The verifier can verify the proof far more quickly than it would take for them to run 100 million hashes themselves, and the proof would also not reveal what the secret number is.In the context of blockchains, this has two very powerful applications:
Scalability: if a block takes a long time to verify, one person can verify it and generate a proof, and everyone else can just quickly verify the proof instead
Privacy: you can prove that you have the right to transfer some asset (you received it, and you didn't already transfer it) without revealing the link to which asset you received. This ensures security without unduly leaking information about who is transacting with whom to the public.
A "succinct" proof is one where both the size of the proof and the time required to verify it grow much more slowly than the computation to be verified.
The basic idea of zero knowledge proof or less knowledge proof is not to reveal any information or reveal very less information.
Polynomial commitments. A polynomial commitment is best viewed as a special way to "hash" a polynomial, where the hash has the additional property that you can check equations between polynomials by checking equations between their hashes. Different polynomial commitment schemes have different properties in terms of exactly what kinds of equations you can check.
More here [ZK-SNARK] [Zokrates] [Examples]
ZK-SNARK vs ZK-STARK
zk-SNARK stands for Zero-Knowledge Succinct Non-Interactive Argument of Knowledge, while zk-STARK stands for Zero-Knowledge Scalable Transparent Argument of Knowledge
zk-SNARKs require elliptic curve cryptography, while zk-STARKs do not.
zk-SNARKs requires a trusted setup while zk-STARKs do not.
STARKs also have a larger proof size, meaning verifying a proof takes longer and uses more gas.
Layer - 2 Scaling
How to get more transaction without changing the ethereum protocol itself is called layer - 2 scaling.
Sidechains [Ref]
A sidechain is a separate blockchain that runs independent of Ethereum and is connected to Ethereum Mainnet by a two-way bridge. Sidechains can have separate block parameters and consensus algorithms, which are often designed for efficient processing of transactions. Using a sidechain involves trade-offs, though, as they do not inherit Ethereum's security properties. Unlike layer 2 scaling solutions, sidechains do not post state changes and transaction data back to Ethereum Mainnet.
Sidechains are independent blockchains, with different histories, development roadmaps, and design considerations. While a sidechain may share some surface-level similarities with Ethereum, it has several distinctive features.
In order for a separate blockchain to become a sidechain to Ethereum Mainnet it needs the ability to facilitate the transfer of assets from and to Ethereum Mainnet. This interoperability with Ethereum is achieved using a blockchain bridge. Bridges use smart contracts deployed on Ethereum Mainnet and a sidechain to control the bridging of funds between them.
While bridges help users move funds between Ethereum and the sidechain, the assets are not physically moved across the two chains. Instead, mechanisms that typically involve minting and burning are used for transferring value across chains.
Plasma [Ref]
Plasma is a way of building scalable decentralized applications that don’t sacrifice security for speed. Plasma chains derive their security from Mainnet. This makes them measurably more secure than sidechains. Both sidechains and plasma chains can have different consensus protocols, but the difference is that plasma chains publish Merkle roots for each block on Ethereum Mainnet.
Off Chain Executions
Plasma applications do a majority of their work outside of the “root chain” (e.g. Ethereum). Root chains tend to be slow and costly because they need to be very secure. If an application can do any work outside of the root blockchain, it should.
For example, in Plasma MVP, almost every transaction occurs outside of Ethereum. Only deposits and withdrawals, the points of entry and exit, are ever handled on the smart contract. This is a standard workflow for plasma applications. Anything that doesn’t require assets/data moving in and out of your smart contract can probably be handled off-chain.
State Commitments
When we’re doing so much off-chain, we need some way to make sure that our changes are final. This is why we make use of something called a “state commitment.” A state commitment is a cryptographic way to store a compressed version of the state of your application.
However, storing everything about your application would defeat the point of plasma entirely. We typically make use of Merkle trees instead. These commitments become kind of like save points for your application.
Exits
Plasma applications make use of these commitments whenever a user wants to leave the plasma chain. We usually refer to this as “exiting” the application.
Let’s illustrate this by imagining a payment network application! The state commitments for this application will probably contain some information about how much money each user currently has. If a user wants to withdraw (“exit”) their money from this application, they need to prove to the smart contract that they have money to withdraw. To do this, the user can use something called a Merkle proof.
The benefit of plasma over side-chain is the ability to exit and providing fraud proof. There is also this concept of settlement agent, for more details read here
Downside is data unavailability and finality has delay, you can’t exit immediately, it may take days to settle.
Rollups
Rollups are a relatively new L2 solution being implemented on Ethereum that enable exponential scalability gains without sacrificing security. The primary innovation of rollups are that they move computation off-chain, while storing only the bare minimum of transaction data on-chain. The rollup chain handles all of the expensive and computationally-dense data processing, enabling exponential growth in Ethereum’s ability to execute transactions. Again, in its simplest form, the rollup chain executes the Ethereum transactions on its own chain, “rolls” them up into one batch, and Ethereum receives and stores the results. However, in order to do so, the Ethereum mainnet needs some way to verify that the transactions that happen off-chain are valid. So, how does Ethereum determine that submitted data from a rollup is valid and bot submitted by a bad actor ? The answer is cryptographic proofs, like validity proofs for ZK-rollups (ZKRU) and fraud proofs for Optimistic rollups (OR). Each rollup deploys a set of smart contracts on L1 Ethereum that are responsible for processing deposits/withdrawals and verifying the submitted proofs.
There is another kind of rollups called optimistic rollups
Optimistic rollups don't compute the transaction, so there needs to be a mechanism to ensure transactions are legitimate and not fraudulent. This is where fraud proofs come in. If someone notices a fraudulent transaction, the rollup will execute a fraud-proof and run the transaction's computation, using the available state data. This means you may have longer wait times for transaction confirmation than a ZK-rollup because the transaction could get challenged. But optimistic rollups suffer with similar problems as plasma (delayed exit)
What rollups does is
Block production is centralized, block validation is trustless and highly decentralized, and censorship is still prevented.
A very dreamy blog written on rollup by vitalik → [Ref]
Statelessness [Ref]
The Ethereum state size is growing quickly. It is currently around 35 GB for just the state and over 100 GB including all Merkle proofs, and is increasing by roughly half that amount per year. State storage is additionally a weak point in Ethereum’s economics: it is the only mechanism by which a participant can pay a cost once to burden consensus nodes forever. In order to maintain the scalability and sustainability of Ethereum, we need some solution.
Two paths to a solution exist, and have existed for a long time: weak statelessness and state expiry:
State expiry: remove state that has not been recently accessed from the state (think: accessed in the last year), and require witnesses to revive expired state. This would reduce the state that everyone needs to store to a flat ~20-50 GB.
Weak statelessness: only require block proposers to store state, and allow all other nodes to verify blocks statelessly. Implementing this in practice requires a switch to Verkle trees to reduce witness sizes.
How it works
The core idea is that there would be a state tree per period (think: 1 period ~= 1 year), and when a new period begins, an empty state tree is initialized for that period and any state updates go into that tree. All writes that happen during a period go into the latest tree (so new trees and old trees may store the same information or even conflict with each other; newer trees always take precedence).
Two key principles are maintained:
Only the most recent tree (ie. the tree corresponding to the current period) can be modified. All older trees are no longer modifiable; objects in older trees can only be modified by creating copies of them in newer trees, and these copies supersede the older copies.
Full nodes (including block proposers) are expected to only hold the most recent two trees, so only objects in the most recent two trees can be read without a witness. Reading older objects requires providing witnesses.
State expiry institutes a hybrid state regime: consensus nodes need to store state that was accessed or modified recently, but can use the witness-based stateless client approach to verify state that is older. That said, it is possible to maintain an “archive node” that stores even historical state trees, or a fully stateless node that uses witnesses to verify even recent state. However, the gas cost structure and the default network formats are built around the assumption that nodes are storing the most recent two state trees.
Layer 1 Scaling - Sharding [Ref]
One of the most amibitious project from ethereum is sharding the whole datalayer (and computing).
The easiest version of sharding to understand is sharding through random sampling. Sharding through random sampling has weaker trust properties than the forms of sharding that we are building towards in the Ethereum ecosystem, but it uses simpler technology.
The core idea is as follows. Suppose that you have a proof of stake chain with a large number (eg. 10000) validators, and you have a large number (eg. 100) blocks that need to be verified. No single computer is powerful enough to validate all of these blocks before the next set of blocks comes in.
Hence, what we do is we randomly split up the work of doing the verification. We randomly shuffle the validator list, and we assign the first 100 validators in the shuffled list to verify the first block, the second 100 validators in the shuffled list to verify the second block, etc. A randomly selected group of validators that gets assigned to verify a block (or perform some other task) is called a committee.
When a validator verifies a block, they publish a signature attesting to the fact that they did so. Everyone else, instead of verifying 100 entire blocks, now only verifies 10000 signatures - a much smaller amount of work, especially with BLS signature aggregation. Instead of every block being broadcasted through the same P2P network, each block is broadcasted on a different sub-network, and nodes need only join the subnets corresponding to the blocks that they are responsible for (or are interested in for other reasons).
Scalable verification of computation
We can break up the 51%-attack-proof scalable validation problem into two cases:
Validating computation: checking that some computation was done correctly, assuming you have possession of all the inputs to the computation
Validating data availability: checking that the inputs to the computation themselves are stored in some form where you can download them if you really need to; this checking should be performed without actually downloading the entire inputs themselves (because the data could be too large to download for every block)
Validating a block in a blockchain involves both computation and data availability checking: you need to be convinced that the transactions in the block are valid and that the new state root hash claimed in the block is the correct result of executing those transactions, but you also need to be convinced that enough data from the block was actually published so that users who download that data can compute the state and continue processing the blockchain. This second part is a very subtle but important concept called the data availability problem; more on this later.
Scalably validating computation is relatively easy; there are two families of techniques: fraud proofs and ZK-SNARKs.
Fraud proofs are a system where to accept the result of a computation, you require someone with a staked deposit to sign a message of the form "I certify that if you make computation C
with input X
, you get output Y
". You trust these messages by default, but you leave open the opportunity for someone else with a staked deposit to make a challenge (a signed message saying "I disagree, the output is Z"). Only when there is a challenge, all nodes run the computation. Whichever of the two parties was wrong loses their deposit, and all computations that depend on the result of that computation are recompute
Scalable verification of data availability is harder
And mere validity is not sufficient to ensure a correctly running blockchain. This is because if the blockchain is valid but all the data is not available, then users have no way of updating the data that they need to generate proofs that any future block is valid. An attacker that generates a valid-but-unavailable block but then disappears can effectively stall the chain. Someone could also withhold a specific user's account data until the user pays a ransom, so the problem is not purely a liveness issue.
Guess the solution, again Random Sampling with a clever logic.
If less than 50% of the data is available, then at least one of the checks will almost certainly fail, and if at least 50% of the data is available then, while some nodes may fail to recognize a block as available, it takes only one honest node to run the erasure code reconstruction procedure to bring back the remaining 50% of the block. ZK-SNARK can be used to verify that the erasure coding on a piece of data was done correctly
What is erasure code → Erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from a subset of the n symbols
Security in a sharded Blockchain
Invalid blocks cannot get through because either:
A fraud proof quickly catches them and informs the entire network of the block's incorrectness, and heavily penalizes the creator, or
A ZK-SNARK proves correctness, and you cannot make a valid ZK-SNARK for an invalid block.
Unavailable blocks cannot get through because:
If less than 50% of a block's data is available, at least one data availability sample check will almost certainly fail for each client, causing the client to reject the block,
If at least 50% of a block's data is available, then actually the entire block is available, because it takes only a single honest node to reconstruct the rest of the block.
That’s all there is to major ideas for scaling ethereum and blockchain in general.
Thanks for reading. I am glad you took out time from your busy life to read this.