The Burden of Proof(s): Code Merkleization

A note about the Stateless Ethereum Initiative:

Research activity (understandably) slowed down in late 2020 as all contributors adapted to life in a strange timeline. However, as the ecosystem gradually approaches Serenity and his Eth1/Eth2 integration, stateless Ethereum efforts will become increasingly relevant and impactful. Expect a more robust year-end stateless Ethereum retrospective in the coming weeks.

Let’s look at the overview again: The ultimate goal of stateless Ethereum is requirements Always maintain a complete copy of an Ethereum node’s updated state trie, allowing state changes to instead rely on (much smaller) data to prove that a particular transaction is making a valid change. I’ll make it. Doing this will solve a big problem with Ethereum. This issue has so far only been exacerbated by improvements in client software. state growth.

The Merkle proofs required for stateless Ethereum are called “witnesses” and prove a change of state by providing all the information. No change Intermediate hashes needed to arrive at a new valid state root.In theory, the witness is much smaller than the full state of Ethereum (synchronization takes at most 6 hours), but still much bigger It takes less time than blocks (blocks must propagate throughout the network in just a few seconds). Therefore, reducing the size of the witness is paramount to minimizing the practicality of stateless Ethereum.

Much like the Ethereum state itself, much of the additional (digital) weight in the witness comes from smart contract code. If a transaction calls a specific contract, the witness must include the contract’s bytecode by default. as a whole together with the witness. Code merkelization is a common technique to reduce the burden of smart contract code on witnesses. This ensures that contract calls only need to include the bits of code that have been “touched” to prove validity. While this technique alone can significantly reduce witnesses, there are many details to consider when dividing smart contract code into byte-sized chunks.

What is bytecode?

There are some trade-offs to consider when splitting contract bytecode. The final question you need to ask is, “How big will the chunks of code be?” – But now let’s take a look at the actual bytecode of a very simple smart contract to understand what it is all about.

pragma solidity >=0.4.22 <0.7.0;

contract Storage {

    uint256 number;

    function store(uint256 num) public {
        number = num;
    }

    function retrieve() public view returns (uint256){
        return number;
    }
}

Once this simple storage contract is compiled, it turns into machine code that runs “inside” the EVM. Here we see the same simple storage contract as above, but with separate EVM instructions (opcodes).

PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH1 0xF JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x4 CALLDATASIZE LT PUSH1 0x32 JUMPI PUSH1 0x0 CALLDATALOAD PUSH1 0xE0 SHR DUP1 PUSH4 0x2E64CEC1 EQ PUSH1 0x37 JUMPI DUP1 PUSH4 0x6057361D EQ PUSH1 0x53 JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x3D PUSH1 0x7E JUMP JUMPDEST PUSH1 0x40 MLOAD DUP1 DUP3 DUP2 MSTORE PUSH1 0x20 ADD SWAP2 POP POP PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 RETURN JUMPDEST PUSH1 0x7C PUSH1 0x4 DUP1 CALLDATASIZE SUB PUSH1 0x20 DUP2 LT ISZERO PUSH1 0x67 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST DUP2 ADD SWAP1 DUP1 DUP1 CALLDATALOAD SWAP1 PUSH1 0x20 ADD SWAP1 SWAP3 SWAP2 SWAP1 POP POP POP PUSH1 0x87 JUMP JUMPDEST STOP JUMPDEST PUSH1 0x0 DUP1 SLOAD SWAP1 POP SWAP1 JUMP JUMPDEST DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP JUMP INVALID LOG2 PUSH5 0x6970667358 0x22 SLT KECCAK256 DUP13 PUSH7 0x1368BFFE1FF61A 0x29 0x4C CALLER 0x1F 0x5C DUP8 PUSH18 0xA3F10C9539C716CF2DF6E04FC192E3906473 PUSH16 0x6C634300060600330000000000000000

As explained in previous post,These opcode instructions are the basic operations of EVM’s,stack architecture. These define a simple storage contract and all the functionality it contains. This contract is an example of a Solidity contract. Remix IDE (Note that the machine code above is an example of storage.sol. after it has already been deployed(not the output of the Solidity compiler, which includes additional “bootstrap” opcodes). If you unfocus your eyes and imagine a physics stack machine jerking around with step-by-step calculations on opcode cards, you’ll see that in the blur of the moving stack you’ll see most of the outlines of the functions placed in the Solidity contract. You can see it.

Whenever a contract receives a message call, this code runs within every Ethereum node to validate a new block on the network. Currently, sending a valid transaction on Ethereum requires a complete copy of the contract’s bytecode. Because running that code from start to finish is the only way to get the (deterministic) output state and associated hash.

Recall that stateless Ethereum aims to change this requirement.Let’s say all you want to do is call a function get() Nothing more. The logic that describes that function is only a subset of the overall contract, and in this case, EVM only actually needs two of them. basic block Number of opcode instructions to return the required value:

PUSH1 0x0 DUP1 SLOAD SWAP1 POP SWAP1 JUMP,

JUMPDEST PUSH1 0x40 MLOAD DUP1 DUP3 DUP2 MSTORE PUSH1 0x20 ADD SWAP2 POP POP PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 RETURN

In the stateless paradigm, just as the witness provides the missing hash of the untouched state, the witness must also provide the missing hash of the unexecuted portion of the machine code. This allows stateless clients to only need the part of the contract that is running. .

witness to the norm

Ethereum smart contracts exist co-located with externally owned accounts, i.e. as leaf nodes of a giant single-rooted state trie. In many ways, contracts are no different than externally owned accounts used by humans. They have addresses, can send transactions, and can hold balances in Ether and other tokens. However, contract accounts are special because they must contain your own program logic (code) or a hash of it.Another related one called the Markle-Patricia try. storage try Maintains variables or persistent state that active contracts use to perform their duties during execution.

This witness visualization clearly shows how important code markization is in reducing witness size. See that huge blob of colored squares? How much bigger is it than all the other elements in the try? This corresponds to his one full serving of smart contract bytecode.

Next to it and a little below it is the permanent state part. storage try, ERC20 balance mapping and ERC721 digital item ownership manifest. Since this is an example of a witness and not a complete state snapshot, these too are mostly created with intermediate hashes, containing only the changes necessary for a stateless client to prove the next block.

Merkleization of code is aimed at breaking up large chunks of code and replacing fields. code hash Within an Ethereum account with a separate Merkle Try root, the aptly named code try.

worth the hash weight

Let’s take an example from This Ethereum Engineering Group videoAnalyze several methods of code chunking using . ERC20 token contract. Many of the tokens you’ve heard of are made according to the ERC-20 standard, so this is a great real-world context for understanding code Merkleization.

Bytecodes are long and unwieldy, so let’s use a simple shorthand that replaces the 4-byte code (8 hexadecimal characters) with one of the following: . or X letter. The latter represents the bytecode required to execute a particular function (in this example, ERC20.transfer() functions are used throughout).

For the ERC20 example, transfer() This function uses just under half of the entire smart contract.

XXX.XXXXXXXXXXXXXXXXXX..........................................
.....................XXXXXX.....................................
............XXXXXXXXXXXX........................................
........................XXX.................................XX..
......................................................XXXXXXXXXX
XXXXXXXXXXXXXXXXXX...............XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX..................................
.......................................................XXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX..................................X
XXXXXXXX........................................................
....

If you want to break that code into 64-byte chunks, you only need 19 of the 41 chunks to run your stateless code. transfer() The remaining necessary data will be obtained from the witness.

|XXX.XXXXXXXXXXXX|XXXXXX..........|................|................
|................|.....XXXXXX.....|................|................
|............XXXX|XXXXXXXX........|................|................
|................|........XXX.....|................|............XX..
|................|................|................|......XXXXXXXXXX
|XXXXXXXXXXXXXXXX|XX..............|.XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXXX
|XXXXXXXXXXXXXXXX|XXXXXXXXXXXXXX..|................|................
|................|................|................|.......XXXXXXXXX
|XXXXXXXXXXXXXXXX|XXXXXXXXXXXXX...|................|...............X
|XXXXXXXX........|................|................|................
|....

Compare this to 31 of 81 chunks in the 32-byte chunking scheme.

|XXX.XXXX|XXXXXXXX|XXXXXX..|........|........|........|........|........
|........|........|.....XXX|XXX.....|........|........|........|........
|........|....XXXX|XXXXXXXX|........|........|........|........|........
|........|........|........|XXX.....|........|........|........|....XX..
|........|........|........|........|........|........|......XX|XXXXXXXX
|XXXXXXXX|XXXXXXXX|XX......|........|.XXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX
|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXX..|........|........|........|........
|........|........|........|........|........|........|.......X|XXXXXXXX
|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXX...|........|........|........|.......X
|XXXXXXXX|........|........|........|........|........|........|........
|....

On the surface, it appears that small chunks are more efficient than large chunks. almost empty Chunking is less frequent. However, you must remember here that unused code also has a cost. Each unexecuted chunk of code is replaced with the next hash. fixed size. The smaller the code chunk, the more hashes of unused code there are, and each of those hashes can be up to 32 bytes (or down to 8 bytes). At this point you might be crying, “Wait a minute! If the hash of a code chunk is a standard size 32 bytes, what happens if I replace the 32 byte code with a 32 byte hash?!”

The contract code is Markledmeans all hashes are linked by . code try — The root hash that the block needs to be verified against. In that structure, series Unexecuted chunks require only one hash, regardless of the number. This means that you can use one hash instead of a potentially large limb full of sequential chunk hashes on a Merkleized code try, unless you have the hashes needed to run your code.

Additional data needs to be collected

The conclusion we arrive at is a bit anticlimactic. There is no theoretically “best” scheme for Merkleizing code.Design choices such as fixed size of code chunks and hashes Relies on data collected about the “real world”. Since the Merkleization of all smart contracts is different, it is the researcher’s burden to choose the format that provides the greatest efficiency gains in observed mainnet activity. What exactly does that mean?

overhead

One thing that may indicate how efficient the code Merkleization scheme is Marklization overheadThis answers the question, “How much additional information does this witness contain beyond the executed code?”

already have some promising resultscollected using dedicated tools Developed by Horacio Mijail from the TeamX research team at Consensys, it has been shown to have a low overhead of 25%, which isn’t bad at all.

In other words, this data shows that overall smaller chunk sizes are more efficient than larger chunk sizes, especially when small hashes (8 bytes) are used. However, these early numbers are by no means comprehensive, as they represent only about 100 recent blocks. If you are reading this and are interested in collecting more substantive code merkleization data and contributing to the Stateless Ethereum initiative, please contact us on the ethresear.ch forum or in the #code-merkleization channel on the Eth1x/2 Research Discord. Please introduce.

As always, if you have any questions, feedback, or requests related to “The 1.X Files” and Stateless Ethereum, please contact us via DM or @gichiba on Twitter.

Related Article

0 Comments

Leave a Comment