The EVM From Scratch for Beginners
A vending machine, a thousand identical vending machines, and you
Imagine you walk up to a vending machine, put in a coin, press B4, and a bag of chips drops. The result will be a bag of chips because the machine is mechanical and deterministic. The same coin plus the same button always gives the same snack.
Now imagine a thousand copies of that vending machine, spread across the country, and you want to be sure that pressing B4 on every one of them at exactly the same moment gives exactly the same result. That part is easy. The hard part is the upgrade.
What if a programmer, somewhere in the world, can drop a new program into the machine, and that program takes your coin, runs some logic, and decides what to spit out? The new program might be a token swap, a vote tally, a betting market, or a tiny auction house. Whatever the program does, you need every copy of the machine to give the same answer. If even one disagrees, the system is broken. There is no cashier to call. There is no manager. The whole system stands or falls on agreement.
That, in one sentence, is the problem the EVM was built to solve.
The EVM, the Ethereum Virtual Machine, is the engine that lets thousands of independent computers all run the same program with the same inputs and all agree on the result, even though no one trusts anyone. This guide will walk you, from absolute first principles, through what the EVM is, how it works, why it works that way, and what to learn next.
We are not going to start with code. We are going to start with a problem, then peel the EVM open layer by layer.
The problem: how do strangers agree on a computation?
Before Ethereum, the most famous blockchain was Bitcoin. Bitcoin does one thing well. It tracks who owns which coins, using a clever data structure called a UTXO set. It cannot run arbitrary programs.
Ethereum, launched on July 30, 2015, took the next step. It made the blockchain a general-purpose computer. Anyone can publish a program, anyone can call it, and the network will run it for you.
But here is the catch. If you ask a thousand strangers' computers to run your program, you have to make sure:
- They all run the exact same code. Not "roughly the same". Exactly the same.
- They all start from the exact same state. Otherwise, even identical code can give different outputs.
- The program is bounded. If a program runs forever, it could lock up the network. Nobody wants that.
- The system is fair. No one can cheat by lying about what time it is, generating random numbers, or reading from a server only they own.
The EVM is a narrow, opinionated solution to all four of those constraints at once.
It is a virtual machine in the same sense as the Java Virtual Machine or a CPU emulator. It is a small, simulated computer with its own instruction set, its own memory model, and its own definition of what counts as a "step". Ethereum invented its own computer from scratch because the alternative, letting nodes run native code on their own CPUs, would have made the "exactly the same code" constraint nearly impossible to enforce. Native code can do anything. It can open a file, read a sensor, contact a server, leak data. The EVM forbids all of that by design.
The result is a computer that is, frankly, kind of dumb. It cannot talk to the internet. It cannot draw a picture. It cannot make a sound. It can only shuffle 256-bit numbers around, read and write to its own private memory, and update a small key-value database. That, it turns out, is exactly enough.
The core idea: the EVM is a function
Now that you have a feel for the problem, here is the entire EVM in one picture. If you remember nothing else from this guide, remember this:
The EVM is a function. You hand it the current world state and a signed transaction. It hands you back a new world state. Every node on Earth runs the exact same function with the exact same inputs. Every node gets the exact same output. That is the whole trick.

Let's unpack the picture.
- World State (left, orange): Imagine a giant spreadsheet that lists every account in the world. Each account's balance, its code if any, and the contents of its storage. That is the world state.
- Transaction (left, orange): A signed message from a user. It says, in effect, "I, account 0xABC, want to do something. I authorize spending up to X amount of fees. Here is the data."
- EVM (center, blue): The function itself. It takes the world state and the transaction, runs a deterministic computation, and produces a result. It is also gas-metered, which we will get to in a moment.
- New World State (right, green): The output. A new version of the spreadsheet, with some accounts' balances changed, some storage slots updated, maybe a new contract created, maybe a log emitted. Every node produces the same new world state, or the system rejects the block.
- The dashed feedback loop (bottom): The output becomes the next input. The new world state is the world state for the next transaction. That is what makes it a machine, not a one-shot computation.
The word deterministic is doing a lot of work in that sentence. It means: given the same inputs, the EVM always produces the same outputs. There is no randomness, no clock, no environment variable, no network call. If you ran the EVM a million times with the same inputs, you would get a million identical results.
This is also why every node can verify everything. They do not have to trust each other. They just run the function themselves and check whether their answer matches.
If you take nothing else away from this section, take the loop. Inputs on the left, function in the middle, output on the right, and the output becoming the next input. That is the heartbeat of the entire Ethereum network.
But what kind of function?
The EVM is a particular kind of function, and that particularity is what makes it both powerful and a little weird to work with.
The EVM is a stack machine
A computer has to keep track of numbers it is working with. There are two common ways to do this.
- Register machines: the kind of CPU inside your laptop. The CPU has a small number of named "registers", like
eax,rbx, and so on, and operations move values between them. - Stack machines: the EVM's choice. Instead of named registers, the EVM has a stack. A stack is a tall, narrow column of values. You push a value onto the top, an opcode (short for "operation code") pops a few values off the top, does something, and pushes the result back on top.
This is a bit like working with a stack of plates. You can only see and touch the top plate. To get to a plate lower in the stack, you have to remove the ones above it. Most programs end up pushing and popping dozens of values in a row.
Why does this matter to you as a beginner? Two reasons. First, the EVM's stack tops out at 1024 items. If a program tries to push a 1025th item, execution halts with a "stack overflow" error. Second, Solidity, the most popular smart contract language, has a quirk called "stack too deep". When a function has more than about 16 local variables in scope, the Solidity compiler throws an error, because the EVM has no place to keep all of them at once. This is a real limitation, and it is a direct consequence of the stack design.
The EVM is a 256-bit machine
Every value on the stack, every value in storage, every word in memory is 256 bits, or 32 bytes. That is huge. Most programming languages use 32 or 64 bits per number.
Why 256 bits? Two reasons. First, Ethereum uses 256-bit cryptographic primitives (Keccak-256 hashes, secp256k1 elliptic curve points, 256-bit addresses), so the EVM is sized to match. Second, when you are running financial code, having lots of headroom in your integers prevents overflow bugs. If a 64-bit integer is enough to count the grains of sand on Earth, a 256-bit integer is enough to count the atoms in a galaxy.
The cost is that even simple arithmetic uses 256 bits of storage and computation, which is more than strictly necessary. In a system already paying for cryptographic operations, the overhead is small.
The EVM has four places for data, plus one new one
During a single call, the EVM has four kinds of data location, plus a fifth that arrived in the most recent major upgrade. The first three are short-lived. They reset when the call ends. The fourth is permanent.
- Stack: the working scratchpad. We already met this. 256-bit words, max 1024 items.
- Memory: a flat, byte-addressable scratch space. Think of it as a giant array of bytes that starts empty at the beginning of every call. You can read or write one byte or one 32-byte word at a time. It is volatile. When the call ends, it is gone.
- Calldata: the read-only buffer that holds the call's arguments. When a user calls a contract, the bytes they sent become the contract's calldata. It is cheaper than memory for large inputs.
- Storage: the contract's persistent key-value database. A contract can have slots, each 256 bits wide. Whatever you write to storage stays there until the contract runs out of gas and reverts, or until someone deletes the contract. This is the only data location that survives between calls.
- Transient storage (since the Cancun hard fork in March 2024): a cheap, in-transaction alternative to storage. It behaves almost like regular storage, but it is wiped at the end of the transaction. It is perfect for reentrancy locks, intermediate values, and "save and restore" patterns that used to require expensive storage writes.
The EVM is "quasi-Turing-complete"
In computer science, Turing-complete is a name for a very specific idea. Back in 1936, a mathematician named Alan Turing described a hypothetical machine, now called a Turing machine, that could read symbols off an infinitely long tape, follow a small list of rules, and write new symbols back. It was deliberately simple. The remarkable claim was this. Given enough time and tape, that machine could compute anything any computer ever built can compute. There is no problem a modern laptop can solve that a Turing machine cannot also solve.
A programming language or virtual machine is called Turing-complete when it is powerful enough to simulate a Turing machine. Almost every general-purpose language you have ever heard of is Turing-complete. Python, JavaScript, C, Rust, Solidity, all of them. So is the EVM, on paper. You can write loops, conditionals, recursion, function calls. Everything you need to do general computation.
So why "quasi"? Because Turing's original model assumed an infinite amount of time and an infinite amount of memory. Real computers do not have those things, and the EVM is even more constrained. Every single instruction the EVM executes costs gas, and every transaction starts with a finite gas budget. If a program runs out of gas mid-execution, the EVM halts with an "out of gas" error and rolls back the transaction. So while the EVM is theoretically capable of computing anything, in practice every program is bounded by how much gas the user was willing to pay for.
That is the entire meaning of "quasi". A language that is theoretically as powerful as any computer, but with a hard, paid-for stop button attached. This is the most important design choice in the entire EVM. It is the reason the network cannot be killed by an infinite loop. It is also why you, as a developer, will spend so much time thinking about gas costs.
How it all fits together: the EVM as a small interpreter
Now let's tie the pieces together. Imagine you are writing a simple smart contract in Solidity that adds two numbers. The Solidity compiler turns that into EVM bytecode, which is a list of single-byte instructions. The interpreter inside every Ethereum node walks that list, one instruction at a time.
The basic loop looks like this:
while gas_remaining > 0:
op = code[pc] # read the next instruction
advance_pc(op) # move the program counter forward
cost = gas_cost(op, state) # how much gas does this op charge?
if gas_remaining < cost:
raise OutOfGas # bail out
deduct_gas(cost)
execute(op, stack, memory, state)
apply_refund(op) # SSTORE clear, etc.
if op.halts:
break
That is it. Every node on Earth runs this loop. Same code, same inputs, same gas rules, same result.
A few things to notice:
- The program counter (PC) is just an integer that points at the current instruction.
- Jumps (
JUMP,JUMPI,JUMPDEST) are how the EVM does loops and conditionals. The PC just gets set to a new value. - Halting can happen because of
STOP(success, no data),RETURN(success, returns memory),REVERT(failure, returns data, undoes the call frame's changes),SELFDESTRUCT(marks the contract for deletion),INVALID(an explicit "this should never run" opcode), or running out of gas.
Let's look at what a tiny piece of Solidity compiles to. Suppose you have:
uint x = 1 + 2;
This compiles to something like:
PUSH1 0x02 // push 2 onto the stack
PUSH1 0x01 // push 1 onto the stack
ADD // pop 1 and 2, push 1+2 = 3
POP // discard the result (we don't use it)
Each of those bytes is one opcode. The EVM's interpreter sees PUSH1, takes the next byte (0x02), and pushes the number 2 onto the stack. Then it sees another PUSH1, pushes 1. Then it sees ADD, pops 1 and 2 off the top, and pushes 3. Then it sees POP, discards the 3, and moves on.
To put a value into storage, you do:
PUSH1 0x2A // 42
PUSH1 0x00 // slot 0
SSTORE // write 42 to slot 0
To read it back:
PUSH1 0x00 // slot 0
SLOAD // push the value at slot 0 onto the stack
The EVM is a small stack machine. It walks a list of single-byte instructions. There are five places to put data, and a gas meter that ticks down with every step. That is the whole mental model.
The state: where the EVM's memory lives
If the EVM is the function, the state is its input and output. The state is what the function operates on. So we should know what it looks like.
Ethereum uses an account model, not the UTXO model that Bitcoin uses. In a UTXO model, your balance is the sum of unspent transaction outputs scattered across the ledger. In an account model, your account is a single object with a balance. Much simpler.
There are two kinds of accounts.
- Externally Owned Accounts (EOAs): controlled by a private key. They have a balance, a nonce (a counter that prevents replay attacks), and no code, no storage. EOAs are the only accounts that can start a transaction. Your MetaMask wallet is an EOA.
- Contract accounts: controlled by code. They have a balance, a nonce, code (their bytecode), and storage (a key-value database). They cannot start a transaction on their own. They only react when something else calls them.
Every account lives as a leaf in a giant, deterministic data structure called a Merkle Patricia Trie, the "world state trie". The root hash of this trie is committed into every block header. That single 32-byte root hash is a cryptographic fingerprint of the entire state. If even one account's balance changes by one wei, the root hash changes, and everyone notices.
A friendlier way to think about the Merkle Patricia Trie
If the name sounds intimidating, here is the small version of the idea.
Imagine a folder on your computer that holds 1,000 documents, and you want to prove to someone that a specific document in that folder has specific content, without showing them the other 999. The naive answer is to send them all 1,000 documents. That is wasteful. The clever answer is to put a fingerprint on every document, then a fingerprint on every pair of fingerprints, then a fingerprint on every pair of those, and so on, until you end up with a single fingerprint at the top. That single fingerprint is the Merkle root. If even one byte of one document changes, every fingerprint above it changes, and the root changes.
That is the essence of a Merkle tree. A pyramid of hashes that lets you prove a leaf's contents using only the few sibling hashes on the path up to the root, instead of the whole tree.
A Patricia Trie is a cleverer version of a regular trie, the data structure behind word-completion in your phone's keyboard. Instead of one node per character, the trie squashes long runs of single-child paths into a single edge. That is what makes Ethereum's state trie small enough to live in a block header.
A Merkle Patricia Trie is just those two ideas combined. A space-efficient trie where every node is also part of a Merkle tree, so the whole thing has a single 32-byte root hash that uniquely identifies the entire state. To prove "account 0xABC has balance 42", you only need to show the sibling hashes along the path from 0xABC to the root, plus a tiny bit of structure. The rest of the state is hidden.
That is the magic of Merkle trees. The state root is like a checksum of the entire world's bank ledger, but it is so good that you can also use it to prove that a specific account has a specific balance, with a short list of sibling hashes, without revealing any of the other accounts.
Each account in the world state has these fields:
nonce: the transaction counter for EOAs, or the contracts-created counter for contract accounts.balance: how many wei the account holds. (Wei is the smallest unit of ETH. One ETH equals wei.)storageRoot: the root hash of the account's own storage trie.codeHash: the hash of the account's bytecode, or a sentinel empty hash for EOAs.
For our purposes right now, the big takeaway is this. The EVM does not own its own state. The state is a giant data structure that the function reads and writes. The EVM's job is to take that data structure plus a transaction, and produce a new version of that data structure.
Transactions: the inputs to the function
A transaction is the message you send to the network. It is a small bundle of fields signed with your private key. Once signed, anyone in the world can verify that you wrote it.
There are several transaction formats floating around Ethereum today, but the most important to know is EIP-1559, the type-2 transaction that has been the default since the London hard fork went live on August 5, 2021.
A 1559 transaction has:
nonce: your EOA's current counter. Prevents the same transaction from being broadcast twice.to: the destination address, or null if you are deploying a new contract.value: how many wei to send.data(orinput): opaque bytes that become the contract's calldata.gasLimit: the maximum gas you authorize for this call.maxFeePerGasandmaxPriorityFeePerGas: the upper bounds for what you are willing to pay per unit of gas.- The signature itself.
Before any EVM code runs, the transaction itself consumes a baseline amount of gas. This is the intrinsic gas:
- 21,000 gas for a simple ETH transfer between EOAs.
- 4 gas per zero byte and 16 gas per non-zero byte of calldata.
- 32,000 gas extra if it is a contract creation, plus 2 gas per 32-byte chunk of initcode (EIP-3860, since Shanghai).
These charges exist so that sending an empty transaction is always cheap but never free, and so that bandwidth costs are paid by the sender.
The flow looks like this:
- The transaction arrives at a node.
- The node verifies the signature and recovers the sender's address.
- The node subtracts the max fee from the sender's balance and increments the sender's nonce.
- The EVM runs the transaction.
- The node refunds any unused gas, pays the priority fee to the block proposer, and burns the base fee.
If the EVM reverts or runs out of gas, the entire transaction is rolled back, but the gas is not refunded. This is the deterrent against spam. Even a failed transaction has to pay.
Gas: the heartbeat of the EVM
We have been mentioning gas for a while. Let us give it the attention it deserves.
Gas is the unit that prices every operation the EVM performs. It is the reason the network cannot be killed by an infinite loop. Every step costs something, and the user has to put a finite amount of money in the gas tank.
Gas does several jobs at once. It pays for CPU time. It pays for memory and storage. It pays for bandwidth. It caps the maximum amount of work a transaction can do. It makes denial-of-service attacks expensive.
Every opcode has a gas cost. A few representative ones:
| Opcode | What it does | Gas cost |
|---|---|---|
ADD | add two numbers | 3 |
MUL | multiply two numbers | 5 |
KECCAK256 | hash a word | 30 plus 6 per word of input |
SLOAD | read a storage slot, warm | 100 |
SLOAD | read a storage slot, cold | 2,100 |
SSTORE | write a storage slot, zero to non-zero | 20,000 |
LOG0 | emit a log with no topics | 375 plus 8 per byte of data |
CALL | call another contract | 100 to 25,000+, depending on what is transferred |
Notice how storage is way more expensive than arithmetic. That is by design. Storage is the only place data can outlive a transaction, so it is the bottleneck for the entire network's growth.
The EIP-1559 fee market
Since the London hard fork on August 5, 2021, Ethereum uses the EIP-1559 fee market. Every block has a base fee, set by the protocol. If the previous block was more than 50% full, the base fee goes up. If it was less than 50% full, the base fee goes down.
You, as the user, set:
maxFeePerGas: the absolute ceiling you will pay per unit of gas.maxPriorityFeePerGas: the tip to the block proposer.
The actual gas price you pay is min(maxFeePerGas, baseFee + maxPriorityFeePerGas). The base fee portion is burned, removing ETH from circulation. The priority fee goes to the block proposer.
This is why Ethereum transactions have a "suggested" gas price that fluctuates every block. You are not bidding for block space the old-fashioned way, like in a sealed-bid auction. You are paying the current base fee plus a small tip to the proposer.
Why burn the base fee at all?
This is one of the most-asked beginner questions, and the answer has three pieces.
The first is alignment. If block proposers, the validators who assemble each new block, got to keep the entire fee, they would have a strong incentive to include transactions that pay the highest fees, even if those transactions are spammy or unfair. They might also be tempted to collude with searchers who build specialized bots to game the system. Burning the base fee means the proposer's only direct reward is the priority tip, which keeps the focus of the network on throughput and fairness rather than on maximizing fee revenue.
The second is supply. Every time ETH is burned, a small amount of ETH permanently leaves circulation. If demand for blockspace is high, more ETH is burned than is issued to validators as a reward, and the net result is a deflationary effect on total supply. This is the opposite of what happens in a system that pays fees to validators, where the network effectively expands supply.
The third is stability of the price signal. Because the base fee is set by a deterministic rule (it goes up or down by a known percentage based on the previous block's fullness), the fee that a transaction ends up paying is predictable in the long run. Users do not have to guess what the next block's proposers will accept. They look at the current base fee, set a max ceiling they are comfortable with, and the protocol does the rest. Burning is what makes that deterministic rule possible. It ties the price of blockspace to a network-wide supply rule rather than to whatever a single proposer is willing to charge.
So when you see "base fee burned" in a block explorer, that is the protocol enforcing three ideas at once. It aligns proposers with users, leans against inflation, and makes the price of blockspace predictable.
Access lists and warm/cold storage
After the Berlin hard fork, Ethereum tracks which addresses and storage slots have already been touched in a transaction. The first access is "cold" and costs more. Subsequent accesses are "warm" and cost less. This is EIP-2929.
EIP-2930 lets you pre-declare a list of addresses and slots that you will touch, so the EVM can pre-warm them and charge you the warm price. A small but useful optimization.
Gas refunds
Some operations used to refund gas:
- Setting a storage slot to zero (a "clear") used to give a refund.
SELFDESTRUCTused to give a refund.
After EIP-3529, which shipped in the London hard fork, SELFDESTRUCT no longer refunds, and the SSTORE refund is capped at a fraction of the transaction's total gas. Refunds still exist, but they are no longer gameable. The reason for the change was simple. People were abusing the old refund rules to spam the network and bloat the state.
Opcodes: a tour of the EVM's vocabulary
The EVM has about 150 opcodes, which sounds like a lot but is small by computer standards. They fall into a few families.
Stop and arithmetic. STOP, ADD, SUB, MUL, DIV, SDIV, MOD, SMOD, ADDMOD, MULMOD, EXP, SIGNEXTEND.
Comparison and bitwise. LT, GT, SLT, SGT, EQ, ISZERO, AND, OR, XOR, NOT, BYTE, SHL, SHR, SAR.
Hashing. KECCAK256. That is the only hash opcode, but it is the only one you need. Every cryptographic primitive that is not exposed as a precompile eventually boils down to Keccak-256.
Environmental information. ADDRESS, BALANCE, ORIGIN, CALLER, CALLVALUE, CALLDATALOAD, CALLDATASIZE, CALLDATACOPY, CODESIZE, CODECOPY, GASPRICE, EXTCODESIZE, EXTCODEHASH, EXTCODECOPY, RETURNDATASIZE, RETURNDATACOPY, SELFBALANCE.
Block information. BLOCKHASH, COINBASE, TIMESTAMP, NUMBER, DIFFICULTY (which is now an alias for PREVRANDAO after The Merge), GASLIMIT, CHAINID, BASEFEE, BLOBHASH (Cancun), MCOPY (Cancun), and PUSH0 (Shanghai, which pushes a single zero on the stack and saves a byte of code).
Stack, memory, storage, flow. POP, MLOAD, MSTORE, MSTORE8, SLOAD, SSTORE, MSIZE, PUSH1..PUSH32, DUP1..DUP16, SWAP1..SWAP16, LOG0..LOG4, JUMP, JUMPI, JUMPDEST, PC, GAS, TLOAD, TSTORE (Cancun), MCOPY (Cancun).
System calls. CREATE, CALL, CALLCODE (deprecated), DELEGATECALL, CREATE2, STATICCALL, SELFDESTRUCT, RETURN, REVERT, INVALID.
Push. PUSH1 through PUSH32 push the next 1 to 32 immediate bytes onto the stack as a 256-bit word. This is the only way data enters the stack from code.
Duplication and exchange. DUP1..DUP16 duplicate the n-th stack item to the top. SWAP1..SWAP16 swap the top item with the n-th item.
Logging. LOG0 through LOG4 emit an event log. They take a memory offset, a length, and 0 to 4 topics from the stack. Topics are 32-byte words used as cheap indexing keys. The rest of the data goes into the log's data field.
A note on INVALID. This is one of the EVM's unsung safety features. Any time the interpreter hits a byte that is not a real opcode, it stops immediately, discards all state changes, and consumes all remaining gas. The same thing happens for JUMP to a non-JUMPDEST byte, or for stack underflow, or for division by zero. These "exceptional halts" are what keep the EVM from being abused by stuffing nonsense into bytecode.
A note on Pectra. The Pectra hard fork activated on May 7, 2025, and it added type-4 transactions, which let an EOA temporarily attach executable code to itself for the duration of a single transaction. From the EVM's point of view, the account gets a codeHash and a code field during that call frame. This is a step toward full account abstraction, and we will come back to it in the "What's Next" section.
Calls: how contracts talk to each other
The EVM has four opcodes for inter-contract communication, plus CREATE and CREATE2 for deployment.
CALL: the standard call. It runs the target contract's code in a fresh frame, with the target's storage and balance. The caller becomesmsg.sender, the value transferred becomesmsg.value, and the bytes the caller sent become the calldata.DELEGATECALL: runs the target's code in the caller's context. Storage, balance,msg.sender,msg.valueall stay the caller's. This is the engine behind the entire proxy pattern, which lets you "upgrade" a contract by pointing it at new code while keeping the same storage.STATICCALL: likeCALLbut read-only. If the called code tries to modify state, it throws. Solidity usesSTATICCALLforviewandpurefunctions.CALLCODE: a deprecated precursor toDELEGATECALL. Still exists in the EVM, but Solidity no longer surfaces it.
A call frame returns either a success flag and some return data, or a failure flag. If the call failed but the parent did not run out of gas, the parent can decide what to do next. The failed call's state changes are rolled back, but the parent keeps going.
There is a hard limit. The EVM allows at most 1024 nested call frames. If you somehow chain calls deeper than that, the 1024th call fails. This is a soft safety net against unbounded reentrancy, though real protection against reentrancy bugs is the contract author's responsibility.
Why DELEGATECALL exists, in plain English
DELEGATECALL is the trick that makes "upgradable" smart contracts possible, and it is worth a moment of extra attention because it is one of those things that sounds exotic until you see what problem it solves.
A smart contract's code is immutable once it is deployed. Whatever bytes the EVM stored at that address is what will run, forever. That is a feature, not a bug, because it is what makes the code auditable. But it is also a real headache. What happens when you find a bug, or when you need to add a feature? On a regular server, you would deploy a new version and tell everyone to use the new URL. On a blockchain, you cannot "tell" existing users anything, because every user has the old address stored somewhere. Their wallet, another contract, a sub-graph, a forum post, you name it.
The trick is to deploy your contract in two pieces:
- A small proxy contract at the address your users know. It holds all of the storage, and it has almost no logic of its own. All it does is forward every call to a second contract using
DELEGATECALL. - A logic contract somewhere else, with the actual code. The proxy's address is fixed. The logic contract's address can be changed by an admin.
When a user calls the proxy, here is what happens:
- The user calls
proxy.increment(). The EVM setsmsg.sender = user,msg.value = 0, and starts running the proxy's bytecode. - The proxy's bytecode immediately does
DELEGATECALLto the logic contract at address0xLogicV1. The EVM is told to load the bytecode from0xLogicV1, but keep running it in the proxy's storage and the proxy'smsg.sender. - The logic contract's
increment()function runs, reads slot 0 from the proxy's storage, adds 1, and writes the new value back to the proxy's storage. From the user's point of view, they called the proxy, and the proxy remembered the new value. - When you need an upgrade, you deploy a new logic contract at
0xLogicV2and tell the proxy to start delegating to it instead. The proxy's address does not change. The proxy's storage does not change. The user's saved address still works. The only thing that changed is which piece of code ran.
That is the proxy pattern. It is everywhere in DeFi, NFTs, and DAOs. It is also one of the most error-prone patterns in the entire EVM ecosystem, because the storage layout of the proxy has to match what the logic contract expects, and any mismatch can corrupt the data. When you see headlines about "a smart contract was hacked because of a storage collision", this is almost always what happened.
The reason DELEGATECALL is so dangerous is that it deliberately breaks the rule "each contract has its own storage". The EVM allows it because it is too useful to forbid, but it puts a serious responsibility on the contract author to keep two unrelated code paths in sync. Keep this in mind any time you see the words "proxy" or "upgrade" in a smart contract.
Memory, storage, and transient storage in practice
Let's go a little deeper into the data locations, because the gas costs and lifetimes have real consequences for how you write code.
Memory: volatile, byte-addressable, with a quadratic cost
Memory starts empty at the beginning of every call. It is byte-addressable, but MLOAD and MSTORE work in 32-byte words. MSTORE8 writes a single byte.
The cost of using memory is not free. Every time you touch a memory location at offset (in bytes), the EVM may have to expand memory. The expansion cost is roughly:
where . This is mostly linear for the first ~724 bytes and starts to grow quadratically after that. The quadratic part is intentional. It makes very large memory allocations expensive, so that programs do not casually use memory the way they would on a regular computer.
MSIZE returns the current memory size in bytes. There is no MFREE. Memory can only grow, and it disappears when the call ends.
Storage: persistent, key-value, expensive
Storage is the only data location that survives between calls. It is a key-value store with possible keys and possible values.
Reading a slot costs 100 gas if the slot is "warm" (touched earlier in this transaction) and 2,100 gas if it is "cold" (first access). Writing a slot ranges from 100 gas (a no-op) to 20,000 gas (going from zero to a non-zero value). That 20,000 is a real chunk of change, and it is why you should pack multiple small variables into a single slot when you can.
How slot packing works
Solidity's compiler, when it lays out your contract's storage variables, walks through them in the order they are declared and stuffs each one into the next available 32-byte slot. Smaller-than-32-byte types share a slot with whatever else fits, and a slot moves on to the next one only when there is no more room.
A few concrete examples make this clearer than any rule of thumb.
- Three
uint128values fit perfectly into one slot. They take up 16 bytes, 16 bytes, and 16 bytes, all in a single 32-byte slot. You have used one slot, and that slot costs 20,000 gas to write the first time, not 60,000. - A
uint128followed by auint64followed by auint32also fits in one slot. They occupy 16, 8, and 4 bytes, leaving 4 bytes unused. - A
uint256followed by abool: theuint256already takes the full 32 bytes of slot 0, so theboolcannot share that slot. Solidity moves on to slot 1 and puts theboolthere. - Two
uint256values: one in slot 0, one in slot 1, end of story. No packing possible.
The thing that trips up beginners is that Solidity does not let you control this layout directly. You cannot tell the compiler to put the bool into slot 0.5. The compiler decides, and the rule it follows is the one above. The order in which you declare your state variables has real, money-visible consequences. A contract that declares its variables in the right order can be many times cheaper to deploy and to write.
Storage packing is a Solidity convention, not an EVM rule. The EVM itself only knows about 256-bit keys and 256-bit values. It is the Solidity compiler that decides, for example, that a bool lives in the low-order byte of slot 1. If you change your contract's variable declarations, or upgrade the contract using a proxy, you can easily end up with two different versions of your code that disagree about which bytes of which slot belong to which variable. When that happens, reading the bool gives you nonsense. This is the same kind of upgrade pitfall as the DELEGATECALL storage-collision issue we discussed earlier, and it is one of the main reasons that "audited" smart contracts are so hard to evolve safely.
Mappings and dynamic arrays are stored using a hash function:
where is the slot where the mapping is declared. Arrays pack densely starting at the mapping's slot, and dynamic arrays store their length at that slot and their data starting at .
Transient storage: cheap, in-transaction
Since the Cancun hard fork on March 13, 2024, TSTORE and TLOAD give contracts a per-transaction, per-account key-value store. It looks like regular storage, but it is wiped at the end of the transaction. The gas costs are dramatically lower. A warm TLOAD is 100 gas, and TSTORE is 100 gas. So transient storage is perfect for:
- Reentrancy locks.
- Intermediate values in a multi-step computation.
- "Save and restore" patterns that previously needed two
SSTOREs.
The Cancun upgrade also added MCOPY (a memory copy opcode) and BLOBHASH (for blob transactions), as well as a new precompile for verifying blob commitments. Solidity 0.8.24, released on January 26, 2024, added native support for transient storage.
Contract creation: how code gets on-chain
When you deploy a smart contract, the network runs a one-off computation that produces the contract's runtime code and stores it at a new address.
The deployment transaction has to set to null and data set to the contract's init code. The init code runs in a special creation frame. It typically runs the Solidity constructor (which sets initial state variables and emits events), then ends with a RETURN that points at the memory region holding the runtime code. The runtime code is the bytecode that the contract will execute for the rest of its life.
The EVM takes the returned bytes, validates them (a quick check that they are not too long and have the right structure), and stores them as the contract's code.
There are two opcodes for creating contracts:
CREATE: the new contract's address is . The address depends on the deployer's full nonce history, so it is hard to predict in advance.CREATE2: the new contract's address is . With the same deployer, salt, and init code, you get the same address. This is the basis for channel factories, deterministic L2 bridges, and counterfactual instantiation.
There are a few important limits. The init code can be at most 49,152 bytes (EIP-3860, since Shanghai). The runtime code can be at most 24,576 bytes (EIP-170, since Spurious Dragon). Deploying a contract costs a base of 32,000 gas plus 2 gas per 32-byte chunk of initcode.
Constructor arguments are appended to the init code in the deployment transaction. This is why the same compiled contract deployed with different constructor arguments produces different on-chain bytecodes. The constructor arguments are baked into the data the EVM sees at creation time.
Precompiles: native code for hard problems
Most of the EVM is pure bytecode, executed by the interpreter. A handful of cryptographic primitives are too expensive to do that way, so Ethereum provides them as precompiles. Precompiles are native functions that look like contract calls. You call them at a fixed low address, but they are actually implemented in C++/Go/Rust in each client.
Why precompiles exist at all
If you have ever written a small program and realized that a particular operation is far too slow in your chosen language, the natural fix is to drop down to a faster language for that one operation and call it. The EVM ran into the same problem with cryptographic primitives. Hashing a large blob of data, recovering a public key from a signature, and pairing-based elliptic curve arithmetic are all slow enough in EVM bytecode that a single call could dominate the gas cost of an entire transaction.
Rather than ask every client developer to optimize the EVM interpreter to do these primitives in pure bytecode, the Ethereum designers took a different approach. They wrote the primitives once, in fast native code, and exposed them as if they were contracts at fixed, reserved addresses (0x01, 0x02, 0x03, and so on). When your contract does call(0x01, data), the EVM does not bother running any bytecode. It hands the data to the native function, the native function does the work, and the EVM hands the result back to you.
This is a kind of compromise. The EVM is supposed to be platform-independent bytecode so that any client can run it. Some operations are so expensive in bytecode that the network would be impractical without native help. Precompiles are the practical answer. Well-defined, deterministic, identical on every client, but written in the host language for speed.
As of the Cancun hard fork, there are ten precompiles:
| Address | Name | What it does |
|---|---|---|
| 0x01 | ECRECOVER | Recovers the signer of a hash from a signature |
| 0x02 | SHA256 | SHA-256 hash (not Keccak) |
| 0x03 | RIPEMD160 | RIPEMD-160 hash |
| 0x04 | IDENTITY | Memory copy |
| 0x05 | MODEXP | Modular exponentiation (RSA, SNARKs) |
| 0x06 | ECADD | BN254 elliptic curve point addition |
| 0x07 | ECMUL | BN254 elliptic curve point multiplication |
| 0x08 | ECPAIRING | BN254 pairing check (zkSNARK verification) |
| 0x09 | BLAKE2F | BLAKE2 compression (cross-chain bridges) |
| 0x0A | POINT_EVAL | KZG commitment evaluation (Cancun) |
The first four were there from the start. The BN254 precompiles (0x06-0x08) were added in Byzantium and repriced in Constantinople (EIP-1108). BLAKE2F was added in Istanbul. POINT_EVAL was added in Cancun for blob verification.
Precompile gas costs are calibrated to the actual CPU work, with input-dependent components. For example, MODEXP charges proportional to the size of the base, exponent, and modulus, multiplied by an iteration count. ECPAIRING charges 45,000 gas base plus 34,000 gas per input point, since the Constantinople repricing. If you use these carelessly, you can burn a lot of gas.
Events, logs, and reverts: talking to the outside world
A smart contract often needs to publish data to the outside world. Storage writes are expensive, and they are not optimized for off-chain discovery. The EVM has a separate, append-only data structure for this. The log.
There are five opcodes for emitting logs:
LOG0: no topicsLOG1: one topicLOG2: two topicsLOG3: three topicsLOG4: four topics
Each log has a memory offset and length, plus 0 to 4 topics. Topics are 32-byte words that off-chain indexers can use as cheap filters. The first topic is typically the event signature hash (e.g., keccak256("Transfer(address,address,uint256)")), and the rest are the values of indexed parameters in your Solidity event.
The cost is 375 gas, plus 375 gas per topic, plus 8 gas per byte of data. Logs are stored in the transaction's receipt trie, not in the world state trie. That is why they are not accessible from inside the EVM after the transaction ends. They are written, then they are gone from the EVM's perspective, but they are forever available to off-chain clients (via eth_getLogs, Etherscan, subgraphs, and so on).
When something goes wrong, the EVM can halt in several ways.
STOPorRETURN: successful, with optional return data.REVERT: failure, with return data. The current call frame's state changes are rolled back, but unused gas is refunded to the parent. The parent can catch the revert and continue.INVALID: an explicit "this should never run" opcode. Treats it as an exceptional halt, consumes all forwarded gas, rolls back state.SELFDESTRUCT: marks the contract for deletion at the end of the transaction and sends the balance to a target address. After EIP-6780 in Cancun,SELFDESTRUCTonly deletes the account if it is called in the same transaction as the contract's creation. In every other case, it only sends the balance. This was a security hardening, not a removal.- Out of gas: runs out of gas mid-execution. Consumes all remaining gas and rolls back the entire transaction.
A REVERT from a sub-call does not revert the whole transaction. It rolls back the sub-call's changes, and the parent can decide what to do. An out-of-gas in the topmost call frame rolls back everything.
This distinction is at the heart of a lot of security advice. A function that calls another contract and does not check the return value is a common vulnerability. The call might have failed, but if you do not check, you will keep going as if nothing happened.
Putting it together: a tiny end-to-end example
Let us walk through what happens when a user calls a simple smart contract. We will keep it short, but it touches almost every concept we have discussed.
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
contract Counter {
uint256 public count;
function increment() external {
count += 1;
}
}
A user calls increment(). Here is the journey.
- The user's wallet builds an EIP-1559 transaction.
to = 0xCounterAddress,data = 0x812600fc(the 4-byte selector forincrement()),value = 0,gasLimit = 50000,maxFeePerGas = 30 gwei,maxPriorityFeePerGas = 2 gwei. - The user's wallet signs the transaction with their private key.
- The user broadcasts the transaction. Every node that receives it verifies the signature, recovers the sender, checks the sender's balance and nonce, and deducts the maximum possible fee.
- The EVM starts a new call frame at the contract's address. The contract's bytecode begins executing. The first thing the bytecode does is decode the 4-byte selector and dispatch to the right function.
- The compiler-generated
increment()function compiles to roughly:SLOADslot 0 (read the current count)PUSH1 0x01ADDSSTOREslot 0 (write the new count)
- The EVM charges gas for each opcode.
SLOADon a cold slot costs 2,100,PUSH1costs 3,ADDcosts 3,SSTOREof a non-zero to non-zero slot costs 2,900. The intrinsic 21,000 and the calldata bytes (the 4-byte selector) bring the total to about 26,011 gas. - The EVM emits no logs, calls no other contracts, and runs to completion. The state trie is updated. The
countslot in the contract's storage trie has a new value, the storage root changes, and the world state root changes. - The EVM returns successfully. The block proposer includes the transaction, the new state root is committed to the block, and every other node verifies the work by running the same function and getting the same result.
That, in essence, is the entire Ethereum network in one tiny example. A signed message, a deterministic function, a gas-metered execution, a state change, and a cryptographic commitment to the new state.
What's next: the open questions
The EVM is a stable piece of engineering. Most of what we have discussed has been true since 2015 or shortly after, with incremental changes. A few things are being worked on right now that are worth knowing about. They are active research and engineering problems, not solved history.
- EOF (EVM Object Format) is a planned upgrade that would give bytecode a real container format with a clean separation between code and data, validated at deploy time, and a few new structured control-flow opcodes (
RJUMP,RJUMPI,RJUMPV). EOF was originally planned for Shanghai, then Cancun, but the community has not settled on a final shape. As of June 2025, it is still being designed and has not shipped to mainnet. - State growth is a long-standing problem. Every storage slot costs the network forever. Researchers are exploring Verkle trees, state expiry, and "rent" proposals, ways to charge contracts over time for the state they keep around.
- Single-threaded execution limits throughput. The EVM runs one transaction at a time, sequentially. There is active research into parallel EVM designs that could process independent transactions concurrently, but consensus on the right design is still forming.
- Account abstraction is the idea that user accounts should be smart contracts. They should be able to pay gas in tokens, batch transactions, recover from key loss, rotate keys, and so on. EIP-4337 (a non-protocol path) is live. Pectra's EIP-7702 is a partial in-protocol step that lets EOAs temporarily attach code. Full native account abstraction is still being designed.
- The 24,576-byte code size limit (set by EIP-170 in 2016) is too small for some applications. Layer-2 chains often raise it on their own. Mainnet has not changed it.
- Memory gas costs are still quadratic. Several EIPs have proposed switching to a linear model, but consensus is hard. As of June 2025, the model is still quadratic.
If you want to follow this work, the best places to look are the EIPs repository, the Ethereum Magicians forum, and the all-core-developers calls. They are public, and the discussions are unusually readable for an open-source project of this size.
What to learn next
If you have made it this far, you have a better mental model of the EVM than most people who write about it. Here is a short, practical reading order for the curious.
- The ethereum.org developer documentation for the EVM, opcodes, and gas. The reference pages are concise and well-maintained.
- The Ethereum Yellow Paper. Yes, the formal specification. The "Shanghai version" is the current one. It is dense, but you can skim the first 20 pages and get a lot out of it.
- evm.codes, an interactive reference for every opcode, with gas costs and stack effects. Bookmark it.
- The Mastering Ethereum book (the open-source edition on GitBook) has a long, readable chapter on the EVM.
- The EVM From Scratch tutorial at evm-from-scratch.app, a hands-on course that walks you through implementing a tiny EVM yourself.
- Foundry's
castandforge debug --vvvv, the most powerful practical tools for inspecting what the EVM is doing when you run a transaction. - RareSkills' articles on storage layout, precompiles, delegatecall, and access lists. They are concise and to the point.
The very best way to internalize all of this is to write a tiny contract, deploy it on a testnet, step through it in a debugger, and watch the stack, memory, and storage change opcode by opcode. Once you have done that once, the rest of the EVM is more of the same.
Welcome to the rabbit hole. The function is simple. The implications are not.
Sources
EVM fundamentals
- ethereum.org: "Ethereum Virtual Machine (EVM)" and "Opcodes for the EVM" reference pages.
- Ethereum Yellow Paper (Shanghai version, eefc5f9a, 2025-02-04).
- "Chapter 13: The Ethereum Virtual Machine" from Mastering Ethereum (cypherpunks-core edition).
- evm.codes: interactive opcode reference.
- crytic/evm-opcodes and wolflo/evm-opcodes: curated opcode lists.
- "Ethereum Virtual Machine Internals, Parts 1 and 2" by NetSPI (2024).
- leftasexercise.com: multi-part EVM walkthroughs by Petter Tornberg.
- EVM From Scratch: interactive tutorial at evm-from-scratch.app.
- Quicknode "Deep Dive" EVM guides.
- RareSkills articles on storage layout, precompiles, delegatecall, and access lists.
Gas, fees, and refunds
- EIP-1559, EIP-2929, EIP-2930, EIP-3529, EIP-3198, EIP-3651.
- Ethereum Yellow Paper Appendix G (gas costs).
- ethereum.org "Gas and fees" page.
- "Gas Refunds and Memory Expansion Cost" (PraneshASP blog).
- EIP-7686 (linear memory costs): proposal status, not yet adopted.
Hard forks and opcode additions
- EIPs/eip-3540, 3670, 3860, 3855, 4895, 1153, 4844, 5656, 7251, 7002, 7685, 7702.
- Beosin "Things to Know About the Ethereum Shanghai Upgrade" (2023).
- "Ethereum Evolved: Dencun Upgrade" (ConsenSys).
- "Dencun Upgrade: Transient Storage Opcodes in Solidity 0.8.24" (Solidity blog, 2024-01-26).
- "MCOPY, TLOAD and TSTORE in Cancun" (Peter McQuaid, 2024).
- "Ethereum Pectra Upgrade" (ConsenSys, 2025-05-06).
- "Prepare for EIP-7702 and the Ethereum Pectra Upgrade" (Alchemy, 2025-05-19).
Precompiles
- EIP-198 (BIGINT modular exponentiation), EIP-1108 (alt_bn128 precompile repricing), EIP-196, EIP-197, EIP-152 (BLAKE2), EIP-4844 (point evaluation).
- "Ethereum Mainnet Precompiled Contracts" (Moonbeam docs).
- "Ethereum Precompiled Contracts" (RareSkills, 2025).
- evm.codes precompiles page.
EOF
- EIP-3540, EIP-3670, EIP-4200, EIP-4750, EIP-5450.
- "The EVM Object Format (EOF) Upgrade Explained" (Ethereum Classic blog, 2023).
- "Features & Timeline Sketched for Shanghai Upgrade" (EtherWorld, 2022).
Events, logs, storage layout
- "Understanding event logs on the Ethereum blockchain" (MyCrypto, 2020).
- "Demystifying EVM Logs and Events" (Antematter, 2023).
- Solidity docs "Layout of State Variables in Storage" (current and historical versions).
- "Ethereum Virtual Machine: Storage Layout" (Steve Ng, 2022).
- RareSkills "Storage Slots in Solidity" (2025).
Contract creation
- "How do contract creation and constructors work in the EVM?" (LiamZ.co, 2023).
- "Ethereum Contract Creation Explained from Bytecode" (monokh.com).
- "Ethereum smart contract creation code" (RareSkills, 2025).
- "Demystifying CREATE2 and Permit2" (Medium, 2025).
EIP-7702
- EIP-7702 specification (ethereum/EIPs repository, 2024).
- "EIP-7702: Set Code for EOAs" (ethresear.ch and EIPs site).
Calls and execution context
- "Delegatecall: The Detailed and Animated Guide" (RareSkills, 2025).
- "Learn Solidity lesson 34. Call, staticcall and delegatecall" (Coinmonks, 2023).
- Stack Exchange threads on CALL vs. DELEGATECALL vs. STATICCALL vs. CALLCODE.