Smart Contract Development

Solady's ERC1967Factory - A Deep Dive

The Solady repo has a super duper optimized ERC1967 factory contract created by jtriley. Let's take a deep dive into it!

gmhacker

24 Jul 2023 • 23 min read

The Solady repo has a super duper optimized ERC1967 factory contract created by jtriley. Let's take a deep dive into it.

sky diving to the center of the Earth — We go deep

Custom Errors

/*´:°•.°+.*•´.*:˚.°*.˚•´.°:°•.°•.*•´.*:˚.°*.˚•´.°:°•.°+.*•´.*:*/
/*                       CUSTOM ERRORS                        */
/*.•°:°.´+˚.*°.˚:*.´•*.+°.•°:´*.´•*.•°.•°:°.´:•˚°.*°.˚:*.´+°.•*/

/// @dev The caller is not authorized to call the function.
error Unauthorized();

/// @dev The proxy deployment failed.
error DeploymentFailed();

/// @dev The upgrade failed.
error UpgradeFailed();

/// @dev The salt does not start with the caller.
error SaltDoesNotStartWithCaller();

There ain't much to say about this part, we will see the custom errors in their places. But I did want to say a word about custom errors in Solidity. The reason why optimized smart contracts tend to use custom errors is that it is much cheaper than the alternative of passing a string to a require or revert statement.
To better illustrate this, here's an example of a Yul function for reverting with a string:

function revertError(errLength, errData) {
    mstore(0x00, 0x08c379a0)  // function selector for Error(string)
    mstore(0x20, 0x20)  // string offset
    mstore(0x40, errLength)  // length
    mstore(0x60, errData)  // data  
    revert(0x1c, sub(0x80, 0x1c))  // starts in the selector (28bytes) 
}

A string is certainly a painful type in memory. Notice that this gets encoded as if we were calling a function with signature Error(string). So we do have a ton of mstore opcodes going on, as well as a significant amount of bytes getting passed to the revert opcode.
When you use a custom error, the signature itself tells us the error information, so there's no need for an error string. If we were to, for example, revert with ERC1967Factory's Unauthorized error, it would look something like this:

// bytes4(keccak256(bytes("Unauthorized()")))
mstore(0x00, 0x82b42900)
revert(0x1c, 0x04)

As you can see, 1 mstore, and only 4 bytes getting passed to revert. Optimizoooooors!

avengers assemble meme — I bet Vectorized could lift Mjolnir

Then we have the constants pertaining to the custom errors. These are just the corresponding selectors.

/// @dev `bytes4(keccak256(bytes("Unauthorized()")))`.
uint256 internal constant _UNAUTHORIZED_ERROR_SELECTOR = 0x82b42900;

/// @dev `bytes4(keccak256(bytes("DeploymentFailed()")))`.
uint256 internal constant _DEPLOYMENT_FAILED_ERROR_SELECTOR = 0x30116425;

/// @dev `bytes4(keccak256(bytes("UpgradeFailed()")))`.
uint256 internal constant _UPGRADE_FAILED_ERROR_SELECTOR = 0x55299b49;

/// @dev `bytes4(keccak256(bytes("SaltDoesNotStartWithCaller()")))`.
uint256 internal constant _SALT_DOES_NOT_START_WITH_CALLER_ERROR_SELECTOR = 0x2f634836;

This is just another typical easy optimization.

Constants get hardcoded into the bytecode, so here we are not taking up any storage.
We could be computing the selector value and not hardcode it. But since those values will never change, we might as well just hardcode them, which in turns saves gas in the contract deployment.

By the way, when devs/auditors look at a specific function, it's more readable to see something like _UNAUTHORIZED_ERROR_SELECTOR, which tells you exactly what it is, than to see 0x82b42900 just dropped out of nowhere. Besides being way easier to change the value in a single place in case of an error.
Anyways come on this is just beginner level dev skills, you should know this! If you don't, I mean, what the hell are you doing??

Idiot sandwich Gordon Ramsey meme — Repeat after me: "I'm not an idiot sandwich, I read Uncle Bob's *Clean Code*"

Events

/*´:°•.°+.*•´.*:˚.°*.˚•´.°:°•.°•.*•´.*:˚.°*.˚•´.°:°•.°+.*•´.*:*/
/*                           EVENTS                           */
/*.•°:°.´+˚.*°.˚:*.´•*.+°.•°:´*.´•*.•°.•°:°.´:•˚°.*°.˚:*.´+°.•*/

/// @dev The admin of a proxy contract has been changed.
event AdminChanged(address indexed proxy, address indexed admin);

/// @dev The implementation for a proxy has been upgraded.
event Upgraded(address indexed proxy, address indexed implementation);

/// @dev A proxy has been deployed.
event Deployed(address indexed proxy, address indexed implementation, address indexed admin);

/// @dev `keccak256(bytes("AdminChanged(address,address)"))`.
uint256 internal constant _ADMIN_CHANGED_EVENT_SIGNATURE =
    0x7e644d79422f17c01e4894b5f4f588d331ebfa28653d42ae832dc59e38c9798f;

/// @dev `keccak256(bytes("Upgraded(address,address)"))`.
uint256 internal constant _UPGRADED_EVENT_SIGNATURE =
    0x5d611f318680d00598bb735d61bacf0c514c6b50e1e5ad30040a4df2b12791c7;

/// @dev `keccak256(bytes("Deployed(address,address,address)"))`.
uint256 internal constant _DEPLOYED_EVENT_SIGNATURE =
    0xc95935a66d15e0da5e412aca0ad27ae891d20b2fb91cf3994b6a3bf2b8178082;

The event declaration is pretty standard here. All events are using indexed parameters, which allows better event search filtering for off-chain applications. The events signatures are being hardcoded as well into constants because they will be emitted using inline assembly (Yul), and log opcodes require the event signature.

OK, last part before dope functions.

Storage

/*´:°•.°+.*•´.*:˚.°*.˚•´.°:°•.°•.*•´.*:˚.°*.˚•´.°:°•.°+.*•´.*:*/
/*                          STORAGE                           */
/*.•°:°.´+˚.*°.˚:*.´•*.+°.•°:´*.´•*.•°.•°:°.´:•˚°.*°.˚:*.´+°.•*/

// The admin slot for a `proxy` is given by:
// ```
//     mstore(0x0c, address())
//     mstore(0x00, proxy)
//     let adminSlot := keccak256(0x0c, 0x20)
// ```

/// @dev The ERC-1967 storage slot for the implementation in the proxy.
/// `uint256(keccak256("eip1967.proxy.implementation")) - 1`.
uint256 internal constant _IMPLEMENTATION_SLOT =
    0x360894a13ba1a3210667c828492db98dca3e2076cc3735a920a3ca505d382bbc;

The implementation slot is the same as the one specified in the EIP-1967, and it is the keccak256 of the string "eip1967.proxy.implementation"... MINUS 1! As the EIP says, this "offset is added so the preimage of the hash cannot be known, further reducing the chances of a possible attack." There was actually a pretty interesting Twitter discussion started by devtooligan on the actual relevance of this "minus 1" for storage slots, it's a good read.
I'll link the answer to the discussion, which also links to the original discussion thread.

ANSWERED

tldr;

1 preimage makes it slightly easier to find a 2nd preimage.

with a 2nd preimage, a malicious dev could design an innocent looking logic contract that overwrites an existing storage slot

pretty far-fetched but its free to subtract 1 so why not

thx @0xfoobar https://t.co/Os7uSv93I4
— devtooligan (@devtooligan) February 6, 2023

Going back to the ERC1967Factory, we see that jtriley explains that the admin address of a proxy also gets stored, with the storage slot number being the hash of the concatenation of the executing contract's address (opcode address) with the proxy address. Cool stuff.

Proxy Admin functions

Ah yes. On to the functions, finally. We start by looking at the ones dealing with the proxy admin addresses.

/// @dev Returns the admin of the proxy.
function adminOf(address proxy) public view returns (address admin) {
    /// @solidity memory-safe-assembly
    assembly {
        mstore(0x0c, address())
        mstore(0x00, proxy)
        admin := sload(keccak256(0x0c, 0x20))
    }
}

Function adminOf

This is a view function responsible for fetching the right admin address when specifying a certain proxy address. The concatenation hash logic was explained above, and the assembly code is almost identical to the comment example provided by the author in the STORAGE section. Here we use the sload opcode to fetch the value we want.

What happens if we pass an incorrect proxy address? The computed storage slot will for sure have nothing stored (no hash collision), so the zero address will be returned. This is better than reverting with wrong inputs inside a view function, which is considered an anti-pattern.

A word should be said about the comment // @solidity memory-safe-assembly. This is to prevent the Solidity compiler from turning off certain optimizations in the presence of an inline assembly block, as explained by Solidity docs. Here's the gist of:

While we recommend to always respect Solidity’s memory model, inline assembly allows you to use memory in an incompatible way. Therefore, moving stack variables to memory and additional memory optimizations are, by default, globally disabled in the presence of any inline assembly block that contains a memory operation or assigns to Solidity variables in memory.
However, you can specifically annotate an assembly block to indicate that it in fact respects Solidity’s memory model (...).

If you wanna better understand when you should use this memory-safe annotation, check this helpful discussion on the Solidity forum prompted by a question from Paul Berg.

Meryl Streep let's move on meme — Thanks, Meryl. Let's go to the next function.

/// @dev Sets the admin of the proxy.
/// The caller of this function must be the admin of the proxy on this factory.
function changeAdmin(address proxy, address admin) public {
    /// @solidity memory-safe-assembly
    assembly {
        // Check if the caller is the admin of the proxy.
        mstore(0x0c, address())
        mstore(0x00, proxy)
        let adminSlot := keccak256(0x0c, 0x20)
        if iszero(eq(sload(adminSlot), caller())) {
            mstore(0x00, _UNAUTHORIZED_ERROR_SELECTOR)
            revert(0x1c, 0x04)
        }
        // Store the admin for the proxy.
        sstore(adminSlot, admin)
        // Emit the {AdminChanged} event.
        log3(0, 0, _ADMIN_CHANGED_EVENT_SIGNATURE, proxy, admin)
    }
}

Function changeAdmin

The purpose of this function is, well, to change the admin of a given proxy. And yes, all functions are essentially in assembly. Here's what happens here:

We concatenate the contract address and the proxy address in question, hash it and read from storage the current admin value.
If the current admin address is equal to the caller address (aka msg.sender in Solidity, aka caller() in Yul), the iszero opcode will return 0 and the if statement will be skipped. When this does not happen, it means someone other than the current admin is trying to change this value. We cannot allow this, so the execution gets reverted with an Unauthorized error (remember the way we revert using custom error signatures in assembly).
At this point, we are certain the caller is the current admin. So we allow the current slot value to be changed to a new admin address. Notice there are no checks against this input, which means that it can also be the zero address. This might be a design choice (gas savings), though it would probably be flagged in an audit.
Finally, we enit the AdminChanged event. The log3 opcode gets 5 parameters:
- the memory offset - this is where the event data (i.e. not indexed event arguments) is supposed to be in memory
- byte size to copy from memory - it would be the length of the event data to be copied from memory. Here we only have indexed arguments, so no data.
- topic 1 - it's the first topic of the event. Events can only have up to 4 topics, and the first is the event signature (except for anonymous events).
- topic 2 - this is the first indexed argument of the event, which is the proxy address
- topic 3 - the second indexed argument, i.e. the new admin address.

This completes the admin functions. Now we're done with baby soup, let's eat some hard-to-chew meat.

salt lake city guy slaps meat — I'm not an expert, but that meat don't care about those slaps

ERC1967Proxy bytecode

Ok now it's the real deal. The next functions in the factory contract are the upgrade functions used for upgrading the implementation address of a given proxy. However, they have very specific logic suited for the ERC1967Proxy bytecode that gets deployed by this factory. And because of that we will be looking at that code first. So here it goes.

0x3d3d336d6396ff2a80c067f99b3d2ab4df2414605157363d3d37363d7f360894a13ba1a3210667c828492db98dca3e2076cc3735a920a3ca505d382bbc545af43d6000803e604c573d6000fd5b3d6000f35b3d3560203555604080361115604c5736038060403d373d3d355af43d6000803e604c573d6000fd

As you can see, this is pretty straightforward. You do the work.

it's fine fire meme — It's fine... It's all fine.

No, let's actually jump into it. Also, luckily for us, jtriley left a very helpful table explaining the proxy bytecode. So let's go through it.

* -------------------------------------------------------------------------------------+
* CREATION (9 bytes)                                                                   |
* -------------------------------------------------------------------------------------|
* Opcode     | Mnemonic        | Stack               | Memory                          |
* -------------------------------------------------------------------------------------|
* 60 runSize | PUSH1 runSize   | r                   |                                 |
* 3d         | RETURNDATASIZE  | 0 r                 |                                 |
* 81         | DUP2            | r 0 r               |                                 |
* 60 offset  | PUSH1 offset    | o r 0 r             |                                 |
* 3d         | RETURNDATASIZE  | 0 o r 0 r           |                                 |
* 39         | CODECOPY        | 0 r                 | [0..runSize): runtime code      |
* f3         | RETURN          |                     | [0..runSize): runtime code      |

I know, even the comments from the wizard look beautiful. Anyways, we start off with the creation bytecode, a part I left out from the example bytecode above, since I just copied that bytecode from one of the deployed proxies and the creation part of it does not get stored on the blockchain.

This is essentially as simple as the creation bytecode can be. The purpose of it is just to return the entire contract bytecode from memory.

The runSize value is the length in bytes of the contract's runtime bytecode - what will be deployed on the blockchain - and the offset is where the memory copy of the bytecode should start, i.e. after the creation bytecode.
The codecopy opcode copies to memory the entire code that is being run starting in the offset value (i.e. only the runtime bytecode, if we pass the right value).
Finally, all those bytes get returned to be stored on the blockchain.

Now let's start with the runtime bytecode analysis.

* -------------------------------------------------------------------------------------|
* RUNTIME (127 bytes)                                                                  |
* -------------------------------------------------------------------------------------|
* Opcode      | Mnemonic       | Stack               | Memory                          |
* -------------------------------------------------------------------------------------|
*                                                                                      |
* ::: keep some values in stack :::::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 3d          | RETURNDATASIZE | 0                   |                                 |
* 3d          | RETURNDATASIZE | 0 0                 |                                 |
*                                                                                      |
* ::: check if caller is factory ::::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 33          | CALLER         | c 0 0               |                                 |
* 73 factory  | PUSH20 factory | f c 0 0             |                                 |
* 14          | EQ             | isf 0 0             |                                 |
* 60 0x57     | PUSH1 0x57     | dest isf 0 0        |                                 |
* 57          | JUMPI          | 0 0                 |                                 |

We start by adding 2 0x00 values to the stack which are going to be used throughout the rest of the bytecode logic. This is cheaper (and smaller in the bytecode) than doing PUSH1 0x00, hence why it's being done here. We now have push0 on Ethereum, but this was developed before that change (besides, other EVM chains don't have it).
We fetch the caller address and we push the factory address into the stack as well. Notice that the factory address will be hardcoded into the proxy bytecode. Another interesting aspect, as we will later see in the actual assembly that builds this bytecode, is that we might actually have a push14 instead and thus decrease the proxy bytecode size. We will see why later on (I'm trying to increase retention here, continue reading!!!).
The jumpi opcode is a conditional jump. If the caller is the factory, then we set the program counter to 0x57, an offset in the bytecode where we should find a jumpdest instruction (otherwise it will fail). If the caller is NOT the factory, then this jump does not occur, in which case we just continue to the next section of the bytecode.

* ::: copy calldata to memory :::::::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 36          | CALLDATASIZE   | cds 0 0             |                                 |
* 3d          | RETURNDATASIZE | 0 cds 0 0           |                                 |
* 3d          | RETURNDATASIZE | 0 0 cds 0 0         |                                 |
* 37          | CALLDATACOPY   | 0 0                 | [0..calldatasize): calldata     |
*                                                                                      |
* ::: delegatecall to implementation ::::::::::::::::::::::::::::::::::::::::::::::::: |
* 36          | CALLDATASIZE   | cds 0 0             | [0..calldatasize): calldata     |
* 3d          | RETURNDATASIZE | 0 cds 0 0           | [0..calldatasize): calldata     |
* 7f slot     | PUSH32 slot    | s 0 cds 0 0         | [0..calldatasize): calldata     |
* 54          | SLOAD          | i cds 0 0           | [0..calldatasize): calldata     |
* 5a          | GAS            | g i cds 0 0         | [0..calldatasize): calldata     |
* f4          | DELEGATECALL   | succ                | [0..calldatasize): calldata     |

The calldatasize opcode pushes into memory, well, the calldata size. The calldatacopy opcode will then be used to copy the entire calldata bytes into memory offset 0x00.
We now see an sload being done to read something from storage. Noteworthy, the slot value is hardcoded, which will actually be the standard implementation slot for ERC1967Proxy.
Together with all the zeros put into the stack and the existing gas to forward (shut up, I know about EIP-150), we do a delegatecall with the exact same calldata. This is a proxy, after all. Note, though, that we only reach this delegatecall part of the code IF the caller is NOT the factory. This follows the pattern of not allowing calls from the address with upgrade rights to get forwarded to the implementation.

* ::: copy returndata to memory :::::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 3d          | RETURNDATASIZE | rds succ            | [0..calldatasize): calldata     |
* 60 0x00     | PUSH1 0x00     | 0 rds succ          | [0..calldatasize): calldata     |
* 80          | DUP1           | 0 0 rds succ        | [0..calldatasize): calldata     |
* 3e          | RETURNDATACOPY | succ                | [0..returndatasize): returndata |
*                                                                                      |
* ::: branch on delegatecall status :::::::::::::::::::::::::::::::::::::::::::::::::: |
* 60 0x52     | PUSH1 0x52     | dest succ           | [0..returndatasize): returndata |
* 57          | JUMPI          |                     | [0..returndatasize): returndata |
*                                                                                      |
* ::: delegatecall failed, revert :::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 3d          | RETURNDATASIZE | rds                 | [0..returndatasize): returndata |
* 60 0x00     | PUSH1 0x00     | 0 rds               | [0..returndatasize): returndata |
* fd          | REVERT         |                     | [0..returndatasize): returndata |

We use returndatacopy to pass the delegatecall's return data to memory. This can either be a normal data response or a returned error data. If you are wondering why the optimizooor man is using push1 instead of returndatasize to have a 0x00 on the stack, that's because it's no longer certain returndatasize returns us 0, since now it has the size of the delegatecall's return data. Yeah, push0 would have been dope here...
If the delegatecall was successful, we jump to the 0x52 bytecode offset, where hopefully we will find another jumpdest. Yes, I too miss Huff's jump labels.
If we don't jump, it means the delegatecall was not successful, so we revert the execution by passing the data we got returned previously. This bubbles up the error that was raised/thrown/whatever during the delegatecall.

* ::: delegatecall succeeded, return ::::::::::::::::::::::::::::::::::::::::::::::::: |
* 5b          | JUMPDEST       |                     | [0..returndatasize): returndata |
* 3d          | RETURNDATASIZE | rds                 | [0..returndatasize): returndata |
* 60 0x00     | PUSH1 0x00     | 0 rds               | [0..returndatasize): returndata |
* f3          | RETURN         |                     | [0..returndatasize): returndata |
*                                                                                      |
* ::: set new implementation (caller is factory) ::::::::::::::::::::::::::::::::::::: |
* 5b          | JUMPDEST       | 0 0                 |                                 |
* 3d          | RETURNDATASIZE | 0 0 0               |                                 |
* 35          | CALLDATALOAD   | impl 0 0            |                                 |
* 06 0x20     | PUSH1 0x20     | w impl 0 0          |                                 |
* 35          | CALLDATALOAD   | slot impl 0 0       |                                 |
* 55          | SSTORE         | 0 0                 |                                 |

We reached a jumpdest, which will hopefully be 0x52, the bytecode offset where we will have the logic in case of a successful delegatecall. If you are wondering what happens to these jump offsets when we have the push14 instead of the push20 in the beginning of the runtime bytecode, then you are one smart cookie. When we deploy with push14 these offsets need to be changed as well - long story short we subtract 6 to them.
The successful delegatecall will just finish up with returning the data that came from it, as it is normal in a proxy contract.
5 bytes later, we have the jumpdest for when the caller IS in fact the factory (in fact... the factory...). Here returndatasize gives us 0x00 again (hurray!). And now here comes a curious part:
- We load the first 32 bytes of calldata into memory.
- We load the second 32 bytes of calldata into memory.
- And we do sstore. Boom!

Remember we are on the logic part only accessible by the factory. The factory will be the one address able to do this, which is writting a value into an arbitraty slot in the proxy contract. If we pass here the implementation slot, it means the factory can rewrite the implementation address. So TLDR: this is the code for upgrading the proxy to a new implementation address. No function sigs, it's just like this.

Let me go on a small tangent. I once saw a beautiful Tennis drop shot by the legend Carlos Alcaraz. It was so well achieved, so exquisite, that the commentator said "That is FILTHY from Carlos Alcaraz!" Jtriley, my brother, this is just filthy.
Let's proceed.

Carlos Alcaraz drop shot — It wasn't this drop shot, but this one was good too

* ::: no extra calldata, return :::::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 60 0x40     | PUSH1 0x40     | 2w 0 0              |                                 |
* 80          | DUP1           | 2w 2w 0 0           |                                 |
* 36          | CALLDATASIZE   | cds 2w 2w 0 0       |                                 |
* 11          | GT             | gt 2w 0 0           |                                 |
* 15          | ISZERO         | lte 2w 0 0          |                                 |
* 60 0x52     | PUSH1 0x52     | dest lte 2w 0 0     |                                 |
* 57          | JUMPI          | 2w 0 0              |                                 |

We push 0x40 into the stack. If you're wondering what 2w means, it's 2 words, i.e. 2 words of 32 bytes, i.e. 64 bytes, i.e. 0x40.
We check the calldata's size. If it is greater than 2 words, then it means there's more calldata after the new implementation address and the slot value. If not, then it is assumed this is just an upgrade with no call, and we jump once again to the 0x52 bytecode offset, the place where there's the logic for handling a successful delegatecall. This will just translate in returning 0 data.

* ::: copy extra calldata to memory :::::::::::::::::::::::::::::::::::::::::::::::::: |
* 36          | CALLDATASIZE   | cds 2w 0 0          |                                 |
* 03          | SUB            | t 0 0               |                                 |
* 80          | DUP1           | t t 0 0             |                                 |
* 60 0x40     | PUSH1 0x40     | 2w t t 0 0          |                                 |
* 3d          | RETURNDATASIZE | 0 2w t t 0 0        |                                 |
* 37          | CALLDATACOPY   | t 0 0               | [0..t): extra calldata          |
*                                                                                      |
* ::: delegatecall to implementation ::::::::::::::::::::::::::::::::::::::::::::::::: |
* 3d          | RETURNDATASIZE | 0 t 0 0             | [0..t): extra calldata          |
* 3d          | RETURNDATASIZE | 0 0 t 0 0           | [0..t): extra calldata          |
* 35          | CALLDATALOAD   | i t 0 0             | [0..t): extra calldata          |
* 5a          | GAS            | g i t 0 0           | [0..t): extra calldata          |
* f4          | DELEGATECALL   | succ                | [0..t): extra calldata          |

The extra calldata will be copied to memory (I know, that's written in the comment). The subtraction is responsible for checking the size of the remaining calldata - the full size minus the 2 first words, which we know are not part of it.
We do the same steps as before to fetch once again the first word of the calldata - the new implementation address, and we delegatecall to it with the extra calldata as, well, the calldata. Notice that we are using the calldata address value because it is way cheaper than doing the value from storage. We know the storage value got updated with this calldata value, so saving an sload is smart.

So to recap, if the execution reached this point, the caller is the factory, the implementation got changed, and now we are doing a delegatecall to that new implementation. This is the equivalent of doing the typical upgradeToAndCall.

* ::: copy returndata to memory :::::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 3d          | RETURNDATASIZE | rds succ            | [0..t): extra calldata          |
* 60 0x00     | PUSH1 0x00     | 0 rds succ          | [0..t): extra calldata          |
* 80          | DUP1           | 0 0 rds succ        | [0..t): extra calldata          |
* 3e          | RETURNDATACOPY | succ                | [0..returndatasize): returndata |
*                                                                                      |
* ::: branch on delegatecall status :::::::::::::::::::::::::::::::::::::::::::::::::: |
* 60 0x52     | PUSH1 0x52     | dest succ           | [0..returndatasize): returndata |
* 57          | JUMPI          |                     | [0..returndatasize): returndata |
*                                                                                      |
* ::: delegatecall failed, revert :::::::::::::::::::::::::::::::::::::::::::::::::::: |
* 3d          | RETURNDATASIZE | rds                 | [0..returndatasize): returndata |
* 60 0x00     | PUSH1 0x00     | 0 rds               | [0..returndatasize): returndata |
* fd          | REVERT         |                     | [0..returndatasize): returndata |
* -------------------------------------------------------------------------------------+

We do the same as we did in the normal delegatecall logic. We load the return data. If it's successful, we go back to 0x52. If not, we revert with the returned error data. Look at you, you're starting to recognize bytecode patterns.

ross friends so damn proud meme — Freakin' chad.

And that concludes the analysis of the proxy bytecode that gets deployed by this contract factory! Finally, let us now jump into the upgrade functions.

Upgrade functions

/// @dev Upgrades the proxy to point to `implementation`.
/// The caller of this function must be the admin of the proxy on this factory.
function upgrade(address proxy, address implementation) public payable {
    upgradeAndCall(proxy, implementation, _emptyData());
}

Function upgrade

This function is used to upgrade the implementation address of a given proxy, and it just uses the logic of upgradeAndCall but with empty data on the third argument. The _emptyData internal function is a neatly way for passing an empty bytes array as calldata, rendering it cheaper than if that parameter were located in memory.

/// @dev Helper function to return an empty bytes calldata.
function _emptyData() internal pure returns (bytes calldata data) {
    /// @solidity memory-safe-assembly
    assembly {
        data.length := 0
    }
}

Nice!

Let's look at the upgradeAndCall function.

/// @dev Upgrades the proxy to point to `implementation`.
/// Then, calls the proxy with abi encoded `data`.
/// The caller of this function must be the admin of the proxy on this factory.
function upgradeAndCall(address proxy, address implementation, bytes calldata data)
    public
    payable
{
    /// @solidity memory-safe-assembly
    assembly {
        // Check if the caller is the admin of the proxy.
        mstore(0x0c, address())
        mstore(0x00, proxy)
        if iszero(eq(sload(keccak256(0x0c, 0x20)), caller())) {
            mstore(0x00, _UNAUTHORIZED_ERROR_SELECTOR)
            revert(0x1c, 0x04)
        }
        // Set up the calldata to upgrade the proxy.
        let m := mload(0x40)
        mstore(m, implementation)
        mstore(add(m, 0x20), _IMPLEMENTATION_SLOT)
        calldatacopy(add(m, 0x40), data.offset, data.length)
        // Try upgrading the proxy and revert upon failure.
        if iszero(call(gas(), proxy, callvalue(), m, add(0x40, data.length), 0x00, 0x00)) {
            // Revert with the `UpgradeFailed` selector if there is no error returndata.
            if iszero(returndatasize()) {
                mstore(0x00, _UPGRADE_FAILED_ERROR_SELECTOR)
                revert(0x1c, 0x04)
            }
            // Otherwise, bubble up the returned error.
            returndatacopy(0x00, 0x00, returndatasize())
            revert(0x00, returndatasize())
        }
        // Emit the {Upgraded} event.
        log3(0, 0, _UPGRADED_EVENT_SIGNATURE, proxy, implementation)
    }
}

Function upgradeAndCall

Let's go through the assembly:

The first lines are the same as in function changeAdmin - we check if the caller is the admin of the proxy in question. Notice that this is the admin addressed stored in the factory contract, the one address capable of calling the upgrade functions. And the factory address itself is the admin address inside the ERC1967Proxy contract, as we have seen in the previous bytecode analysis. In other words, proxy upgrades can only be done through the factory contract.
We load the free memory pointer from the right place in memory - 0x40 - and starting at that offset we save in memory the new implementation address, the implementation slot, and the extra data. Because we've analysed the proxy bytecode, we know this parameter concatenation will be the calldata passed to the proxy call. The proxy will recognize the caller to be the factory and will write the first calldata word - in this case, the implementation address - into the storage slot equal to the second calldata word - the implementation slot. If the extra data is not empty, it will also delegatecall to the new implementation, as we've seen.
If the upgrade call fails we will revert the execution. If there's returned error data, we bubble that error, but if not we raise the UpgradeFailed error.
If the call was successful, we emit the Upgraded event using a log3 opcode (it takes the event signature as topic 1, the proxy address as topic 2 and the new implementation address as topic 3).

This concludes the upgrade functions, and we now move into the deployment functions.

big enter button punch — That's how I deploy proxies

Deploy functions

/// @dev Deploys a proxy for `implementation`, with `admin`,
/// and returns its address.
/// The value passed into this function will be forwarded to the proxy.
function deploy(address implementation, address admin) public payable returns (address proxy) {
    proxy = deployAndCall(implementation, admin, _emptyData());
}

Function deploy

The first one is deploy, which just calls deployAndCall, using the same _emptyData trick as in the upgrade function. We pass here the proxy implementation address and the address that will be registered as the proxy admin. The function will return the deployed proxy address.

/// @dev Deploys a proxy for `implementation`, with `admin`,
/// and returns its address.
/// The value passed into this function will be forwarded to the proxy.
/// Then, calls the proxy with abi encoded `data`.
function deployAndCall(address implementation, address admin, bytes calldata data)
    public
    payable
    returns (address proxy)
{
    proxy = _deploy(implementation, admin, bytes32(0), false, data);
}

Function deployAndCall

The deployAndCall function will just make use of the internal _deploy function, which can optionally perform a deterministic deploy. Here, bytes(0) will be salt parameter and the useSalt parameter will be set to false.

/// @dev Deploys a proxy for `implementation`, with `admin`, `salt`,
/// and returns its deterministic address.
/// The value passed into this function will be forwarded to the proxy.
function deployDeterministic(address implementation, address admin, bytes32 salt)
    public
    payable
    returns (address proxy)
{
    proxy = deployDeterministicAndCall(implementation, admin, salt, _emptyData());
}

/// @dev Deploys a proxy for `implementation`, with `admin`, `salt`,
/// and returns its deterministic address.
/// The value passed into this function will be forwarded to the proxy.
/// Then, calls the proxy with abi encoded `data`.
function deployDeterministicAndCall(
    address implementation,
    address admin,
    bytes32 salt,
    bytes calldata data
) public payable returns (address proxy) {
    /// @solidity memory-safe-assembly
    assembly {
        // If the salt does not start with the zero address or the caller.
        if iszero(or(iszero(shr(96, salt)), eq(caller(), shr(96, salt)))) {
            mstore(0x00, _SALT_DOES_NOT_START_WITH_CALLER_ERROR_SELECTOR)
            revert(0x1c, 0x04)
        }
    }
    proxy = _deploy(implementation, admin, salt, true, data);
}

Functions deployDeterministic and deployDeterministicAndCall

Function deployDeterministic uses deployDeterministicAndCall like deploy uses deployAndCall. Here we see that there's an additional salt parameter, which will be used for the create2 deterministic deploy in _deploy. Interestingly, there's a check on this new parameter, which can potentially revert the execution with a SaltDoesNotStartWithCaller error. Let's break it down:

There is an or opcode inside the first iszero opcode. This means we will skip the if block if one of the or sides returns a non-zero value.
The first condition inside the or is iszero(shr(96, salt)). The shr opcode performs a shift right operation, in this case of 96 bits, or 12 bytes. Checking if this result is zero is the equivalent of checking if the left-most 20 bytes of salt are all zero. If this is true, we are good.
The second condition is eq(caller(), shr(96, salt)). Doing the same shift right operation, this time we're checking if the left-most 20 bytes of salt are equal to the caller address (20 bytes as well), and we also skip the if block if this condition is met.

So the TL;DR here is that the salt parameter's left-most 20 bytes need to either be zero or equal to the function caller. This is a design choice, since create2 doesn't actually require these constraints on the salt. But here is the reasoning behind such design choice:

Adding the caller address into the salt parameter and enforcing this condition is a neat way of providing optional front-running protection. If someone sees this deploy transaction and wants to snipe the sweet deterministic address where the contract was getting deployed to, executing a transaction with the same salt and higher gas fee will not work, for the caller will no longer match the first 20 bytes of the salt parameter.
In case one does not need front-running protection - a public goods deploy, for example, where it's irrelevant who the deployer is - one can let those first 20 bytes be zero, and the check is essentially bypassed.

Run Forrest run — Frontrun, Forrest, frontrun!

Now we want to move on to the internal _deploy function, but because it uses another internal function - _initCode - we will check this one out first.

function _initCode() internal view returns (bytes memory m) {
    /// @solidity memory-safe-assembly
    assembly {
        /**
         * Bytecode comment big behemoth
         */

        m := mload(0x40)
        // forgefmt: disable-start
        switch shr(112, address())
        case 0 {
            // If the factory's address has six or more leading zero bytes.
            mstore(add(m, 0x75), 0x604c573d6000fd) // 7
            mstore(add(m, 0x6e), 0x3d3560203555604080361115604c5736038060403d373d3d355af43d6000803e) // 32
            mstore(add(m, 0x4e), 0x3735a920a3ca505d382bbc545af43d6000803e604c573d6000fd5b3d6000f35b) // 32
            mstore(add(m, 0x2e), 0x14605157363d3d37363d7f360894a13ba1a3210667c828492db98dca3e2076cc) // 32
            mstore(add(m, 0x0e), address()) // 14
            mstore(m, 0x60793d8160093d39f33d3d336d) // 9 + 4
        }
        default {
            mstore(add(m, 0x7b), 0x6052573d6000fd) // 7
            mstore(add(m, 0x74), 0x3d356020355560408036111560525736038060403d373d3d355af43d6000803e) // 32
            mstore(add(m, 0x54), 0x3735a920a3ca505d382bbc545af43d6000803e6052573d6000fd5b3d6000f35b) // 32
            mstore(add(m, 0x34), 0x14605757363d3d37363d7f360894a13ba1a3210667c828492db98dca3e2076cc) // 32
            mstore(add(m, 0x14), address()) // 20
            mstore(m, 0x607f3d8160093d39f33d3d3373) // 9 + 4
        }
        // forgefmt: disable-end
    }
}

I removed the huge proxy bytecode breakdown comment because we went through it already in the ERC1967 bytecode explanation section. As we can see there are 2 branches of bytecode compilation: one when the first 6 bytes of the factory address are zero (damn!), the other when this condition isn't met. Indeed, the canonical ERC1967Factory address is 0x0000000000006396FF2a80c067f99B3d2Ab4Df24, which has 6 leading zero bytes. Using this contract will allow the deployed proxy contract bytecode to be smaller, thus cheaper.

The bytecode shown above was already explained previously. Just be mindful that the bytecode writing into memory is being done from the end to the beginning. In other words, you read the beginning of the bytecode in the last mstore of each switch branch. Now on to the _deploy function.

/// @dev Deploys the proxy, with optionality to deploy deterministically with a `salt`.
function _deploy(
    address implementation,
    address admin,
    bytes32 salt,
    bool useSalt,
    bytes calldata data
) internal returns (address proxy) {
    bytes memory m = _initCode();
    /// @solidity memory-safe-assembly
    assembly {
        // Create the proxy.
        switch useSalt
        case 0 { proxy := create(0, add(m, 0x13), 0x89) }
        default { proxy := create2(0, add(m, 0x13), 0x89, salt) }
        // Revert if the creation fails.
        if iszero(proxy) {
            mstore(0x00, _DEPLOYMENT_FAILED_ERROR_SELECTOR)
            revert(0x1c, 0x04)
        }

        // Set up the calldata to set the implementation of the proxy.
        mstore(m, implementation)
        mstore(add(m, 0x20), _IMPLEMENTATION_SLOT)
        calldatacopy(add(m, 0x40), data.offset, data.length)
        // Try setting the implementation on the proxy and revert upon failure.
        if iszero(call(gas(), proxy, callvalue(), m, add(0x40, data.length), 0x00, 0x00)) {
            // Revert with the `DeploymentFailed` selector if there is no error returndata.
            if iszero(returndatasize()) {
                mstore(0x00, _DEPLOYMENT_FAILED_ERROR_SELECTOR)
                revert(0x1c, 0x04)
            }
            // Otherwise, bubble up the returned error.
            returndatacopy(0x00, 0x00, returndatasize())
            revert(0x00, returndatasize())
        }

        // Store the admin for the proxy.
        mstore(0x0c, address())
        mstore(0x00, proxy)
        sstore(keccak256(0x0c, 0x20), admin)

        // Emit the {Deployed} event.
        log4(0, 0, _DEPLOYED_EVENT_SIGNATURE, proxy, implementation, admin)
    }
}

Function _deploy

Let's break it down.

We start by fetching the proxy's bytecode with using the internal _initCode function. Wow, non-assembly code!
If useSalt is true, it means we want a deterministic deployment, so we deploy with create2 using the provided salt. If not, we use create. If you're wondering why we're passing the memory offset as add(m, 0x13), that's because the _initCode function returns some leading zeros due to the last mstore done, which is only writing 13 bytes. So we remove the other 19 bytes (and 19 = 0x13 in hexadecimal). 0x89 is the largest possible size of the proxy bytecode.
If the returned address is zero, it means the deployment failed for some reason (e.g. deterministc address has been taken already). So execution reverts with a DeploymentFailed error.
The next step is to store the implementation address on the deployed proxy contract (remember, the factory is the one able to do this), and we also forward the data parameter in the end of the calldata for the proxy call. If this data is not empty, the proxy will do an additional delegatecall to the implementation contract, as we've seen previously.
If the call was not successful, the execution gets reverted, either with a DeploymentFailed error or with error data coming from the failed call.
The proxy is deployed and set. Now we concatenate the caller and the proxy address to hash it and get the slot where we should store the admin address.
Finally, we use log4 (3 indexed parameters plus the signature) to emit a Deployed event.

We're almost done! There are 2 functions left.

office meme is it over yet — Almost! Thank you for reaching this far!

/// @dev Returns the address of the proxy deployed with `salt`.
function predictDeterministicAddress(bytes32 salt) public view returns (address predicted) {
    bytes32 hash = initCodeHash();
    /// @solidity memory-safe-assembly
    assembly {
        // Compute and store the bytecode hash.
        mstore8(0x00, 0xff) // Write the prefix.
        mstore(0x35, hash)
        mstore(0x01, shl(96, address()))
        mstore(0x15, salt)
        predicted := keccak256(0x00, 0x55)
        // Restore the part of the free memory pointer that has been overwritten.
        mstore(0x35, 0)
    }
}

/// @dev Returns the initialization code hash of the proxy.
/// Used for mining vanity addresses with create2crunch.
function initCodeHash() public view returns (bytes32 result) {
    bytes memory m = _initCode();
    /// @solidity memory-safe-assembly
    assembly {
        result := keccak256(add(m, 0x13), 0x89)
    }
}

Functions predictDeterministicAddress and initCodeHash

The initCodeHash function does one clear job: hash the proxy bytecode and return its result. This is pretty straightforward. The function's output will actually be used in the first function, predictDeterministicAddress.

The idea of this function is to allow someone to calculate in advance the deployment address, something possible thanks to create2. The computed address will be hash of an interesting concatenation:

0xff
The address of the deployer. In our case, it will always be the factory address.
A salt parameter, used to mine a specific address.
The deploy bytecode hash.

If you're wondering why the final mstore is needed, that's because mstore(0x35, hash) will overwrite a portion of the memory location where the free memory pointer is stored - 0x40. Because the free memory pointer can never be large enough to have data on those left-most bytes - due to the upper bound on the EVM's memory - they can get safely zero'ed out in the end. And we care about this because this function is public so it can be used internally (or by a contract inheriting it), and we don't want to mess any function calling this one internally. Clever!

I'm impressed you reached this far, thanks for reading! Checkout jtriley on the bird I mean X app and Vectorized's beautiful Solady repo.

If you liked it, please share! It pleases the gods of assembly. See ya!