Patents Wiki
Back to briefings
PaperCryptography

The Coin That Forgets: How Zerocash Turned Bitcoin's Public Ledger Into a Privacy Engine

A Bitcoin transaction is a postcard. Anyone running a block explorer can read the sender, the recipient, and the amount, and a small academic industry has spent the last several years showing how to use those postcards to reconstruct who paid whom, when, and for what. A first round of fixes (most prominently Zerocoin in 2013) broke the link between old and new coins, but still left destinations and amounts on the table. The 2014 paper Zerocash: Decentralized Anonymous Payments from Bitcoin, by Ben-Sasson, Chiesa, Garman, Green, Miers, Tromer, and Virza, made a louder claim: hide the origin, hide the destination, hide the amount, and do it on a public ledger that anyone can audit.

A ledger you can audit but cannot read. The authors pull this off with a small piece of cryptographic machinery called a zk-SNARK, attached to a payment primitive they call a "pour." The idea is no longer a thought experiment. A full currency called Zcash launches on October 28, three weeks from today, on the same architecture, and every Zerocash pour produces a proof that is roughly the size of a tweet and checks in a few milliseconds. This briefing walks through what the paper says, why the math works, and where the rough edges still are.

The paper at a glance

  • Title: Zerocash: Decentralized Anonymous Payments from Bitcoin
  • Authors: Eli Ben-Sasson, Alessandro Chiesa, Christina Garman, Matthew Green, Ian Miers, Eran Tromer, Madars Virza
  • Venue: 2014 IEEE Symposium on Security and Privacy (Oakland), May 18 to 21, 2014, San Jose, California
  • DOI: 10.1109/SP.2014.36
  • IACR ePrint: 2014/349 (extended technical report, roughly 56 pages, May 18, 2014)
  • Project page: zerocash-project.org
  • Performance headline: transactions under 1 kB, verification under 6 ms, orders of magnitude tighter than Zerocoin and competitive with plain Bitcoin.

A ledger that remembers too much

Bitcoin's transparency is the source of its auditability and the source of its privacy problem. Every transaction is a tuple of "this address sent this many coins to that address," copied onto thousands of nodes and stamped forever. Researchers have shown that even a small amount of side information (a forum username, a merchant's public payout address, a Tor exit node) can map those pseudonymous addresses to real people and rebuild a detailed picture of their financial lives. A growing body of work, including de-anonymization studies by Ron and Shamir and by Meiklejohn and others, treats the Bitcoin ledger less as an anonymous cash system and more as a permanent, indexed, financial gossip column.

The 2013 Zerocoin system proposed a partial fix. Old coins went into a cryptographic pool and new coins came out, breaking the direct chain of ownership. The approach worked for unlinking origins, but it had three uncomfortable limits. First, it only hid who paid whom, not what was paid or to which new address. Second, the protocol forced fixed denominations, so splitting a 1 BTC coin into a 0.4 and a 0.6 payment required multiple on-chain transactions and a small explosion in proof data. Third, the proofs themselves were large. The Zerocash authors report that Zerocoin's double-discrete-logarithm proofs exceeded 45 kB per spend and required about 450 ms to verify on their reference machine, awkward to stuff into every block.

Zerocash attacks all three problems at once, and that is the reason it matters even if you never spend a single shielded coin.

The core idea: prove validity, leak nothing

The heart of the paper is a construction called a Decentralized Anonymous Payment scheme, or DAP. A DAP is an extension you bolt onto any append-only ledger. It adds two new transaction types: mint, which converts public "basecoin" into private "zerocoin," and pour, which consumes old zerocoins and produces new ones. The pour is where the privacy happens, and it is where the cryptography gets interesting.

A pour takes up to two old coins and up to two new coins, plus an optional public output (used to redeem back to basecoin or to pay transaction fees). A naive version of this would simply publish the old coins' identifiers, but that would link the old and new coins together and undo the whole exercise. The authors' move is to make the pour cryptographically self-certifying: instead of revealing anything about the inputs or the new owners, the sender attaches a short zero-knowledge proof that all of the following are true, without disclosing any of the underlying data:

  • The sender knows the secret material behind two existing coin commitments that already live in the global ledger.
  • Those commitments are correctly placed in a Merkle tree whose root is the one currently agreed upon by the network.
  • The total value of the input coins equals the total value of the new coins plus the public output, so no money is created or destroyed.
  • The serial numbers being revealed have never been used before, so the same coin cannot be spent twice.

The proof is built with a zk-SNARK, a zero-knowledge Succinct Non-interactive ARGument of Knowledge. "Succinct" means the proof is a few hundred bytes and the verifier can check it in milliseconds, which is what makes it cheap enough to embed in every transaction. "Non-interactive" means the sender can post the proof as a single blob. There is no back-and-forth with the verifier. "Zero knowledge" means the verifier learns only that the statement is true, nothing else.

To get a feel for the move, imagine publishing a sealed envelope that contains a proof that "the lottery ticket inside has a winning number," without opening the envelope. The reader can check the seal, the auditor can check the proof, and nobody learns the number itself. zk-SNARKs generalize that trick to arbitrary mathematical statements, and the authors use them to prove a fairly long checklist of "this pour was constructed honestly" facts in one go.

The mechanics of a pour, end to end

The diagram below traces the flow from a sender's two private input coins, through the cryptographic machinery, to the public transaction that hits the ledger. The key components are the user's witness, the formal statement being proved, the SNARK proof itself, and the structure of the resulting pour transaction.

Core Architecture/Flow

Concretely, every coin in Zerocash is a small data object with four fields: a value in basecoin units, an address public key identifying the owner, a serial number used to prevent double-spending, and a commitment (a SHA-256-based hash of those three fields plus a secret randomizer). The commitment is what gets put on the ledger at mint time, and a global Merkle tree of all coin commitments is maintained by every full node. Membership in the tree is logarithmic in size, so spending a coin does not require a linear scan of history.

A pour transaction, as it appears on the ledger, has five public parts. First, the serial numbers of the consumed coins, which are revealed to mark those coins as spent. Second, the commitments of the two new coins, which are added to the Merkle tree. Third, two ciphertexts, encrypted with the recipients' public keys using a key-private encryption scheme called ECIES, so that only the intended recipients can open them and recover the new coins' secrets. Fourth, the zk-SNARK proof itself, a blob of a few hundred bytes that ties it all together. Fifth, an optional public output containing a value, a basecoin-style destination address, and an ECDSA signature so anyone can verify that the sender authorized the redemption.

Underneath the proof sits the witness, a private set of inputs that only the sender knows: the input coin values, the secret address keys, the randomizers that went into the old commitments, and the Merkle-tree paths that show those commitments are inside the current tree. The statement being proved in zero knowledge is a long checklist written in the language of arithmetic circuits, which is what the SNARK machinery is built to evaluate efficiently. The proof that comes out is what gets attached to the pour.

A useful way to think about why the proof is so short: the verifier is not being asked to redo the computation. They are being asked to confirm that a particular precomputed cryptographic object (think of it as a signature over a long algebraic identity) is consistent with the public inputs. The mathematics that makes this possible comes from a family of constructions called quadratic arithmetic programs. The specific instantiation here is the BCTV14 zk-SNARK from the SCIPR Lab, the same construction Zcash adopted for its first protocol version, and the performance numbers in the paper (under 1 kB per transaction, under 6 ms to verify) are measured against it.

What the paper actually claims

The contribution splits into two pieces, and they are independent in the way they matter to a reader.

First, the paper formalizes the DAP abstraction. A DAP is defined by six operations: a one-time Setup that publishes public parameters, CreateAddress for generating key pairs, Mint for converting basecoin into zerocoin, Pour for the private transfer, VerifyTransaction for checking either kind of transaction, and Receive for scanning the ledger for incoming coins. Against that abstraction, the authors define three security properties. Ledger indistinguishability says the public view of the ledger reveals no more than the trivial fact that some valid transactions occurred. Transaction non-malleability says an adversary cannot tamper with pending transactions before they are confirmed. Balance says no adversary can spend more than they minted or were paid.

Second, the paper shows that a DAP can be instantiated concretely and efficiently, with the instantiation called Zerocash. The cryptographic ingredients are SHA-256 (for commitments and Merkle hashing), SHA-256-based pseudorandom functions (for deriving serial numbers from secret keys), ECDSA (for signing the public output), ECIES (for sending new-coin secrets to recipients), and the BCTV14 zk-SNARK for the main validity proof. The performance numbers are part of the claim. In the paper's experiments, each Zerocash transaction is under 1 kB and each SNARK verifier runs in under 6 ms, with the prover running in a few minutes on commodity hardware. Compared with Zerocoin's 45+ kB proofs and roughly 450 ms verification, this is a one- to two-order-of-magnitude improvement, and it is competitive with the cost of plain Bitcoin transaction handling.

A subtler claim is the fungibility argument. If a coin's history is publicly traceable, the coin carries a kind of social stigma, and a coin that once passed through a ransomware address is worth less on the open market than an identical coin that did not. By making every zerocoin equally opaque, Zerocash restores the property that any two units of the currency are interchangeable. That property is one of the reasons the paper has been cited well outside the cryptocurrency community.

Why this is a bigger deal than it looks

Zerocash lands at a moment when the cryptographic community has spent the better part of two decades refining zero-knowledge proofs to the point where they are usable. The 2013 Zerocoin was a proof of concept. Zerocash is a proof of concept with real performance numbers and a clean abstraction that can be plugged into other systems. Two years on, Zcash's launch is the most visible downstream consumer, but the DAP abstraction has shown up in the design of other privacy-preserving ledgers, in confidential asset systems, and in academic work on anonymous credentials.

The architectural lesson matters on its own. Until Zerocash, "auditability" and "privacy" in a public ledger were usually treated as a tradeoff. You could have one, or you could have the other, or you could split the difference with a trusted party. The paper demonstrates a third option: a fully public, append-only log of validity proofs, with the actual transactional details sealed inside those proofs. The verifier checks the seal, the auditor sees the public commitments, the user keeps the secrets. It is a template that the next decade of decentralized systems is likely to keep borrowing from.

There is also a social consequence the paper is careful to surface. Once a cryptocurrency has shielded transactions at the protocol layer, the question of who is allowed to see what is no longer a property of the chain. It is a property of whoever is running a node. That shifts the privacy conversation from "audit the blockchain" to "audit the people you transact with," which is where most readers probably prefer the conversation to be.

What is still unsettled

Zerocash is not the last word on blockchain privacy, and the paper is honest about where the rough edges are.

The trusted setup is the most-discussed issue. The Setup algorithm produces a Common Reference String that is needed both to generate and to verify proofs. If the secret randomness used to create that string is ever leaked, the holder can forge proofs and mint coins out of thin air. The paper discusses ways to mitigate this, most importantly by running a multi-party computation ceremony in which several independent parties contribute to the parameters, so that the toxic waste is secure unless every participant is compromised. The need for some form of trusted generation still remains part of the design. Zcash is currently running such a ceremony for its own parameters. Eliminating the setup entirely is an active research direction, with newer SNARK families aiming for "transparent" parameter generation.

Performance caveats matter too. Generating a pour's SNARK proof takes minutes on a laptop, which is fine for a user sending a payment but not for anything high-frequency. The proving key itself is large, on the order of a gigabyte, which raises the bar for running a prover. The paper notes both numbers and treats them as engineering challenges rather than as fundamental limits. The cryptographic assumptions underlying BCTV14 are also still relatively young by the standards of the field, and the authors explicitly flag that the soundness of the construction rests on knowledge-of-exponent-style assumptions that have not been weathered by decades of cryptanalysis the way RSA or discrete-log have.

The protocol also has a known, by-design limitation: it does not hide everything. Network-level metadata (the IP address that broadcasts a pour, the timing of that broadcast, the gas or fee patterns that the sender chooses) is not addressed by the cryptographic construction. Side-channel privacy has to be solved elsewhere, typically with network-level anonymity tools layered on top of the protocol.

The regulatory conversation is unresolved. The authors acknowledge that strong protocol-level anonymity is in tension with anti-money-laundering and counter-terrorist-financing regimes, and the paper does not try to resolve that tension. Zcash and similar systems have been working with regulators on optional disclosure mechanisms, but the broader question of how a privacy-preserving currency interacts with state-level financial oversight is genuinely open.

Sources