Case Study: Formalizing Grug’s Jellyfish Merkle Tree with Quint

Name: Quint
Author: Informal Systems

March 5, 2025 • Written by: Aleksandar Ignjatijevic, Gabriela Moreira and Josef Widder

Jellyfish Merkle Tree formal specification cover image

In Q4 2024, Informal Systems partnered with Left Curve to formally specify their Rust implementation of the Jellyfish Merkle Tree (JMT) using Quint , our in-house specification language. The JMT is a core component of Grug , Left Curve’s novel blockchain execution environment.

Verifying the correctness of such a complex algorithm is no small feat. Bugs can lurk undetected even after extensive testing, potentially leading to catastrophic failures. That’s where Quint, our in-house specification language, came in. By enabling automated simulations across a vast range of scenarios, we not only specified the algorithm but also uncovered a subtle bug in the tree formation. Left Curve fixed the issue, and we validated the fix using Quint.

You can explore the complete Quint specification and documentation in Left Curve’s GitHub repository .

Why Formal Specification Matters

In blockchain systems, trust is everything. But how can you trust an algorithm you don’t fully understand? And how do you begin to understand distributed systems, protocols and data structures?

Formal specification bridges this gap. It makes complex protocols easier to grasp and provides a foundation for rigorous verification. An English specification alone isn’t enough—it explains what the protocol does, but typically doesn’t show how reliably it does it. That’s where executable specifications shine. They allow for automated checks at scale, uncovering hidden flaws that regular testing might miss.

For Left Curve, this was critical. The JMT is central to Grug’s ability to reach consensus over a distributed state machine. Inspired by Diem’s Jellyfish Merkle Tree , adapted to Left Curve’s unique use cases. However, given the algorithm’s critical role in ensuring data integrity, even a small bug could have significant consequences for the system’s reliability and security.

The Quint Approach: Specification, Simulation, and Verification

Our collaboration with Left Curve followed a three-step process.

1. Specification

We set out to create a Quint model faithful to the Rust implementation, including all its algorithmic optimizations. The JMT applies batches of operations (inserts and deletes) on a binary tree, recursing from root to leaves while updating the tree in place.

The Algorithm’s Complexity: The JMT algorithm is designed to apply a batch of operations on a tree in an optimal way. It does so by recursing over the tree from root to leaves, splitting the batch into operations relevant to each subtree. Between recursive calls, the algorithm updates the mutable tree, introducing two computation flows in the same functions:

Recursion Flow: Resolves operations from leaves to the root.
Mutation Flow: Updates the tree from root to leaves.

This dual flow is the main source of complexity. Modeling it in Quint was challenging because Quint doesn’t natively support recursion. However, this exercise forced us to deeply understand the implementation’s behavior, including edge cases that might have been overlooked - including one that caused a bug.

We also specified proof generation (Grug’s own algorithm) and proof verification (from ICS23 ). This allowed us to check properties like completeness and soundness for both membership and non-membership proofs.

2. Twin-Testing

To ensure our model was correct, we took a twin-testing approach. We created two versions of the specification:

Simple: A straightforward algorithm that rebuilds the tree from scratch after each operation.
Fancy: A faithful reproduction of the optimized Rust implementation.

The “simple” version served as a reference model, while the “fancy” version included all the optimizations. We ran tens of thousands of simulations, comparing the results of consecutive applications using both versions to ensure they behaved identically.

Here’s how the twin-testing worked:


val reference = simple::apply(tree, version - 1, version, ops)
val result = fancy::apply(tree, version - 1, version, ops)
assert(reference == result)

Where ops is a list of randomly generated operations, and tree and version are cumulative results from applying these operations.

This approach gave us confidence that the optimizations weren’t introducing errors, and that our modeling of the algorithm was correct.

3. Simulations and Verification

With the model validated, we turned to simulations and verification to obtain confidence about the algorithm’s correctness. We defined 17 invariants—properties that must hold true for the JMT to function correctly.

Here are some of the most interesting invariants we defined:

Version Consistency: Versions of predecessors in the path should be >= node version. When nodes are updated (inserted or deleted), they are assigned a new version, as are all their predecessors in the tree. Subtrees untouched by an update retain their version. This means that, in a properly maintained tree, a parent node’s version should always be greater than or equal to its child’s version.
Internal Node Validity: Internal nodes have at least one child. If an internal node has no children, it’s effectively a leaf with no value, which violates the tree’s structure.
Orphaned Nodes: Orphans should not appear in trees after they are orphaned. Orphaned nodes are used for state pruning. The intuition is that orphaned nodes can be safely pruned because they’re no longer needed for tree operations (including proof construction or verification) after they’re orphaned. This invariant makes it explicit why it is safe to prune orphans.
Completeness and Soundness of Proofs: We also defined invariants for the completeness and soundness of both membership and non-membership proofs. A deeper dive into completeness and soundness invariants can be found on Left Curve’s GitHub repository .

Simulation Methodology: To validate these invariants, we used two tools:

Quint Simulator: Runs a Depth-First Search (DFS) on the state space, exploring many samples up to a specified depth (--max-steps).
TLC Model Checker: Runs a Breadth-First Search (BFS) on the complete state space, with a refined model to keep the state space manageable.

We ran millions of simulations, iterating over the model and invariants to spot issues as soon as they appeared. After iteration, final checks were run for a total of 66 hours using 6 parallel instances. Checking a total of samples of:

1.8 million samples for invariants related to tree manipulation
2.3 million samples for proof-related invariants

On top of that, we managed to formally verify those properties in two different scenarios of reduced scope, by leveraging the Quint to TLA+ transpilation and running the TLC model checker.

This massively increased our confidence in the model’s correctness. Quint’s simulations were a game-changer, as we would never have been able to write so many interesting scenarios by ourselves, and we could have missed the one that resulted in a bug.

Uncovered Bug

During simulations, Quint found a scenario that violated several invariants, including the mentioned version consistency and internal node validity. When we looked at the scenario produced by Quint, we observed that in this run a leaf node was saved in two locations (bit 0 and bit 010).

To confirm the bug was not a modeling mistake, we reproduced the test in Rust—and the bug persisted. Left Curve promptly fixed the issue, and we updated the Quint specification to reflect the corrected behavior. After the fix, no further invariant violations were found.

Lessons Learned

This project highlighted great benefits of Quint modeling and simulations for complex systems (and auditing them):

Deep Understanding: Writing a formal specification forced us to deeply understand the algorithm’s behavior, including edge cases that might have been overlooked.
Scalable Testing: Quint’s simulations allowed us to test millions of scenarios, far beyond what regular testing could achieve.
Properties Find Bugs: Defining and checking properties is an effective way to find bugs, as we don’t have to come up with corner cases ourselves.

Conclusion

Formal methods like Quint are transforming how we build and verify blockchain systems. By specifying and simulating the Jellyfish Merkle Tree, we checked its correctness and robustness, helping Left Curve push the boundaries of blockchain scalability.

If you’re working on a project that demands the highest level of confidence, let’s talk. Reach out to us at audits@informal.systems, or find us on our website .

If you would like to try out Quint yourself, check out the rich set of examples and tutorials on the Quint website . Come to us with any questions, we are happy to help.

Now, it’s time to ship. 🚀