Skip to Content
Quint

Espresso HotShot Epoch Changes in Quint

Written by: Yassine Boukhari and Gabriela Moreira
Espresso HotShot epoch changes cover image

That’s that Quint Espresso ☕🎶

Introduction

One of the last security audits we performed was on the epoch change protocol for the HotShot consensus algorithm developed and used by Espresso. We formed a team to work on this audit, but also an additional team to work on a side quest: Formalizing that protocol in Quint. This blog post will cover how Quint helped the audit process and how the process of formal specification is irreplaceable.

The Quint specification is available on GitHub.

Using Quint for Espresso

HotShot is the consensus protocol powering the Espresso Network, alongside a data-availability layer to order transactions for an L2 rollup. It builds on a chained variant of HotStuff-2 and operates over a dynamically changing set of proof-of-stake validators: every K blocks, a new epoch begins, with a new validator set and stake table. HotShot runs in views driven by a pacemaker that handles timeouts, view synchronization, and leader election. The epoch-change protocol extends the consensus to seamlessly transition between epochs.

The main challenge in formally specifying real world systems is to figure out a good scope (and stick to it). In this project, we knew the scope: epoch changes. Now, it was a matter of sticking to this scope and keeping everything else as abstracted as possible, so the Quint model didn’t get any more complex than it needed to be. Additionally, the main concern with this protocol was about liveness, which also became part of our scoping strategy and we’ll cover later.

Scope Preservation

To preserve scope, we abstract everything that is not part of the scope. For starters, the block. The only information this protocol needs from the block is its height, so that’s the only thing we model. Certificates need signatures, but we assume that some reliable signing algorithm is used, and only store the ids of nodes who signed that certificate in the model, avoiding any encryption/decryption complexity and sticking to what matters, the protocol logic.

In leader-based consensus under partial synchrony, the participating nodes may temporarily have different views of the correct leader, preventing progress. HotShot addresses this issue with the Naor-Keidar view-synchronization protocol, a multi-phase exchange involving designated relay nodes, relay-to-relay gossip, and the formation of view-sync certificates to guarantee that all honest nodes converge on the same view number and leader despite timeouts or message delays.

Since modeling each protocol round and message exchange would bloat our Quint specifications and significantly expand the scope, we deliberately treat this view synchronization mechanism as a black box: once it completes, all honest replicas share a synchronized state and can proceed directly to the proposing and voting steps.

On top of that, we are also careful to structure the Quint code in such a way that network details are isolated from the protocol functionality. This helps a lot with reading and understanding the specification, and also changing the network easily and trying different set ups. This is something we always do when writing Quint specs, and you will hear more about our framework in the near future.

To sum it up: we abstract the things like the blocks and certificates, the view synchronization process and messaging/network details.

Quint vs. English Specification

A general problem with non-formal specifications is the lack of precision. English, markdown and even pseudocode do not force us to be precise, and being fully precise in those languages is almost impossible, even for the best of us - as Leslie Lamport recognized, by creating a new language to formally and precisely define his algorithms in papers. Imprecise descriptions have ambiguities that can lead to wrong interpretations and faulty code.

Thankfully, all imprecisions we found in the HotShot epoch change specifications (paper and pseudocode) were not translated into faulty code. Every time we found ambiguity in the specification, our audit team was alerted and they investigated the code carefully to confirm there were no issues, consulting the HotShot team when clarifications were needed. Having a pseudocode to work with is great, as it gives us an extensive amount of detail. However, it is often imprecise, as it is easy to forget some definitions or mess up the types when there’s no compiler in play. Executable specifications solve this problem.

Some examples of inconsistencies in the pseudocode are the fields justify_cert and justify_qc that seemed to be referring to the same value in the proposal data structure, and the function fetch_proposal that was used but never defined.

Focus on Liveness

The main security concern with the epoch-change protocol was regarding its liveness - that is, whether it always can make progress and doesn’t get “stuck” in any ways. Even with our scoped model, epoch-change still requires a lot of states, as multiple nodes have to go through multiple blocks in order for an epoch to change. We didn’t have plans to try and constrain into a state space small enough for the model checkers to verify liveness. That didn’t seem feasible under our short time available, but also not really necessary. We managed to get really good confidence about liveness through other means: witnesses and deadlock detection.

Witnesses

Even in the first stages of modeling, the very first question we wanted to answer was: can this go into another epoch? This was not really a question about the protocol itself, but about our model. “A state where epoch is 2” is what we call a witness property - something that we expect to hold in at least one state in one execution. We can then ask the Quint simulator whether it can find a trace that leads to a state that is described by the witness. If so, Quint produces as output a trace that describes step-by-step how the system goes from an initial state to “a state where epoch is 2”.

Simple witnesses like this are often the best way to find small issues when modeling, and using the Quint simulator to check that we can indeed reach the basic scenarios is a great way to debug our version of the spec. For example, even the smallest issue on the function that determines whether a proposal is valid can cause the whole model to never accept any proposals, and therefore be “stuck”. Basic witnesses help us be sure we got rid of basic liveness issues.

Cross-Epoch Extended Vote-Propagation Witness

We defined a witness that checks that whenever an honest node forms the extended quorum certificate for block K and enters epoch e+1, every honest node still in epoch e should have extended-vote messages pending delivery. We reasoned that without those votes in transit, some nodes would never observe the extended quorum certificate and a malicious leader could exploit this by simply withholding the certificate, halting progress. Quint immediately exposed a trace where the spec only sent the extended votes to the next-view leader. So there was a liveness issue on the specification (not present on the code).

We fixed this by making a change to the protocol so nodes broadcast those extended votes to all nodes in e+1.

Deadlock Detection

A deadlock is one way to break liveness, so we want to make sure deadlocks cannot occur. As all actions in this protocol are reactions to messages (events), a deadlock is simply a state where there are no messages in flight. We send a message with the genesis block and certificate on the initial state, and from there on, there should always be messages in flight - otherwise, the system would be stuck. We defined this as an invariant and tested it on many different samples using the Quint simulator, convincing ourselves that the system cannot hit deadlocks, and getting more confidence that liveness holds.

While checking this deadlock invariant, Quint uncovered a self-block on extended proposals:

At view v + 1, the leader gathers quorum votes for block K and forms QC₁. The leader immediately updates its highQC ← QC₁.

It then constructs and broadcasts its extended proposal for view v + 2, justified by QC₁. On receiving its own proposal, the validity check enforces:

  • proposal.view – 1 == proposal.certificate.view, and
  • proposal.certificate.view – 1 == highQC.view

But since highQC.view was already set to v + 1, the second check fails (v + 1 – 1 ≠ v + 1), so the leader rejects its own proposal and the quorum can’t be formed. No new messages get emitted, and the protocol deadlocks. This problem was also only present in the specification and not reflected in the code.

Final Thoughts

In conclusion, our experience working on the formal specification of the epoch-change protocol alongside the audit proved to be highly productive. Quint played a key role in uncovering interesting scenarios that were then shared with the audit team, facilitating a deeper understanding of the expected behavior as defined by the specification. This collaborative approach not only raised important questions but also highlighted the value of the specification itself as a valuable artifact with enforced precision and no ambiguities.

To learn more about Quint and how it can help you define precise and well-scoped parts of your system, visit quint.sh. Our specialized team can help you leverage the power of Quint to ensure the security and reliability of your systems, reach out to us at audits@informal.systems. Learn more at informal.systems/security-and-engineering.

Last updated on