Verification Up and Down the Stack
Loading...
Date
Authors
Advisor
Gurfinkel, Arie
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
As our world depends more and more on digital automation and complex systems, it is important that we understand them and maintain the ability to diagnose them. Assuring the correctness of modern systems requires reasoning across multiple abstraction layers, from hardware descriptions and low-level execution models to high-level software and business logic. Maintaining the correctness of each abstraction layer is a responsibility that often falls on teams made up of developers with varying expertise, few of whom reason explicitly in formal logic. While there are numerous methods that can be employed to produce reliable systems, this thesis focuses on automated reasoning techniques and how their outputs can be made useful to their users.
Automated reasoning has contributed to assuring correctness, security, and reliability of complex systems, with remarkable success. When successful, practitioners have confidence in the correctness of the claim they have proven. However, failures from automated reasoning tools often fail to provide insight into why a system is incorrect, leaving developers with opaque outcomes that are difficult to relate to system behavior or intent.
The central thesis of this work is that automated reasoning is most effective when its results are expressed as concrete, diagnosable artifacts that mediate between formal models and human understanding. Rather than treating verification outcomes as abstract solver judgments, this thesis argues that concrete counterexamples should be first-class results. In this way, the verification frameworks we present enable developers to leverage automated reasoning techniques that produce feedback at the level of abstraction they are operating in. Such artifacts enable reuse of reasoning across abstraction layers, support debugging and explanation, and make formal verification applicable in settings where human judgment and intent remain essential.
We support this thesis through a series of contributions spanning hardware, low-level systems software, and business logic. At the hardware level, we develop an MLIR-based representation for Btor2 circuits that is used for model checking with concrete counterexample generation and hardware simulation. At the software level, we develop a verification pipeline for eBPF (i.e., low-level, security-critical programs), a domain where verification is mandatory but explanations are scarce. Our approach generates executable counterexamples that concretely witness verification failures and can be explored using standard debugging tools, transforming verification outcomes from abstract judgments into actionable artifacts. Furthermore, we illustrate the value of executable counterexamples through two case studies with the Stellar Development Foundation.
We extend our artifact-centered view beyond program semantics to business logic. We show how combining automated reasoning with Large Language Models enables the generation of verifiable and auditable artifacts that relate high-level intent to implementation, even as systems evolve. Furthermore, we show how counterexamples communicated in a natural language allow for systematic understanding of business logic implementation through a case study with Amazon.
Across these domains, the common theme is not a particular solver or infrastructure, but the systematic exposure of automated reasoning results in forms that can be inspected, executed, and audited by humans. We show that advancing automated reasoning in practice requires not only stronger automation, but also thinking about how verification results can remain intelligible and usable across abstraction layers. With these contributions, we hope to keep the systems we depend on within our grasp to ensure their correctness and maintainability