Contract Verification using Symbolic Execution
February 19, 2013
For Olin Shiver's Program Analysis course this semester, I'm presenting a lecture on “Higher-Order Symbolic Execution via Contracts” by Sam Tobin-Hochstadt and David Van Horn. I'm writing this post to prepare for my presentation, hence it is mostly a reinterpretation of the paper mentioned. With that said, let's get started!
Adapting symbolic execution to work with contracts provides two main gains: increasing the efficiency of running programs with contract and allowing for modular program analysis.
Programming with contracts enables programmers to provide rich dynamic specifications to components. With the ease of writing specifications dynamically comes the expense of checking them at runtime. The ability to statically prove the absence of contract violations would provide a large efficiency gain, enabling contract checks excluded from the execution.
Contracts also enable us to reason about modules in isolation, or modularly, since they specify how two parties should interact at their boundaries. As it turns out, analyzing modular programs is difficult! Abstract interpretation for instance, cannot reason about concrete function flowing into unknown contexts. Symbolic execution allows for analysis in the presence of unknown components but often fails to offer automatic, terminating analyses. If we can utilize abstract interpretation and symbolic execution to verify contracts, we'll be left with a modular and composable automatic program analysis.
With these two problems in mind, let's delve into the intricacies of the paper. On a high level, [1] extends CPCF, a language with contracts, to operate (non-deterministically) over symbolic values (SCPCF) in order to soundly discover blame. This is done by refining symbolic values with successful contract checks and using those to skip future contract checks.
We'll start with an overview of contracts, move to an overview of symbolic execution, and close with how the semantics of contract PCF is extended to include symbolic values.
1. Introduction to Contracts
Contracts are executable specifications that enable blame to be assigned to software components that violate defined specifications. At the first-order level, blame assignment and checking of contracts is simple, but higher-order contracts are limited by Rice's Theorem: it isn't decidable whether a function argument adheres to its contract. Hence higher-order values must be wrapped in a monitor that checks the contract at each use. Monitors track agreements between parties as higher-order values flow back and forth across component boundaries, enabling blame to be assigned correctly.
Summarizing from [1], CPCF is an extension of PCF, introduced by Dimoulas et al. [7], that enables contracts to wrap base values and first class functions.
Types       T ::= B | T → T | con(T)
Base types  B ::= N | B
Terms       E ::= A | X | E E | μ X:T . E | if E E E
                | O1(E) | O2(E, E) | monff,f (C, E)
Operations O1 ::= zero? | false? | ...
           O2 ::= + | - | ∧ | ∨ | ...
Contracts   C ::= flat(E) | C → C | C → λ X:T . C
Answers     A ::= V | ε[blameff]
Values      V ::= λ X:T . E | 0 | 1 | -1 | ... | tt | ff
Evaluation  ε ::= [] | ε E | V ε | O2(ε, E) | O2(V, ε)
  contexts      | O1(ε) | if ε E E | monhf,g(C, ε)There are three types of contracts: flat(E) wraps base values with a predicate defined in the CPCF language, C1 → C2 is a function contract with the contract C1 for the function's argument, and C2 for the function's result, lastly C1 → λX.C2 are dependent contracts where X binds the function's argument in the body of C2.
A contract C is attached to expression E using the monitor construct monhf,g (C, E), where the labels f, g and h are used to blame specific components in the event of a contract failure. Label f names the server or the actual expression, g labels the client, or the context, and h names a contract, since contracts are arbitrary expressions, they can also be blamed. The labels in blamegf state that component f broke its contract with g.
[7] introduces complete monitors, which correctly monitor all channels of communication between components, including contracts themselves. For the sake of simplicity, we will work with lax contract semantics, which is not a complete monitor, but is still blame correct.
The reduction relation of CPCF is mostly standard, with the last two rules handling function and flat contract checks.
if tt E1 E2                   ⟼ E1
if ff E1 E2                   ⟼ E2
(λ X:T . E) V                 ⟼ E[V/X]
μ X:T . E                     ⟼ E[μ X:T . E/X]
O(V)                          ⟼ A if δ(O, V) = A
monhf,g(C1 → λ X:T . C2, V)     ⟼ λ X:T . monhf,g (C2, (V monhg,f (C1, X)))
monhf,g(flat(E), V)            ⟼ if (E V) V blamehfFunction contracts are checked by reversing the monitor's blame 
labels on the argument, since if it doesn't satisfy the domain of the 
function contract, it is the context's fault. The monitor of the 
computation's result retains its original blame labels since a result 
that doesn't satisfy the range of the contract is the function's fault. 
Flat contract checking involves checking the value against the contract 
in an if expression, raising blame if the contract isn't satisfied.
Here is a simple example adapted from [9] that shows why label reversal happens for higher-order contracts:
;; foo : (Int → Int) → Int
;; with contract: ((greater-than-9? → between-0-and-10?) → between-10-and-20?)
(define foo (λ (bar) (+ 10 (bar 0))))When foo invokes bar, foo's greater-than-9? contract fails, but this is due to foo supplying the incorrect value to bar. The greater-than-9? contract is to the left of two arrows in foo's
 contract, and is said to be in an even position. Base contracts in even
 position have had their blame labels reversed by the reduction relation
 an even number of times, so blame will always go to the function.
On the other hand, if we imagine that foo applies bar to 10 and it returns -10, then this will violate between-0-and-10?,
 which is to the left of one arrow, and thus in an odd position. Being 
in an odd position means that the reduction relation has reversed the 
blame labels an odd number of times, leaving blame to be assigned to the
 context that provided the bar argument.
From here [1] introduces SCPCF, an extension of CPCF that include symbolic values. Before we get into that, I'd like to cover the basics of symbolic execution.
2. Introduction to Symbolic Execution
Symbolic execution is a form of program analysis that lies between the extremes of testing a program and proving it correct. The idea is to generalize the notion of testing a program on concrete inputs by replacing them with symbolic input variables. These symbolic variables represent a broad class of concrete inputs. Hence, the result of a program's symbolic execution is equivalent to a large number of normal test cases run over that program.
To run a program symbolically, the execution semantics of the 
language must be extended to let operators work on symbolic inputs, 
producing symbolic formulas as output. Symbolic inputs are arbitrary 
symbols, and symbolic formulas are manipulations over these symbols, 
suited to capture data-specific information. If we are trying to 
symbolically executing arithmetic operations, the formulas outputted by 
symbolic operations will be algebraic formulas, enabling us to leverage 
algebraic simplifications. For instance, extending the concrete version 
of +, δ(+, 4, 5) = 9, to operate over symbolic values returns an algebraic equation: δ(+, α1, α2) = α1 + α2.
Let's look at the following simple arithmetic function:
;; doubleFirst :: Int -> Int -> Int
(define (doubleFirst x y)
  (let ([z (+ x y)])
    (+ x (- z y))))Executing the it with concrete values, (doubleFirst 4 5), leaves us with the following variable assignments: x = 4, y = 5, z = 9, return = 8. With the symbolic execution (doubleFirst α1 α2), the function's variables are assigned to the following algebraic formulas x = α1, y = α2, z = α1 + α2, return = α1 + α1.
So far, the symbolic execution we've described can only handle 
programs with linear control flow. How do we handle branching operations
 in the presence of symbolic values? Symbolic values generally won't 
have enough constraints to determine which branch of a branching 
operation to take, hence we need to add the path conditional (pc)
 to our program's execution state. The path conditional is a boolean 
expression over symbolic values that accumulates constraints which must 
be satisfied in order for the execution to follow the associated path.
Let's run through a symbolic execution of the divCond function below instantiated with symbolic values α1 and α2 and starting with pc = true.
;; divCond :: Int -> Int -> Int
(define (divCond x y)
  (if (== y 10)
    (if (== x 0)
      (error "div by zero")
      (/ y x))
    x))At the (== y 10) expression, y = α2, so we don’t have enough information to branch on it, hence we must fork:
- For the “then” case we set pc = true ∧ (α2 == 10)and process the(if (== x 0) ...)expression, where:- in the “then” case pc = true ∧ (α2 == 10) ∧ (α1 == 0)and an error is raised
- in the “else” case pc = true ∧ (α2 == 10) ∧ (α1 != 0)and a division is computed
 
- in the “then” case 
- the “else” case pc = true ∧ (α2 != 10)and we returnx, which is set toα1
At this point, you might be wondering how such obvious properties could be useful?
Godefroid, Klarlund and Sen [4]
 use randomized program testing paired with symbolic execution to 
efficiently generate concrete test inputs with complete code coverage. 
The idea is to run the program on randomized input while keeping track 
of the path conditional. This execution will cover one path through the 
program, but they are seeking complete path coverage, so they backtrack 
to the last place where a constraint was added to the pc, negate that constraint and generate new test input that satisfies the new pc.
 When the program is run on this new test input, it will take go down 
the other branch of the last conditional, covering an additional path. 
Repeating this process and generating concrete test inputs for each path
 through a program results in a minimal set of path covering test 
inputs.
Cousot, Cousot, Fähndrich and Logozzo [5] use abstract interpretation, a technique similar to symbolic execution, to infer necessary preconditions
 to functions in a 1st order setting. By necessary precondition, we mean
 the smallest possible precondition, which when violated will definitely
 result in an error. The general idea behind their approach, which can 
be described in terms of symbolic execution, is to identify the set of 
path conditions that lead to errors, negate them and set them as 
preconditions to the function. Take for example the symbolic execution 
of divCond, the only error occurs when the pc is (α2 == 10) ∧ (α1 == 0), thus the necessary precondition for divCond is ¬((y == 10) ∧ (x == 0)).
In general, symbolic execution is used widely in model checking and test case generation. Now let's move on to how it can be leveraged to verify contracts.
3. Contracts as symbolic values
We are looking to be able to verify that the composition of known and
 unknown program components respects their specifications. To accomplish
 this, [1] introduces the notion of symbolic values to CPCF, creating SCPCF. These symbolic values, denoted by ●T, represent unknown values of type T. They have arbitrary behavior of type T,
 which is then refined “by attaching a set of contracts that specify an 
agreement between the program and the missing component”.
Prevalues      U ::= ●T | λ X:T . E | 0 | 1 | -1 | ... | tt | ff
Values         V ::= U/Ç    where Ç = {C,...}Prevalues are the normal values of CPFC extended with the symbolic ●T. Values are prevalues extended with a set of contracts. Notationally, V ∙ C will be used to represent U/Ç ∪ {C} where V = U/Ç, and Ç will be omitted from values wherever irrelevant.
The goal is to define the semantics of SCPCF to run with unknown 
values such that it soundly approximates running the same program with 
concrete values. More formally, the semantics are sound, or preserve 
approximation, if given ⊢ V:T ⊑ ●T and ε[V] ⟼* A, then ε[●T] ⟼* A' and A ⊑ A', that is, A' approximates A.
 Hence, we can run a program with unknown values, and if it doesn't 
raise blame, then we know that a concrete instantiation of that program 
will also not raise blame.
4. SCPCF Semantics
We extend the semantics much like described in §2, but with some key differences:
if V E1 E2      ⟼ E1   if δ(false?, V) ∋ ff
if V E1 E2      ⟼ E2   if δ(false?, V) ∋ tt
(λ X:T . E) V   ⟼ E[V/X]
μX:T.E          ⟼ E[μX:T . E/X]
O(V)            ⟼ A if δ(O, V) ∋ A
(●T→T'/Ç) V     ⟼ ●T'/{C2[V/X] | C1 → λ X:T . C2 ∈ Ç}
(●T→T'/Ç) V     ⟼ havocT VBefore δ mapped arithmetic operations to algebraic equations, which satisfy multiple concrete answers. We let the δ
 of SCPCF map operations and arguments to sets of answers. Concrete 
arguments map to concrete singleton sets while symbolic arguments map to
 more abstract sets, for instance δ(+, V1, V2) = {●N}, if V1 or V2 = ●N/Ç
Due to the change of δ, branching becomes non-deterministic in the presence of symbolic values, similar to §2.
 Unlike before though, we don't add information to the path condition at
 branches. This is because we aren't really concerned about gathering 
constraints on symbols at the term level, but rather at the contract 
level.
That said, since contracts use the same language as terms, the theory in [1]
 could be made more precise if branches hoisted their predicates into 
contracts and added them to the path condition upon non-deterministic 
forks. For instance, given foo = ●N and that sqrt has the contract flat(positive?) → flat(positive?), then (if (positive? foo) (sqrt foo) 0) cannot be verified using [1] unless evaluating (positive? foo) resulted in foo ∙ flat(positive?)
The last two rules of the SCPCF semantics deal with applying symbolic
 functions. There are two possible outcomes when a argument V is passed to an unknown function ●T→T':
- Vis used in an unknown manner, but when- Vis used, it returns with no failures. Hence the result of the use of- Vis refined by the range of the contracts over the function- V.
- Alternatively, the usage of Vin an unknown context results in the blame ofV. Possible blame is found using thehavocfunction, which explores the behavior ofVby iteratively applying it to unknown values.
havocB    = μx.x
havocT→T'  = λx:T → T' . havocT'(x ●T)Values of base type can't raise blame in an unknown context, since 
their contract has already been satisfied when they were passed into the
 unknown context. Thus, havocB diverges as a means to not introduce spurious results. At the function type, havoc
 “produces a function that applies its argument to an appropriately 
typed unknown input value and then recursively applies havoc at the 
result type to the output”. This soundly explores possible blame 
behavior of values in unknown “demonic” contexts.
Let's slowly work through the example presented in [1] to see how havoc will discover blame, if it exists:
Let hoExample be a higher-order function with the contract (any→any)→any and let sqrt have the contract positive? → positive?.
hoExample:(N→N)→N =
  λ f:N→N . mon(any,
                (λ f:N→N . (sqrt (f 0))) (λ x:N mon(any, f mon(any, x))))We want to check that in any arbitrary context, hoExample cannot be blamed, hence we pass it to havoc((N→N)→N).
1.  havoc(N→N)→N hoExample
2.  havocN (hoExample ●N→N)
3.         mon(any, (λ f:N→N . (sqrt (f 0))) (λ x:N mon(any, ●N→N mon(any, x))))
4.                  (sqrt ((λ x:N mon(any, ●N→N mon(any, x))) 0))
5.                  (sqrt mon(any, ●N→N mon(any, 0)))
6.                  (sqrt mon(any, (●N→N 0)))
7a.  ...     ...    (sqrt havocN 0)
7b.                 (sqrt ●N∙any)
8.                  ((λ x:N . mon(positive?, sqrt mon(positive? x))) ●N∙any)
9.                  mon(positive?, sqrt mon(positive?, ●N∙any))
10. havocN mon(any, blame hoExample)In step 2, havoc applies ●N→N to hoExample, using havocN to ensure divergence if no blame is found. Step 3 and 4 substitute ●N→N into hoExample, wrapping it in a any→any monitor. 0 passes the any contract in steps 5 and 6. In step 7, the evaluation of the symbolic function ●N→N applied to 0 introduces a fork in the reduction relation: on one branch havocN is applied to 0 to check if it introduces blame, and on the other, (●N→N 0) evaluates with no blame, resulting in ●N∙any. havocN 0 diverges, so we continue with the second branch. In step 8 and 9 ●N∙any is passed to sqrt, but the contract check (positive? ●N∙any) both passes and raises blame, so validation of hoExample fails.
If we change hoExample’s contract to (any→positive?)→any, then in step 7b the symbolic value ●N would be refined with the contract positive?, enabling the contract check of step 9 to pass. This is made possible with the introduction of the ⊢ V : C ✓ relation presented below.
5. Contract checking in SCPCF
Adding non-deterministic branching at if statements in 
the presence of symbolic values means that the semantics of flat 
contract checking becomes quite imprecise, if left as is. Using the 
contract checking semantics of §1 with symbolic values means that the following expression will incorrectly raise blame twice:
(monhf,g (flat(prime?) → flat(even?), primePlus1)) (monhf,g (flat(prime?) ●N))
Once when monitoring ●N to see if it satisfies flat(prime?), which results in both success and failure since (prime? ●N) = {tt, ff}, and the other when ●N is checked to satisfy the domain of flat(prime?) → flat(even?), as it’s passed to primePlus1.
To remedy this, we set up the path condition to keep track of when symbolic values satisfy a given contract. With this information, future contract checks can be ruled out, thus eliminating non-deterministic branching and unsound blame.
The path condition in [1] is formulated by refining symbolic values with contracts (V∙C) once a flat contract check passes. This information is then used in the following judgement relation, which says that V provably satisfies the contract C:
   C ∈ Ç
───────────
 ⊢ V : C ✓With these additions we can modify the contract checking semantics, remembering successful contract checks and avoiding the imprecision introduced by redundant contract checks:
monhf,g (C,V)                 ⟼ V if ⊢ V : C ✓
monhf,g (flat(E),V)           ⟼ if (E V)
                                  (V ∙ flat(E))    | where ⊬ V : flat(E) ✓
                                  blamegf           |
monhf,g (C1 ⟼ λ X:T . C2, V) ⟼ λ X:T . monhf,g (C2, V monhg,f (C1, X))
                                   where ⊬ V : C1 ⟼ λ X:T . C2 ✓6. Soundness of SCPCF
As noted in §3, we are striving for a symbolic execution semantics that soundly approximates the concrete execution semantics. By sound approximation we mean that if the symbolic semantics doesn't raise blame during an execution, then the concrete semantics is guaranteed to also not raise blame. Of course it is an approximation, so it is acceptable for the symbolic semantics to find blame when it doesn't actually exists.
In order to prove that the symbolic execution semantics soundly 
approximates the concrete execution semantics when concerned with blame,
 we develop an approximation relation ⊑ where A ⊑ A' means that A' approximates, or is less precise than A:
                  ⊢ V : T
                 ──────────        ───────────────────
                  ⊢ V ⊑ ●T          mon(C, E) ⊑ ● ∙ C
                          ⊢ V : C ✓             mon(C, E) ⊑ E'
     ───────────         ───────────       ────────────────────────
      V∙C ⊑ V              V ⊑ V∙C          mon(C, E) ⊑ mon(C, E')
           ──────────────────────────────────────────────────
            (λ X . mon(D, (V mon(C, X)))) ⊑ ● ∙ C → λ X . DFrom top to bottom, left to right, the rules are as follows:
- unknown values of type T approximate concrete values of type T
- unknown values refined by a contract approximate expressions monitored with the same contract
- contract refinements may be introduced on the value being approximated
- refinements may be eliminated from the approximating value when that value already proves the contract
- a contract monitor may be introduced to the approximation expression if the expression already approximates a value wrapped in the same contract monitor
- unknown values refined with function contracts approximate a value monitored by a partially evaluated function contract
The soundness theorem:
E ⊑ E' ∧ E ⟼* A implies ∃ A' . E' ⟼* A' ∧ A ⊑ A'is proved by case analysis on the reduction semantics and utilizing 
havoc's completeness, as well as auxiliary lemmas that ensure 
substitution and basic operations preserve approximation. By havoc's 
completeness, we mean ε[V] ⟼* ε'[blame l] where l is not in ε, then havoc V ⟼* ε''[blame l].
As a small aside, last semester I used Coq to prove that a 
non-deterministic semantics for a simple, typed language with unknown 
values (●T) soundly approximates its concrete counterpart (coq code on github).
And that is pretty much it! The symbolic semantics are not guaranteed to terminate, so the authors of [1] integrate the orthogonal technique of abstracting abstract machines [8]. Additionally, [1] goes on to extend their technique to an untyped core model of Racket, which makes things more complicate in the absence of type information.
8. Conclusion
In this post I've presented the fundamental ideas from “Higher-Order Symbolic Execution via Contracts” by Tobin-Hochstadt and Van Horn.
This paper introduced the idea of a modular analysis for programs 
with higher-order contracts that enables contract verification in the 
presence of unknown components. Such a verification enables the safe 
omission of select contract checks and also provides an analysis where 
individual components can be verified for later composition. Symbolic 
execution with contracts as symbol refinements is the secret weapon, 
with other key components being demonic contexts (havoc) to discover blame, the approximation relation (⊑) to prove soundness, and abstract interpretation to guarantee termination.
Possible future work:
- Adapting the work on necessary precondition inference by [5] to work in a higher-order setting. This could be used to strengthen the contracts of known components so that their composition with unknown components can be successfully verified.
- Hoisting arbitrary predicate checks to the contract level and adding them to the path condition, or contract set of a value, as described in §4.
Questions:
- How does symbolic execution traditionally deal with non-termination?
- Can havoc be made more precise by integrating path covering test case generation similar to [4]?
- Would weakening of contracts ever increase the ability of the analysis to verify a component's contracts?
Links
- [1] “Higher-Order Symbolic Execution via Contracts” by Tobin-Hochstadt and Van Horn 2012
- [2] “Symbolic Execution and Program Testing” by King 1976
- [3] “Contracts as Pairs of Projections” by Findler and Blume 2006
- [4] “DART: Directed Automated Random Testing” by Godefroid, Klarlund and Sen 2005
- [5] “The Design and Implementation of a System for the Automatic Inference of Necessary Preconditions” by Cousot, Cousot, Fähndrich and Logozzo 2013
- [6] “Compositional and Lightweight Dependent Type Inference for ML” by Zhu and Jagannathan 2013
- [7] “Complete Monitors for Behavioral Contracts” by Dimoulas, Tobin-Hochstadt, and Felleisen 2012
- [8] “Abstracting Abstract Machines” by Might and Van Horn 2011
- [9] “Contracst for Higher-Order Functions” by Findler and Felleisen 2002