Modeling "The Essence of Compiling with Continuations"

December 6, 2012

Understanding technical papers is difficult. It is easy to convince yourself that you understand a paper, but even easier to shatter this illusion by attempting to run through the paper’s figures by hand. Figures often contain subtleties, omissions, and unfortunately typos that need to be digested and worked through.

This became apparent to me while working through the classic paper “The Essence of Compiling with Continuations” for Matthias Felleisen's Principles of Programming Languages course. As a means to better comprehend this paper, my project partner Erik Silkensen and I mechanically modeled the paper's figures in Redex.

In this post I will present the paper and link to the corresponding executable models that can be found on GitHub as I go.

Continuation passing style (CPS)

Compilers transform high level source languages, such as Java or Haskell into lower level target languages, such as JVM or LLVM. Intermediate language forms are often used to enable the application of generalized optimizations. A popular intermediate form is continuation passing style (CPS), in which procedures don't return but pass their result to a caller-provided continuation or call-back function. Take for instance, a function add1 = λ x . (+ x 1), in CPS it would look like add1' = λ x k . (k (+ x 1)), where k is the provided continuation. Aside from enabling many optimizations, CPS makes control flow of programs explicit and is easy to translate into assembly. For more intuition on CPS, Matt Might has several excellent articles on CPS by example, compiling with CPS and implementing language features using CPS.

The Essence of Compiling with Continuations

An optimizing CPS transformation usually takes multiple passes. Flanagan et. al. show in “The Essence of Compiling with Continuations” that one can forgo a standard 3-pass CPS transformation while still capturing the essence of compiling with continuations by doing a single source level transformation to A-Normal Form (ANF).

To show this, the authors present a Scheme-like language CS, CPS convert it into the CPS(CS) language. Then they incrementally optimize an abstract machine that operates over the CPS(CS) language, arriving at the $C_{c p s} E K$ $C_{c p s} E K$ machine. To close they prove its equivalence to the $C_{a} E K$ $C_{a} E K$ machine, which operates over ANF programs in the A(CS) language.

Below is the initial Scheme-like CS Language, which has expressions M, values V and operations O.

M ::= V | (let (x M) M) | (if0 M M M) | (M M ...) | (O M ...)
V ::= number | x | (λ (x ...) M)
O ::= + | - | * | /

CPS transformations

To convert the CS language into CPS, we begin with the naive CPS translation F that adds continuation to terms in the language.

F[V]               = λ k . (k Φ[V])
F[(let (x M1) M2)] = λ k . (F[M1] (λ t . (let (x t) (F[M2] k))))
F[(if0 M1 M2 M3)]  = λ k . (F[M1] (λ t . (if0 t (F[M2] k) (F[M3] k))))
F[(M M1)]          = λ k . (F[M] (λ t . (F[M1] (λ t1 . (t k t1)))))
F[(0 M)]           = λ k . (F[M] (λ t . (O' k t)))))

Φ[c]       = c
Φ[x]       = x
Φ[λ x . M] = λ k x . (F[M] k)

It is naive because it introduces many administrative λ terms that can be simplified away by applying β-reductions. For instance, F(λ x . x) will result in:

(λ k3 . (k3 (λ k2 x . ((λ k1 . ((λ k . (k x)) (λ t . (t k1)))) k2))))

which can then be simplified to:

(λ k3 . (k3 (λ k2 x . (x k2))))

by applying β-reductions to λs introduced by the F function wherever possible.

The resulting language after applying the F function to CS terms is CPS(CS), where there are expressions P, values W and the CPS equivalent of operators found in CS O':

P  ::= (k W) | (let (x W) P) | (if0 W P P)
     | (W k W ...) | (W (λ (x) P) W ...)
     | (O' k W ...) | (O' (λ (x) P) W ...)
W  ::= number | x | (λ (k x ...) P)
O' ::= +' | -' | *' | /'

Optimizing abstract machines for CPS(CS)

Continuations are explicitly present in terms of the CPS(CS) language, hence control and environment are the only components needed to create an abstract machine capable of evaluating CPS(CS) programs. The authors begin with such a machine, the $C_{c p s} E$ $C_{c p s} E$ machine, but quickly notice that continuation variables are often redundantly bound in the environment. There is no need when evaluating ((λ k1 x . P) k2 y) to bind k1 := k2 in the environment since they point to the same continuation. Addressing this, a control stack is added, resulting in a $C_{c p s} E K$ $C_{c p s} E K$ machine in which continuations are no longer stored in the state's environment. From there they notice that since the machine is keeping track of the program's continuations, continuation variables found in CPS(CS) terms are not used to lookup continuations in the environment. Hence an unCPS transformation is designed to remove explicit continuations from terms and a subsequent $C_{a} E K$ $C_{a} E K$ machine is formulated.

UnCPS

In the $C_{c p s} E$ $C_{c p s} E$ machine, the continuation variables found in terms are used to lookup continuations in the environment. Migrating to the $C_{c p s} E K$ $C_{c p s} E K$ machine means that these continuations are already managed by the state's control stack. This leaves continuation variables in terms useless, so let's drop them. What about the remaining terms that have with continuations of the form λ x . P? These can be transformed into let expressions by binding the current computation to the continuation's free x and putting the continuation body in the body of the let:

(W (λ x . P) W ...) → (let (x (W W ...)) P)

Doing so means that continuations are no longer passed around but become the surrounding context, syntactically linearizing the program. This continuation removal transformation can be formalized as the $U$ $U$ metafunction found in our Redex models. For instance applying $U$ $U$ to the following CPS expression:

(+' (λ t1 . (let (x 1) (f (λ t2 . (+' k t1 t2)) x))) 2 2)

results in the following CS expression:

(let (t1 (+ 2 2))
  (let (x 1)
    (let (t2 (f x)))
      (+ t1 t2)))

The range of this function is the A(CS) language, which is a stricter subset of the CS language.

A-Normal Form (ANF) language

M ::= V | (let (x V) M) | (if0 V M M)
    | (V V ...) | (let (x (V V ...)) M)
    | (O V ...) | (let (x (O V ...)) M)
V ::= number | x | (λ (x ...) M)

The A(CS) language that results from unCPSing the CPS(CS) language holds some interesting properties. It maintains the sequential structure of CPS code implicitly, without continuations. Programs in the A(CS) language are constrained such that non-value M terms (computations) must be let-bound or appear in tail position. These properties enable certain optimizations, such as being able to write abstract machines that take larger steps.

Direct ANF translation

So far, in order to translate into the A(CS) language we've had to convert to CPS, simplify administrative λs, and translate out of CPS via the unCPS function $U$ $U$ . One must wonder, can we forgo CPS completely and translate CS to A(CS) in one source-level pass? Flanagan et. al. show this is possible by presenting a strongly normalizing A-reduction relation which takes CS terms to A(CS) terms using the evaluation context ε. The evaluation context ε is defined such that the holes represent the next reducible expression according to the original CS language's $C_{C S} E K$ $C_{C S} E K$ machine. For instance, when the $C_{C S} E K$ $C_{C S} E K$ machine sees a term (if0 (- 1 2) 2 3) it will reduce (- 1 2) first.

ε ::= [] | (let (x ε) M) | (if0 ε M M) | (F V ... ε M ...)
      where F = V or F = O

The idea is to create an reduction relation, A-Reduction, where we bind intermediate computations found in context holes to variables and replace these computations with their binding variables. This results in programs with a linearized control flow that only require one continuation structure in the corresponding abstract machine implementation ( $C_{a} E K$ $C_{a} E K$ ).

A-Reduction:

(A1)  ε[(let (x M) N)] →a (let (x M) ε[N])
                          where ε ≠ [] and x ∉ FreeVariables(ε)
(A2)  ε[(if0 V M1 M2)] →a (if0 V ε[M1] ε[M2])
(A3)  ε[(F V ...)]     →a (let (t (F V ...)) ε[t])
                          where F = V or F = O, ε ≠ [], t ∉ FreeVariables(ε),
                                ∀ ε' . ε ≠ ε'[(let (z []) M)]

The A-Reduction is comprised of 3 rules:

(A1) moves let-bindings out of the current context, as long as we make sure to pick a fresh x and substitute it in N. This rule helps us lift let terms out of computational locations so that we no longer need the lt continuation found in the $C_{C S} E$ $C_{C S} E$ machine.
(A2) moves if terms out of the current context, duplicating the context for each branch. This rule enables us to get ride of the if continuation found in the $C_{C S} E$ $C_{C S} E$ machine.
(A3) binds the result of intermediate computations to variables and lifts them out of the current context. We need to make sure that the hole in ε isn't of the shape ε'[(let (x []) M)] for some arbitrary ε', or else we would be unnecessarily be introducing let-bindings. This rule ensures that applications are only found in the body of let expressions or in tail position.

All the rules enforce the side-condition that ε isn't empty because if it was the reduction relation would never terminate.

The A-Reduction is a reduction relation over the entire program term, and it turns out that once we apply any of the →a rules, we are left with a term that can't be matched by the ε context. This is due to the fact that the ε context only matches terms not yet in ANF and the →a rules introduce terms in ANF. In order to model this in Redex, we were forced to add a ψ context to let us match over terms already in ANF:

ψ ::= [] | (let (x V) ψ) | (let (x (F V ...)) ψ) | (if0 V ψ M) | (if0 V M ψ)
      where F = V or F = O

The A-Reduction rules can then be wrapped in a ψ context, allowing for terms already in ANF to be put in the ψ context so reductions on non-ANF terms can be made. This results in the Adapted A-Reduction which we were able to model in a straight forward manner in Redex:

ψ[M1] → ψ[M2] when M1 →a M2

Typos

The typos we found in “The Essence of Compiling with Continuations” are quite minor, the only reason I mention them is to motivate Redex's LaTeX exporting features.

In the CPS transformation figure of the paper (Figure 3), the auxiliary Φ function reads

Φ[λ x . M] = λ k x . F[M]

failing to use the k variable bound by the λ, and should hence read

Φ[λ x . M] = λ k x . (F[M] k)

Additionally the $U$ $U$ function calls an auxiliary Ψ function that unCPSs values. The λ case of Ψ reads Ψ[λ k x . P] = λ x . U[M] but clearly should be Ψ[λ k x . P] = λ x . U[P].

If Redex was available in 1993, such typos could have have been avoided by creating Redex models and then exporting them as LaTeX figures, in the flavor of Run Your Research.

Resources and Further Reading

Our Redex Models on GitHub
Matt Might’s blog post on A Normalization
Matt Might’s blog posts on CPS: CPS by example, compiling with CPS and implementing language features using CPS.
The paper: “The Essence of Compiling with Continuations”
A preceding paper that introduces unCPS: “Reasoning about programs in continuation-passing style”
Semantics Engineering With PLT Redex textbook