Previous: Introduction Up: Deferred Compilation: The Automation of Run-Time Code Generation Next: Strategies for Deferred Compilation

An Example

A simple example illustrates some of the techniques employed by deferred compilation. Consider a program that contains a (tail-recursive) definition of the exponentiation function:

A conventional compilation of power might yield the following machine code:


     power:   beq     r1, r0, L1         ; if exp = 0 goto L1
              sub     r1, r1, 1          ; exp = exp - 1
              mul     r3, r3, r2         ; accum = accum * base
              jmp     power              ; goto power
     L1:      move    r1, r3             ; result = accum
              ret                        ; return
Suppose the program calls power repeatedly, but with the first argument changing more slowly than the second argument. This would arise, for example, in a loop where each iteration computes a new base and calls power without varying the exponent. One can also imagine a curried version of power which is applied to an exponent value and then passed to a mapping function. In such situations, we say that the first argument is computed in an early stage and the second argument is computed in a late stage.

A staging analysis can be used to identify such computation stages and label those subexpressions in the program that depend only on the early arguments, as opposed to those that require late argument values. In the case of just two stages, this labeling of early and late computations corresponds precisely to a binding-time analysis and annotation [NN92][Con93], and in fact our prototype compiler incorporates a binding-time analyzer.

Early computations are compiled in the normal way, but late computations are translated into code that emits the corresponding instructions at run time. In this example, since the exponent is available at an early stage, the conditional test and subtraction expressions are compiled normally, but the compilation of the multiplication expression is deferred to run time. In the simplest form of deferred compilation, we might obtain the following code:

 
     powgen:  beq     r1, r0, L1 
              sub     r1, r1, 1 
              emit    mul    r3, r3, r2 
              jmp     powgen 
     L1:      emit    move   r1, r3 
              ret
Note that the only difference between power and powgen is that the multiplication instruction is emitted (perhaps many times) instead of being executed, as is the instruction that moves the accumulator to the result register. When called with exp = 5, powgen completely unrolls the loop and generates code with all ``constants'' folded and all ``dead code'' eliminated:

              mul     r3, r3, r2
              mul     r3, r3, r2
              mul     r3, r3, r2
              mul     r3, r3, r2
              mul     r3, r3, r2
              move    r1, r3

Deferred compilation can be fast enough to pay off quickly. On a typical RISC architecture a fixed-argument instruction can be emitted in as few as four cycles (see Appendix A). Under this assumption the costs incurred by powgen are recovered after only three iterations of the run-time-generated code when exp = 5.

Making deferred compilation practical for a wide variety of programs is more of a challenge than this simple example might imply. Here we see that run-time loop unrolling can be highly profitable, but clearly there are limits; if pursued too aggressively, the run-time overhead may exceed the performance gain of the dynamically generated code. Another complication stems from the fact that real-world programs often contain many more than the two stages of computation exhibited by this example, a large number of which may benefit from run-time optimization. Thus, a conventional binding-time analysis is not, in general, powerful enough for our needs.

The next section discusses these issues in more detail and proposes several strategies for addressing them. We also examine how a wider range of optimizations and code generation techniques can be adapted to deferred compilation. The effectiveness of some of these techniques is examined in the context of a more realistic example in Section 4.

mleone@cs.cmu.edu