Previous: Introduction Up: Deferred Compilation:
The Automation of Run-Time Code Generation Next: Strategies for Deferred Compilation
A conventional compilation of power might yield the following machine
code:
power: beq r1, r0, L1 ; if exp = 0 goto L1
sub r1, r1, 1 ; exp = exp - 1
mul r3, r3, r2 ; accum = accum * base
jmp power ; goto power
L1: move r1, r3 ; result = accum
ret ; return
Suppose the program calls power repeatedly, but with the first
argument changing more slowly than the second argument. This would arise,
for example, in a loop where each iteration computes a new base and calls
power without varying the exponent. One can also imagine a curried
version of power which is applied to an exponent value and then
passed to a mapping function. In such situations, we say that the
first argument is computed in an early stage and the second
argument is computed in a late stage.
A staging analysis can be used to identify such computation stages and label those subexpressions in the program that depend only on the early arguments, as opposed to those that require late argument values. In the case of just two stages, this labeling of early and late computations corresponds precisely to a binding-time analysis and annotation [NN92][Con93], and in fact our prototype compiler incorporates a binding-time analyzer.
Early computations are compiled in the normal way, but late computations
are translated into code that emits the corresponding instructions at
run time. In this example, since the exponent is available at an early
stage, the conditional test and subtraction expressions are compiled
normally, but the compilation of the multiplication expression is deferred
to run time. In the simplest form of deferred compilation, we might obtain
the following code:
powgen: beq r1, r0, L1
sub r1, r1, 1
emit mul r3, r3, r2
jmp powgen
L1: emit move r1, r3
ret
Note that the only difference between power and powgen is that
the multiplication instruction is emitted (perhaps many times) instead of
being executed, as is the instruction that moves the accumulator to the
result register. When called with exp = 5, powgen
completely unrolls the loop and generates code with all ``constants''
folded and all ``dead code'' eliminated:
mul r3, r3, r2
mul r3, r3, r2
mul r3, r3, r2
mul r3, r3, r2
mul r3, r3, r2
move r1, r3
Deferred compilation can be fast enough to pay off quickly. On a typical RISC architecture a fixed-argument instruction can be emitted in as few as four cycles (see Appendix A). Under this assumption the costs incurred by powgen are recovered after only three iterations of the run-time-generated code when exp = 5.
Making deferred compilation practical for a wide variety of programs is more of a challenge than this simple example might imply. Here we see that run-time loop unrolling can be highly profitable, but clearly there are limits; if pursued too aggressively, the run-time overhead may exceed the performance gain of the dynamically generated code. Another complication stems from the fact that real-world programs often contain many more than the two stages of computation exhibited by this example, a large number of which may benefit from run-time optimization. Thus, a conventional binding-time analysis is not, in general, powerful enough for our needs.
The next section discusses these issues in more detail and proposes several strategies for addressing them. We also examine how a wider range of optimizations and code generation techniques can be adapted to deferred compilation. The effectiveness of some of these techniques is examined in the context of a more realistic example in Section 4.
