Previous: Strategies for Deferred Compilation Up: Deferred Compilation:
The Automation of Run-Time Code Generation Next: Costs of Deferred Compilation
The FABIUS source language is a rudimentary, strict, first-order functional language. Integers and pointers to heap-allocated structures are the only run-time values; FABIUS does not support arrays or assignment. We have currently limited our attention to two-stage programs, so that the problem of staging analysis becomes one of binding-time analysis [NN92]. The staging analysis also determines how function calls should be treated by the code generator. An aggressive heuristic is used to determine which function applications should be inlined: function calls in the branches of ``late'' conditionals are specialized, but all other calls are inlined [BD91]. All analysis is automatic, requiring no programmer intervention.
All register allocation and assignment occurs at compile-time; registers are assigned independently to variables in early and late computations. In keeping with the focus on fast run-time code generation, very few optimizations are applied at run time. The primary optimizations are ``constant'' propagation, ``constant'' folding, dead-code elimination, and function inlining. Loops are expressed as tail-recursive functions, so inlining effectively yields loop unrolling. We have ignored the issue of instruction scheduling for the moment; we assume an idealized RISC machine with no delay slots (see Appendix A). Run-time code generation occurs in a single pass; no intermediate representation is constructed and no analysis or optimization is performed on code after it is generated.
Our preliminary results are encouraging. As an example we consider
vector-matrix multiplication, which is often used to implement
matrix-matrix multiplication and is common in scientific computing
applications. Berlin and Weise have investigated improvements to
similar scientific code through compile-time application of partial
evaluation [BW90]. Vector-matrix multiplication is a prime
candidate for run-time code generation because the vector is fixed
throughout the computation, and the loop that computes the inner
product of the vector with a row or column from the matrix can be
completely unrolled. Such optimizations cannot usually be
performed at compile time, however, because the sizes and contents of
the vector and matrix are generally not statically apparent. The
source code for the example is given in Appendix B, along with one of
the run-time code-generators produced by FABIUS.
The above figure compares the total execution time of vector-matrix
multiplication for varying input sizes under conventional and deferred
compilation. The inputs were vectors of length and square
matrices of dimension
containing pseudo-random integers, and the
execution times are given in machine cycles (see Appendix A). The
``conventionally compiled'' code was produced by disabling the
FABIUS staging analysis and is of high quality. The dotted line
represents the portion of time spent generating code at run time; this
time is included in the total execution time of the code produced by
deferred compilation. As the figure demonstrates, deferred
compilation can yield significant improvement in overall execution
time even for small problem sizes. In this case the cost of run-time
code generation was recouped when multiplying a 16 element vector with
a 16
16 matrix. The speedup increases linearly with larger
input sizes (ignoring the secondary costs detailed in
Section 5), yielding a speedup of greater than 20% when
.
The amount of dynamically allocated data space was roughly the same
under conventional and deferred compilation. However, as expected, we
observed a significant increase in code size. The conventionally
compiled code occupied just over 50 words; under deferred compilation
the size of the static code rose to nearly 275 words and the size of
the run-time-generated code rose linearly from 250 words to
approximately 800 words as ranged from 4 to 32. Increases of this
magnitude are to be expected when aggressively inlining,
since we are trading space for time, but it remains to be seen whether
such increases are manageable in larger applications. More
extensive experimentation is currently underway.
