Previous: Strategies for Deferred Compilation Up: Deferred Compilation: The Automation of Run-Time Code Generation Next: Costs of Deferred Compilation

Implementation

We have implemented a prototype compiler called FABIUS that incorporates many of the deferred compilation strategies described in the previous section, as described below. The primary goal of FABIUS is to reduce the run-time cost of code generation to a minimum, at the cost of some degradation in the quality of the generated code and an increase in the size of both the generating and the generated code. This provides a baseline for the evaluation of compilers that perform more aggressive run-time optimizations.

The FABIUS source language is a rudimentary, strict, first-order functional language. Integers and pointers to heap-allocated structures are the only run-time values; FABIUS does not support arrays or assignment. We have currently limited our attention to two-stage programs, so that the problem of staging analysis becomes one of binding-time analysis [NN92]. The staging analysis also determines how function calls should be treated by the code generator. An aggressive heuristic is used to determine which function applications should be inlined: function calls in the branches of ``late'' conditionals are specialized, but all other calls are inlined [BD91]. All analysis is automatic, requiring no programmer intervention.

All register allocation and assignment occurs at compile-time; registers are assigned independently to variables in early and late computations. In keeping with the focus on fast run-time code generation, very few optimizations are applied at run time. The primary optimizations are ``constant'' propagation, ``constant'' folding, dead-code elimination, and function inlining. Loops are expressed as tail-recursive functions, so inlining effectively yields loop unrolling. We have ignored the issue of instruction scheduling for the moment; we assume an idealized RISC machine with no delay slots (see Appendix A). Run-time code generation occurs in a single pass; no intermediate representation is constructed and no analysis or optimization is performed on code after it is generated.

Our preliminary results are encouraging. As an example we consider vector-matrix multiplication, which is often used to implement matrix-matrix multiplication and is common in scientific computing applications. Berlin and Weise have investigated improvements to similar scientific code through compile-time application of partial evaluation [BW90]. Vector-matrix multiplication is a prime candidate for run-time code generation because the vector is fixed throughout the computation, and the loop that computes the inner product of the vector with a row or column from the matrix can be completely unrolled. Such optimizations cannot usually be performed at compile time, however, because the sizes and contents of the vector and matrix are generally not statically apparent. The source code for the example is given in Appendix B, along with one of the run-time code-generators produced by FABIUS.

The above figure compares the total execution time of vector-matrix multiplication for varying input sizes under conventional and deferred compilation. The inputs were vectors of length and square matrices of dimension containing pseudo-random integers, and the execution times are given in machine cycles (see Appendix A). The ``conventionally compiled'' code was produced by disabling the FABIUS staging analysis and is of high quality. The dotted line represents the portion of time spent generating code at run time; this time is included in the total execution time of the code produced by deferred compilation. As the figure demonstrates, deferred compilation can yield significant improvement in overall execution time even for small problem sizes. In this case the cost of run-time code generation was recouped when multiplying a 16 element vector with a 1616 matrix. The speedup increases linearly with larger input sizes (ignoring the secondary costs detailed in Section 5), yielding a speedup of greater than 20% when .

The amount of dynamically allocated data space was roughly the same under conventional and deferred compilation. However, as expected, we observed a significant increase in code size. The conventionally compiled code occupied just over 50 words; under deferred compilation the size of the static code rose to nearly 275 words and the size of the run-time-generated code rose linearly from 250 words to approximately 800 words as ranged from 4 to 32. Increases of this magnitude are to be expected when aggressively inlining, since we are trading space for time, but it remains to be seen whether such increases are manageable in larger applications. More extensive experimentation is currently underway.

mleone@cs.cmu.edu