Previous: Register Allocation for Deferred Compilation Up: Strategies for Deferred Compilation
We have generalized destination-driven code generation [DHB90] to produce specialized run-time code generators (henceforth simply called generators) that do not manipulate any representation of the source program at run time. The algorithm is surprisingly straightforward because it obeys staging annotations rather blindly. As an expression is traversed, ``early'' operations are converted to machine code that performs the appropriate computation, while ``late'' operations are compiled into code that emits the machine instructions that will eventually perform the computation.
As the example in
Section 2 demonstrates, this simple
technique produces highly effective run-time optimizations. These
optimizations are more powerful that those found in many template
compilers [KEH91], and eliminating the need for
run-time processing of an intermediate representation or template can
yield much faster code generation. Many conventional peephole
optimizations, such as strength reduction and instruction selection,
can easily be incorporated. For example, a generator can avoid
emitting a multiplication involving a value from an earlier stage
if it takes the time to determine whether
. The increased cost
of such run-time optimizations must be weighed against their benefit;
a staging analysis that determines where to aggressively optimize
would facilitate such decisions.
A generator that emits native machine code in a single pass will be faster than one that builds an intermediate representation, performs analysis and optimization, and then generates machine instructions. However, it can be difficult to produce good quality native code in a single pass. Branches and procedure calls are problematic because the destination may be code that has not yet been compiled. Due to run-time inlining and loop unrolling the generator may not be able to predict where the target code will eventually be located, so run-time backpatching is necessary. Span dependent instructions are challenging for similar reasons. Good instruction scheduling is also difficult to achieve in a single pass. Although a schedule can be ``hard-wired'' into generators for straight-line blocks, scheduling across basic-block boundaries requires more general techniques.
We are also investigating the adaptation of inlining and loop unrolling algorithms to deferred compilation. In conventional compilers such techniques yield increased opportunities for optimization and improve the amortization of various computations, such as range checks. Our preliminary work suggests that similar benefits can be obtained by run-time inlining and loop unrolling. It can be difficult to statically determine where to inline or how far to unroll a loop. The use of run-time information to guide such decisions may prove to be of significant benefit. We have augmented the compile-time code generator described above with the partial evaluation technique of unfolding [JGS93][BD91], a form of inlining.
In some contexts it is impractical to inline a function yet still desirable to optimize it based upon the results of earlier computations. For example if a function is called at many different program points with the same value from an early computation, it may be preferable to generate a single optimized version of the function rather than inlining its body at each call site. This strategy is commonly called specialization [JSS89]. Specialization also permits run-time-optimized code to be reused rather than regenerated, which saves both space and time. The memoization of run-time code generators is a simple way to achieve such reuse. Run-time memoization can be quite expensive, particularly when memoizing on structured data [Mal93]. Nevertheless, preliminary experiments indicate that it is worthwhile in some applications. The development of static and dynamic strategies for controlling memoization is an interesting open problem.
