$!"

$#"

%!"

%#"

&'()*+,"

-./"

01"

Figure 10. Speedup vs. a baseline JavaScript interpreter (SpiderMonkey) for our trace-based JIT compiler, Apple’s SquirrelFish Extreme

inline threading interpreter and Google’s V8 JS compiler. Our system generates particularly efﬁcient code for programs that beneﬁt most from

type specialization, which includes SunSpider Benchmark programs that perform bit manipulation. We type-specialize the code in question

to use integer arithmetic, which substantially improves performance. For one of the benchmark programs we execute 25 times faster than

the SpiderMonkey interpreter, and almost 5 times faster than V8 and SFX. For a large number of benchmarks all three VMs produce similar

results. We perform worst on benchmark programs that we do not trace and instead fall back onto the interpreter. This includes the recursive

benchmarks access-binary-trees and control-flow-recursive, for which we currently don’t generate any native code.

In particular, the bitops benchmarks are short programs that per-

form many bitwise operations, so TraceMonkey can cover the en-

tire program with 1 or 2 traces that operate on integers. TraceMon-

key runs all the other programs in this set almost entirely as native

code.

regexp-dna is dominated by regular expression matching,

which is implemented in all 3 VMs by a special regular expression

compiler. Thus, performance on this benchmark has little relation

to the trace compilation approach discussed in this paper.

TraceMonkey’s smaller speedups on the other benchmarks can

be attributed to a few speciﬁc causes:

•

The implementation does not currently trace recursion, so

TraceMonkey achieves a small speedup or no speedup on

benchmarks that use recursion extensively: 3d-cube, 3d-

raytrace, access-binary-trees, string-tagcloud, and

controlflow-recursive.

•

The implementation does not currently trace eval and some

other functions implemented in C. Because date-format-

tofte and date-format-xparb use such functions in their

main loops, we do not trace them.

•

The implementation does not currently trace through regular

expression replace operations. The replace function can be

passed a function object used to compute the replacement text.

Our implementation currently does not trace functions called

as replace functions. The run time of string-unpack-code is

dominated by such a replace call.

•

Two programs trace well, but have a long compilation time.

access-nbody forms a large number of traces (81). crypto-md5

forms one very long trace. We expect to improve performance

on this programs by improving the compilation speed of nano-

jit.

•

Some programs trace very well, and speed up compared to

the interpreter, but are not as fast as SFX and/or V8, namely

bitops-bits-in-byte, bitops-nsieve-bits, access-

fannkuch, access-nsieve, and crypto-aes. The reason is

not clear, but all of these programs have nested loops with

small bodies, so we suspect that the implementation has a rela-

tively high cost for calling nested traces. string-fasta traces

well, but its run time is dominated by string processing builtins,

which are unaffected by tracing and seem to be less efﬁcient in

SpiderMonkey than in the two other VMs.

Detailed performance metrics. In Figure 11 we show the frac-

tion of instructions interpreted and the fraction of instructions exe-

cuted as native code. This ﬁgure shows that for many programs, we

are able to execute almost all the code natively.

Figure 12 breaks down the total execution time into four activ-

ities: interpreting bytecodes while not recording, recording traces

(including time taken to interpret the recorded trace), compiling

traces to native code, and executing native code traces.

These detailed metrics allow us to estimate parameters for a

simple model of tracing performance. These estimates should be

considered very rough, as the values observed on the individual

benchmarks have large standard deviations (on the order of the