Two-Level Just-in-Time Compilation With One Interpreter and One Engine
Two-Level Just-in-Time Compilation With One Interpreter and One Engine
Two-Level Just-in-Time Compilation With One Interpreter and One Engine
is shown in Listing 1. When developers write the generic traverse stack resulting trace
start
interpreter, they first declare an instance of JitTierDriver A
class that has a field pc. It tells adaptive RPython funda-
A
pc (B → D)
pu
mental information such as the variable name of the pro- B
s
h
gram counter. Furthermore, transforming hints should be B C
defined in a specific handler. can_enter_tier1_branch,
can_enter_tier1_jump, and can_enter_tier1_ret tell pc (C → F)
pus
h E
the adaptive RPython’s transformer the necessary infor- pc (B → D) C D
mation to generate the method-traversal interpreter. The F
end
pop
method-traversal interpreter requires a particular kind of
po
pc (C → F)
code in the JUMP_IF, JUMP, and RET bytecode handlers so
D
p
E F
pc (B → D)
that hints can be called in those handlers. The requisite in- emit_X emit_Y pc (B → D)
and forth between native code and an interpreter exe- translates the entire bytecode into machine code. In addition,
cution leads to run-time overhead. Meanwhile, the com- a full-fledged compiler WarpMonkey [19] compiles a hot
bination of baseline JIT and tracing JIT compilations spot into fast machine code. Besides such a JavaScript en-
(callabit_baseline_tracing) is as fast as the tracing JIT gine, the SpiderMonkey VM has an interpreter and compiler
compilation-only strategy (callabit_tracing_only). Addition- called WASM-Baseline and WASM-Ion.
ally, when looking at the Figure 7, the baseline-tracing JIT Google’s JavaScript engine V8, which is included in the
strategy’s trace size is about 40 % smaller than the only trac- Chrome browser, also supports a multitier compilation mech-
ing JIT strategy. In contrast, the trace sizes are the same anism [10]. V8 sees it as a problem that the JIT-compiled
between baseline-tracing JIT and tracing-baseline JIT strate- code can consume a large amount of memory, but it runs
gies, but the tracing-baseline JIT strategy is about 45 % slower only once. The baseline interpreter/compiler is called Igni-
than baseline-tracing JIT strategy, and the baseline-only tion, and it is so highly optimized to collaborate with V8’s
strategy is about 5 % faster than tracing-baseline JIT strategy. JIT compiler engine Turbofan. It can reduce the code size up
From these results, we can deduce that there is a ceiling to to 50 % by preserving the original.
using only a single JIT strategy. Furthermore, to leverage Google’s V8 has another optimizing compiler called
different levels of JIT compilations, we have to apply an ap- Liftoff [11]. The Liftoff compiler is designed for a startup
propriate compilation according to the structure or nature compiler of WebAssembly and works alongside Turbofan.
of the target program. Turbofan is based on its intermediate representation (ir), so it
In summary, our baseline JIT compilation is about 1.77x needs to translate WebAssembly code into the ir, leading to a
faster than the interpreter-only execution in both the stable reduction in the startup performance of the Chrome browser.
and startup speeds.6 Moreover, our baseline JIT compilation However, Liftoff instead directly compiles WebAssembly
is only about 43 % slower than the tracing JIT compilation, code into machine code. The liftoff compiler is tuned to
even though it has very few optimizations, such as inlining quickly generate memory-efficient code to reduce the mem-
and type specialization. This means that our approach to ory footprint at startup time.
enabling baseline JIT compilation alongside with tracing The Jikes Java Research VM (originally called Jalapeño) [1],
JIT compilation has enough potential to work as a startup which was developed by IBM Research, is a research-oriented
compilation if we carefully adjust the threshold to enter a VM that is written in Java. It has baseline and optimizing JIT
baseline JIT compilation. This is left as future work. compilers and supports an optimization strategy in three-
tires.
4 Related Work
Both well-developed VMs, such as Java VM or JavaScript VM,
and research-oriented VMs of a certain size support multitier
5 Conclusion and Future Work
JIT compilation to balance among the startup speed, compila-
tion time, and memory footprint. As far as the authors know, In the current paper, we proposed the concept and initial
such VMs build at least two different compilers to realize stage implementation of adaptive RPython, which can gen-
multitier optimization. In contrast, our approach realizes it erate a VM that supports two-tier compilation. In realizing
in one engine with a language implementation framework. adaptive RPython, we did not implement another compiler
The Java HotSpot™ VM has the two different compilers, from scratch but drove the existing meta-tracing JIT compila-
that is C1 [16] and C2 [22], and four optimization levels. The tion engine by a specially instrumented interpreter called the
typical path is moving through the level 0, 3 to 4. Level 0 generic interpreter. The generic interpreter supports a fluent
means interpreting. On level 3, C1 compiler compiles a target api that can be easily integrated with RPython’s original
with profiling information gathered by the interpreter. If the hint function. The adaptive RPython compiler generates dif-
C2’s compilation queue is not full and the target turns out to ferent interpreters that support a different compilation tier.
be hot, C2 starts to optimize the method aggressively (level The JIT trace-stitching reconstructs the initial control flow
4). Level 1 and 2 are used when C2’s compilation queue is of a generated trace from a baseline JIT interpreter to emit
full, or level 3 optimization cannot work. the executable native code. In our preliminary evaluation,
The Firefox JavaScript VM called SpiderMonkey [17] has when we manually applied a suitable compilation depending
several interpreters and compilers to enable multitier opti- on the control flow of a target method, we confirmed that
mization. For interpreters, it has normal and baseline in- the baseline-tracing JIT compilation runs as fast as the trac-
terpreters [18]. The baseline interpreter supports inline ing JIT-only compilation and reduces 50 % of the trace size.
caches [7, 12] to improve its performance. The baseline JIT From this result, selecting an appropriate compilation strat-
compiler uses the same inline caching mechanism, but it egy according to a target program’s control flow or nature
is essential in the multitier compilation.
6 We calculated the geometric mean of loop, loopabit, and To implement an internal graph-to-graph conversion of
callabit_baseline_only in both stable and startup speeds. the generic interpreter in RPython is something we plan
Two-level Just-in-Time Compilation with One Interpreter and One Engine PEPM ’22, January 17–18, 2022, Philadelphia, Pennsylvania, United States
to work on next. We currently implement the generic in- Machinery, New York, NY, USA, 297–302. https://doi.org/10.1145/
terpreter transformer as source to source because it is a 800017.800542
proof-of-concept. For a smoother integration with RPython, [8] Maciej Fijałkowski, Armin Rigo, Rafał Gałczyński, Ronan Lamy, Sebas-
tian Pawluś, Ashwini Oruganti, and Edd Barrett. 2014. HippyVM - an
we need to switch implementation strategies in the future. implementation of the PHP language in RPython. Retrieved 2021-10-07
To realize the technique to automatically shift JIT compi- from http://hippyvm.baroquesoftware.com
lation tiers in Adaptive RPython, we also need to investigate [9] Alex Gaynor, Tim Felgentreff, Charles Nutter, Evan Phoenix, Brian
a compilation scheme including of suitable heuristics regard- Ford, and PyPy development team. 2013. A high performance ruby,
ing when to go from one tier to the next. written in RPython. Retrieved 2021-10-07 from http://docs.topazruby.
com/en/latest/
Finally, we would implement our Adaptive RPython tech- [10] Google. 2016. Firing up the Ignition interpreter. Retrieved 2021-10-07
niques in the PyPy programming language because it brings from https://v8.dev/blog/ignition-interpreter
many benefits. For example, we can obtain a lot of data by [11] Google. 2018. Liftoff: a new baseline compiler for WebAssembly in V8.
running our adaptive RPython on existing polished bench- https://v8.dev/blog/liftoff
mark programs to determine a certain threshold to switch [12] Urs Hölzle, Craig Chambers, and David Ungar. 1991. Optimizing
dynamically-typed object-oriented languages with polymorphic in-
a JIT compilation. Furthermore, we could potentially bring line caches. In ECOOP’91 European Conference on Object-Oriented Pro-
our research results to many Python programmers. gramming, Pierre America (Ed.). Springer Berlin Heidelberg, Berlin,
Heidelberg, 21–38.
[13] P. Joseph Hong. 1992. Threaded Code Designs for Forth Interpreters.
Acknowledgments SIGFORTH Newsl. 4, 2 (Oct. 1992), 11–16. https://doi.org/10.1145/
We would like to thank the reviewers of the PEPM 2022 work- 146559.146561
shop for their valuable comments. This work was supported [14] Ruochen Huang, Hidehiko Masuhara, and Tomoyuki Aotani. 2016.
by JSPS KAKENHI grant number 21J10682 and JST ACT-X Improving Sequential Performance of Erlang Based on a Meta-tracing
Just-In-Time Compiler. In International Symposium on Trends in Func-
grant number JPMJAX2003. tional Programming. Springer, 44–58.
[15] Yusuke Izawa, Hidehiko Masuhara, Carl Friedrich Bolz-Tereick, and
References Youyou Cong. 2021. Threaded Code Generation with a Meta-tracing
JIT Compiler. (Sept. 2021). arXiv:2106.12496 submitted for publication.
[1] Bowen Alpern, C. R. Attanasio, Anthony Cocchi, Derek Lieber, Stephen [16] Thomas Kotzmann, Christian Wimmer, Hanspeter Mössenböck,
Smith, Ton Ngo, John J. Barton, Susan Flynn Hummel, Janice C. Shep- Thomas Rodriguez, Kenneth Russell, and David Cox. 2008. De-
erd, and Mark Mergen. 1999. Implementing Jalapeño in Java. In Pro-
sign of the Java HotSpot™ Client Compiler for Java 6. ACM Trans.
ceedings of the 14th ACM SIGPLAN Conference on Object-Oriented Pro-
Archit. Code Optim. 5, 1, Article 7 (May 2008), 32 pages. https:
gramming, Systems, Languages, and Applications (Denver, Colorado,
//doi.org/10.1145/1369396.1370017
USA) (OOPSLA ’99). Association for Computing Machinery, New York,
[17] Mozilla. 2019. Spider Monkey: Mozilla’s JavaScript and WebAssembly
NY, USA, 314–324. https://doi.org/10.1145/320384.320418
Engine. Retrieved 2021-10-07 from https://spidermonkey.dev
[2] Spenser Bauman, Carl Friedrich Bolz, Robert Hirschfeld, Vasily Kir-
[18] Mozilla. 2019. SpiderMonkey’s JavaScript Interpreter and Compiler.
ilichev, Tobias Pape, Jeremy G. Siek, and Sam Tobin-Hochstadt. 2015.
Retrieved 2021-09-27 from https://firefox-source-docs.mozilla.org/js
Pycket: A Tracing JIT for a Functional Language. In Proceedings of the
[19] Mozilla. 2020. Warp: Improved JS performance in Firefox 83. Retrieved
20th ACM SIGPLAN International Conference on Functional Program-
2021-10-07 from https://hacks.mozilla.org/2020/11/warp-improved-js-
ming (Vancouver, BC, Canada) (ICFP 2015). ACM, New York, NY, USA,
performance-in-firefox-83/
22–34. https://doi.org/10.1145/2784731.2784740
[20] Oracle Lab. 2013. A high performance implementation of the Ruby
[3] James R. Bell. 1973. Threaded Code. Commun. ACM 16, 6 (June 1973),
programming language. https://github.com/oracle/truffleruby
370–372. https://doi.org/10.1145/362248.362270
[21] Oracle Labs. 2018. Graal/Truffle-based implementation of Python. Re-
[4] Carl Friedrich Bolz, Antonio Cuni, Maciej Fijałkowski, Michael
trieved 2021-10-07 from https://github.com/graalvm/graalpython
Leuschel, Samuele Pedroni, and Armin Rigo. 2011. Runtime Feedback
[22] Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The Java
in a Meta-tracing JIT for Efficient Dynamic Languages. In Proceedings
Hotspot™ Server Compiler. In Proceedings of the 2001 Symposium on
of the 6th Workshop on Implementation, Compilation, Optimization of
JavaTM Virtual Machine Research and Technology Symposium - Volume
Object-Oriented Languages, Programs and Systems (Lancaster, United
1 (Monterey, California) (JVM ’01). USENIX Association, USA, 1.
Kingdom) (ICOOOLPS ’11). ACM, New York, NY, USA, Article 9, 8 pages.
[23] PyPy development team. 2009. PyPy Speed Center. Retrieved 2021-09-
https://doi.org/10.1145/2069172.2069181
27 from https://speed.pypy.org
[5] Carl Friedrich Bolz, Antonio Cuni, Maciej Fijałkowski, and Armin
[24] Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas
Rigo. 2009. Tracing the Meta-level: PyPy’s Tracing JIT Compiler. In
Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon,
Proceedings of the 4th Workshop on the Implementation, Compilation,
and Matthias Grimmer. 2017. Practical Partial Evaluation for High-
Optimization of Object-Oriented Languages and Programming Systems
performance Dynamic Language Runtimes. In Proceedings of the 38th
(Genova, Italy). ACM, New York, NY, USA, 18–25. https://doi.org/10.
ACM SIGPLAN Conference on Programming Language Design and Im-
1145/1565824.1565827
plementation (Barcelona, Spain) (PLDI 2017). ACM, New York, NY, USA,
[6] Carl Friedrich Bolz and Laurence Tratt. 2015. The Impact of Meta-
662–676. https://doi.org/10.1145/3062341.3062381
tracing on VM Design and Implementation. Science of Computer Pro-
gramming 98 (2015), 408 – 421. https://doi.org/10.1016/j.scico.2013.02.
001 Special Issue on Advances in Dynamic Languages.
[7] L. Peter Deutsch and Allan M. Schiffman. 1984. Efficient Implemen-
tation of the Smalltalk-80 System. In Proceedings of the 11th ACM
SIGACT-SIGPLAN Symposium on Principles of Programming Languages
(Salt Lake City, Utah, USA) (POPL ’84). Association for Computing
PEPM ’22, January 17–18, 2022, Philadelphia, Pennsylvania, United States Izawa, Masuhara, and Bolz-Tereick.
loopabit
geo_mean (baseline)
loop
loopabit
geo_mean (tracing)
break;
else if op is RetOp then
append 𝑜𝑝 to trace;
guard_failure ← PopGuardFailure ();
append (trace, guard_failure) to result;
break;
else
append 𝑜𝑝 to trace; (a) The result of the stable speeds of loop and loopabit.
The speed up ratio normalized to the interp.-only exec.
return result;
Function PopGuardFailure(): TLA w/ Adaptive RPython (Startup speed)
if first pop? then Executing w/ baseline JIT Executing w/ tracing JIT
return None;
else 5 5
𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 ← pop the element from guard_failure_stack;
4 4
return 𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 ;
3 3
Function HandleEmitJump(𝑜𝑝, 𝑖𝑛𝑝𝑢𝑡𝑎𝑟𝑔𝑠 ):
𝑡𝑎𝑟𝑔𝑒𝑡 ← GetProgramCounter (op); 2 2
𝑡𝑜𝑘𝑒𝑛 ← token_map [𝑡𝑎𝑟𝑔𝑒𝑡 ];
guard_failure ← PopGuardFailure (); 1 1
append 𝐽 𝑢𝑚𝑝𝑂𝑝 (𝑎𝑟𝑔𝑠, 𝑡𝑜𝑘𝑒𝑛) to trace;
0 0
return trace, guard_failure;
loop
loopabit
loopabit
geo_mean (baseline)
loop
geo_mean (tracing)
Function HandleEmitRet(𝑜𝑝 ):
𝑟𝑒𝑡 𝑣𝑎𝑙 ← GetRetVal (𝑜𝑝) ;
guard_failure ← PopGuardFailure () ;
append 𝑅𝑒𝑡𝑂𝑝 (𝑟𝑒𝑡 𝑣𝑎𝑙) to trace;
return trace, guard_failure;
ms
1.5 1.5
150
1.0
1.0 100
50 0.5
0.5
0 0.0
0.0 # Traces Compilation time (ms)
callabit_baseline_interp
callabit_baseline_only
callabit_baseline_tracing
callabit_tracing_baseline
callabit_tracing_only
Figure 7. The trace sizes and compilation times in callabit
programs. The program is so small that the compilation time
is at most 3 % of the total.
17 @elidable
(a) The result of the stable speeds of callabit programs. 18 def t _ e m p t y ( ) :
19 return _T_EMPTY
The speed up ratio normalized to the interp.-only exec.
20
21 memoization = { }
TLA w/ Adaptive RPython (Startup speed) 22
23 @elidable
3.0 24 def t _ p u s h ( pc , next ) :
2.5 25 key = pc , next
26 if key in m e m o i z a t i o n :
2.0 27 return m e m o i z a t i o n [ key ]
28 r e s u l t = T r a v e r s e S t a c k ( pc , next )
1.5 29 m e m o i z a t i o n [ key ] = r e s u l t
1.0 30 return r e s u l t
callabit_baseline_only
callabit_baseline_tracing
callabit_tracing_baseline
callabit_tracing_only
1 # loopabit . tla
C Programs 2
3
t l a . DUP ,
t l a . CONST_INT , 1 ,
C.1 The Definition of Traverse Stack 4
5
t l a . SUB ,
t l a . DUP ,
6 t l a . CONST_INT , 1 ,
1 class T r a v e r s e S t a c k :
7 t l a . LT ,
2 _ i m m u t a b l e _ f i e l d s _ = [ ' pc ' , ' next ' ]
8 t l a . JUMP_IF , 1 2 ,
3
9 t l a . JUMP , 1 ,
4 def _ _ i n i t _ _ ( s e l f , pc , next ) :
10 t l a . POP ,
5 s e l f . pc = pc
11 t l a . CONST_INT , 1 ,
6 s e l f . next = next
12 t l a . SUB ,
7
13 t l a . DUP ,
8 def t _ p o p ( s e l f ) :
14 t l a . DUP ,
9 return s e l f . pc , s e l f . next
15 t l a . CONST_INT , 1 ,
10
16 t l a . LT ,
11 @elidable
17 t l a . JUMP_IF , 2 5 ,
12 def t _ i s _ e m p t y ( s e l f ) :
18 t l a . JUMP , 1 ,
13 return s e l f is _T_EMPTY
19 t l a . EXIT
14
15 _T_EMPTY = None Listing 4. Definition of loopabit
16
PEPM ’22, January 17–18, 2022, Philadelphia, Pennsylvania, United States Izawa, Masuhara, and Bolz-Tereick.