Tutorial LLV M Back End Cpu 0
Tutorial LLV M Back End Cpu 0
Tutorial LLV M Back End Cpu 0
Chen Chung-Shu
1 About 1
3 Backend structure 63
11 Assembler 453
14 Appendix A: Getting Started: Installing LLVM and the Cpu0 example code 567
i
ii
CHAPTER
ONE
ABOUT
• Authors
• Contributors
• Acknowledgments
• Support
• Revision history
• Licensing
• Motivation
• Preface
• Prerequisites
• Outline of Chapters
1.1 Authors
1.2 Contributors
Anoushe Jamshidi, [email protected], Chapters 1, 2, 3 English re-writing and Sphinx tool and
format setting.
Chen Wei-Ren, [email protected], assisted with text and code formatting.
Chen Zhong-Cheng, who is the author of original cpu0 verilog code.
1.3 Acknowledgments
We would like to thank Sean Silva, [email protected], for his help, encouragement, and as-
sistance with the Sphinx document generator. Without his help, this book would not have been
1
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
finished and published online. We also thank those corrections from readers who make the book
more accurate.
1.4 Support
We get the kind help from LLVM development mail list, [email protected], even we don’t know
them. So, our experience is you are not alone and can get help from the development list members
in working with the LLVM project. Some of them are:
Akira Hatanaka <[email protected]> in va_arg question answer.
Ulrich Weigand <[email protected]> in AsmParser question answer.
2 Chapter 1. About
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Version 3.5.9, Released February 2, 2015 Fix bug of 64 bits shift. Fix global address error by replacing
addiu with ori. Change encode of “cmp $sw, $3, $2” from 0x10320000 to 0x10f32000.
Version 3.5.8, Released December 27, 2014 Correct typing. Fix typing error for update lb-
dex/src/modify/src/ of install.rst. Add libsoftfloat/compiler-rt and libc/avr-libc-1.8.1. Add LLVM-
VPO in chapter Optimization.
Version 3.5.7, Released December 1, 2014 Fix over 16-bits frame prologue/epilogue error from 3.5.3.
Call convention ABI S32 is enabled by option. Change from ADD to ADDu in copyPhysReg() of
Cpu0SEInstrInfo.cpp. Add asm directive .weak back which exists in 3.5.3.
Version 3.5.6, Released November 18, 2014 Remove SWI and IRET instructions. Add
Cpu0SetChapter.h for ex-build-test.sh. Correct typing. Fix thread variable error come from
version 3.5.3 in static mode. Add sub-section “Cpu0 backend machine ID and relocation records”
of Chapter 2.
Version 3.5.5, Released November 11, 2014 Rename SPR to C0R. Add ISR simulation.
Version 3.5.4, Released November 6, 2014 Adjust chapter 9 sections. Fix .cprestore bug. Re-organize
sections. Add sub-section “Why not using ADD instead of SUB?” in chapter 2. Add overflow
control option to use ADD and SUB instructions.
Version 3.5.3, Released October 29, 2014 Merge Cpu0 example code into one copy and it can be config
by Cpu0Config.h.
Version 3.5.2, Released October 3, 2014 Move R_CPU0_32 from type of non-relocation record to type
ofrelocation record. Correct logic error for setgt of BrcondPatsSlt of Cpu0InstrInfo.td.
Version 3.5.1, Released October 1, 2014 Add move alias instruction for addu $reg, $zero. Add cpu cy-
cles count in verilog. Fix ISD::SIGN_EXTEND_INREG error in other types beside i1. Support
DAG op br_jt and DAG node JumpTable.
Version 3.5.0, Released September 05, 2014 Issue NOP in delay slot.
Version 3.4.8, Released August 29, 2014 Add reason that set endian swap in memory module. Add pre-
sentation files.
Version 3.4.7, Released August 22, 2014 Fix wrapper_pic for cmov.ll. Add shift operations 64 bits sup-
port. Fix wrapper_pic for ch8_5.cpp. Add section thread of chapter 14. Add section Motivation
of chapter about. Support little endian for cpu0 verilog. Move ch8_5.cpp test from Chapter Run
backend to Chapter lld since it need lld linker. Support both big endian and little endian in cpu0 Verilog, elf2hex and
lld. Make branch release_34_7.
Version 3.4.6, Released July 26, 2014 Add Chapter 15, optimization. Correct typing. Add Chapter 14,
C++. Fix bug of generating cpu032II instruction in dynamic_linker.cpp.
Version 3.4.5, Released June 30, 2014 Correct typing.
Version 3.4.4, Released June 24, 2014 Correct typing. Add the reason of use SSA form. Move sections
LLVM Code Generation Sequence, DAG and Instruction Selection from Chapter 3 to Chapter 2.
Version 3.4.3, Released March 31, 2014 Fix Disassembly bug for GPROut register class. Adjust Chap-
ters. Remove hand copy Table of tblgen in AsmParser.
Version 3.4.2, Released February 9, 2014 Add ch12_2.cpp for slt instruction explanation and fix bug in
Cpu0InstrInfo.cpp. Correct typing. Move Cpu0 Status Register from Number 20 to Number 10. Fix
llc -mcpu option problem. Update example code build shell script. Add condition move instruction.
Fix bug of branch pattern match in Cpu0InstrInfo.td.
Version 3.4.1, Released January 18, 2014 Add ch9_4.cpp to lld test. Fix the wrong reference in
lbd/lib/Target/Cpu0 code. inlineasm. First instruction jmp X, where X changed from _Z5startv
to start. Correct typing.
4 Chapter 1. About
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Version 3.2.13, Released May 23, 2013 Add sub-section “Setup llvm-lit on iMac” of Appendix A. Re-
place some code-block with literalinclude in *.rst. Add Fig 9 of chapter Backend structure. Add sec-
tion Dynamic stack allocation support of chapter Function call. Fix bug of Cpu0DelUselessJMP.cpp.
Fix cpu0 instruction table errors.
Version 3.2.12, Released March 9, 2013 Add section “Type of char and short int” of chapter “Global
variables, structs and arrays, other type”.
Version 3.2.11, Released March 8, 2013 Fix bug in generate elf of chapter “Backend Optimization”.
Version 3.2.10, Released February 23, 2013 Add chapter “Backend Optimization”.
Version 3.2.9, Released February 20, 2013 Correct the “Variable number of arguments” such as
sum_i(int amount, ...) errors.
Version 3.2.8, Released February 20, 2013 Add section llvm-objdump -t -r.
Version 3.2.7, Released February 14, 2013 Add chapter Run backend. Add Icarus Verilog tool installa-
tion in Appendix A.
Version 3.2.6, Released February 4, 2013 Update CMP instruction implementation. Add llvm-objdump
section.
Version 3.2.5, Released January 27, 2013 Add “LLVMBackendTutorialExampleCode/llvm3.1”. Add
section “Structure type support”. Change reference from Figure title to Figure number.
Version 3.2.4, Released January 17, 2013 Update for LLVM 3.2. Change title (book name) from “Write
An LLVM Backend Tutorial For Cpu0” to “Tutorial: Creating an LLVM Backend for the Cpu0
Architecture”.
Version 3.2.3, Released January 12, 2013 Add chapter “Porting to LLVM 3.2”.
Version 3.2.2, Released January 10, 2013 Add section “Full support %” and section “Verify DIV for
operator %”.
Version 3.2.1, Released January 7, 2013 Add Footnote for references. Reorganize chapters (Move bot-
tom part of chapter “Global variable” to chapter “Other instruction”; Move section “Translate into
obj file” to new chapter “Generate obj file”. Fix errors in Fig/otherinst/2.png and Fig/otherinst/3.png.
Version 3.2.0, Released January 1, 2013 Add chapter Function. Move Chapter “Installing LLVM and
the Cpu0 example code” from beginning to Appendix A. Add subsection “Install other tools on
Linux”. Add chapter ELF.
Version 3.1.2, Released December 15, 2012 Fix section 6.1 error by add “def : Pat<(brcond RC:$cond,
bb:$dst), (JNEOp (CMPOp RC:$cond, ZEROReg), bb:$dst)>;” in last pattern. Modify section 5.5
Fix bug Cpu0InstrInfo.cpp SW to ST. Correct LW to LD; LB to LDB; SB to STB.
Version 3.1.1, Released November 28, 2012 Add Revision history. Correct ldi instruction error (replace
ldi instruction with addiu from the beginning and in the all example code). Move ldi instruction
change from section of “Adjust cpu0 instruction and support type of local variable pointer” to Section
”CPU0 processor architecture”. Correct some English & typing errors.
1.6 Licensing
http://llvm.org/docs/DeveloperPolicy.html#license
1.6. Licensing 5
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
1.7 Motivation
We all learned computer knowledge from school through the concept of book. The concept is an
effective way to know the big view. But once getting into develop a real complicated system, we
often feel the concept from school or book is not much or not details enough. Compiler is a very
complicated system, so traditionally the students in school learn this knowledge in concept and do the home work via
yacc/lex tools to translate part of C or other high level language into immediate representation (IR) or assembly to feel
the parsing knowledge and tools application.
On the other hand, the compiler engineers who graduated from school often facing the real market
complicated CPUs and specification. Since for market reason, there are a serial of CPUs and ABI
(Application Binary Interface) to deal with. Moreover, for speed performance reason, the real com-
piler backend program is too complicated to be a learning material in compiler backend designing even the market
CPU include only one CPU and ABI.
This book develop the compiler backend along with a simple school designed CPU which called
Cpu0. It include the implementation of a compiler backend, linker, llvm-objdump, elf2hex as well as
Verilog language source code of Cpu0 instruction set. We provide readers full source code to compile
C/C++ program and see how the programs run on the Cpu0 machine created by verilog language. Through this school
learning purpose CPU, you get the chance to know the whole thing in compiler backend, linker, system tools and
CPU design. Usually it is not easy from working in real CPU and compiler since the real job is too complicated to be
finished by one single person only.
As my observation, LLVM advocated by some software engineers against gcc with two reasons. One
is political with BSD license 1 2 . The other is technical with following the 3 tiers of compiler software
structure along with C++ object oriented technology. GCC started with C and adopted C++ after near
20 years later 3 . Maybe gcc adopted C++ just because llvm do that. I learned C++ object oriented programming during
studing in school. After “Design Pattern”, “C++/STL” and “object oriented design” books study, I understand the C
is easy to trace while C++ is easy to creating reusable software units known as object. If a programmer has well
knowledge in “Design Pattern”, then the C++ can supply more reuse ability and rewrite ability. A book of “system
language” about software quality that I have ever read , listing these items: read ability, rewrite ability, reuse ability
and performance to define the software quality. Object oriented programming exists for solving the big and complex
software development. Of course, compiler and OS are complex software without question, why do gcc and linux not
using c++ 4 ? This is the reason I try to create a backend under llvm rather than gcc.
1.8 Preface
The LLVM Compiler Infrastructure provides a versatile structure for creating new backends. Cre-
ating a new backend should not be too difficult once you familiarize yourself with this structure.
However, the available backend documentation is fairly high level and leaves out many details. This
tutorial will provide step-by-step instructions to write a new backend for a new target architecture from scratch.
We will use the Cpu0 architecture as an example to build our new backend. Cpu0 is a simple RISC
architecture that has been designed for educational purposes. More information about Cpu0, includ-
ing its instruction set, is available here. The Cpu0 example code referenced in this book can be found
here. As you progress from one chapter to the next, you will incrementally build the backend’s functionality.
Since Cpu0 is a simple RISC CPU for educational purpose, it makes this llvm backend code simple
too and easy to learning. In addition, Cpu0 supply the Verilog source code that you can run on
your PC or FPGA platform when you go to chapter “Verify backend on Verilog simulator”. To
1 http://llvm.org/docs/DeveloperPolicy.html#license
2 http://www.phoronix.com/scan.php?page=news_item&px=MTU4MjA
3 http://en.wikipedia.org/wiki/GNU_Compiler_Collection
4 http://en.wikipedia.org/wiki/C%2B%2B
6 Chapter 1. About
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
explain the backend design, we carefully design C/C++ program for each chapter new added function. Through these
example code, readers can understand what IRs (llvm immediate form) the backend transfer from and the C/C++ code
corresponding to these IRs.
This tutorial started using the LLVM 3.1 Mips backend as a reference and sync to llvm 3.5 Mips
at version 3.5.3. As our experience, reference and sync with a released backend code will help
upgrading your backend features and fixing bugs. You can take advantage by compare difference
from version to version, and hire llvm development team effort. Since Cpu0 is an educational architecture, and it
has missed some key pieces of documentation needed when developing a compiler, such as an Application Binary
Interface (ABI). We implement our backend by borrowing information from the Mips ABI as a guide. You may want
to familiarize yourself with the relevant parts of the Mips ABI as you progress through this tutorial.
This document can be a tutorial of toolchain development for a new CPU architecture. Many pro-
grammer gradutated from school with the knowledges of Compiler as well as Computer architecture
but is not an professional engineer in compiler or CPU design. This document is a material to intro-
duce these engineers how to programming a toolchain as well as designing a CPU based on the LLVM infrastructure
without pay any money to buy software or hardware. Computer is the only device needed.
Finally, this book is not a compiler book in concept. It is for those readers who are interested in
extending compiler toolchain to support a new CPU based on llvm structure. To program on Linux
OS, you program a driver without knowing every details in OS. For example in a specific USB device
driver program on Linux plateform, he or she will try to understand the USB specification, linux USB subsystem and
common device driver working model and API. In the same way, to extend functions from a large software like this
llvm umbrella project, you should find a way to reach the goal and ignore the details not on your way. Try to understand
in details of every line of source code is not realistic if your project is an extended function from a well defined software
structure. It only makes sense in rewriting the whole software structure. Of course, if there are more llvm backend
book or documents, then readers have the chance to know more about llvm by reading book or documents.
1.9 Prerequisites
Readers should be comfortable with the C++ language and Object-Oriented Programming concepts.
LLVM has been developed and implemented in C++, and it is written in a modular way so that
various classes can be adapted and reused as often as possible.
Already having conceptual knowledge of how compilers work is a plus, and if you already have im-
plemented compilers in the past you will likely have no trouble following this tutorial. As this tutorial
will build up an LLVM backend step-by-step, we will introduce important concepts as necessary.
This tutorial references the following materials. We highly recommend you read these documents to
get a deeper understanding of what the tutorial is teaching:
The Architecture of Open Source Applications Chapter on LLVM
LLVM’s Target-Independent Code Generation documentation
LLVM’s TableGen Fundamentals documentation
LLVM’s Writing an LLVM Compiler Backend documentation
Description of the Tricore LLVM Backend
Mips ABI document
1.9. Prerequisites 7
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
The upper half of Fig. 1.1 is the work flow and software package of a computer program be generated
and executed. IR stands for Intermediate Representation. The lower half is this book’s work flow and
software package of the toolchain extended implementation based on llvm. Except clang, the other
blocks need to be extended for a new backend development (Many backend extending clang too, but Cpu0 backend has
not this need at this point). This book implement the yellow boxes part. The green parts of this figure, lld and elf2hex
for Cpu0 backend, can be found on http://jonathan2251.github.io/lbt/index.html. The hex is the ascii file format using
‘0’ to ‘9’ and ‘a’ to ‘f’ for hexadecimal value representation since the Verilog language machine uses it as input file.
This book include 10,000 lines of source code for
1. Step-by-step, creating an llvm backend for the Cpu0. Chapter 2 to 11.
2. Cpu0 verilog source code. Chapter 12.
With these code, reader can generate Cpu0 machine code through Cpu0 llvm backend compiler,
then see how it runs on your computer if the code without global variable or relocation record for
handling by linker. The pdf and epub are also available in the web. This is a tutorial for llvm backend
developer but not for an expert. It also can be a material for those who have compiler and computer architecture book’s
knowledges and like to know how to extend the llvm toolchain to support a new CPU.
Cpu0 architecture and LLVM structure:
This chapter introduces the Cpu0 architecture, a high-level view of LLVM, and how Cpu0 will be
targeted in in an LLVM backend. This chapter will run you through the initial steps of building the
backend, including initial work on the target description (td), setting up cmake and LLVMBuild files,
and target registration. Around 750 lines of source code are added by the end of this chapter.
Backend structure:
This chapter highlights the structure of an LLVM backend using by UML graphs, and we continue
to build the Cpu0 backend. Thousands of lines of source code are added, most of which are common
from one LLVM backends to another, regardless of the target architecture. By the end of this chapter,
the Cpu0 LLVM backend will support less than ten instructions to generate some initial assembly output.
Arithmetic and logic instructions:
Over ten C operators and their corresponding LLVM IR instructions are introduced in this chapter.
Few houndred lines of source code, mostly in .td Target Description files, are added. With these
houndred lines of source code, the backend can now translate the +, -, *, /, &, |, ^, <<, >>, ! and %
C operators into the appropriate Cpu0 assembly code. Usage of the llc debug option and of Graphviz as a debug
tool are introduced in this chapter.
Generating object files:
8 Chapter 1. About
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Object file generation support for the Cpu0 backend is added in this chapter, as the Target Registra-
tion structure is introduced. Based on llvm structure, the Cpu0 backend can generate big and little
endian ELF object files without much effort.
Global variables:
Global variable handling is added in this chapter. Cpu0 supports PIC and static addressing mode,
both addressing mode explained as their functionality are implemented.
Other data type:
In addition to type int, other data type such as pointer, char, bool, long long, structure and array are
added in this chapter.
Control flow statements:
Support for flow control statements, such as, if, else, while, for, goto, switch, case as well as both
a simple optimization software pass and hardware instructions for control statement optimization
discussed in this chapter.
Function call:
This chapter details the implementation of function calls in the Cpu0 backend. The stack frame,
handling incoming & outgoing arguments, and their corresponding standard LLVM functions are
introduced.
ELF Support:
This chapter details Cpu0 support for the well-known ELF object file format. The ELF format and
binutils tools are not a part of LLVM, but are introduced. This chapter details how to use the ELF
tools to verify and analyze the object files created by the Cpu0 backend. The disassemble command
llvm-objdump -d support for Cpu0 is added in the last section of this chapter.
Assembler:
Support the translation of hand code assembly language into obj under the llvm insfrastructure.
C++ support:
Support C++ language features. It’s under working.
Verify backend on Verilog simulator:
Create the CPU0 virtual machine with Verilog language of Icarus tool first. With this tool, feeding
the hex file which generated by llvm-objdump to the Cpu0 virtual machine and seeing the Cpu0
running result on PC computer.
Appendix A: Getting Started: Installing LLVM and the Cpu0 example code:
Details how to set up the LLVM source code, development tools, and environment setting for Mac
OS X and Linux platforms.
Appendix B: Cpu0 document and test:
This book uses Sphinx to generate pdf and epub format of document further. Details about how to
install tools to and generate these docuemnts and regression test for Cpu0 backend are included.
10 Chapter 1. About
CHAPTER
TWO
Before you begin this tutorial, you should know that you can always try to develop your own backend by porting code
from existing backends. The majority of the code you will want to investigate can be found in the /lib/Target directory
of your root LLVM installation. As most major RISC instruction sets have some similarities, this may be the avenue
you might try if you are an experienced programmer and knowledgable of compiler backends.
11
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
On the other hand, there is a steep learning curve and you may easily get stuck debugging your new backend. You can
easily spend a lot of time tracing which methods are callbacks of some function, or which are calling some overridden
method deep in the LLVM codebase - and with a codebase as large as LLVM, all of this can easily become difficult
to keep track of. This tutorial will help you work through this process while learning the fundamentals of LLVM
backend design. It will show you what is necessary to get your first backend functional and complete, and it should
help you understand how to debug your backend when it produces incorrect machine code using output provided by
the compiler.
This chapter details the Cpu0 instruction set and the structure of LLVM. The LLVM structure information is adapted
from Chris Lattner’s LLVM chapter of the Architecture of Open Source Applications book 10 . You can read the
original article from the AOSA website if you prefer.
At the end of this Chapter, you will begin to create a new LLVM backend by writing register and instruction definitions
in the Target Description files which will be used in next chapter.
Finally, there are compiler knowledge like DAG (Directed-Acyclic-Graph) and instruction selection needed in llvm
backend design, and they are explained here.
This section is based on materials available here 1 (Chinese) and here 2 (English).
Cpu0 is a 32-bit architecture. It has 16 general purpose registers (R0, ..., R15), co-processor registers (like Mips), and
other special registers. Its structure is illustrated in Fig. 2.1 below.
The registers are used for the following purposes:
1&sl=zh-CN&tl=en&u=http://ccckmit.wikidot.com/ocs:cpu0
The Cpu0 instruction set can be divided into three types: L-type instructions, which are generally associated with
memory operations, A-type instructions for arithmetic operations, and J-type instructions that are typically used when
altering control flow (i.e. jumps). Fig. 2.2 illustrates how the bitfields are broken down for each type of instruction.
The Cpu0 has two ISA, the first ISA-I is cpu032I which hired CMP instruction from ARM; the second ISA-II is
cpu032II which hired SLT instruction from Mips. The cpu032II include all cpu032I instruction set and add SLT,
BEQ, ..., instructions. The main purpose to add cpu032II is for instruction set design explanation. As you will see in
later chapter (chapter Control flow statements), the SLT instruction will has better performance than CMP old style
instruction. The following table details the cpu032I instruction set:
• First column F.: meaning Format.
0xffffff80(= -128) if byte [Rb+Cx] is 0x80; Ra is 0x0000007f(= 127) if byte [Rb+Cx] is 0x7f. After LBu Ra, [Rb+Cx], Ra is 0x00000080(= 128)
if byte [Rb+Cx] is 0x80; Ra is 0x0000007f(= 127) if byte [Rb+Cx] is 0x7f. Difference between LH and LHu is similar.
5 Conditions include the following comparisons: >, >=, ==, !=, <=, <. SW is actually set by the subtraction of the two register operands, and the
address of register $t9; when user writes “jr $lr” meaning it jump back to the caller function (since $lr is the return address). For user read ability,
Cpu0 prints “ret $lr” instead of “jr $lr”.
From text book of computer introduction, we know SUB can be replaced by ADD as follows,
• (A - B) = (A + (-B))
Since Mips uses 32 bits to represent int type of C language, if B is the value of -2G, then
• (A - (-2G)) = (A + (2G))
But the problem is value -2G can be represented in 32 bits machine while 2G cannot, since the range of 2’s complement
representation for 32 bits is (-2G .. 2G-1). The 2’s complement reprentation has the merit of fast computation in circuits
design, it is widely used in real CPU implementation. That’s why almost every CPU create SUB instruction, rather
than using ADD instead of.
The Cpu0 status word register (SW) contains the state of the Negative (N), Zero (Z), Carry (C), Overflow (V), Debug
(D), Mode (M), and Interrupt (I) flags. The bit layout of the SW register is shown in Fig. 2.3 below.
When a CMP Ra, Rb instruction executes, the condition flags will change. For example:
• If Ra > Rb, then N = 0, Z = 0
• If Ra < Rb, then N = 1, Z = 0
• If Ra = Rb, then N = 0, Z = 1
The direction (i.e. taken/not taken) of the conditional jump instructions JGT, JLT, JGE, JLE, JEQ, JNE is determined
by the N and Z flags in the SW register.
The Cpu0 architecture has a five-stage pipeline. The stages are instruction fetch (IF), instruction decode (ID), execute
(EX), memory access (MEM) and write backe (WB). Here is a description of what happens in the processor for each
stage:
1. Instruction fetch (IF)
• The Cpu0 fetches the instruction pointed to by the Program Counter (PC) into the Instruction Register (IR): IR
= [PC].
• The PC is then updated to point to the next instruction: PC = PC + 4.
2. Instruction decode (ID)
• The control unit decodes the instruction stored in IR, which routes necessary data stored in registers to the ALU,
and sets the ALU’s operation mode based on the current instruction’s opcode.
3. Execute (EX)
• The ALU executes the operation designated by the control unit upon data in registers. Except load and store
instructions, the result is stored in the destination register after the ALU is done.
4. Memory access (MEM)
• Read data from data cache to pipeline register MEM/WB if it is load instruction; write data from register to data
cache if it is strore instruction.
5. Write-back (WB)
• Move data from pipeline register MEM/WB to Register if it is load instruction.
This section introduces the compiler data structure, algorithm and mechanism that llvm uses.
The text in this and the following sub-section comes from the AOSA chapter on LLVM written by Chris Lattner 10 .
The most popular design for a traditional static compiler (like most C compilers) is the three phase design whose major
components are the front end, the optimizer and the back end, as seen in Fig. 2.4. The front end parses source code,
checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code. The
AST is optionally converted to a new representation for optimization, and the optimizer and back end are run on the
code.
The optimizer is responsible for doing a broad variety of transformations to try to improve the code’s running time,
such as eliminating redundant computations, and is usually more or less independent of language and target. The back
end (also known as the code generator) then maps the code onto the target instruction set. In addition to making correct
code, it is responsible for generating good code that takes advantage of unusual features of the supported architecture.
Common parts of a compiler back end include instruction selection, register allocation, and instruction scheduling.
This model applies equally well to interpreters and JIT compilers. The Java Virtual Machine (JVM) is also an imple-
mentation of this model, which uses Java bytecode as the interface between the front end and optimizer.
The most important win of this classical design comes when a compiler decides to support multiple source languages
or target architectures. If the compiler uses a common code representation in its optimizer, then a front end can be
written for any language that can compile to it, and a back end can be written for any target that can compile from it,
as shown in Fig. 2.5.
With this design, porting the compiler to support a new source language (e.g., Algol or BASIC) requires implementing
a new front end, but the existing optimizer and back end can be reused. If these parts weren’t separated, implementing
a new source language would require starting over from scratch, so supporting N targets and M source languages
would need N*M compilers.
Another advantage of the three-phase design (which follows directly from retargetability) is that the compiler serves a
broader set of programmers than it would if it only supported one source language and one target. For an open source
project, this means that there is a larger community of potential contributors to draw from, which naturally leads to
more enhancements and improvements to the compiler. This is the reason why open source compilers that serve many
communities (like GCC) tend to generate better optimized machine code than narrower compilers like FreePASCAL.
This isn’t the case for proprietary compilers, whose quality is directly related to the project’s budget. For example, the
Intel ICC Compiler is widely known for the quality of code it generates, even though it serves a narrow audience.
A final major win of the three-phase design is that the skills required to implement a front end are different than
those required for the optimizer and back end. Separating these makes it easier for a “front-end person” to enhance
and maintain their part of the compiler. While this is a social issue, not a technical one, it matters a lot in practice,
particularly for open source projects that want to reduce the barrier to contributing as much as possible.
The most important aspect of its design is the LLVM Intermediate Representation (IR), which is the form it uses to rep-
resent code in the compiler. LLVM IR is designed to host mid-level analyses and transformations that you find in the
optimizer chapter of a compiler. It was designed with many specific goals in mind, including supporting lightweight
runtime optimizations, cross-function/interprocedural optimizations, whole program analysis, and aggressive restruc-
turing transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language
with well-defined semantics. To make this concrete, here is a simple example of a .ll file:
// Above LLVM IR corresponds to this C code, which provides two different ways to
// add integers:
unsigned add1(unsigned a, unsigned b) {
return a+b;
}
// Perhaps not the most efficient way to add two numbers.
unsigned add2(unsigned a, unsigned b) {
if (a == 0) return b;
return add2(a-1, b+1);
}
As you can see from this example, LLVM IR is a low-level RISC-like virtual instruction set. Like a real RISC
instruction set, it supports linear sequences of simple instructions like add, subtract, compare, and branch. These
instructions are in three address form, which means that they take some number of inputs and produce a result in a
different register. LLVM IR supports labels and generally looks like a weird form of assembly language.
Unlike most RISC instruction sets, LLVM is strongly typed with a simple type system (e.g., i32 is a 32-bit integer,
i32** is a pointer to pointer to 32-bit integer) and some details of the machine are abstracted away. For example, the
calling convention is abstracted through call and ret instructions and explicit arguments. Another significant difference
from machine code is that the LLVM IR doesn’t use a fixed set of named registers, it uses an infinite set of temporaries
named with a % character.
Beyond being implemented as a language, LLVM IR is actually defined in three isomorphic forms: the textual format
above, an in-memory data structure inspected and modified by optimizations themselves, and an efficient and dense
on-disk binary “bitcode” format. The LLVM Project also provides tools to convert the on-disk format from text to
binary: llvm-as assembles the textual .ll file into a .bc file containing the bitcode goop and llvm-dis turns a .bc file into
a .ll file.
The intermediate representation of a compiler is interesting because it can be a “perfect world” for the compiler
optimizer: unlike the front end and back end of the compiler, the optimizer isn’t constrained by either a specific source
language or a specific target machine. On the other hand, it has to serve both well: it has to be designed to be easy for
a front end to generate and be expressive enough to allow important optimizations to be performed for real targets.
The “mix and match” approach allows target authors to choose what makes sense for their architecture and permits a
large amount of code reuse across different targets. This brings up another challenge: each shared component needs
to be able to reason about target specific properties in a generic way. For example, a shared register allocator needs
to know the register file of each target and the constraints that exist between instructions and their register operands.
LLVM’s solution to this is for each target to provide a target description in a declarative domain-specific language (a
set of .td files) processed by the tblgen tool. The (simplified) build process for the x86 target is shown in Fig. 2.6.
The different subsystems supported by the .td files allow target authors to build up the different pieces of their target.
For example, the x86 back end defines a register class that holds all of its 32-bit registers named “GR32” (in the .td
files, target specific definitions are all caps) like this:
The language used in .td files are Target(Hardware) Description Language that let llvm backend compiler engineers
to define the transformation for llvm IR and the machine instructions of their CPUs. In frontend, compiler develop-
ment tools provide the “Parser Generator” for compiler development; in backend, they provide the “Machine Code
Generator” for development, as the following figures.
��������������������������������� ��������
������������ �������������������
�������������������
������������������������������������������ ����
�������� �����������
���������������������������
�����
������������������������������
������������������
Since the c++’s grammar is more context-sensitive than context-free, llvm frontend project clang uses handcode parser
without BNF generator tools. In backend development, the IR to machine instructions transformation can get great
benefits from TableGen tools. Though c++ compiler cannot get benefit from BNF generator tools, many computer
languages and script languages are more context-free and can get benefit from the tools.
The following come from wiki:
Java syntax has a context-free grammar that can be parsed by a simple LALR parser. Parsing C++ is more complicated
9
.
The gnu g++ compiler abandoned BNF tools since version 3.x. I think another reason beyond that c++ has more
9 https://en.wikipedia.org/wiki/Comparison_of_Java_and_C%2B%2B
context-sensitive grammar is handcode parser can provide better error diagnosis than BNF tool since BNF tool always
select the rules from BNF grammar if match.
Fig. 2.7: tricore_llvm.pdf: Code generation sequence. On the path from LLVM code to assembly code, numerous
passes are run through and several data structures are used to represent the intermediate results.
LLVM is a Static Single Assignment (SSA) based representation. LLVM provides an infinite virtual registers which
can hold values of primitive type (integral, floating point, or pointer values). So, every operand can be saved in
different virtual register in llvm SSA representation. Comment is “;” in llvm representation. Following is the llvm
SSA instructions.
We explain the code generation process as below. If you don’t feel comfortable, please check tricore_llvm.pdf section
4.2 first. You can read “The LLVM Target-Independent Code Generator” from here 12 and “LLVM Language Refer-
ence Manual” from here 13 before go ahead, but we think the section 4.2 of tricore_llvm.pdf is enough and suggesting
you read the web site documents as above only when you are still not quite understand, even if you have read the
articles of this section and next two sections for DAG and Instruction Selection.
1. Instruction Selection
12 http://llvm.org/docs/CodeGenerator.html
13 http://llvm.org/docs/LangRef.html
// In this stage, transfer the llvm opcode into machine opcode, but the operand
// still is llvm virtual operand.
store i16 0, i16* %a // store 0 of i16 type to where virtual register %a
// point to.
=> st i16 0, i32* %a // Use Cpu0 backend instruction st instead of IR store.
// The reorder version needs 2 registers only (by allocate %a and %b in the same
// register)
=> %a = add i32 1, i32 0
st %a, i32* %c, 1
%b = add i32 2, i32 0
st %b, i32* %c, 2
OPTIONS:
...
-debug-pass - Print PassManager debugging information
=None - disable debug output
=Arguments - print pass arguments to pass to 'opt'
=Structure - print pass structure before run()
=Executions - print pass name before it is executed
=Details - print pass details when it is executed
Module Verifier
Machine Function Analysis
Natural Loop Information
Branch Probability Analysis
* MIPS DAG->DAG Pattern Instruction Selection
Expand ISel Pseudo-instructions
Tail Duplication
Optimize machine instruction PHIs
MachineDominator Tree Construction
Slot index numbering
Merge disjoint stack slots
Local Stack Slot Allocation
Remove dead machine instructions
MachineDominator Tree Construction
Machine Natural Loop Construction
Machine Loop Invariant Code Motion
Machine Common Subexpression Elimination
Machine code sinking
* Peephole Optimizations
Process Implicit Definitions
Remove unreachable machine basic blocks
Live Variable Analysis
Eliminate PHI nodes for register allocation
Two-Address instruction pass
Slot index numbering
Live Interval Analysis
Debug Variable Analysis
Simple Register Coalescing
Live Stack Slot Analysis
Calculate spill weights
Virtual Register Map
Live Register Matrix
Bundle Machine CFG Edges
Spill Code Placement Analysis
* Greedy Register Allocator
Virtual Register Rewriter
Stack Slot Coloring
Machine Loop Invariant Code Motion
* Prologue/Epilogue Insertion & Frame Finalization
Control Flow Optimizer
Tail Duplication
Machine Copy Propagation Pass
* Post-RA pseudo instruction expansion pass
MachineDominator Tree Construction
Machine Natural Loop Construction
Post RA top-down list latency scheduler
Analyze Machine Code For Garbage Collection
Machine Block Frequency Analysis
Branch Probability Basic Block Placement
Mips Delay Slot Filler
Mips Long Branch
MachineDominator Tree Construction
Machine Natural Loop Construction
* Mips Assembly Printer
Delete Garbage Collector Information
SSA form says that each variable is assigned exactly once. LLVM IR is SSA form which has unbounded virtual regis-
ters (each variable is assigned exactly once and is keeped in different virtual register). As the result, the optimization
steps used in code generation sequence which include stages of Instruction Selection, Scheduling and Formation
and Register Allocation, won’t loss any optimization opportunity. For example, if using limited virtual registers
instead of unlimited as the following code,
Above using limited virtual registers, so virtual register %a used twice. Compiler have to generate the following code
since it assigns virtual register %a as output at two different statement.
=> %a = add i32 1, i32 0 st %a, i32* %c, 1 %a = add i32 2, i32 0 st %a, i32* %c, 2
Above code have to run in sequence. On the other hand, the SSA form as the following can be reodered and run in
parallel with the following different version 14 .
// version 1
=> %a = add i32 1, i32 0
st %a, i32* %c, 0
%b = add i32 2, i32 0
st %b, i32* %d, 0
// version 2
=> %a = add i32 1, i32 0
%b = add i32 2, i32 0
st %a, i32* %c, 0
st %b, i32* %d, 0
// version 3
=> %b = add i32 2, i32 0
st %b, i32* %d, 0
%a = add i32 1, i32 0
st %a, i32* %c, 0
For the source program as above, the following are the SSA form in source code level and llvm IR level respectively.
14 Refer section 10.2.3 of book Compilers: Principles, Techniques, and Tools (2nd Edition)
b[i] = f(t);
}
In some internet video applications and muti-core (SMP) platforms, splitting g() and f() to two different loop have
better perfomance. DSA can split as the following while SSA cannot. Of course, it’s possible to do extra analysis
on %temp of SSA and reverse it into %t_idx and %t_addr as the following DSA. But in compiler discussion, the
translation is from high to low level of machine code. Besides, as you see, the llvm ir lose the for loop information
already though it can be reconstructed by extra analysis. So, in this book and almost every paper in compiler discuss
with this high-to-low premise, otherwise it’s talking about reverse engineering in assembler or compiler.
Now, the data dependences only exist on t[i] between “t[i] = g(a[i])” and “b[i] = f(t[i])” for each i = (0..999). The
program can be run on many different order, and it provides many parallel processing opportunities for multi-core
(SMP) and heterogeneous processors. For instance, g(x) is run on GPU and f(x) is run on CPU.
Many important techniques for local optimization begin by transforming a basic block into DAG 15 . For example, the
basic block code and it’s corresponding DAG as Fig. 2.8.
If b is not live on exit from the block, then we can do “common expression remove” as the following table.
15 Refer section 8.5 of book Compilers: Principles, Techniques, and Tools (2nd Edition)
The major function of backend is that translate IR code into machine code at stage of Instruction Selection as Fig. 2.9.
For machine instruction selection, the best solution is representing IR and machine instruction by DAG. To simplify
in view, the register leaf is skipped in Fig. 2.10. The rj + rk is IR DAG representation (for symbol notation, not llvm
SSA form). ADD is machine instruction.
The IR DAG and machine instruction DAG can also represented as list. For example, (+ ri , rj j) and (- ri , 1) are lists for
IR DAG; (ADD ri , rj ) and (SUBI ri , 1) are lists for machine instruction DAG.
Now, let’s check the ADDiu instruction defined in Cpu0InstrInfo.td as follows,
lbdex/chapters/Chapter2/Cpu0InstrFormats.td
//===----------------------------------------------------------------------===//
// Format L instruction class in Cpu0 : <|opcode|ra|rb|cx|>
//===----------------------------------------------------------------------===//
class FL<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmL>
{
bits<4> ra;
bits<4> rb;
bits<16> imm16;
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
Fig. 2.11 shows how the pattern match work in the IR node, add, and instruction node, ADDiu, which both defined
in Cpu0InstrInfo.td. In this example, IR node “add %a, 5” will be translated to “addiu $r1, 5” after %a is allcated to
register $r1 in regiter allocation stage since the IR pattern[(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))] is set
in ADDiu and the 2nd operand is “signed immediate” which matched “%a, 5”. In addition to pattern match, the .td also
set assembly string “addiu” and op code 0x09. With this information, the LLVM TableGen will generate instruction
both in assembly and binary automatically (the binary instruction can be issued in obj file of ELF format which will
be explained at later chapter). Similarly, the machine instruction DAG nodes LD and ST can be translated from IR
DAG nodes load and store. Notice that the $rb in Fig. 2.11 is virtual register name (not machine register).
Fig. 2.11: Pattern match for ADDiu instruction and IR node add
Cpu0InstrInfo.td
def immSExt16 : PatLeaf<(imm), [{ return isInt<16>(N-<getSExtValue()); }]>;
class ArithLogicI<bits<8> op, string instr_asm, SDNode OpNode
Operand Od, PatLeaf imm_type, RegisterClass RC>
FL<0op, (outs RC:$ra), (ins RC:$rb, Od:$imm16),
!strconcat(instr_asm, "\t$ra, $rb, $imm16"),
[(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))], IIAlu> {
let isReMaterializable = 1;
}
def ADDiu : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPURegs>;
��������������������������������������������
����������������������������������
�������������������������
������������������������������
���������������������������������������������������������������������
From DAG instruction selection we mentioned, the leaf node must be a Data Node. ADDiu is format L type which the
last operand must fits in 16 bits range. So, Cpu0InstrInfo.td define a PatLeaf type of immSExt16 to let llvm system
know the PatLeaf range. If the imm16 value is out of this range, “isInt<16>(N->getSExtValue())” will return false
and this pattern won’t use ADDiu in instruction selection stage.
Some cpu/fpu (floating point processor) has multiply-and-add floating point instruction, fmadd. It can be represented
by DAG list (fadd (fmul ra, rc), rb). For this implementation, we can assign fmadd DAG pattern to instruction td as
follows,
Similar with ADDiu, [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC), F4RC:$FRB))] is the pattern which
include nodes fmul and fadd.
Now, for the following basic block notation IR and llvm SSA IR code,
d = a * c
e = d + b
...
%d = fmul %a, %c
%e = fadd %d, %b
...
the Instruction Selection Process will translate this two IR DAG node (fmul %a, %c) (fadd %d, %b) into one machine
instruction DAG node (fmadd %a, %c, %b), rather than translate them into two machine instruction nodes fmul and
fadd if the FMADDS is appear before FMUL and FADD in your td file.
As you can see, the IR notation representation is easier to read than llvm SSA IR form. So, this notation form is used
in this book sometimes.
For the following basic block code,
a = b + c // in notation IR form
d = a - d
%e = fmadd %a, %c, %b // in llvm SSA IR form
We can apply Fig. 2.6 Instruction Tree Patterns to get the following machine code,
lbdex/input/ch9_caller_callee_save_registers.cpp
int caller()
{
int t1 = 3;
int result = add1(t1);
result = result - t1;
return result;
}
Run Mips backend with above input will get the following result.
jal _Z4add1i
nop
sw $2, 16($fp) # $2 : the return vaule for fuction add1()
lw $1, 20($fp) # load t1 from 20($fp)
subu $1, $2, $1
sw $1, 16($fp)
move $2, $1 # move result to return register $2
move $sp, $fp
lw $fp, 24($sp) # 4-byte Folded Reload
lw $ra, 28($sp) # 4-byte Folded Reload
addiu $sp, $sp, 32
jr $ra
nop
.set at
.set macro
.set reorder
.end _Z6callerv
$func_end0:
.size _Z6callerv, ($func_end0)-_Z6callerv
.cfi_endproc
As above assembly output, Mips allocates t1 variable to register $1 and no need to spill $1 since $1 is caller saved
register. On the other hand, $ra is callee saved register, so it spills at beginning of the assembly output since jal
uses $ra register. Cpu0 $lr is the same register as Mips $ra, so it calls setAliasRegs(MF, SavedRegs, Cpu0::LR) in
determineCalleeSaves() of Cpu0SEFrameLowering.cpp when the function has called another function.
As the example of last sub-section. The $ra is “live in” register since the return address is decided by caller. The $2
is “live out” register since the return value of the function is saved in this register, and caller can get the result by read
it directly as the comment in above example. Through mark “live in” and “live out” registers, backend provides llvm
middle layer information to remove useless instructions in variables access. Of course, llvm applies the DAG analysis
mentioned in the previous sub-section to finish it. Since C supports seperate compilation for different functions, the
“live in” and “out” information from backend provides the optimization opportunity to llvm. LLVM provides function
addLiveIn() to mark “live in” register but no function addLiveOut() provided. For the “live out” register, Mips backend
marks it by DAG=DAG.getCopyToReg(..., $2, ...) and return DAG instead, since all local varaiables are not exist after
function exit.
From now on, the Cpu0 backend will be created from scratch step by step. To make readers easily understanding the
backend structure, Cpu0 example code can be generated with chapter by chapter through command here 11 . Cpu0
example code, lbdex, can be found at near left bottom of this web site. Or here http://jonathan2251.github.io/lbd/
lbdex.tar.gz.
To create a new backend, there are some files in <<llvm root dir>> need to be modified. The added information
include both the ID and name of machine, and relocation records. Chapter “ELF Support” include the relocation
records introduction. The following files are modified to add Cpu0 backend as follows,
11 http://jonathan2251.github.io/lbd/doc.html#generate-cpu0-document
lbdex/src/modify/src/config-ix.cmake
...
elseif (LLVM_NATIVE_ARCH MATCHES "cpu0")
set(LLVM_NATIVE_ARCH Cpu0)
...
lbdex/src/modify/src/CMakeLists.txt
set(LLVM_ALL_TARGETS
...
Cpu0
...
)
lbdex/src/modify/src/include/llvm/ADT/Triple.h
...
#undef mips
#undef cpu0
...
class Triple {
public:
enum ArchType {
...
cpu0, // For Tutorial Backend Cpu0
cpu0el,
...
};
...
}
lbdex/src/modify/src/include/llvm/Object/ELFObjectFile.h
...
template <class ELFT>
StringRef ELFObjectFile<ELFT>::getFileFormatName() const {
switch (EF.getHeader()->e_ident[ELF::EI_CLASS]) {
case ELF::ELFCLASS32:
switch (EF.getHeader()->e_machine) {
...
case ELF::EM_CPU0: // llvm-objdump -t -r
return "ELF32-cpu0";
...
}
...
}
...
template <class ELFT>
unsigned ELFObjectFile<ELFT>::getArch() const {
bool IsLittleEndian = ELFT::TargetEndianness == support::little;
switch (EF.getHeader()->e_machine) {
...
case ELF::EM_CPU0: // llvm-objdump -t -r
switch (EF.getHeader()->e_ident[ELF::EI_CLASS]) {
case ELF::ELFCLASS32:
return IsLittleEndian ? Triple::cpu0el : Triple::cpu0;
default:
report_fatal_error("Invalid ELFCLASS!");
}
...
}
}
lbdex/src/modify/src/include/llvm/Support/ELF.h
enum {
...
EM_CPU0 = 999 // Document LLVM Backend Tutorial Cpu0
};
...
// Cpu0 Specific e_flags
enum {
EF_CPU0_NOREORDER = 0x00000001, // Don't reorder instructions
EF_CPU0_PIC = 0x00000002, // Position independent code
EF_CPU0_ARCH_32 = 0x50000000, // CPU032 instruction set per linux not elf.h
EF_CPU0_ARCH = 0xf0000000 // Mask for applying EF_CPU0_ARCH_ variant
};
lbdex/src/modify/src/lib/MC/MCSubtargetInfo.cpp
...
}
lbdex/src/modify/src/lib/MC/SubtargetFeature.cpp
FeatureBitset
SubtargetFeatures::ApplyFeatureFlag(FeatureBitset Bits, StringRef Feature,
ArrayRef<SubtargetFeatureKV> FeatureTable) {
...
if (!Cpu0DisableUnreconginizedMessage) // For Cpu0
...
}
FeatureBitset
SubtargetFeatures::getFeatureBits(StringRef CPU,
ArrayRef<SubtargetFeatureKV> CPUTable,
ArrayRef<SubtargetFeatureKV> FeatureTable) {
...
if (!Cpu0DisableUnreconginizedMessage) // For Cpu0
...
}
lib/object/ELF.cpp
...
include/llvm/Support/ELFRelocs/Cpu0.def
#ifndef ELF_RELOC
#error "ELF_RELOC must be defined"
#endif
ELF_RELOC(R_CPU0_NONE, 0)
ELF_RELOC(R_CPU0_32, 2)
ELF_RELOC(R_CPU0_HI16, 5)
ELF_RELOC(R_CPU0_LO16, 6)
ELF_RELOC(R_CPU0_GPREL16, 7)
ELF_RELOC(R_CPU0_LITERAL, 8)
ELF_RELOC(R_CPU0_GOT16, 9)
ELF_RELOC(R_CPU0_PC16, 10)
ELF_RELOC(R_CPU0_CALL16, 11)
ELF_RELOC(R_CPU0_GPREL32, 12)
ELF_RELOC(R_CPU0_PC24, 13)
ELF_RELOC(R_CPU0_GOT_HI16, 22)
ELF_RELOC(R_CPU0_GOT_LO16, 23)
ELF_RELOC(R_CPU0_RELGOT, 36)
ELF_RELOC(R_CPU0_TLS_GD, 42)
ELF_RELOC(R_CPU0_TLS_LDM, 43)
ELF_RELOC(R_CPU0_TLS_DTP_HI16, 44)
ELF_RELOC(R_CPU0_TLS_DTP_LO16, 45)
ELF_RELOC(R_CPU0_TLS_GOTTPREL, 46)
ELF_RELOC(R_CPU0_TLS_TPREL32, 47)
ELF_RELOC(R_CPU0_TLS_TP_HI16, 49)
ELF_RELOC(R_CPU0_TLS_TP_LO16, 50)
ELF_RELOC(R_CPU0_GLOB_DAT, 51)
ELF_RELOC(R_CPU0_JUMP_SLOT, 127)
lbdex/src/modify/src/lib/Support/Triple.cpp
...
}
...
static Triple::ArchType parseArch(StringRef ArchName) {
return StringSwitch<Triple::ArchType>(ArchName)
...
.Cases("cpu0", "cpu0eb", "cpu0allegrex", Triple::cpu0)
.Cases("cpu0el", "cpu0allegrexel", Triple::cpu0el)
...
}
...
static Triple::ObjectFormatType getDefaultFormat(const Triple &T) {
...
case Triple::cpu0:
case Triple::cpu0el:
...
}
...
static unsigned getArchPointerBitWidth(llvm::Triple::ArchType Arch) {
switch (Arch) {
...
case llvm::Triple::cpu0:
case llvm::Triple::cpu0el:
...
return 32;
}
}
...
Triple Triple::get32BitArchVariant() const {
Triple T(*this);
switch (getArch()) {
...
case Triple::cpu0:
case Triple::cpu0el:
...
// Already 32-bit.
break;
}
return T;
}
As it has been discussed in the previous section, LLVM uses target description files (which uses the .td file extension)
to describe various components of a target’s backend. For example, these .td files may describe a target’s register
set, instruction set, scheduling information for instructions, and calling conventions. When your backend is being
compiled, the tablegen tool that ships with LLVM will translate these .td files into C++ source code written to files
that have a .inc extension. Please refer to 21 for more information regarding how to use tablegen.
Every backend has its own .td to define some target information. These files have a similar syntax to C++. For Cpu0,
the target description file is called Cpu0Other.td, which is shown below:
21 http://llvm.org/docs/TableGen/index.html
lbdex/chapters/Chapter2/Cpu0Other.td
//===-- Cpu0Other.td - Describe the Cpu0 Target Machine ----*- tablegen -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
// This is the top level entry point for the Cpu0 target.
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Target-independent interfaces
//===----------------------------------------------------------------------===//
include "llvm/Target/Target.td"
//===----------------------------------------------------------------------===//
// Target-dependent interfaces
//===----------------------------------------------------------------------===//
include "Cpu0RegisterInfo.td"
include "Cpu0RegisterInfoGPROutForOther.td" // except AsmParser
include "Cpu0.td"
Cpu0Other.td and Cpu0.td includes a few other .td files. Cpu0RegisterInfo.td (shown below) describes the Cpu0’s
set of registers. In this file, we see that each register has been given a name. For example, “def PC” indi-
cates that there is a register name as PC. Beside of register information, it also define register class informa-
tion. You may have multiple register classes such as CPURegs, SR, C0Regs and GPROut. GPROut defined in
Cpu0RegisterInfoGPROutForOther.td which include CPURegs except SW, so SW won’t be allocated as the output
registers in register allocation stage.
lbdex/chapters/Chapter2/Cpu0RegisterInfo.td
//===----------------------------------------------------------------------===//
// Declarations that describe the CPU0 register file
//===----------------------------------------------------------------------===//
// Co-processor 0 Registers
class Cpu0C0Reg<bits<16> Enc, string n> : Cpu0Reg<Enc, n>;
//===----------------------------------------------------------------------===//
//@Registers
//===----------------------------------------------------------------------===//
// The register string, such as "9" or "gp" will show on "llvm-objdump -d"
//@ All registers definition
let Namespace = "Cpu0" in {
//@ General Purpose Registers
def ZERO : Cpu0GPRReg<0, "zero">, DwarfRegNum<[0]>;
def AT : Cpu0GPRReg<1, "1">, DwarfRegNum<[1]>;
def V0 : Cpu0GPRReg<2, "2">, DwarfRegNum<[2]>;
def V1 : Cpu0GPRReg<3, "3">, DwarfRegNum<[3]>;
def A0 : Cpu0GPRReg<4, "4">, DwarfRegNum<[4]>;
def A1 : Cpu0GPRReg<5, "5">, DwarfRegNum<[5]>;
def T9 : Cpu0GPRReg<6, "t9">, DwarfRegNum<[6]>;
def T0 : Cpu0GPRReg<7, "7">, DwarfRegNum<[7]>;
def T1 : Cpu0GPRReg<8, "8">, DwarfRegNum<[8]>;
def S0 : Cpu0GPRReg<9, "9">, DwarfRegNum<[9]>;
def S1 : Cpu0GPRReg<10, "10">, DwarfRegNum<[10]>;
def GP : Cpu0GPRReg<11, "gp">, DwarfRegNum<[11]>;
def FP : Cpu0GPRReg<12, "fp">, DwarfRegNum<[12]>;
def SP : Cpu0GPRReg<13, "sp">, DwarfRegNum<[13]>;
def LR : Cpu0GPRReg<14, "lr">, DwarfRegNum<[14]>;
def SW : Cpu0GPRReg<15, "sw">, DwarfRegNum<[15]>;
// def MAR : Register< 16, "mar">, DwarfRegNum<[16]>;
// def MDR : Register< 17, "mdr">, DwarfRegNum<[17]>;
//===----------------------------------------------------------------------===//
//@Register Classes
//===----------------------------------------------------------------------===//
lbdex/chapters/Chapter2/Cpu0RegisterInfoGPROutForOther.td
//===----------------------------------------------------------------------===//
// Register Classes
//===----------------------------------------------------------------------===//
In C++, class typically provides a structure to lay out some data and functions, while definitions are used to allocate
memory for specific instances of a class. For example:
The class Date has the members year, month, and day, but these do not yet belong to an actual object. By defining
an instance of Date called birthday, you have allocated memory for a specific object, and can set the year, month,
and day of this instance of the class.
In .td files, class describes the structure of how data is laid out, while definitions act as the specific instances of the
class. If you look back at the Cpu0RegisterInfo.td file, you will see a class called Cpu0Reg which is derived from
the Register class provided by LLVM. Cpu0Reg inherits all the fields that exist in the Register class. The “let
HWEncoding = Enc” which meaning assign field HWEncoding from parameter Enc. Since Cpu0 reserve 4 bits for 16
registers in instruction format, the assigned value range is from 0 to 15. Once assigning the 0 to 15 to HWEncoding,
the backend register number will be gotten from the function of llvm register class since TableGen will set this number
automatically.
The def keyword is used to create instances of class. In the following line, the ZERO register is defined as a member
of the Cpu0GPRReg class:
The def ZERO indicates the name of this register. <0, “ZERO”> are the parameters used when creating this specific
instance of the Cpu0GPRReg class, thus the field Enc is set to 0, and the string n is set to ZERO.
As the register lives in the Cpu0 namespace, you can refer to the ZERO register in backend C++ code by using
Cpu0::ZERO.
Notice the use of the let expressions: these allow you to override values that are initially defined in a superclass. For
example, let Namespace = “Cpu0” in the Cpu0Reg class will override the default namespace declared in Register
class. The Cpu0RegisterInfo.td also defines that CPURegs is an instance of the class RegisterClass, which is an
built-in LLVM class. A RegisterClass is a set of Register instances, thus CPURegs can be described as a set of
registers.
The Cpu0 instructions td is named to Cpu0InstrInfo.td which contents as follows,
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
//===- Cpu0InstrInfo.td - Target Description for Cpu0 Target -*- tablegen -*-=//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains the Cpu0 implementation of the TargetInstrInfo class.
//
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Cpu0 profiles and nodes
//===----------------------------------------------------------------------===//
// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDTNone,
[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
//===----------------------------------------------------------------------===//
// Instruction format superclass
//===----------------------------------------------------------------------===//
include "Cpu0InstrFormats.td"
//===----------------------------------------------------------------------===//
// Cpu0 Operand, Complex Patterns and Transformations Definitions.
//===----------------------------------------------------------------------===//
// Instruction operand types
// Signed Operand
def simm16 : Operand<i32> {
let DecoderMethod= "DecodeSimm16";
}
// Address operand
def mem : Operand<iPTR> {
let PrintMethod = "printMemOperand";
let MIOperandInfo = (ops CPURegs, simm16);
let EncoderMethod = "getMemEncoding";
}
//===----------------------------------------------------------------------===//
// Load/Store PatFrags.
def load_a : AlignedLoad<load>;
def store_a : AlignedStore<store>;
//===----------------------------------------------------------------------===//
// Instructions specific format
//===----------------------------------------------------------------------===//
class FMem<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin>: FL<op, outs, ins, asmstr, pattern, itin> {
bits<20> addr;
let Inst{19-16} = addr{19-16};
let Inst{15-0} = addr{15-0};
let DecoderMethod = "DecodeMem";
}
// Memory Load/Store
let canFoldAsLoad = 1 in
class LoadM<bits<8> op, string instr_asm, PatFrag OpNode, RegisterClass RC,
Operand MemOpnd, bit Pseudo>:
FMem<op, (outs RC:$ra), (ins MemOpnd:$addr),
!strconcat(instr_asm, "\t$ra, $addr"),
[(set RC:$ra, (OpNode addr:$addr))], IILoad> {
let isPseudo = Pseudo;
}
// 32-bit store.
multiclass StoreM32<bits<8> op, string instr_asm, PatFrag OpNode,
bit Pseudo = 0> {
def #NAME# : StoreM<op, instr_asm, OpNode, CPURegs, mem, Pseudo>;
}
//@JumpFR {
let isBranch=1, isTerminator=1, isBarrier=1, imm16=0, hasDelaySlot = 1,
isIndirectBranch = 1 in
class JumpFR<bits<8> op, string instr_asm, RegisterClass RC>:
FL<op, (outs), (ins RC:$ra),
!strconcat(instr_asm, "\t$ra"), [(brind RC:$ra)], IIBranch> {
let rb = 0;
let imm16 = 0;
}
//@JumpFR }
// Return instruction
class RetBase<RegisterClass RC>: JumpFR<0x3c, "ret", RC> {
let isReturn = 1;
let isCodeGenOnly = 1;
let hasCtrlDep = 1;
let hasExtraSrcRegAllocReq = 1;
}
//===----------------------------------------------------------------------===//
// Instruction definition
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Cpu0 Instructions
//===----------------------------------------------------------------------===//
/// No operation
let addr=0 in
def NOP : FJ<0, (outs), (ins), "nop", [], IIAlu>;
//===----------------------------------------------------------------------===//
// Arbitrary patterns that map to one or more instructions
//===----------------------------------------------------------------------===//
// Small immediates
def : Pat<(i32 immSExt16:$in),
(ADDiu ZERO, imm:$in)>;
lbdex/chapters/Chapter2/Cpu0InstrFormats.td
//===----------------------------------------------------------------------===//
// Describe CPU0 instructions format
//
// CPU INSTRUCTION FORMATS
//
// opcode - operation code.
// ra - dst reg, only used on 3 regs instr.
// rb - src reg.
// rc - src reg (on a 3 reg instr).
// cx - immediate
//
//===----------------------------------------------------------------------===//
// Format specifies the encoding used by the instruction. This is part of the
// ad-hoc solution used to emit machine instruction encodings by our machine
// code emitter.
class Format<bits<4> val> {
bits<4> Value = val;
}
let Size = 4;
bits<8> Opcode = 0;
//
// Attributes specific to Cpu0 instructions...
//
bits<4> FormBits = Form.Value;
//===----------------------------------------------------------------------===//
// Format A instruction class in Cpu0 : <|opcode|ra|rb|rc|cx|>
//===----------------------------------------------------------------------===//
//@class FL {
//===----------------------------------------------------------------------===//
// Format L instruction class in Cpu0 : <|opcode|ra|rb|cx|>
//===----------------------------------------------------------------------===//
class FL<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmL>
{
bits<4> ra;
bits<4> rb;
bits<16> imm16;
//===----------------------------------------------------------------------===//
// Format J instruction class in Cpu0 : <|opcode|address|>
//===----------------------------------------------------------------------===//
class FJ<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmJ>
{
bits<24> addr;
ADDiu is a instance of class ArithLogicI inherited from FL, it can be expanded and get member value further as
follows,
So,
op = 0x09
instr_asm = “addiu”
OpNode = add
Od = simm16
imm_type = immSExt16
RC = CPURegs
lbdex/chapters/Chapter2/Cpu0Schedule.td
//===----------------------------------------------------------------------===//
// Functional units across Cpu0 chips sets. Based on GCC/Cpu0 backend files.
//===----------------------------------------------------------------------===//
def ALU : FuncUnit;
def IMULDIV : FuncUnit;
//===----------------------------------------------------------------------===//
// Instruction Itinerary classes used for Cpu0
//===----------------------------------------------------------------------===//
def IIAlu : InstrItinClass;
def II_CLO : InstrItinClass;
def II_CLZ : InstrItinClass;
def IILoad : InstrItinClass;
def IIStore : InstrItinClass;
def IIBranch : InstrItinClass;
//===----------------------------------------------------------------------===//
// Cpu0 Generic instruction itineraries.
//===----------------------------------------------------------------------===//
//@ http://llvm.org/docs/doxygen/html/structllvm_1_1InstrStage.html
def Cpu0GenericItineraries : ProcessorItineraries<[ALU, IMULDIV], [], [
//@2
InstrItinData<IIAlu , [InstrStage<1, [ALU]>]>,
InstrItinData<II_CLO , [InstrStage<1, [ALU]>]>,
InstrItinData<II_CLZ , [InstrStage<1, [ALU]>]>,
InstrItinData<IILoad , [InstrStage<3, [ALU]>]>,
InstrItinData<IIStore , [InstrStage<1, [ALU]>]>,
InstrItinData<IIBranch , [InstrStage<1, [ALU]>]>
]>;
Target/Cpu0 directory has two files CMakeLists.txt and LLVMBuild.txt, contents as follows,
lbdex/chapters/Chapter2/CMakeLists.txt
set(LLVM_TARGET_DEFINITIONS Cpu0Other.td)
add_subdirectory(TargetInfo)
add_subdirectory(MCTargetDesc)
lbdex/chapters/Chapter2/LLVMBuild.txt
[common]
subdirectories =
MCTargetDesc TargetInfo
[component_0]
# TargetGroup components are an extension of LibraryGroups, specifically for
# defining LLVM targets (which are handled specially in a few places).
type = TargetGroup
# The name of the component should always be the name of the target. (should
# match "def Cpu0 : Target" in Cpu0.td)
name = Cpu0
# Cpu0 component is located in directory Target/
parent = Target
# Whether this target defines an assembly parser, assembly printer, disassembler
# , and supports JIT compilation. They are optional.
[component_1]
# component_1 is a Library type and name is Cpu0CodeGen. After build it will
# in lib/libLLVMCpu0CodeGen.a of your build command directory.
type = Library
name = Cpu0CodeGen
# Cpu0CodeGen component(Library) is located in directory Cpu0/
parent = Cpu0
# If given, a list of the names of Library or LibraryGroup components which
# must also be linked in whenever this library is used. That is, the link time
# dependencies for this component. When tools are built, the build system will
# include the transitive closure of all required_libraries for the components
# the tool needs.
required_libraries =
CodeGen Core MC
Cpu0Desc
Cpu0Info
SelectionDAG
Support
Target
# end of required_libraries
CMakeLists.txt is the make information for cmake and # is comment. File LLVMBuild.txt is written in a sim-
ple variant of the INI or configuration file format. Comments are prefixed by # in both files. We explain the
setting for these two files in comments. Please read it. The “tablegen(” in above CMakeLists.txt is defined in
cmake/modules/TableGen.cmake as below,
src/cmake/modules/TableGen.cmake
src/utils/TableGen/CMakeLists.txt
add_tablegen(llvm-tblgen LLVM
...
)
src/cmake/modules/TableGen.cmake
...
endfunction()
Since execution file llvm-tblgen is built before compiling any llvm backend source code during building llvm, the
llvm-tblgen is always ready for backend’s TableGen reguest.
This book breaks the whole backend source code by function, add code chapter by chapter. Don’t try to understand
everything in the text of book, the code added in each chapter is a reading material too. To understand the computer
related knowledge in concept, you can ignore source code, but implementing based on an existed open software cannot.
In programming, documentation cannot replace the source code totally. Reading source code is a big opportunity in
the open source development.
Both CMakeLists.txt and LLVMBuild.txt coexist in sub-directories MCTargetDesc and TargetInfo. The contents of
MakeLists.txt and LLVMBuild.txt in these two directories instruct llvm generating Cpu0Desc and Cpu0Info libraries,
repectively. After building, you will find three libraries: libLLVMCpu0CodeGen.a, libLLVMCpu0Desc.a and
libLLVMCpu0Info.a in lib/ of your build directory. For more details please see “Building LLVM with CMake”
16
and “LLVMBuild Guide” 17 .
You must also register your target with the TargetRegistry. After registration, llvm tools are able to lookup and use
your target at runtime. The TargetRegistry can be used directly, but for most targets there are helper templates which
should take care of the work for you.
All targets should declare a global Target object which is used to represent the target during registration. Then, in the
target’s TargetInfo library, the target should define that object and use the RegisterTarget template to register the target.
For example, the file TargetInfo/Cpu0TargetInfo.cpp register TheCpu0Target for big endian and TheCpu0elTarget for
little endian, as follows.
lbdex/chapters/Chapter2/Cpu0.h
//===-- Cpu0.h - Top-level interface for Cpu0 representation ----*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
16 http://llvm.org/docs/CMake.html
17 http://llvm.org/docs/LLVMBuild.html
//===----------------------------------------------------------------------===//
//
// This file contains the entry points for global functions defined in
// the LLVM Cpu0 back-end.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_CPU0_H
#define LLVM_LIB_TARGET_CPU0_CPU0_H
#include "Cpu0Config.h"
#include "MCTargetDesc/Cpu0MCTargetDesc.h"
#include "llvm/Target/TargetMachine.h"
namespace llvm {
class Cpu0TargetMachine;
class FunctionPass;
#endif
lbdex/chapters/Chapter2/TargetInfo/Cpu0TargetInfo.cpp
#include "Cpu0.h"
#include "llvm/IR/Module.h"
#include "llvm/Support/TargetRegistry.h"
using namespace llvm;
RegisterTarget<Triple::cpu0el,
/*HasJIT=*/true> Y(TheCpu0elTarget, "cpu0el", "Cpu0el");
}
lbdex/chapters/Chapter2/TargetInfo/CMakeLists.txt
add_llvm_library(LLVMCpu0Info
Cpu0TargetInfo.cpp
)
lbdex/chapters/Chapter2/TargetInfo/LLVMBuild.txt
[component_0]
type = Library
name = Cpu0Info
parent = Cpu0
required_libraries = Support
add_to_library_groups = Cpu0
Files Cpu0TargetMachine.cpp and MCTargetDesc/Cpu0MCTargetDesc.cpp just define the empty initialize function
since we register nothing for this moment.
lbdex/chapters/Chapter2/Cpu0TargetMachine.cpp
#include "Cpu0TargetMachine.h"
#include "Cpu0.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/Support/TargetRegistry.h"
using namespace llvm;
lbdex/chapters/Chapter2/MCTargetDesc/Cpu0MCTargetDesc.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCTARGETDESC_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCTARGETDESC_H
#include "Cpu0Config.h"
#include "llvm/Support/DataTypes.h"
namespace llvm {
class Target;
class Triple;
// Defines symbolic names for Cpu0 registers. This defines a mapping from
// register name to register number.
#define GET_REGINFO_ENUM
#include "Cpu0GenRegisterInfo.inc"
#define GET_SUBTARGETINFO_ENUM
#include "Cpu0GenSubtargetInfo.inc"
#endif
lbdex/chapters/Chapter2/MCTargetDesc/Cpu0MCTargetDesc.cpp
#include "Cpu0MCTargetDesc.h"
#include "llvm/MC/MachineLocation.h"
#include "llvm/MC/MCELFStreamer.h"
#include "llvm/MC/MCInstrAnalysis.h"
#include "llvm/MC/MCInstPrinter.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/TargetRegistry.h"
#define GET_INSTRINFO_MC_DESC
#include "Cpu0GenInstrInfo.inc"
#define GET_SUBTARGETINFO_MC_DESC
#include "Cpu0GenSubtargetInfo.inc"
#define GET_REGINFO_MC_DESC
#include "Cpu0GenRegisterInfo.inc"
//@2 {
extern "C" void LLVMInitializeCpu0TargetMC() {
}
//@2 }
lbdex/chapters/Chapter2/MCTargetDesc/CMakeLists.txt
# MCTargetDesc/CMakeLists.txt
add_llvm_library(LLVMCpu0Desc
Cpu0MCTargetDesc.cpp
)
lbdex/chapters/Chapter2/MCTargetDesc/LLVMBuild.txt
[component_0]
type = Library
name = Cpu0Desc
parent = Cpu0
required_libraries = MC
Cpu0Info
Support
add_to_library_groups = Cpu0
18 http://llvm.org/docs/WritingAnLLVMBackend.html#target-registration
19 http://clang.llvm.org/get_started.html
~/llvm/test/src/lib/Target/Cpu0/Cpu0SetChapter.h
#define CH CH2
Now, run the cmake command and Xcode to build td (the following cmake command is for my setting),
118-165-78-230:cmake_debug_build Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -G "Xcode" ../src/
-- Targeting Cpu0
...
-- Targeting XCore
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/Jonathan/llvm/test/cmake_debug_build
118-165-78-230:cmake_debug_build Jonathan$
After build, you can type command llc -version to find the cpu0 backend,
118-165-78-230:cmake_debug_build Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc --version
LLVM (http://llvm.org/):
...
Registered Targets:
arm - ARM
...
cpp - C++ backend
cpu0 - Cpu0
cpu0el - Cpu0el
...
The llc -version can display Registered Targets “cpu0” and “cpu0el”, because the code in file Target-
Info/Cpu0TargetInfo.cpp we made in last sub-section “Target Registration” 20 .
Let’s build lbdex/chapters/Chapter2 code as follows,
118-165-75-57:test Jonathan$ pwd
/Users/Jonathan/test
118-165-75-57:test Jonathan$ cp -rf lbdex/Cpu0 ~/llvm/test/src/lib/Target/.
In order to save time, we build Cpu0 target only by option -DLLVM_TARGETS_TO_BUILD=Cpu0. Af-
ter cmake, please open Xcode and build the Xcode project file as appendix A, or refer appendix A to
build it on linux if you work on unix/linux platform. After that, you can find the *.inc files in directory
/Users/Jonathan/llvm/test/cmake_debug_build/lib/Target/Cpu0 as follows,
cmake_debug_build/lib/Target/Cpu0/Cpu0GenRegisterInfo.inc
namespace Cpu0 {
enum {
NoRegister,
AT = 1,
EPC = 2,
FP = 3,
GP = 4,
HI = 5,
LO = 6,
LR = 7,
PC = 8,
SP = 9,
SW = 10,
ZERO = 11,
A0 = 12,
A1 = 13,
S0 = 14,
S1 = 15,
T0 = 16,
T1 = 17,
T9 = 18,
V0 = 19,
V1 = 20,
NUM_TARGET_REGS // 21
};
}
...
lbdex/input/ch3.cpp
int main()
{
return 0;
}
First step, compile it with clang and get output ch3.bc as follows,
As above, compile C to .bc by clang -target mips-unknown-linux-gnu because Cpu0 borrows the ABI
from Mips. Next step, transfer bitcode .bc to human readable text format as follows,
// ch3.ll
; ModuleID = 'ch3.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
2:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:6
4-S128"
target triple = "mips-unknown-linux-gnu"
Now, when compiling ch3.bc will get the error message as follows,
At this point, we finish the Target Registration for Cpu0 backend. The backend compiler command llc can recognize
Cpu0 backend now. Currently we just define target td files (Cpu0.td, Cpu0Other.td, Cpu0RegisterInfo.td, ...). Accord-
ing to LLVM structure, we need to define our target machine and include those td related files. The error message says
we didn’t define our target machine. This book is a step-by-step backend delvelopment. You can review the houndreds
lines of Chapter2 example code to see how to do the Target Registration.
THREE
BACKEND STRUCTURE
• TargetMachine structure
• Add AsmPrinter
• Add Cpu0DAGToDAGISel class
• Handle return register $lr
• Add Prologue/Epilogue functions
– Concept
– Prologue and Epilogue functions
– Handle stack slot for local variables
– Large stack
• Data operands DAGs
• Summary of this Chapter
This chapter introduces the backend class inheritance tree and class members first. Next, following the backend
structure, adding individual classes implementation in each section. At the end of this chapter, we will have a backend
to compile llvm intermediate code into Cpu0 assembly code.
Many lines of code are added in this chapter. They almost are common in every backend except the backend name
(Cpu0 or Mips ...). Actually, we copy almost all the code from Mips and replace the name with Cpu0. In addition to
knowing the DAGs pattern match in theoretic compiler and realistic llvm code generation phase, please focus on the
classes relationship in this backend structure. Once knowing the structure, you can create your backend structure as
quickly as we did, even though there are 5000 lines of code around added in this chapter.
lbdex/chapters/Chapter3_1/Cpu0TargetObjectFile.h
63
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_CPU0TARGETOBJECTFILE_H
#define LLVM_LIB_TARGET_CPU0_CPU0TARGETOBJECTFILE_H
#include "Cpu0Config.h"
#include "Cpu0TargetMachine.h"
#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
namespace llvm {
class Cpu0TargetMachine;
class Cpu0TargetObjectFile : public TargetLoweringObjectFileELF {
MCSection *SmallDataSection;
MCSection *SmallBSSSection;
const Cpu0TargetMachine *TM;
public:
};
} // end namespace llvm
#endif
lbdex/chapters/Chapter3_1/Cpu0TargetObjectFile.cpp
#include "Cpu0TargetObjectFile.h"
#include "Cpu0Subtarget.h"
#include "Cpu0TargetMachine.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/GlobalVariable.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ELF.h"
#include "llvm/Target/TargetMachine.h"
using namespace llvm;
static cl::opt<unsigned>
SSThreshold("cpu0-ssection-threshold", cl::Hidden,
cl::desc("Small data and bss section threshold size (default=8)"),
cl::init(8));
SmallDataSection = getContext().getELFSection(
".sdata", ELF::SHT_PROGBITS, ELF::SHF_WRITE | ELF::SHF_ALLOC);
lbdex/chapters/Chapter3_1/Cpu0TargetMachine.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0TARGETMACHINE_H
#define LLVM_LIB_TARGET_CPU0_CPU0TARGETMACHINE_H
#include "Cpu0Config.h"
#include "MCTargetDesc/Cpu0ABIInfo.h"
#include "Cpu0Subtarget.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/SelectionDAGISel.h"
#include "llvm/Target/TargetFrameLowering.h"
#include "llvm/Target/TargetMachine.h"
namespace llvm {
class formatted_raw_ostream;
class Cpu0RegisterInfo;
#endif
lbdex/chapters/Chapter3_1/Cpu0TargetMachine.cpp
//
// Implements the info about Cpu0 target spec.
//
//===----------------------------------------------------------------------===//
#include "Cpu0TargetMachine.h"
#include "Cpu0.h"
#include "Cpu0Subtarget.h"
#include "Cpu0TargetObjectFile.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/Support/TargetRegistry.h"
using namespace llvm;
Ret += "-m:m";
// 8 and 16 bit integers only need to have natural alignment, but try to
// align them to 32 bits. 64 bit integers have natural alignment.
Ret += "-i8:8:32-i16:16:32-i64:64";
// 32 bit registers are always available and the stack is at least 64 bit
// aligned.
Ret += "-n32-S64";
return Ret;
}
Cpu0TargetMachine::~Cpu0TargetMachine() {}
void Cpu0ebTargetMachine::anchor() { }
void Cpu0elTargetMachine::anchor() { }
const Cpu0Subtarget *
Cpu0TargetMachine::getSubtargetImpl(const Function &F) const {
Attribute CPUAttr = F.getFnAttribute("target-cpu");
Attribute FSAttr = F.getFnAttribute("target-features");
: TargetFS;
namespace {
//@Cpu0PassConfig {
/// Cpu0 Code Generator Pass Configuration Options.
class Cpu0PassConfig : public TargetPassConfig {
public:
Cpu0PassConfig(Cpu0TargetMachine *TM, PassManagerBase &PM)
: TargetPassConfig(TM, PM) {}
include/llvm/Target/TargetInstInfo.h
lbdex/chapters/Chapter3_1/Cpu0.td
include "Cpu0CallingConv.td"
// Without this will have error: 'cpu032I' is not a recognized processor for
// this target (ignoring processor)
//===----------------------------------------------------------------------===//
// Cpu0 Subtarget features //
//===----------------------------------------------------------------------===//
FeatureChapter8_2, FeatureChapter9_1,
FeatureChapter9_2, FeatureChapter9_3,
FeatureChapter10_1,
FeatureChapter11_1, FeatureChapter11_2,
FeatureChapter12_1]>;
//===----------------------------------------------------------------------===//
// Cpu0 processors supported.
//===----------------------------------------------------------------------===//
lbdex/chapters/Chapter3_1/Cpu0CallingConv.td
lbdex/chapters/Chapter3_1/Cpu0FrameLowering.h
//===-- Cpu0FrameLowering.h - Define frame lowering for Cpu0 ----*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
//
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_CPU0FRAMELOWERING_H
#define LLVM_LIB_TARGET_CPU0_CPU0FRAMELOWERING_H
#include "Cpu0Config.h"
#include "Cpu0.h"
#include "llvm/Target/TargetFrameLowering.h"
namespace llvm {
class Cpu0Subtarget;
public:
explicit Cpu0FrameLowering(const Cpu0Subtarget &sti, unsigned Alignment)
: TargetFrameLowering(StackGrowsDown, Alignment, 0, Alignment),
STI(sti) {
}
};
#endif
lbdex/chapters/Chapter3_1/Cpu0FrameLowering.cpp
#include "Cpu0FrameLowering.h"
#include "Cpu0InstrInfo.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0Subtarget.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetOptions.h"
//===----------------------------------------------------------------------===//
//
// Stack Frame Processing methods
// +----------------------------+
//
// The stack is allocated decrementing the stack pointer on
// the first instruction of a function prologue. Once decremented,
// all stack references are done thought a positive offset
// from the stack/frame pointer, so the stack is considering
// to grow up! Otherwise terrible hacks would have to be made
// to get this stack ABI compliant :)
//
// The stack frame required by the ABI (after call):
// Offset
//
// 0 ----------
// 4 Args to pass
// . saved $GP (used in PIC)
// . Alloca allocations
// . Local Area
// . CPU "Callee Saved" Registers
// . saved FP
// . saved RA
// . FPU "Callee Saved" Registers
// StackSize -----------
//
// Offset - offset from sp after stack allocation on function prologue
//
// The sp is the stack pointer subtracted/added from the stack size
// at the Prologue/Epilogue
//
// hasFP - Return true if the specified function should have a dedicated frame
// pointer register. This is true if the function has variable sized allocas,
// if it needs dynamic stack realignment, if frame pointer elimination is
// disabled, or if the frame address is taken.
bool Cpu0FrameLowering::hasFP(const MachineFunction &MF) const {
const MachineFrameInfo *MFI = MF.getFrameInfo();
const TargetRegisterInfo *TRI = STI.getRegisterInfo();
return MF.getTarget().Options.DisableFramePointerElim(MF) ||
MFI->hasVarSizedObjects() || MFI->isFrameAddressTaken() ||
TRI->needsStackRealignment(MF);
}
lbdex/chapters/Chapter3_1/Cpu0SEFrameLowering.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0SEFRAMELOWERING_H
#define LLVM_LIB_TARGET_CPU0_CPU0SEFRAMELOWERING_H
#include "Cpu0Config.h"
#include "Cpu0FrameLowering.h"
namespace llvm {
/// emitProlog/emitEpilog - These methods insert prolog and epilog code into
/// the function.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
};
#endif
lbdex/chapters/Chapter3_1/Cpu0SEFrameLowering.cpp
#include "Cpu0SEFrameLowering.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0SEInstrInfo.h"
#include "Cpu0Subtarget.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetOptions.h"
//@emitPrologue {
void Cpu0SEFrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {
}
//}
//@emitEpilogue {
void Cpu0SEFrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {
}
//}
const Cpu0FrameLowering *
llvm::createCpu0SEFrameLowering(const Cpu0Subtarget &ST) {
return new Cpu0SEFrameLowering(ST);
}
lbdex/chapters/Chapter3_1/Cpu0InstrInfo.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0INSTRINFO_H
#define LLVM_LIB_TARGET_CPU0_CPU0INSTRINFO_H
#include "Cpu0Config.h"
#include "Cpu0.h"
#include "Cpu0RegisterInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/Target/TargetInstrInfo.h"
#define GET_INSTRINFO_HEADER
#include "Cpu0GenInstrInfo.inc"
namespace llvm {
/// Return the number of bytes of code the specified instruction may be.
unsigned GetInstSizeInBytes(const MachineInstr &MI) const;
protected:
};
const Cpu0InstrInfo *createCpu0SEInstrInfo(const Cpu0Subtarget &STI);
}
#endif
lbdex/chapters/Chapter3_1/Cpu0InstrInfo.cpp
#include "Cpu0InstrInfo.h"
#include "Cpu0TargetMachine.h"
#include "Cpu0MachineFunction.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TargetRegistry.h"
#define GET_INSTRINFO_CTOR_DTOR
#include "Cpu0GenInstrInfo.inc"
//@Cpu0InstrInfo {
Cpu0InstrInfo::Cpu0InstrInfo(const Cpu0Subtarget &STI)
:
Subtarget(STI) {}
//@GetInstSizeInBytes {
/// Return the number of bytes of code the specified instruction may be.
unsigned Cpu0InstrInfo::GetInstSizeInBytes(const MachineInstr &MI) const {
//@GetInstSizeInBytes - body
switch (MI.getOpcode()) {
default:
return MI.getDesc().getSize();
}
}
lbdex/chapters/Chapter3_1/Cpu0InstrInfo.td
//===----------------------------------------------------------------------===//
// Cpu0 Instruction Predicate Definitions.
//===----------------------------------------------------------------------===//
AssemblerPredicate<"FeatureChapter11_1">;
def Ch11_2 : Predicate<"Subtarget->hasChapter11_2()">,
AssemblerPredicate<"FeatureChapter11_2">;
def Ch12_1 : Predicate<"Subtarget->hasChapter12_1()">,
AssemblerPredicate<"FeatureChapter12_1">;
def Ch_all : Predicate<"Subtarget->hasChapterAll()">,
AssemblerPredicate<"FeatureChapterAll">;
lbdex/chapters/Chapter3_1/Cpu0ISelLowering.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0ISELLOWERING_H
#define LLVM_LIB_TARGET_CPU0_CPU0ISELLOWERING_H
#include "Cpu0Config.h"
#include "MCTargetDesc/Cpu0ABIInfo.h"
#include "Cpu0.h"
#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/IR/Function.h"
#include "llvm/Target/TargetLowering.h"
#include <deque>
namespace llvm {
namespace Cpu0ISD {
enum NodeType {
// Start the numbering from where ISD NodeType finishes.
FIRST_NUMBER = ISD::BUILTIN_OP_END,
// Tail call
TailCall,
// Thread Pointer
ThreadPointer,
// Return
Ret,
EH_RETURN,
// DivRem(u)
DivRem,
DivRemU,
Wrapper,
DynAlloc,
Sync
};
}
//===--------------------------------------------------------------------===//
// TargetLowering Implementation
//===--------------------------------------------------------------------===//
class Cpu0FunctionInfo;
class Cpu0Subtarget;
//@class Cpu0TargetLowering
class Cpu0TargetLowering : public TargetLowering {
public:
explicit Cpu0TargetLowering(const Cpu0TargetMachine &TM,
const Cpu0Subtarget &STI);
protected:
protected:
// Subtarget Info
const Cpu0Subtarget &Subtarget;
// Cache the ABI from the TargetMachine, we use it everywhere.
const Cpu0ABIInfo &ABI;
private:
};
const Cpu0TargetLowering *
createCpu0SETargetLowering(const Cpu0TargetMachine &TM, const Cpu0Subtarget &STI);
}
#endif // Cpu0ISELLOWERING_H
lbdex/chapters/Chapter3_1/Cpu0ISelLowering.cpp
#include "Cpu0MachineFunction.h"
#include "Cpu0TargetMachine.h"
#include "Cpu0TargetObjectFile.h"
#include "Cpu0Subtarget.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/CallingConv.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/GlobalVariable.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
//@3_1 1 {
const char *Cpu0TargetLowering::getTargetNodeName(unsigned Opcode) const {
switch (Opcode) {
case Cpu0ISD::JmpLink: return "Cpu0ISD::JmpLink";
case Cpu0ISD::TailCall: return "Cpu0ISD::TailCall";
case Cpu0ISD::Hi: return "Cpu0ISD::Hi";
case Cpu0ISD::Lo: return "Cpu0ISD::Lo";
case Cpu0ISD::GPRel: return "Cpu0ISD::GPRel";
case Cpu0ISD::Ret: return "Cpu0ISD::Ret";
case Cpu0ISD::EH_RETURN: return "Cpu0ISD::EH_RETURN";
case Cpu0ISD::DivRem: return "Cpu0ISD::DivRem";
case Cpu0ISD::DivRemU: return "Cpu0ISD::DivRemU";
case Cpu0ISD::Wrapper: return "Cpu0ISD::Wrapper";
default: return NULL;
}
}
//@3_1 1 }
//@Cpu0TargetLowering {
Cpu0TargetLowering::Cpu0TargetLowering(const Cpu0TargetMachine &TM,
const Cpu0Subtarget &STI)
: TargetLowering(TM), Subtarget(STI), ABI(TM.getABI()) {
//===----------------------------------------------------------------------===//
// Lower helper functions
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Misc Lower Operation implementation
//===----------------------------------------------------------------------===//
#include "Cpu0GenCallingConv.inc"
//===----------------------------------------------------------------------===//
//@ Formal Arguments Calling Convention Implementation
//===----------------------------------------------------------------------===//
//@LowerFormalArguments {
/// LowerFormalArguments - transform physical registers into virtual registers
/// and generate load operations for arguments places on the stack.
SDValue
Cpu0TargetLowering::LowerFormalArguments(SDValue Chain,
CallingConv::ID CallConv,
bool IsVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals)
const {
return Chain;
}
// @LowerFormalArguments }
//===----------------------------------------------------------------------===//
//@ Return Value Calling Convention Implementation
//===----------------------------------------------------------------------===//
SDValue
Cpu0TargetLowering::LowerReturn(SDValue Chain,
CallingConv::ID CallConv, bool IsVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals,
const SDLoc &DL, SelectionDAG &DAG) const {
return DAG.getNode(Cpu0ISD::Ret, DL, MVT::Other,
Chain, DAG.getRegister(Cpu0::LR, MVT::i32));
}
lbdex/chapters/Chapter3_1/Cpu0SEISelLowering.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0SEISELLOWERING_H
#define LLVM_LIB_TARGET_CPU0_CPU0SEISELLOWERING_H
#include "Cpu0Config.h"
#include "Cpu0ISelLowering.h"
#include "Cpu0RegisterInfo.h"
namespace llvm {
class Cpu0SETargetLowering : public Cpu0TargetLowering {
public:
explicit Cpu0SETargetLowering(const Cpu0TargetMachine &TM,
const Cpu0Subtarget &STI);
#endif // Cpu0ISEISELLOWERING_H
lbdex/chapters/Chapter3_1/Cpu0SEISelLowering.cpp
#include "Cpu0RegisterInfo.h"
#include "Cpu0TargetMachine.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"
static cl::opt<bool>
EnableCpu0TailCalls("enable-cpu0-tail-calls", cl::Hidden,
cl::desc("CPU0: Enable tail calls."), cl::init(false));
//@Cpu0SETargetLowering {
Cpu0SETargetLowering::Cpu0SETargetLowering(const Cpu0TargetMachine &TM,
const Cpu0Subtarget &STI)
: Cpu0TargetLowering(TM, STI) {
//@Cpu0SETargetLowering body {
const Cpu0TargetLowering *
llvm::createCpu0SETargetLowering(const Cpu0TargetMachine &TM,
const Cpu0Subtarget &STI) {
return new Cpu0SETargetLowering(TM, STI);
}
lbdex/chapters/Chapter3_1/Cpu0MachineFunction.h
//===-- Cpu0MachineFunctionInfo.h - Private data used for Cpu0 ----*- C++ -*-=//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file declares the Cpu0 specific subclass of MachineFunctionInfo.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_CPU0MACHINEFUNCTION_H
#define LLVM_LIB_TARGET_CPU0_CPU0MACHINEFUNCTION_H
#include "Cpu0Config.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/Target/TargetFrameLowering.h"
#include "llvm/Target/TargetMachine.h"
#include <map>
namespace llvm {
//@1 {
/// Cpu0FunctionInfo - This class is derived from MachineFunction private
/// Cpu0 target-specific information for each MachineFunction.
class Cpu0FunctionInfo : public MachineFunctionInfo {
public:
Cpu0FunctionInfo(MachineFunction& MF)
: MF(MF),
VarArgsFrameIndex(0),
MaxCallFrameSize(0)
{}
~Cpu0FunctionInfo();
private:
virtual void anchor();
MachineFunction& MF;
unsigned MaxCallFrameSize;
};
//@1 }
#endif // CPU0_MACHINE_FUNCTION_INFO_H
lbdex/chapters/Chapter3_1/Cpu0MachineFunction.cpp
#include "Cpu0MachineFunction.h"
#include "Cpu0InstrInfo.h"
#include "Cpu0Subtarget.h"
#include "llvm/IR/Function.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
bool FixGlobalBaseReg;
Cpu0FunctionInfo::~Cpu0FunctionInfo() {}
void Cpu0FunctionInfo::anchor() { }
lbdex/chapters/Chapter3_1/MCTargetDesc/Cpu0ABIInfo.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0ABIINFO_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0ABIINFO_H
#include "Cpu0Config.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/Triple.h"
#include "llvm/IR/CallingConv.h"
#include "llvm/MC/MCRegisterInfo.h"
namespace llvm {
class MCTargetOptions;
class StringRef;
class TargetRegisterClass;
class Cpu0ABIInfo {
public:
enum class ABI { Unknown, O32, S32 };
protected:
ABI ThisABI;
public:
Cpu0ABIInfo(ABI ThisABI) : ThisABI(ThisABI) {}
/// Obtain the size of the area allocated by the callee for arguments.
/// CallingConv::FastCall affects the value for O32.
unsigned GetCalleeAllocdArgSizeInBytes(CallingConv::ID CC) const;
#endif
lbdex/chapters/Chapter3_1/MCTargetDesc/Cpu0ABIInfo.cpp
#include "Cpu0Config.h"
#include "Cpu0ABIInfo.h"
#include "Cpu0RegisterInfo.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/StringSwitch.h"
#include "llvm/MC/MCTargetOptions.h"
#include "llvm/Support/CommandLine.h"
static cl::opt<bool>
EnableCpu0S32Calls("cpu0-s32-calls", cl::Hidden,
cl::desc("CPU0 S32 call: use stack only to pass arguments.\
"), cl::init(false));
namespace {
static const MCPhysReg O32IntRegs[4] = {Cpu0::A0, Cpu0::A1};
static const MCPhysReg S32IntRegs = {};
}
Cpu0ABIInfo Cpu0ABIInfo::computeTargetABI() {
Cpu0ABIInfo abi(ABI::Unknown);
if (EnableCpu0S32Calls)
abi = ABI::S32;
else
abi = ABI::O32;
// Assert exactly one ABI was chosen.
assert(abi.ThisABI != ABI::Unknown);
return abi;
}
return EhDataReg[I];
}
lbdex/chapters/Chapter3_1/Cpu0Subtarget.h
#include "Cpu0FrameLowering.h"
#include "Cpu0ISelLowering.h"
#include "Cpu0InstrInfo.h"
#include "llvm/CodeGen/SelectionDAGTargetInfo.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/MC/MCInstrItineraries.h"
#include "llvm/Target/TargetSubtargetInfo.h"
#include <string>
#define GET_SUBTARGETINFO_HEADER
#include "Cpu0GenSubtargetInfo.inc"
namespace llvm {
class StringRef;
class Cpu0TargetMachine;
public:
bool HasChapterDummy;
bool HasChapterAll;
return false;
#endif
}
protected:
enum Cpu0ArchEnum {
Cpu032I,
Cpu032II
};
bool EnableOverflow;
InstrItineraryData InstrItins;
Triple TargetTriple;
public:
bool isPositionIndependent() const;
const Cpu0ABIInfo &getABI() const;
#endif
lbdex/chapters/Chapter3_1/Cpu0Subtarget.cpp
//
// This file implements the Cpu0 specific subclass of TargetSubtargetInfo.
//
//===----------------------------------------------------------------------===//
#include "Cpu0Subtarget.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0.h"
#include "Cpu0RegisterInfo.h"
#include "Cpu0TargetMachine.h"
#include "llvm/IR/Attributes.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TargetRegistry.h"
#define GET_SUBTARGETINFO_TARGET_DESC
#define GET_SUBTARGETINFO_CTOR
#include "Cpu0GenSubtargetInfo.inc"
void Cpu0Subtarget::anchor() { }
//@1 {
Cpu0Subtarget::Cpu0Subtarget(const Triple &TT, const std::string &CPU,
const std::string &FS, bool little,
const Cpu0TargetMachine &_TM) :
//@1 }
// Cpu0GenSubtargetInfo will display features by llc -march=cpu0 -mcpu=help
Cpu0GenSubtargetInfo(TT, CPU, FS),
IsLittle(little), TM(_TM), TargetTriple(TT), TSInfo(),
InstrInfo(
Cpu0InstrInfo::create(initializeSubtargetDependencies(CPU, FS, TM))),
FrameLowering(Cpu0FrameLowering::create(*this)),
TLInfo(Cpu0TargetLowering::create(TM, *this)) {
Cpu0Subtarget &
Cpu0Subtarget::initializeSubtargetDependencies(StringRef CPU, StringRef FS,
const TargetMachine &TM) {
if (TargetTriple.getArch() == Triple::cpu0 || TargetTriple.getArch() == Triple::
˓→cpu0el) {
CPU = "";
return *this;
}
else if (CPU != "cpu032I" && CPU != "cpu032II") {
CPU = "cpu032II";
}
}
else {
errs() << "!!!Error, TargetTriple.getArch() = " << TargetTriple.getArch()
<< "CPU = " << CPU << "\n";
exit(0);
}
if (CPU == "cpu032I")
Cpu0ArchVersion = Cpu032I;
else if (CPU == "cpu032II")
Cpu0ArchVersion = Cpu032II;
if (isCpu032I()) {
HasCmp = true;
HasSlt = false;
}
else if (isCpu032II()) {
HasCmp = true;
HasSlt = true;
}
else {
errs() << "-mcpu must be empty(default:cpu032II), cpu032I or cpu032II" << "\n";
}
return *this;
}
lbdex/chapters/Chapter3_1/Cpu0RegisterInfo.h
//
// This file contains the Cpu0 implementation of the TargetRegisterInfo class.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_CPU0REGISTERINFO_H
#define LLVM_LIB_TARGET_CPU0_CPU0REGISTERINFO_H
#include "Cpu0Config.h"
#include "Cpu0.h"
#include "llvm/Target/TargetRegisterInfo.h"
#define GET_REGINFO_HEADER
#include "Cpu0GenRegisterInfo.inc"
namespace llvm {
class Cpu0Subtarget;
class TargetInstrInfo;
class Type;
public:
Cpu0RegisterInfo(const Cpu0Subtarget &Subtarget);
#endif
lbdex/chapters/Chapter3_1/Cpu0RegisterInfo.cpp
#include "Cpu0RegisterInfo.h"
#include "Cpu0.h"
#include "Cpu0Subtarget.h"
#include "Cpu0MachineFunction.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/Type.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
#define GET_REGINFO_TARGET_DESC
#include "Cpu0GenRegisterInfo.inc"
//===----------------------------------------------------------------------===//
// Callee Saved Registers methods
//===----------------------------------------------------------------------===//
/// Cpu0 Callee Saved Registers
// In Cpu0CallConv.td,
// def CSR_O32 : CalleeSavedRegs<(add LR, FP,
// (sequence "S%u", 2, 0))>;
// llc create CSR_O32_SaveList and CSR_O32_RegMask from above defined.
const MCPhysReg *
Cpu0RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
return CSR_O32_SaveList;
}
const uint32_t *
Cpu0RegisterInfo::getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID) const {
return CSR_O32_RegMask;
}
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
//@getReservedRegs body {
static const uint16_t ReservedCPURegs[] = {
Cpu0::ZERO, Cpu0::AT, Cpu0::SP, Cpu0::LR, Cpu0::PC
};
BitVector Reserved(getNumRegs());
return Reserved;
}
//@eliminateFrameIndex {
//- If no eliminateFrameIndex(), it will hang on run.
// pure virtual method
// FrameIndex represent objects inside a abstract stack.
// We must replace FrameIndex with an stack/frame pointer
// direct reference.
void Cpu0RegisterInfo::
eliminateFrameIndex(MachineBasicBlock::iterator II, int SPAdj,
unsigned FIOperandNum, RegScavenger *RS) const {
}
//}
bool
Cpu0RegisterInfo::requiresRegisterScavenging(const MachineFunction &MF) const {
return true;
}
bool
Cpu0RegisterInfo::trackLivenessAfterRegAlloc(const MachineFunction &MF) const {
return true;
}
lbdex/chapters/Chapter3_1/Cpu0SERegisterInfo.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0SEREGISTERINFO_H
#define LLVM_LIB_TARGET_CPU0_CPU0SEREGISTERINFO_H
#include "Cpu0Config.h"
#include "Cpu0RegisterInfo.h"
namespace llvm {
class Cpu0SEInstrInfo;
#endif
lbdex/chapters/Chapter3_1/Cpu0SERegisterInfo.cpp
#include "Cpu0SERegisterInfo.h"
const TargetRegisterClass *
Cpu0SERegisterInfo::intRegClass(unsigned Size) const {
return &Cpu0::CPURegsRegClass;
}
cmake_debug_build/lib/Target/Cpu0/Cpu0GenInstInfo.inc
#define GET_INSTRINFO_HEADER
#include "Cpu0GenInstrInfo.inc"
//- Cpu0InstInfo.h
class Cpu0InstrInfo : public Cpu0GenInstrInfo {
Cpu0TargetMachine &TM;
public:
explicit Cpu0InstrInfo(Cpu0TargetMachine &TM);
};
Chapter3_1 add most Cpu0 backend classes. The code of Chapter3_1 can be summaried as Fig. 3.1. Class
Cpu0Subtarget provides the interfaces getInstrInfo(), getFrameLowering(), ..., to get other Cpu0 classes. Most classes
(like Cpu0InstrInfo, Cpu0RegisterInfo, ...) have Subtarget reference member to allowing them access other classes
through the Cpu0Subtarget interface. If the backend module hasn’t the Subtarget reference, these classes still can
Since llvm has deep inheritance tree, they are not digged here. Benefit from the inheritance tree structure, not
much code needed to be implemented in classes of instruction, frame/stack and select DAG, since much code
are implemented by their parent classes. The llvm-tblgen generate Cpu0GenInstrInfo.inc based on information
from Cpu0InstrInfo.td. Cpu0InstrInfo.h extract those code it needs from Cpu0GenInstrInfo.inc by define “#de-
fine GET_INSTRINFO_HEADER”. With TabelGen, the code size in backend is reduced again through the pat-
tern match theory of compiler developemnt. This is explained in both sections of “DAG” and “Instruction Se-
lection” in last chapter. Following is the code fragment from Cpu0GenInstrInfo.inc. Code between “#if def
GET_INSTRINFO_HEADER” and “#endif // GET_INSTRINFO_HEADER”” will be extracted to Cpu0InstrInfo.h.
cmake_debug_build/lib/Target/Cpu0/Cpu0GenInstInfo.inc
namespace llvm {
struct Cpu0GenInstrInfo : public TargetInstrInfoImpl {
explicit Cpu0GenInstrInfo(int SO = -1, int DO = -1);
};
} // End llvm namespace
#endif // GET_INSTRINFO_HEADER
lbdex/chapters/Chapter3_1/CMakeLists.txt
Cpu0FrameLowering.cpp
Please take a look for Chapter3_1 code. After that, building Chapter3_1 by “#define CH CH3_1” in Cpu0Config.h
as follows, and do building with Xcode on iMac or make on linux again.
~/llvm/test/src/lib/Target/Cpu0SetChapter.h
#define CH CH3_1
With Chapter3_1 implementation, the Chapter2 error message “Could not allocate target machine!” has gone. The
new error say that we have not Target AsmPrinter. We will add it in next section.
Chapter3_1 create FeatureCpu032I and FeatureCpu032II for CPU cpu032I and cpu032II, repectively. Beyond that, it
defines two more features, FeatureCmp and FeatureSlt. In order to demostrate the “instruction set designing choice”
to readers, this book creates two CPU. Readers will realize why Mips CPU uses instruction SLT instead of CMP
after they have read later Chapter “Control flow statement”. With the added code of supporting cpu032I and cpu32II
in Cpu0.td and Cpu0InstrInfo.td of Chapter3_1, the command llc -march=cpu0 -mcpu=help can display
messages as follows,
1 http://llvm.org/docs/WritingAnLLVMBackend.html#target-machine
2 http://llvm.org/docs/LangRef.html#data-layout
When user input -mcpu=cpu032I , the variable IsCpu032I from Cpu0InstrInfo.td will be true since the func-
tion isCpu032I() defined in Cpu0Subtarget.h is true (set Cpu0ArchVersion to cpu032I in initializeSubtargetDe-
pendencies() called in constructor function, the variable CPU in constructor function is “cpu032I” when user in-
put -mcpu=cpu032I). Please notice variable Cpu0ArchVersion must be initialized in Cpu0Subtarget.cpp, other-
wise variable Cpu0ArchVersion can be any value and functions isCpu032I() and isCpu032II() which support llc
-mcpu=cpu032I and llc -mcpu=cpu032II , repectively, will have trouble. The value of variables HasCmp
and HasSlt are set depend on Cpu0ArchVersion. Instructions slt and beq, ... are supported only in case of HasSlt is
true, and furthermore, HasSlt is true only when Cpu0ArchVersion is Cpu032II. Similiarly, Ch4_1, Ch4_2, ..., are used
in controlling the enable or disable of instruction definition. Through Subtarget->hasChapter4_1() which exists both
in Cpu0.td and Cpu0Subtarget.h, the Predicate, such as Ch4_1, defined in Cpu0InstrInfo.td can be enabled or disabled.
For example, the shift-rotate instructions can be enabled by define CH to greater than or equal to CH4_1 as follows,
lbdex/Cpu0/Cpu0InstrInfo.td
~/llvm/test/src/lib/Target/Cpu0SetChapter.h
#define CH CH4_1
On the contrary, it can be disabled by define it to less than CH4_1, for instance CH3_5, as follows,
~/llvm/test/src/lib/Target/Cpu0SetChapter.h
#define CH CH3_5
lbdex/chapters/Chapter2/Cpu0.td
lbdex/chapters/Chapter3_2/InstPrinter/Cpu0InstPrinter.h
//=== Cpu0InstPrinter.h - Convert Cpu0 MCInst to assembly syntax -*- C++ -*-==//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This class prints a Cpu0 MCInst to a .s file.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_INSTPRINTER_CPU0INSTPRINTER_H
#define LLVM_LIB_TARGET_CPU0_INSTPRINTER_CPU0INSTPRINTER_H
#include "Cpu0Config.h"
#include "llvm/MC/MCInstPrinter.h"
namespace llvm {
// These enumeration declarations were orignally in Cpu0InstrInfo.h but
// had to be moved here to avoid circular dependencies between
// LLVMCpu0CodeGen and LLVMCpu0AsmPrinter.
class TargetMachine;
// Autogenerated by tblgen.
void printInstruction(const MCInst *MI, raw_ostream &O);
static const char *getRegisterName(unsigned RegNo);
private:
void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printUnsignedImm(const MCInst *MI, int opNum, raw_ostream &O);
void printMemOperand(const MCInst *MI, int opNum, raw_ostream &O);
//#if CH >= CH7_1
void printMemOperandEA(const MCInst *MI, int opNum, raw_ostream &O);
//#endif
};
} // end namespace llvm
#endif
lbdex/chapters/Chapter3_2/InstPrinter/Cpu0InstPrinter.cpp
#include "Cpu0InstPrinter.h"
#include "Cpu0InstrInfo.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;
#define PRINT_ALIAS_INSTR
#include "Cpu0GenAsmWriter.inc"
//@1 {
void Cpu0InstPrinter::printInst(const MCInst *MI, raw_ostream &O,
StringRef Annot, const MCSubtargetInfo &STI) {
// Try to print any aliases first.
if (!printAliasInstr(MI, O))
//@1 }
//- printInstruction(MI, O) defined in Cpu0GenAsmWriter.inc which came from
// Cpu0.td indicate.
printInstruction(MI, O);
printAnnotation(O, Annot);
}
if (Op.isImm()) {
O << Op.getImm();
return;
}
void Cpu0InstPrinter::
printMemOperand(const MCInst *MI, int opNum, raw_ostream &O) {
// Load/Store memory operands -- imm($reg)
// If PIC target the target is loaded as the
// pattern ld $t9,%call16($gp)
printOperand(MI, opNum+1, O);
O << "(";
printOperand(MI, opNum, O);
O << ")";
}
lbdex/chapters/Chapter3_2/InstPrinter/CMakeLists.txt
add_llvm_library(LLVMCpu0AsmPrinter
Cpu0InstPrinter.cpp
)
lbdex/chapters/Chapter3_2/InstPrinter/LLVMBuild.txt
[component_0]
type = Library
name = Cpu0AsmPrinter
parent = Cpu0
required_libraries = MC Support
add_to_library_groups = Cpu0
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
// Address operand
def mem : Operand<i32> {
let PrintMethod = "printMemOperand";
let MIOperandInfo = (ops CPURegs, simm16);
let EncoderMethod = "getMemEncoding";
}
...
// 32-bit load.
multiclass LoadM32<bits<8> op, string instr_asm, PatFrag OpNode,
bit Pseudo = 0> {
def #NAME# : LoadM<op, instr_asm, OpNode, GPROut, mem, Pseudo>;
}
// 32-bit store.
multiclass StoreM32<bits<8> op, string instr_asm, PatFrag OpNode,
bit Pseudo = 0> {
def #NAME# : StoreM<op, instr_asm, OpNode, CPURegs, mem, Pseudo>;
}
Cpu0InstPrinter::printMemOperand() will print backend operands for “local variable access”, which is like the fol-
lowing,
ld $2, 16($fp)
st $2, 8($fp)
lbdex/chapters/Chapter3_2/Cpu0MCInstLower.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0MCINSTLOWER_H
#define LLVM_LIB_TARGET_CPU0_CPU0MCINSTLOWER_H
#include "Cpu0Config.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/Support/Compiler.h"
namespace llvm {
class MCContext;
class MCInst;
class MCOperand;
class MachineInstr;
class MachineFunction;
class Cpu0AsmPrinter;
//@1 {
/// This class is used to lower an MachineInstr into an MCInst.
class LLVM_LIBRARY_VISIBILITY Cpu0MCInstLower {
//@2
typedef MachineOperand::MachineOperandType MachineOperandType;
MCContext *Ctx;
Cpu0AsmPrinter &AsmPrinter;
public:
Cpu0MCInstLower(Cpu0AsmPrinter &asmprinter);
void Initialize(MCContext* C);
void Lower(const MachineInstr *MI, MCInst &OutMI) const;
MCOperand LowerOperand(const MachineOperand& MO, unsigned offset = 0) const;
};
}
#endif
lbdex/chapters/Chapter3_2/Cpu0MCInstLower.cpp
//
// This file contains code to lower Cpu0 MachineInstrs to their corresponding
// MCInst records.
//
//===----------------------------------------------------------------------===//
#include "Cpu0MCInstLower.h"
#include "Cpu0AsmPrinter.h"
#include "Cpu0InstrInfo.h"
#include "MCTargetDesc/Cpu0BaseInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/IR/Mangler.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"
Cpu0MCInstLower::Cpu0MCInstLower(Cpu0AsmPrinter &asmprinter)
: AsmPrinter(asmprinter) {}
void Cpu0MCInstLower::Initialize(MCContext* C) {
Ctx = C;
}
//@LowerOperand {
MCOperand Cpu0MCInstLower::LowerOperand(const MachineOperand& MO,
unsigned offset) const {
MachineOperandType MOTy = MO.getType();
switch (MOTy) {
//@2
default: llvm_unreachable("unknown operand type");
case MachineOperand::MO_Register:
// Ignore all implicit register operands.
if (MO.isImplicit()) break;
return MCOperand::createReg(MO.getReg());
case MachineOperand::MO_Immediate:
return MCOperand::createImm(MO.getImm() + offset);
case MachineOperand::MO_RegisterMask:
break;
}
return MCOperand();
}
if (MCOp.isValid())
OutMI.addOperand(MCOp);
}
}
lbdex/chapters/Chapter3_2/MCTargetDesc/Cpu0BaseInfo.h
//===-- Cpu0BaseInfo.h - Top level definitions for CPU0 MC ------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains small standalone helper functions and enum definitions for
// the Cpu0 target useful for the compiler back-end and the MC libraries.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0BASEINFO_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0BASEINFO_H
#include "Cpu0Config.h"
#include "Cpu0MCTargetDesc.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/Support/DataTypes.h"
#include "llvm/Support/ErrorHandling.h"
namespace llvm {
/// Cpu0II - This namespace holds all of the target specific flags that
/// instruction info tracks.
//@Cpu0II
namespace Cpu0II {
/// Target Operand Flag enum.
enum TOF {
//===------------------------------------------------------------------===//
// Cpu0 Specific MachineOperand flags.
MO_NO_FLAG,
/// MO_GOT_CALL - Represents the offset into the global offset table at
/// which the address of a call site relocation entry symbol resides
/// during execution. This is different from the above since this flag
/// can only be present in call instructions.
MO_GOT_CALL,
/// MO_GPREL - Represents the offset from the current gp value to be used
/// for the relocatable object file being produced.
MO_GPREL,
enum {
//===------------------------------------------------------------------===//
// Instruction encodings. These are the standard/most common forms for
// Cpu0 instructions.
//
FormMask = 15
};
}
#endif
lbdex/chapters/Chapter3_2/Cpu0MCAsmInfo.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCASMINFO_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCASMINFO_H
#include "Cpu0Config.h"
#if CH >= CH3_2
#include "llvm/MC/MCAsmInfoELF.h"
namespace llvm {
class Triple;
} // namespace llvm
#endif
lbdex/chapters/Chapter3_2/Cpu0MCAsmInfo.cpp
#include "Cpu0MCAsmInfo.h"
#if CH >= CH3_2
#include "llvm/ADT/Triple.h"
void Cpu0MCAsmInfo::anchor() { }
AlignmentIsInBytes = false;
Data16bitsDirective = "\t.2byte\t";
Data32bitsDirective = "\t.4byte\t";
Data64bitsDirective = "\t.8byte\t";
PrivateGlobalPrefix = "$";
// PrivateLabelPrefix: display $BB for the labels of basic block
PrivateLabelPrefix = "$";
CommentString = "#";
ZeroDirective = "\t.space\t";
GPRel32Directive = "\t.gpword\t";
GPRel64Directive = "\t.gpdword\t";
WeakRefDirective = "\t.weak\t";
UseAssignmentForEHBegin = true;
SupportsDebugInformation = true;
ExceptionsType = ExceptionHandling::DwarfCFI;
DwarfRegNumForCFI = true;
}
Finally, add code in Cpu0MCTargetDesc.cpp to register Cpu0InstPrinter as below. It also registers other classes
(register, instruction and subtarget) which defined in Chapter3_1 at this point.
lbdex/chapters/Chapter3_2/MCTargetDesc/Cpu0MCTargetDesc.h
namespace llvm {
class MCAsmBackend;
class MCCodeEmitter;
class MCContext;
class MCInstrInfo;
class MCObjectWriter;
class MCRegisterInfo;
class MCSubtargetInfo;
class StringRef;
...
class raw_ostream;
...
}
lbdex/chapters/Chapter3_2/MCTargetDesc/Cpu0MCTargetDesc.cpp
#include "InstPrinter/Cpu0InstPrinter.h"
#include "Cpu0MCAsmInfo.h"
/// Select the Cpu0 Architecture Feature for the given triple and cpu name.
/// The function will be called at command 'llvm-objdump -d' for Cpu0 elf input.
static StringRef selectCpu0ArchFeature(const Triple &TT, StringRef CPU) {
std::string Cpu0ArchFeature;
if (CPU.empty() || CPU == "generic") {
if (TT.getArch() == Triple::cpu0 || TT.getArch() == Triple::cpu0el) {
if (CPU.empty() || CPU == "cpu032II") {
Cpu0ArchFeature = "+cpu032II";
}
else {
if (CPU == "cpu032I") {
Cpu0ArchFeature = "+cpu032I";
}
}
}
}
return Cpu0ArchFeature;
}
//@1 }
return MAI;
}
namespace {
public:
Cpu0MCInstrAnalysis(const MCInstrInfo *Info) : MCInstrAnalysis(Info) {}
};
}
//@2 {
extern "C" void LLVMInitializeCpu0TargetMC() {
for (Target *T : {&TheCpu0Target, &TheCpu0elTarget}) {
// Register the MC asm info.
RegisterMCAsmInfoFn X(*T, createCpu0MCAsmInfo);
}
//@2 }
lbdex/chapters/Chapter3_2/MCTargetDesc/CMakeLists.txt
Cpu0MCAsmInfo.cpp
lbdex/chapters/Chapter3_2/MCTargetDesc/LLVMBuild.txt
Cpu0AsmPrinter
To make the registration clearly, summary as the following diagram, Fig. 3.3.
Above createCpu0MCAsmInfo() registering the object of class Cpu0MCAsmInfo for target TheCpu0Target and
TheCpu0elTarget. TheCpu0Target is for big endian and TheCpu0elTarget is for little endian. Cpu0MCAsmInfo is
derived from MCAsmInfo which is an llvm built-in class. Most code is implemented in it’s parent, backend reuses
those code by inheritance.
Above createCpu0MCInstrInfo() instancing MCInstrInfo object X, and initialize it by InitCpu0MCInstrInfo(X).
Since InitCpu0MCInstrInfo(X) is defined in Cpu0GenInstrInfo.inc, this function will add the information from
Cpu0InstrInfo.td we specified.
Above createCpu0MCInstPrinter() instancing Cpu0InstPrinter to take care printing function for instructions.
Above createCpu0MCRegisterInfo() is similar to “Register function of MC instruction info”, but it initializes the
register information specified in Cpu0RegisterInfo.td. They share some values from instruction/register td description,
so no need to specify them again in Initialize routine if they are consistant with td description files.
Above createCpu0MCSubtargetInfo() instancing MCSubtargetInfo object and initialize with Cpu0.td information.
According “section Target Registration” 3 , we can register Cpu0 backend classes at LLVMInitializeCpu0TargetMC()
on demand by the dynamic register mechanism as the above function, LLVMInitializeCpu0TargetMC().
Now, it’s time to work with AsmPrinter as follows,
lbdex/chapters/Chapter3_2/Cpu0AsmPrinter.h
3 http://jonathan2251.github.io/lbd/llvmstructure.html#target-registration
#ifndef LLVM_LIB_TARGET_CPU0_CPU0ASMPRINTER_H
#define LLVM_LIB_TARGET_CPU0_CPU0ASMPRINTER_H
#include "Cpu0Config.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0MCInstLower.h"
#include "Cpu0Subtarget.h"
#include "Cpu0TargetMachine.h"
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/MC/MCStreamer.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Target/TargetMachine.h"
namespace llvm {
class MCStreamer;
class MachineInstr;
class MachineBasicBlock;
class Module;
class raw_ostream;
private:
public:
#endif
lbdex/chapters/Chapter3_2/Cpu0AsmPrinter.cpp
#include "Cpu0AsmPrinter.h"
#include "InstPrinter/Cpu0InstPrinter.h"
#include "MCTargetDesc/Cpu0BaseInfo.h"
#include "Cpu0.h"
#include "Cpu0InstrInfo.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/Twine.h"
#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Mangler.h"
#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/TargetRegistry.h"
#include "llvm/Target/TargetLoweringObjectFile.h"
#include "llvm/Target/TargetOptions.h"
AsmPrinter::runOnMachineFunction(MF);
return true;
}
//@EmitInstruction {
//- EmitInstruction() must exists or will have run time error.
void Cpu0AsmPrinter::EmitInstruction(const MachineInstr *MI) {
//@EmitInstruction body {
if (MI->isDebugValue()) {
SmallString<128> Str;
raw_svector_ostream OS(Str);
PrintDebugValueComment(MI, OS);
return;
}
do {
if (I->isPseudo())
llvm_unreachable("Pseudo opcode found in EmitInstruction()");
MCInst TmpInst0;
MCInstLowering.Lower(&*I, TmpInst0);
OutStreamer->EmitInstruction(TmpInst0, getSubtargetInfo());
} while ((++I != E) && I->isInsideBundle()); // Delay slot check
}
//@EmitInstruction }
//===----------------------------------------------------------------------===//
//
// Cpu0 Asm Directives
//
// -- Frame directive "frame Stackpointer, Stacksize, RARegister"
// Describe the stack frame.
//
// -- Mask directives "(f)mask bitmask, offset"
// Tells the assembler which registers are saved and where.
// bitmask - contain a little endian bitset indicating which registers are
// saved on function prologue (e.g. with a 0x80000000 mask, the
// assembler knows the register 31 (RA) is saved at prologue.
// offset - the position before stack pointer subtraction indicating where
// the first saved register on prologue is located. (e.g. with a
//
// Consider the following function prologue:
//
// .frame $fp,48,$ra
// .mask 0xc0000000,-8
// addiu $sp, $sp, -48
// st $ra, 40($sp)
// st $fp, 36($sp)
//
// With a 0xc0000000 mask, the assembler knows the register 31 (RA) and
// 30 (FP) are saved at prologue. As the save order on prologue is from
//===----------------------------------------------------------------------===//
// Mask directives
//===----------------------------------------------------------------------===//
// .frame $sp,8,$lr
//-> .mask 0x00000000,0
// .set noreorder
// .set nomacro
// Create a bitmask with all callee saved registers for CPU or Floating Point
// registers. For CPU registers consider LR, GP and FP for saving if necessary.
void Cpu0AsmPrinter::printSavedRegsBitmask(raw_ostream &O) {
// CPU and FPU Saved Registers Bitmasks
unsigned CPUBitmask = 0;
int CPUTopSavedRegOff;
// Print CPUBitmask
O << "\t.mask \t"; printHex32(CPUBitmask, O);
O << ',' << CPUTopSavedRegOff << '\n';
}
//===----------------------------------------------------------------------===//
// Frame and Set directives
//===----------------------------------------------------------------------===//
//-> .frame $sp,8,$lr
// .mask 0x00000000,0
// .set noreorder
// .set nomacro
/// Frame Directive
void Cpu0AsmPrinter::emitFrameDirective() {
const TargetRegisterInfo &RI = *MF->getSubtarget().getRegisterInfo();
if (OutStreamer->hasRawTextSupport())
OutStreamer->EmitRawText("\t.frame\t$" +
StringRef(Cpu0InstPrinter::getRegisterName(stackReg)).lower() +
"," + Twine(stackSize) + ",$" +
StringRef(Cpu0InstPrinter::getRegisterName(returnReg)).lower());
}
// .type main,@function
//-> .ent main # @main
// main:
void Cpu0AsmPrinter::EmitFunctionEntryLabel() {
if (OutStreamer->hasRawTextSupport())
OutStreamer->EmitRawText("\t.ent\t" + Twine(CurrentFnSym->getName()));
OutStreamer->EmitLabel(CurrentFnSym);
}
// .frame $sp,8,$pc
// .mask 0x00000000,0
//-> .set noreorder
//@-> .set nomacro
/// EmitFunctionBodyStart - Targets can override this to emit stuff before
/// the first basic block in the function.
void Cpu0AsmPrinter::EmitFunctionBodyStart() {
MCInstLowering.Initialize(&MF->getContext());
emitFrameDirective();
if (OutStreamer->hasRawTextSupport()) {
SmallString<128> Str;
raw_svector_ostream OS(Str);
printSavedRegsBitmask(OS);
OutStreamer->EmitRawText(OS.str());
OutStreamer->EmitRawText(StringRef("\t.set\tnoreorder"));
OutStreamer->EmitRawText(StringRef("\t.set\tnomacro"));
if (Cpu0FI->getEmitNOAT())
OutStreamer->EmitRawText(StringRef("\t.set\tnoat"));
}
}
// .section .mdebug.abi32
// .previous
void Cpu0AsmPrinter::EmitStartOfAsmFile(Module &M) {
// FIXME: Use SwitchSection.
When instruction is ready to print, function Cpu0AsmPrinter::EmitInstruction() will be triggered first. And then
it will call OutStreamer.EmitInstruction() to print OP code and register name according the information from
Cpu0GenInstrInfo.inc and Cpu0GenRegisterInfo.inc both registered at dynamic register function, LLVMInitial-
izeCpu0TargetMC(). Notice, file Cpu0InstPrinter.cpp only print operand while the OP code information come from
Cpu0InstrInfo.td.
Add the following code to Cpu0ISelLowering.cpp.
lbdex/chapters/Chapter3_2/Cpu0ISelLowering.cpp
Cpu0TargetLowering::
Cpu0TargetLowering(Cpu0TargetMachine &TM)
: TargetLowering(TM, new Cpu0TargetObjectFile()),
Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
Add the following code to Cpu0MachineFunction.h since the Cpu0AsmPrinter.cpp will call getEmitNOAT().
lbdex/chapters/Chapter3_2/Cpu0MachineFunction.h
...
bool getEmitNOAT() const { return EmitNOAT; }
void setEmitNOAT() { EmitNOAT = true; }
private:
...
bool EmitNOAT;
};
Beyond adding these new .cpp files to CMakeLists.txt, please remember to add subdirectory InstPrinter, enable asm-
printer, adding libraries AsmPrinter and Cpu0AsmPrinter to LLVMBuild.txt as follows,
lbdex/chapters/Chapter3_2/CMakeLists.txt
...
add_llvm_target(Cpu0CodeGen
Cpu0AsmPrinter.cpp
Cpu0MCInstLower.cpp
...
)
...
add_subdirectory(InstPrinter)
lbdex/chapters/Chapter3_2/LLVMBuild.txt
// LLVMBuild.txt
[common]
subdirectories =
InstPrinter
...
[component_0]
...
# Please enable asmprinter
has_asmprinter = 1
...
[component_1]
required_libraries =
AsmPrinter
...
Cpu0AsmPrinter
...
Now, run Chapter3_2/Cpu0 for AsmPrinter support, will get new error message as follows,
The llc fails to compile IR code into machine code since we don’t implement class Cpu0DAGToDAGISel.
The IR DAG to machine instruction DAG transformation is introduced in the previous chapter. Now, let’s check what
IR DAG nodes the file ch3.bc has. List ch3.ll as follows,
// ch3.ll
define i32 @main() nounwind uwtable {
%1 = alloca i32, align 4
store i32 0, i32* %1
ret i32 0
}
As above, ch3.ll uses the IR DAG node store, ret. So, the definitions in Cpu0InstrInfo.td as below is enough. The
ADDiu used for stack adjustment which will be needed in later section “Add Prologue/Epilogue functions” of this
chapter. IR DAG is defined in file include/llvm/Target/TargetSelectionDAG.td.
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
//===----------------------------------------------------------------------===//
Add class Cpu0DAGToDAGISel (Cpu0ISelDAGToDAG.cpp) to CMakeLists.txt, and add the following fragment to
Cpu0TargetMachine.cpp,
lbdex/chapters/Chapter3_3/CMakeLists.txt
add_llvm_target(
...
Cpu0ISelDAGToDAG.cpp
Cpu0SEISelDAGToDAG.cpp
...
)
The following code in Cpu0TargetMachine.cpp will create a pass in instruction selection stage.
lbdex/chapters/Chapter3_3/Cpu0TargetMachine.cpp
#include "Cpu0SEISelDAGToDAG.h"
...
class Cpu0PassConfig : public TargetPassConfig {
public:
...
};
...
lbdex/chapters/Chapter3_3/Cpu0ISelDAGToDAG.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0ISELDAGTODAG_H
#define LLVM_LIB_TARGET_CPU0_CPU0ISELDAGTODAG_H
#include "Cpu0Config.h"
#include "Cpu0.h"
#include "Cpu0Subtarget.h"
#include "Cpu0TargetMachine.h"
#include "llvm/CodeGen/SelectionDAGISel.h"
#include "llvm/IR/Type.h"
#include "llvm/Support/Debug.h"
//===----------------------------------------------------------------------===//
// Instruction Selector Implementation
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Cpu0DAGToDAGISel - CPU0 specific code to select CPU0 machine
// instructions for SelectionDAG operations.
//===----------------------------------------------------------------------===//
namespace llvm {
// Pass Name
const char *getPassName() const override {
return "CPU0 DAG->DAG Pattern Instruction Selection";
}
protected:
/// Keep a pointer to the Cpu0Subtarget around so that we can make the right
/// decision when generating code for different targets.
const Cpu0Subtarget *Subtarget;
private:
// Include the pieces autogenerated from the target description.
#include "Cpu0GenDAGISel.inc"
// Complex Pattern.
bool SelectAddr(SDNode *Parent, SDValue N, SDValue &Base, SDValue &Offset);
};
#endif
lbdex/chapters/Chapter3_3/Cpu0ISelDAGToDAG.cpp
#include "Cpu0ISelDAGToDAG.h"
#include "Cpu0.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0RegisterInfo.h"
#include "Cpu0SEISelDAGToDAG.h"
#include "Cpu0TargetMachine.h"
#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAGISel.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/IR/CFG.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Type.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"
using namespace llvm;
//===----------------------------------------------------------------------===//
// Instruction Selector Implementation
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Cpu0DAGToDAGISel - CPU0 specific code to select CPU0 machine
// instructions for SelectionDAG operations.
//===----------------------------------------------------------------------===//
return Ret;
}
//@SelectAddr {
/// ComplexPattern used on Cpu0InstrInfo
/// Used on Cpu0 Load/Store instructions
bool Cpu0DAGToDAGISel::
SelectAddr(SDNode *Parent, SDValue Addr, SDValue &Base, SDValue &Offset) {
//@SelectAddr }
EVT ValTy = Addr.getValueType();
SDLoc DL(Addr);
Base = Addr;
Offset = CurDAG->getTargetConstant(0, DL, ValTy);
return true;
}
//@Select {
/// Select instructions not customized! Used for
/// expanded, promoted and normal instructions
void Cpu0DAGToDAGISel::Select(SDNode *Node) {
//@Select }
unsigned Opcode = Node->getOpcode();
switch(Opcode) {
default: break;
lbdex/chapters/Chapter3_3/Cpu0SEISelDAGToDAG.h
#ifndef LLVM_LIB_TARGET_CPU0_CPU0SEISELDAGTODAG_H
#define LLVM_LIB_TARGET_CPU0_CPU0SEISELDAGTODAG_H
#include "Cpu0Config.h"
#include "Cpu0ISelDAGToDAG.h"
namespace llvm {
public:
explicit Cpu0SEDAGToDAGISel(Cpu0TargetMachine &TM, CodeGenOpt::Level OL)
: Cpu0DAGToDAGISel(TM, OL) {}
private:
};
#endif
lbdex/chapters/Chapter3_3/Cpu0ISelDAGToDAG.cpp
#include "Cpu0SEISelDAGToDAG.h"
#include "MCTargetDesc/Cpu0BaseInfo.h"
#include "Cpu0.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0RegisterInfo.h"
#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/IR/CFG.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Type.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"
using namespace llvm;
//@selectNode
bool Cpu0SEDAGToDAGISel::trySelect(SDNode *Node) {
unsigned Opcode = Node->getOpcode();
SDLoc DL(Node);
///
// Instruction Selection not handled by the auto-generated
// tablegen selection should be handled here.
///
///
// Instruction Selection not handled by the auto-generated
// tablegen selection should be handled here.
///
EVT NodeTy = Node->getValueType(0);
unsigned MultOpc;
switch(Opcode) {
default: break;
return false;
}
Function Cpu0DAGToDAGISel::Select() of Cpu0ISelDAGToDAG.cpp is for the selection of “OP code DAG node”
while Cpu0DAGToDAGISel::SelectAddr() is for the selection of “DATA DAG node with addr type” which defined in
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
src/include/llvm/Target/TargetSelection.td
src/include/llvm/CodeGen/ValueTypes.td
Build Chapter3_3 and run with it, finding the error message of Chapter3_2 is gone. The new error message for
Chapter3_3 as follows,
Above can display the error message DAG node “Cpu0ISD::Ret” because the following code added in Chap-
ter3_1/Cpu0ISelLowering.cpp.
lbdex/chapters/Chapter3_1/Cpu0ISelLowering.cpp
The following code is the result of running Mips backend with ch3.cpp.
JonathantekiiMac:input Jonathan$ ~/llvm/release/cmake_debug_build/Debug/bin/llc
-march=mips -relocation-model=pic -filetype=asm ch3.bc -o -
.text
.abicalls
.section .mdebug.abi32,"",@progbits
.nan legacy
.file "ch3.bc"
.text
.globl main
.align 2
.type main,@function
.set nomicromips
.set nomips16
.ent main
main: # @main
.frame $fp,8,$ra
.mask 0x40000000,-4
.fmask 0x00000000,0
.set noreorder
.set nomacro
.set noat
# BB#0:
addiu $sp, $sp, -8
sw $fp, 4($sp) # 4-byte Folded Spill
move $fp, $sp
sw $zero, 0($fp)
addiu $2, $zero, 0
move $sp, $fp
lw $fp, 4($sp) # 4-byte Folded Reload
jr $ra
addiu $sp, $sp, 8
.set at
.set macro
.set reorder
.end main
$func_end0:
.size main, ($func_end0)-main
As you can see, Mips return to the caller by using “jr $ra” where $ra is a specific register which keeps the caller’s next
instruction address. And it saves the return value in register $2. If we only create DAGs directly, then will having the
following two problems.
1. LLVM can allocate any register for return value, for instance $3, rather than keeps it in $2.
2. LLVM will allocate a register randomly to “jr” since jr needs one operand. For instance, it generate “jr $8” rather
than “jr $ra”.
If Backend uses the “jal sub-routine” and “jr”, and put the return address in the specific register $ra, then no the second
problem. But in Mips, it allows programmer uses “jal $rx, sub-routine” and “jr $rx” whereas $rx is not $ra. Allowing
programmer uses other register but $ra providing more flexibility in programming of high level language such as C
with assembly. File ch8_2_longbranch.cpp in the following is an example, it uses jr $1 without spill $ra register. This
will save a lot of time if it is in a hot function.
lbdex/input/ch8_2_longbranch.cpp
int test_longbranch()
{
volatile int a = 2;
volatile int b = 1;
int result = 0;
if (a < b)
result = 1;
return result;
}
sw $2, 4($fp)
sw $zero, 0($fp)
lw $1, 8($fp)
lw $3, 4($fp)
slt $1, $1, $3
bnez $1, $BB0_3
nop
# BB#1:
addiu $sp, $sp, -8
sw $ra, 0($sp)
lui $1, %hi(($BB0_4)-($BB0_2))
bal $BB0_2
addiu $1, $1, %lo(($BB0_4)-($BB0_2))
$BB0_2:
addu $1, $ra, $1
lw $ra, 0($sp)
jr $1
addiu $sp, $sp, 8
$BB0_3:
sw $2, 0($fp)
$BB0_4:
lw $2, 0($fp)
move $sp, $fp
lw $fp, 12($sp) # 4-byte Folded Reload
jr $ra
addiu $sp, $sp, 16
.set at
.set macro
.set reorder
.end _Z15test_longbranchv
$func_end0:
.size _Z15test_longbranchv, ($func_end0)-_Z15test_longbranchv
lbdex/chapters/Chapter3_4/Cpu0CallingConv.td
lbdex/chapters/Chapter3_4/Cpu0InstrFormats.td
lbdex/chapters/Chapter3_4/Cpu0InstrInfo.td
lbdex/chapters/Chapter3_4/Cpu0ISelLowering.h
/// Cpu0CC - This class provides methods used to analyze formal and call
/// arguments and inquire about calling convention information.
class Cpu0CC {
public:
enum SpecialCallingConvType {
NoSpecialCallingConv
};
/// reservedArgArea - The size of the area the caller reserves for
/// register arguments. This is 16-byte if ABI is O32.
unsigned reservedArgArea() const;
private:
/// Return the type of the register which is used to pass an argument or
/// return a value. This function returns f64 if the argument is an i64
/// value which has been generated as a result of softening an f128 value.
/// Otherwise, it just returns VT.
MVT getRegVT(MVT VT, const Type *OrigTy, const SDNode *CallNode,
bool IsSoftFloat) const;
template<typename Ty>
void analyzeReturn(const SmallVectorImpl<Ty> &RetVals, bool IsSoftFloat,
const SDNode *CallNode, const Type *RetTy) const;
CCState &CCInfo;
CallingConv::ID CallConv;
bool IsO32;
SmallVector<ByValArgInfo, 2> ByValArgs;
};
lbdex/chapters/Chapter3_4/Cpu0ISelLowering.cpp
SDValue
Cpu0TargetLowering::LowerReturn(SDValue Chain,
CallingConv::ID CallConv, bool IsVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals,
const SDLoc &DL, SelectionDAG &DAG) const {
SDValue Flag;
SmallVector<SDValue, 4> RetOps(1, Chain);
if (RVLocs[i].getValVT() != RVLocs[i].getLocVT())
Val = DAG.getNode(ISD::BITCAST, DL, RVLocs[i].getLocVT(), Val);
// Guarantee that all emitted copies are stuck together with flags.
Flag = Chain.getValue(1);
RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
}
if (!Reg)
llvm_unreachable("sret virtual register not created in the entry block");
SDValue Val =
DAG.getCopyFromReg(Chain, DL, Reg, getPointerTy(DAG.getDataLayout()));
unsigned V0 = Cpu0::V0;
template<typename Ty>
void Cpu0TargetLowering::Cpu0CC::
analyzeReturn(const SmallVectorImpl<Ty> &RetVals, bool IsSoftFloat,
const SDNode *CallNode, const Type *RetTy) const {
CCAssignFn *Fn;
Fn = RetCC_Cpu0;
void Cpu0TargetLowering::Cpu0CC::
analyzeCallResult(const SmallVectorImpl<ISD::InputArg> &Ins, bool IsSoftFloat,
const SDNode *CallNode, const Type *RetTy) const {
analyzeReturn(Ins, IsSoftFloat, CallNode, RetTy);
}
void Cpu0TargetLowering::Cpu0CC::
analyzeReturn(const SmallVectorImpl<ISD::OutputArg> &Outs, bool IsSoftFloat,
return VT;
}
lbdex/chapters/Chapter3_4/Cpu0MachineFunction.h
lbdex/chapters/Chapter3_4/Cpu0SEInstrInfo.h
//@expandPostRAPseudo
bool expandPostRAPseudo(MachineInstr &MI) const override;
private:
void expandRetLR(MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const;
lbdex/chapters/Chapter3_4/Cpu0SEInstrInfo.cpp
//@expandPostRAPseudo
/// Expand Pseudo instructions into real backend instructions
bool Cpu0SEInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
//@expandPostRAPseudo-body
MachineBasicBlock &MBB = *MI.getParent();
switch (MI.getDesc().getOpcode()) {
default:
return false;
case Cpu0::RetLR:
expandRetLR(MBB, MI);
break;
MBB.erase(MI);
return true;
}
Build Chapter3_4 and run with it, finding the error message in Chapter3_3 is gone. The compile result will hang on
and please press “ctrl+C” to abort as follows,
...
.text
.section .mdebug.abiO32
.previous
.file "ch3.bc"
^C
It hang on because Cpu0 backend has not handled stack slot for local variables. Instruction “store i32 0, i32* %1”
in above IR need Cpu0 allocate a stack slot and save to the stack slot. However, the ch3.cpp can be run with option
clang -O2 as follows,
118-165-78-230:input Jonathan$ clang -O2 -target mips-unknown-linux-gnu -c
ch3.cpp -emit-llvm -o ch3.bc
118-165-78-230:input Jonathan$ ~/llvm/test/cmake_debug_build/Debug/bin/llvm-dis
ch3.bc -o -
...
define i32 @main() #0 {
ret i32 0
}
To see how the ‘DAG->DAG Pattern Instruction Selection’ work in llc, let’s compile with llc
-print-before-all -print-after-all option and get the following result. The DAGs which before and
after instruction selection stage are shown as follows,
118-165-78-230:input Jonathan$ clang -O2 -target mips-unknown-linux-gnu -c
ch3.cpp -emit-llvm -o ch3.bc
118-165-78-12:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=pic -filetype=asm
-print-before-all -print-after-all ch3.bc -o -
...
*** IR Dump After Module Verifier ***
; Function Attrs: nounwind readnone
define i32 @main() #0 {
ret i32 0
}
...
Initial selection DAG: BB#0 'main:'
SelectionDAG has 5 nodes:
t0: ch = EntryToken
t3: ch,glue = CopyToReg t0, Register:i32 %V0, Constant:i32<0>
t4: ch = Cpu0ISD::Ret t3, Register:i32 %V0, t3:1
...
===== Instruction selection begins: BB#0 ''
Selecting: t4: ch = Cpu0ISD::Ret t3, Register:i32 %V0, t3:1
ISEL: Starting pattern match on root node: t4: ch = Cpu0ISD::Ret t3, Register:i32
˓→%V0, t3:1
.globl main
.p2align 2
.type main,@function
.ent main # @main
main:
.frame $sp,0,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $2, $zero, 0
ret $lr
.set macro
.set reorder
.end main
$func_end0:
.size main, ($func_end0)-main
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
// Small immediates
def : Pat<(i32 immSExt16:$in),
(ADDiu ZERO, imm:$in)>;
lbdex/chapters/Chapter3_4/Cpu0InstrInfo.td
// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDTNone,
[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
2. Create Cpu0ISD::Ret node in LowerReturn() of Cpu0ISelLowering.cpp, which is called when meets keyword
of return in C. Remind, In LowerReturn() put return value in register $2 ($v0).
3. After instruction selection, the Cpu0ISD::Ret is replaced by Cpu0::RetLR as below. This effect come from “def
RetLR” as step 1.
4. Expand the Cpu0ISD::RetLR into instruction Cpu0::RET $lr in “Post-RA pseudo instruction expansion pass”
stage by the code in Chapter3_4/Cpu0SEInstrInfo.cpp as above. This stage come after the register allocation,
so we can replace the V0 ($r2) by LR ($lr) without any side effect.
5. Print assembly or obj according the information of *.inc generated by TableGen from *.td as the following at
“Cpu0 Assembly Print” stage.
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
//@JumpFR {
let isBranch=1, isTerminator=1, isBarrier=1, imm16=0, hasDelaySlot = 1,
isIndirectBranch = 1 in
class JumpFR<bits<8> op, string instr_asm, RegisterClass RC>:
FL<op, (outs), (ins RC:$ra),
!strconcat(instr_asm, "\t$ra"), [(brind RC:$ra)], IIBranch> {
let rb = 0;
let imm16 = 0;
3.5.1 Concept
Fig. 3.4: Addressing of a variable a located on the stack. If the stack frame has a variable size, slot must be addressed
relative to the frame pointer
lbdex/chapters/Chapter3_5/Cpu0SEFrameLowering.cpp
// Adjust stack.
TII.adjustStackPtr(SP, -StackSize, MBB, MBBI);
if (CSI.size()) {
// Find the instruction past the last instruction that saves a callee-saved
// register to the stack.
for (unsigned i = 0; i < CSI.size(); ++i)
++MBBI;
DebugLoc dl = MBBI->getDebugLoc();
Cpu0ABIInfo ABI = STI.getABI();
unsigned SP = Cpu0::SP;
if (!StackSize)
return;
// Adjust stack.
TII.adjustStackPtr(SP, StackSize, MBB, MBBI);
}
bool
Cpu0SEFrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
const MachineFrameInfo *MFI = MF.getFrameInfo();
// Reserve call frame if the size of the maximum call frame fits into 16-bit
// immediate field and there are no variable sized objects on the stack.
// Make sure the second register scavenger spill slot can be accessed with one
// instruction.
return isInt<16>(MFI->getMaxCallFrameSize() + getStackAlignment()) &&
!MFI->hasVarSizedObjects();
}
lbdex/chapters/Chapter3_5/Cpu0MachineFunction.h
void createEhDataRegsFI();
int getEhDataRegFI(unsigned Reg) const { return EhDataRegFI[Reg]; }
Now we explain the Prologue and Epilogue further by example code. For the following llvm IR code of ch3.cpp,
Chapter3_5 of Cpu0 backend will emit the corresponding machine instructions as follows,
LLVM get the stack size by counting how many virtual registers is assigned to local variables. After that, it calls
emitPrologue().
lbdex/chapters/Chapter3_5/Cpu0SEInstrInfo.h
lbdex/chapters/Chapter3_5/Cpu0SEInstrInfo.cpp
if (isInt<16>(Amount)) {
// addiu sp, sp, amount
BuildMI(MBB, I, DL, get(ADDiu), SP).addReg(SP).addImm(Amount);
}
else { // Expand immediate that doesn't fit in 16-bit.
unsigned Reg = loadImmediate(Amount, MBB, I, DL, nullptr);
BuildMI(MBB, I, DL, get(ADDu), SP).addReg(SP).addReg(Reg, RegState::Kill);
}
}
/// This function generates the sequence of instructions needed to get the
/// result of adding register REG and immediate IMM.
unsigned
Cpu0SEInstrInfo::loadImmediate(int64_t Imm, MachineBasicBlock &MBB,
MachineBasicBlock::iterator II,
const DebugLoc &DL,
unsigned *NewImm) const {
Cpu0AnalyzeImmediate AnalyzeImm;
unsigned Size = 32;
unsigned LUi = Cpu0::LUi;
unsigned ZEROReg = Cpu0::ZERO;
unsigned ATReg = Cpu0::AT;
bool LastInstrIsADDiu = NewImm;
if (Inst->Opc == LUi)
BuildMI(MBB, II, DL, get(LUi), ATReg).addImm(SignExtend64<16>(Inst->ImmOpnd));
else
BuildMI(MBB, II, DL, get(Inst->Opc), ATReg).addReg(ZEROReg)
.addImm(SignExtend64<16>(Inst->ImmOpnd));
if (LastInstrIsADDiu)
*NewImm = Inst->ImmOpnd;
return ATReg;
}
In emitPrologue(), it emits machine instructions to adjust sp (stack pointer register) for local variables. For our exam-
ple, it will emit the instruction,
The emitEpilogue() will emit “addiu $sp, $sp, 8”, where 8 is the stack size.
In above ch3.cpp assembly output, it generates “addiu $2, $zero, 0” rather than “ori $2, $zero, 0” because ADDiu
defined before ORi as following, so it takes the priority. Of course, if the ORi is defined first, the it will translate into
“ori” instruction.
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
// Small immediates
def : Pat<(i32 immSExt16:$in),
(ADDiu ZERO, imm:$in)>;
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.td
// Arbitrary immediates
def : Pat<(i32 imm:$imm),
(ORi (LUi (HI16 imm:$imm)), (LO16 imm:$imm))>;
} // let Predicates = [Ch3_4]
The following code handle the stack slot for local variables.
lbdex/chapters/Chapter3_5/Cpu0RegisterInfo.cpp
unsigned i = 0;
while (!MI.getOperand(i).isFI()) {
++i;
assert(i < MI.getNumOperands() &&
"Instr doesn't have FrameIndex operand!");
}
if (CSI.size()) {
MinCSFI = CSI[0].getFrameIdx();
MaxCSFI = CSI[CSI.size() - 1].getFrameIdx();
}
// The following stack frame objects are always referenced relative to $sp:
// 1. Outgoing arguments.
// 2. Pointer to dynamically allocated stack space.
// 3. Locations for callee-saved registers.
// Everything else is referenced relative to whatever register
// getFrameRegister() returns.
unsigned FrameReg;
FrameReg = Cpu0::SP;
Offset += MI.getOperand(i+1).getImm();
DEBUG(errs() << "Offset : " << Offset << "\n" << "<--------->\n");
// If MI is not a debug value, make sure Offset fits in the 16-bit immediate
// field.
if (!MI.isDebugValue() && !isInt<16>(Offset)) {
assert("(!MI.isDebugValue() && !isInt<16>(Offset))");
}
MI.getOperand(i).ChangeToRegister(FrameReg, false);
MI.getOperand(i+1).ChangeToImmediate(Offset);
}
The eliminateFrameIndex() of Cpu0RegisterInfo.cpp is called after stages “instruction selection” and “regis-
ters allocated”. It translates frame index to correct offset of stack pointer by “spOffset = MF.getFrameInfo()-
>getObjectOffset(FrameIndex);”.
lbdex/chapters/Chapter3_5/Cpu0SEFrameLowering.cpp
if (MF.getFrameInfo()->hasCalls())
setAliasRegs(MF, SavedRegs, Cpu0::LR);
return;
}
The determineCalleeSaves() of Cpu0SEFrameLowering.cpp as above determine the spill registers. Once the spill
registers are determined, the function eliminateFrameIndex() will save/restore registers to/from stack slots via the
following code.
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.h
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.cpp
MachineMemOperand *
Cpu0InstrInfo::GetMemOperand(MachineBasicBlock &MBB, int FI,
MachineMemOperand::Flags Flags) const {
lbdex/chapters/Chapter3_5/Cpu0SEInstrInfo.h
lbdex/chapters/Chapter3_5/Cpu0SEInstrInfo.cpp
void Cpu0SEInstrInfo::
storeRegToStack(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
unsigned SrcReg, bool isKill, int FI,
const TargetRegisterClass *RC, const TargetRegisterInfo *TRI,
int64_t Offset) const {
DebugLoc DL;
MachineMemOperand *MMO = GetMemOperand(MBB, FI, MachineMemOperand::MOStore);
unsigned Opc = 0;
Opc = Cpu0::ST;
assert(Opc && "Register class not handled!");
BuildMI(MBB, I, DL, get(Opc)).addReg(SrcReg, getKillRegState(isKill))
.addFrameIndex(FI).addImm(Offset).addMemOperand(MMO);
}
void Cpu0SEInstrInfo::
loadRegFromStack(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
unsigned DestReg, int FI, const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI, int64_t Offset) const {
DebugLoc DL;
if (I != MBB.end()) DL = I->getDebugLoc();
MachineMemOperand *MMO = GetMemOperand(MBB, FI, MachineMemOperand::MOLoad);
unsigned Opc = 0;
Opc = Cpu0::LD;
assert(Opc && "Register class not handled!");
BuildMI(MBB, I, DL, get(Opc), DestReg).addFrameIndex(FI).addImm(Offset)
.addMemOperand(MMO);
}
lbdex/Cpu0/Cpu0CallingConv.td
.section .mdebug.abiO32
.previous
.file "ch3.bc"
Target didn't implement TargetInstrInfo::storeRegToStackSlot!
...
Stack dump:
...
Abort trap: 6
At this point, we have translated the very simple main() function with “return 0;” single instruction. The
Cpu0AnalyzeImmediate.cpp and the Cpu0InstrInfo.td instructions defined in Chapter3_5 as the following, which take
care the 32 bits stack size adjustments.
lbdex/chapters/Chapter3_5/CMakeLists.txt
add_llvm_target(
...
Cpu0AnalyzeImmediate.cpp
...
)
lbdex/chapters/Chapter3_5/Cpu0AnalyzeImmediate.h
#include "Cpu0Config.h"
#if CH >= CH3_5
#include "llvm/ADT/SmallVector.h"
#include "llvm/Support/DataTypes.h"
namespace llvm {
class Cpu0AnalyzeImmediate {
public:
struct Inst {
unsigned Opc, ImmOpnd;
Inst(unsigned Opc, unsigned ImmOpnd);
};
typedef SmallVector<Inst, 7 > InstSeq;
/// Analyze - Get an instruction sequence to load immediate Imm. The last
/// instruction in the sequence must be an ADDiu if LastInstrIsADDiu is
/// true;
const InstSeq &Analyze(uint64_t Imm, unsigned Size, bool LastInstrIsADDiu);
private:
typedef SmallVector<InstSeq, 5> InstSeqLs;
unsigned Size;
unsigned ADDiu, ORi, SHL, LUi;
InstSeq Insts;
};
}
#endif
lbdex/chapters/Chapter3_5/Cpu0AnalyzeImmediate.cpp
#include "Cpu0.h"
#if CH >= CH3_5
#include "llvm/Support/MathExtras.h"
// Do nothing if Imm is 0.
if (!MaskedImm)
return;
// Sign-extend and shift operand of ADDiu and see if it still fits in 16-bit.
int64_t Imm = SignExtend64<16>(Seq[0].ImmOpnd);
int64_t ShiftedImm = (uint64_t)Imm << (Seq[1].ImmOpnd - 16);
if (!isInt<16>(ShiftedImm))
return;
Insts.clear();
Insts.append(ShortestSeq->begin(), ShortestSeq->end());
}
const Cpu0AnalyzeImmediate::InstSeq
&Cpu0AnalyzeImmediate::Analyze(uint64_t Imm, unsigned Size,
bool LastInstrIsADDiu) {
this->Size = Size;
ADDiu = Cpu0::ADDiu;
ORi = Cpu0::ORi;
SHL = Cpu0::SHL;
LUi = Cpu0::LUi;
InstSeqLs SeqLs;
return Insts;
}
#endif
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.h
#include "Cpu0AnalyzeImmediate.h"
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.td
// Unsigned Operand
def uimm16 : Operand<i32> {
let PrintMethod = "printUnsignedImm";
}
// Immediate can be loaded with LUi (32-bit int with lower 16-bit cleared).
def immLow16Zero : PatLeaf<(imm), [{
int64_t Val = N->getSExtValue();
return isInt<32>(Val) && !(Val & 0xffff);
}]>;
}
}
// Arbitrary immediates
def : Pat<(i32 imm:$imm),
(ORi (LUi (HI16 imm:$imm)), (LO16 imm:$imm))>;
} // let Predicates = [Ch3_4]
The Cpu0AnalyzeImmediate.cpp written in recursive with a little complicate in logic. However, the recursive skills
is used in the front end compile book, you should fimiliar with it. Instead of tracking the code, listing the stack size
and the instructions generated in “Table: Cpu0 stack adjustment instructions before replace addiu and shl with lui
instruction” as follows and “Table: Cpu0 stack adjustment instructions after replace addiu and shl with lui instruction”
at next,
Table 3.3: Cpu0 stack adjustment instructions before replace addiu and shl with lui instruction
stack size range ex. stack size Cpu0 Prologue instruc- Cpu0 Epilogue instruc-
tions tions
0 ~ 0x7ff8
• 0x7ff8 • addiu $sp, $sp, - • addiu $sp, $sp,
32760; 32760;
0x8000 ~ 0xfff8
• 0x8000 • addiu $sp, $sp, - • addiu $1, $zero, 1;
32768; • shl $1, $1, 16;
• addiu $1, $1, -32768;
• addu $sp, $sp, $1;
x10000 ~ 0xfffffff8
• 0x7ffffff8 • addiu $1, $zero, 8; • addiu $1, $zero, 8;
• shl $1, $1, 28; • shl $1, $1, 28;
• addiu $1, $1, 8; • addiu $1, $1, -8;
• addu $sp, $sp, $1; • addu $sp, $sp, $1;
x10000 ~ 0xfffffff8
• 0x90008000 • addiu $1, $zero, -9; • addiu $1, $zero, -
• shl $1, $1, 28; 28671;
• addiu $1, $1, -32768; • shl $1, $1, 16
• addu $sp, $sp, $1; • addiu $1, $1, -32768;
• addu $sp, $sp, $1;
Since the Cpu0 stack is 8 bytes alignment, addresses from 0x7ff9 to 0x7fff are impossible existing.
Assume sp = 0xa0008000 and stack size = 0x90008000, then (0xa0008000 - 0x90008000) => 0x10000000. Verify
with the Cpu0 Prologue instructions as follows,
1. “addiu $1, $zero, -9” => ($1 = 0 + 0xfffffff7) => $1 = 0xfffffff7.
2. “shl $1, $1, 28;” => $1 = 0x70000000.
3. “addiu $1, $1, -32768” => $1 = (0x70000000 + 0xffff8000) => $1 = 0x6fff8000.
4. “addu $sp, $sp, $1” => $sp = (0xa0008000 + 0x6fff8000) => $sp = 0x10000000.
Verify with the Cpu0 Epilogue instructions with sp = 0x10000000 and stack size = 0x90008000 as follows,
1. “addiu $1, $zero, -28671” => ($1 = 0 + 0xffff9001) => $1 = 0xffff9001.
2. “shl $1, $1, 16;” => $1 = 0x90010000.
3. “addiu $1, $1, -32768” => $1 = (0x90010000 + 0xffff8000) => $1 = 0x90008000.
4. “addu $sp, $sp, $1” => $sp = (0x10000000 + 0x90008000) => $sp = 0xa0008000.
The Cpu0AnalyzeImmediate::GetShortestSeq() will call Cpu0AnalyzeImmediate:: ReplaceADDiuSHLWithLUi() to
replace addiu and shl with single instruction lui only. The effect as the following table.
Table 3.4: Cpu0 stack adjustment instructions after replace addiu and shl with lui instruction
stack size range ex. stack size Cpu0 Prologue instruc- Cpu0 Epilogue instruc-
tions tions
0x8000 ~ 0xfff8
• 0x8000 • addiu $sp, $sp, - • ori $1, $zero, 32768;
32768; • addu $sp, $sp, $1;
x10000 ~ 0xfffffff8
• 0x7ffffff8 • lui $1, 32768; • lui $1, 32767;
• addiu $1, $1, 8; • ori $1, $1, 65528
• addu $sp, $sp, $1; • addu $sp, $sp, $1;
x10000 ~ 0xfffffff8
• 0x90008000 • lui $1, 28671; • lui $1, 36865;
• ori $1, $1, 32768; • addiu $1, $1, -32768;
• addu $sp, $sp, $1; • addu $sp, $sp, $1;
Assume sp = 0xa0008000 and stack size = 0x90008000, then (0xa0008000 - 0x90008000) => 0x10000000. Verify
with the Cpu0 Prologue instructions as follows,
1. “lui $1, 28671” => $1 = 0x6fff0000.
2. “ori $1, $1, 32768” => $1 = (0x6fff0000 + 0x00008000) => $1 = 0x6fff8000.
3. “addu $sp, $sp, $1” => $sp = (0xa0008000 + 0x6fff8000) => $sp = 0x10000000.
Verify with the Cpu0 Epilogue instructions with sp = 0x10000000 and stack size = 0x90008000 as follows,
1. “lui $1, 36865” => $1 = 0x90010000.
2. “addiu $1, $1, -32768” => $1 = (0x90010000 + 0xffff8000) => $1 = 0x90008000.
3. “addu $sp, $sp, $1” => $sp = (0x10000000 + 0x90008000) => $sp = 0xa0008000.
File ch3_largeframe.cpp include the large frame test.
Run Chapter3_5 with ch3_largeframe.cpp will get the following result.
lbdex/input/ch3_largeframe.cpp
int test_largegframe() {
int a[469753856];
return 0;
}
.type _Z16test_largegframev,@function
.ent _Z16test_largegframev # @_Z16test_largegframev
_Z16test_largegframev:
.frame $fp,1879015424,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
.set noat
# BB#0:
lui $1, 36865
addiu $1, $1, -32768
addu $sp, $sp, $1
addiu $2, $zero, 0
lui $1, 28672
addiu $1, $1, -32768
addu $sp, $sp, $1
ret $lr
.set at
.set macro
.set reorder
.end _Z16test_largegframev
$func_end0:
.size _Z16test_largegframev, ($func_end0)-_Z16test_largegframev
From above or compiler book, you can see all the OP code are the internal nodes in DAGs graph, and operands are the
leaves of DAGs. To develop your backend, you can copy the related data operands DAGs node from other backend
since the IR data nodes are take cared by all the backend. About the data DAGs nodes, you can understand some
of them through the Cpu0InstrInfo.td and find them by command, grep -R “<datadag>” ‘find src/include/llvm‘, with
spending a little more time to think or guess about it. Some data DAGs we know more, some we know a little and
some remains unknown but it’s OK for us. List some of data DAGs we understand and occured until now as follows,
include/llvm/Target/TargetSelectionDAG.td
// PatLeaf's are pattern fragments that have no operands. This is just a helper
// to define immediates and other common things concisely.
class PatLeaf<dag frag, code pred = [{}], SDNodeXForm xform = NOOP_SDNodeXForm>
: PatFrag<(ops), frag, pred, xform>;
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.td
// Signed Operand
def simm16 : Operand<i32> {
let DecoderMethod= "DecodeSimm16";
}
// Unsigned Operand
def uimm16 : Operand<i32> {
let PrintMethod = "printUnsignedImm";
}
// Address operand
def mem : Operand<iPTR> {
let PrintMethod = "printMemOperand";
let MIOperandInfo = (ops CPURegs, simm16);
let EncoderMethod = "getMemEncoding";
// Immediate can be loaded with LUi (32-bit int with lower 16-bit cleared).
def immLow16Zero : PatLeaf<(imm), [{
int64_t Val = N->getSExtValue();
return isInt<32>(Val) && !(Val & 0xffff);
}]>;
//===----------------------------------------------------------------------===//
// Pattern fragment for load/store
//===----------------------------------------------------------------------===//
As mentioned in sub-section “instruction selection” of last chapter, immSExt16 is a data leaf DAG node and it will
return true if its value is in the range of signed 16 bits integer. The load_a, store_a and others are similar but they
check with alignment.
The mem is explained in chapter3_2 for print operand; addr is explained in chapter3_3 for data DAG selection. The
simm16, ..., inherited from Operand<i32> because Cpu0 is 32 bits. It may over 16 bits, so immSExt16 pattern leaf
is used to control it as example ADDiu mention in last chapter. PatLeaf immZExt16, immLow16Zero and ImmLeaf
immZExt5 are similar to immSExt16.
Summary the functions for llvm backend stages as the following table.
...
CPU0 DAG->DAG Pattern Instruction Selection
Initial selection DAG
Optimized lowered selection DAG
Type-legalized selection DAG
Optimized type-legalized selection DAG
Legalized selection DAG
Optimized legalized selection DAG
Instruction selection
Selected selection DAG
Scheduling
...
Greedy Register Allocator
...
Prologue/Epilogue Insertion & Frame Finalization
...
Post-RA pseudo instruction expansion pass
...
Cpu0 Assembly Printer
Instruction selection
• Cpu0DAGToDAGISel::Select
We add a pass in Instruction Section stage in section “Add Cpu0DAGToDAGISel class”. You can embed your code
into other passes like that. Please check CodeGen/Passes.h for the information. Remember the pass is called according
the function unit as the llc -debug-pass=Structure indicated.
We have finished a simple compiler for cpu0 which only supports ld, st, addiu, ori, lui, addu, shl and ret 8 instruc-
tions.
We are satisfied with this result. But you may think “After so much code we program, and just get these 8 instructions!”.
The point is we have created a frame work for Cpu0 target machine (please look back the llvm backend structure class
inheritance tree early in this chapter). Until now, we have over 3000 lines of source code with comments which include
files *.cpp, *.h, *.td, CMakeLists.txt and LLVMBuild.txt. It can be counted by command wc `find dir -name
*.cpp` for files *.cpp, *.h, *.td, *.txt. LLVM front end tutorial have 700 lines of source code without comments
in total. Don’t feel down with this result. In reality, writing a backend is warm up slowly but run fastly. Clang has
over 500,000 lines of source code with comments in clang/lib directory which include C++ and Obj C support. Mips
backend of llvm 3.1 has only 15,000 lines with comments. Even the complicate X86 CPU which CISC outside and
RISC inside (micro instruction), has only 45,000 lines in llvm 3.1 with comments. In next chapter, we will show you
that add a new instruction support is as easy as 123.
FOUR
• Arithmetic
– +, -, *, <<, and >>
– Display llvm IR nodes with Graphviz
– Operator % and /
* The DAG of %
* Arm solution
* Mips solution
* Full support %, and /
– Rotate instructions
• Logic
• Summary
This chapter adds more Cpu0 arithmetic instructions support first. The section Display llvm IR nodes with Graphviz
will show you the steps of DAG optimization and their corresponding llc display options. These DAGs translation
existed in some steps of optimization can be displayed by the graphic tool of Graphviz which supply useful information
in graphic view. Logic instructions support will come after arithmetic section. In spite of that llvm backend handle the
IR only, we get the IR from the corresponding C operators with designed C example code. Through compiling with
C code, readers can know exactly what kind of C statements are handled by each chapter’s appending code. Instead
of focusing on classes relationship in this backend structure of last chapter, readers should focus on the mapping of C
operators and llvm IR and how to define the mapping relationship of IR and instructions in td. HILO and C0 register
class are defined in this chapter. Readers will know how to handle other register classes beside general purpose register
class, and why they are needed, from this chapter.
4.1 Arithmetic
lbdex/chapters/Chapter4_1/Cpu0Subtarget.cpp
173
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
...
EnableOverflow = EnableOverflowOpt;
...
}
lbdex/chapters/Chapter4_1/Cpu0InstrInfo.td
// DivRem(u) nodes
def Cpu0DivRem : SDNode<"Cpu0ISD::DivRem", SDT_Cpu0DivRem,
[SDNPOutGlue]>;
def Cpu0DivRemU : SDNode<"Cpu0ISD::DivRemU", SDT_Cpu0DivRem,
[SDNPOutGlue]>;
// Move to Lo/Hi
class MoveToLOHI<bits<8> op, string instr_asm, RegisterClass RC,
list<Register> DefRegs>:
FA<op, (outs), (ins RC:$ra),
!strconcat(instr_asm, "\t$ra"), [], IIHiLo> {
let rb = 0;
let rc = 0;
let shamt = 0;
let Defs = DefRegs;
let hasSideEffects = 0;
}
// Move to C0 Register
class MoveToC0<bits<8> op, string instr_asm, RegisterClass RC>:
FA<op, (outs C0Regs:$ra), (ins RC:$rb),
!strconcat(instr_asm, "\t$ra, $rb"), [], IIAlu> {
let rc = 0;
let shamt = 0;
let hasSideEffects = 0;
}
lbdex/chapters/Chapter4_1/Cpu0ISelLowering.h
lbdex/chapters/Chapter4_1/Cpu0ISelLowering.cpp
setTargetDAGCombine(ISD::SDIVREM);
setTargetDAGCombine(ISD::UDIVREM);
...
}
...
EVT Ty = N->getValueType(0);
unsigned LO = Cpu0::LO;
unsigned HI = Cpu0::HI;
unsigned Opc = N->getOpcode() == ISD::SDIVREM ? Cpu0ISD::DivRem :
Cpu0ISD::DivRemU;
SDLoc DL(N);
// insert MFLO
if (N->hasAnyUseOfValue(0)) {
SDValue CopyFromLo = DAG.getCopyFromReg(InChain, DL, LO, Ty,
InGlue);
DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), CopyFromLo);
InChain = CopyFromLo.getValue(1);
InGlue = CopyFromLo.getValue(2);
}
// insert MFHI
if (N->hasAnyUseOfValue(1)) {
SDValue CopyFromHi = DAG.getCopyFromReg(InChain, DL,
HI, Ty, InGlue);
return SDValue();
}
switch (Opc) {
default: break;
case ISD::SDIVREM:
case ISD::UDIVREM:
return performDivRemCombine(N, DAG, DCI, Subtarget);
}
return SDValue();
}
lbdex/chapters/Chapter4_1/Cpu0RegisterInfo.td
}
...
lbdex/chapters/Chapter4_1/Cpu0Schedule.td
]>;
lbdex/chapters/Chapter4_1/Cpu0SEISelDAGToDAG.h
lbdex/chapters/Chapter4_1/Cpu0SEISelDAGToDAG.cpp
if (HasLo) {
Lo = CurDAG->getMachineNode(Cpu0::MFLO, DL,
Ty, MVT::Glue, InFlag);
InFlag = SDValue(Lo, 1);
}
if (HasHi)
Hi = CurDAG->getMachineNode(Cpu0::MFHI, DL,
Ty, InFlag);
///
// Instruction Selection not handled by the auto-generated
// tablegen selection should be handled here.
///
///
// Instruction Selection not handled by the auto-generated
// tablegen selection should be handled here.
///
EVT NodeTy = Node->getValueType(0);
unsigned MultOpc;
switch(Opcode) {
default: break;
case ISD::MULHS:
case ISD::MULHU: {
MultOpc = (Opcode == ISD::MULHU ? Cpu0::MULTu : Cpu0::MULT);
auto LoHi = selectMULT(Node, MultOpc, DL, NodeTy, false, true);
ReplaceNode(Node, LoHi.second);
return true;
case ISD::Constant: {
const ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Node);
unsigned Size = CN->getValueSizeInBits(0);
if (Size == 32)
break;
return true;
}
}
...
}
lbdex/chapters/Chapter4_1/Cpu0SEInstrInfo.h
lbdex/chapters/Chapter4_1/Cpu0SEInstrInfo.cpp
if (DestReg)
MIB.addReg(DestReg, RegState::Define);
if (ZeroReg)
MIB.addReg(ZeroReg);
if (SrcReg)
MIB.addReg(SrcReg, getKillRegState(KillSrc));
}
The ADDu, ADD, SUBu, SUB and MUL defined in Chapter4_1/Cpu0InstrInfo.td are for operators +, -, *. SHL
(defined before) and SHLV are for <<. SRA, SRAV, SHR and SHRV are for >>.
In RISC CPU, such as Mips, the multiply/divide function unit and add/sub/logic unit are designed from two different
hardware circuits, and more, their data path are separate. Cpu0 is same, so these two function units can be executed at
same time (instruction level parallelism). Reference 1 for instruction itineraries.
Chapter4_1/ can handle +, -, *, <<, and >> operators in C language. The corresponding llvm IR instructions are
add, sub, mul, shl, ashr. The ‘ashr’ instruction (arithmetic shift right) returns the first operand shifted to the right a
specified number of bits with sign extension. In brief, we call ashr is “shift with sign extension fill”.
Note: ashr
Example: <result> = ashr i32 4, 1 ; yields {i32}:result = 2
<result> = ashr i8 -2, 1 ; yields {i8}:result = -1
<result> = ashr i32 1, 32 ; undefined
The semantic of C operator >> for negative operand is dependent on implementation. Most compilers translate it into
“shift with sign extension fill”, and Mips sra is this instruction. Following is the Micosoft web site’s explanation,
In addition to ashr, the other instruction “shift with zero filled” lshr in llvm (Mips implement lshr with instruction
srl) has the following meaning.
Note: lshr
Example: <result> = lshr i8 -2, 1 ; yields {i8}:result = 0x7FFFFFFF
In llvm, IR node sra is defined for ashr IR instruction, and node srl is defined for lshr instruction (We don’t know why
it doesn’t use ashr and lshr as the IR node name directly). Summary as the Table: C operator >> implementation.
1 http://llvm.org/docs/doxygen/html/structllvm_1_1InstrStage.html
lbdex/input/ch4_math.ll
Example input ch4_1_math.cpp as the following is the C file which include +, -, *, <<, and >> operators. It will
generate corresponding llvm IR instructions, add, sub, mul, shl, ashr by clang as Chapter 3 indicated.
lbdex/input/ch4_1_math.cpp
int test_math()
{
int a = 5;
int b = 2;
unsigned int a1 = -5;
int c, d, e, f, g, h, i;
unsigned int f1, g1, h1, i1;
c = a + b; // c = 7
d = a - b; // d = 3
e = a * b; // e = 10
f = (a << 2); // f = 20
f1 = (a1 << 1); // f1 = 0xfffffff6 = -10
g = (a >> 2); // g = 1
g1 = (a1 >> 30); // g1 = 0x03 = 3
h = (1 << a); // h = 0x20 = 32
h1 = (1 << b); // h1 = 0x04
i = (0x80 >> a); // i = 0x04
i1 = (b >> a); // i1 = 0x0
return (c+d+e+f+int(f1)+g+(int)g1+h+(int)h1+i+(int)i1);
// 7+3+10+20-10+1+3+32+4+4+0 = 74
}
Cpu0 instructions add and sub will trigger overflow exception while addu and subu truncate overflow value directly.
Compile ch4_1_addsuboverflow.cpp with llc -cpu0-enable-overflow=true will generate add and sub in-
structions as follows,
lbdex/input/ch4_1_addsuboverflow.cpp
#include "debug.h"
int test_add_overflow()
{
int a = 0x70000000;
int b = 0x20000000;
int c = 0;
c = a + b;
return 0;
}
int test_sub_overflow()
{
int a = -0x70000000;
int b = 0x20000000;
int c = 0;
c = a - b;
return 0;
}
In modern CPU, programmers are used to using truncate overflow instructions for C operators + and -. Anyway,
through option -cpu0-enable-overflow=true, programmer get the chance to compile program with overflow exception
program. Usually, this option used in debug purpose. Compile with this option can help to identify the bug and fix it
early.
The previous section, display the DAG translation process in text on terminal by option llc -debug . The llc also
supports the graphic displaying. The section Install other tools on iMac include the download and installation of tool
Graphivz. The llc graphic displaying with tool Graphviz is introduced in this section. The graphic displaying is more
readable by eyes than displaying text in terminal. It’s not a must-have, but helps a lot especially when you are tired
in tracking the DAG translation process. List the llc graphic support options from the sub-section “SelectionDAG
Instruction Selection Process” of web “The LLVM Target-Independent Code Generator” 5 as follows,
By tracking llc -debug , you can see the steps of DAG translation as follows,
Initial selection DAG
Optimized lowered selection DAG
Type-legalized selection DAG
Optimized type-legalized selection DAG
Legalized selection DAG
Optimized legalized selection DAG
Instruction selection
Selected selection DAG
Scheduling
...
Let’s run llc with option -view-dag-combine1-dags, and open the output result with Graphviz as follows,
118-165-12-177:input Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc -view-dag-combine1-dags -march=cpu0
-relocation-model=pic -filetype=asm ch4_1_mult.bc -o ch4_1_mult.cpu0.s
Writing '/tmp/llvm_84ibpm/dag.main.dot'... done.
118-165-12-177:input Jonathan$ Graphviz /tmp/llvm_84ibpm/dag.main.dot
Note: llc Graphviz options and the corresponding stages of DAG translation
-view-dag-combine1-dags: Initial selection DAG
-view-legalize-dags: Optimized type-legalized selection DAG
-view-dag-combine2-dags: Legalized selection DAG
-view-isel-dags: Optimized legalized selection DAG
-view-sched-dags: Selected selection DAG
The -view-isel-dags is important and often used by an llvm backend writer because it is the DAGs before instruction
selection. In order to writing the pattern match instruction in target description file .td, backend programmer needs
knowing what the DAG nodes are for a given C operator.
The DAG of %
Example input code ch4_1_mult.cpp which contains the C operator “%” and it’s corresponding llvm IR, as follows,
lbdex/input/ch4_1_mult.cpp
int test_mult()
{
int b = 11;
b = (b+1)%12;
return b;
}
...
define i32 @_Z8test_multv() #0 {
%b = alloca i32, align 4
store i32 11, i32* %b, align 4
%1 = load i32* %b, align 4
%2 = add nsw i32 %1, 1
%3 = srem i32 %2, 12
store i32 %3, i32* %b, align 4
%4 = load i32* %b, align 4
ret i32 %4
}
LLVM srem is the IR of corresponding “%”, reference here 6 . Copy the reference as follows,
Run Chapter3_5/ with input file ch4_1_mult.bc via option llc -view-isel-dags , will get the following error
message and the llvm DAGs of Fig. 4.2 below.
6 http://llvm.org/docs/LangRef.html#srem-instruction
LLVM replaces srem divide operation with multiply operation in DAG optimization because DIV operation costs more
in time than MUL. Example code “int b = 11; b=(b+1)%12;” is translated into DAGs as Fig. 4.2. The DAGs of gener-
ated result is verified and explained by calculating the value in each node. The 0xC*0x2AAAAAAB=0x2,00000004,
(mulhs 0xC, 0x2AAAAAAAB) meaning get the Signed mul high word (32bits). Multiply with 2 operands of 1 word
size probably generate the 2 word size of result (0x2, 0xAAAAAAAB). The result of high word, in this case is 0x2.
The final result (sub 12, 12) is 0 which match the statement (11+1)%12.
Arm solution
To run with ARM solution, change Cpu0InstrInfo.td and Cpu0ISelDAGToDAG.cpp from Chapter4_1/ as follows,
lbdex/chapters/Chapter4_1/Cpu0InstrInfo.td
lbdex/chapters/Chapter4_1/Cpu0ISelDAGToDAG.cpp
#if 0
/// Select multiply instructions.
std::pair<SDNode*, SDNode*>
Cpu0DAGToDAGISel::SelectMULT(SDNode *N, unsigned Opc, SDLoc DL, EVT Ty,
bool HasLo, bool HasHi) {
SDNode *Lo = 0, *Hi = 0;
SDNode *Mul = CurDAG->getMachineNode(Opc, DL, MVT::Glue, N->getOperand(0),
N->getOperand(1));
SDValue InFlag = SDValue(Mul, 0);
if (HasLo) {
Lo = CurDAG->getMachineNode(Cpu0::MFLO, DL,
Ty, MVT::Glue, InFlag);
InFlag = SDValue(Lo, 1);
}
if (HasHi)
Hi = CurDAG->getMachineNode(Cpu0::MFHI, DL,
Ty, InFlag);
default: break;
#if 0
case ISD::MULHS:
case ISD::MULHU: {
MultOpc = (Opcode == ISD::MULHU ? Cpu0::MULTu : Cpu0::MULT);
return SelectMULT(Node, MultOpc, DL, NodeTy, false, true).second;
}
#endif
...
}
Let’s run above changes with ch4_1_mult.cpp as well as llc -view-sched-dags option to get Fig. 4.3. Instruc-
tion SMMUL will get the high word of multiply result.
The following is the result of run above changes with ch4_1_mult.bc.
The other instruction UMMUL and llvm IR mulhu are unsigned int type for operator %. You can check it by unmark
the “unsigned int b = 11;” in ch4_1_mult.cpp.
Using SMMUL instruction to get the high word of multiplication result is adopted in ARM.
Mips solution
Mips uses MULT instruction and save the high & low part to registers HI and LO, respectively. After that, uses
mfhi/mflo to move register HI/LO to your general purpose registers. ARM SMMUL is fast if you only need the
HI part of result (it ignores the LO part of operation). ARM also provides SMULL (signed multiply long) to get
the whole 64 bits result. If you need the LO part of result, you can use Cpu0 MUL instruction to get the LO part
of result only. Chapter4_1/ is implemented with Mips MULT style. We choose it as the implementation of this
book for adding instructions as less as possible. This approach make Cpu0 better both as a tutorial architecture for
school teaching purpose material, and an engineer learning materials in compiler design. The MULT, MULTu, MFHI,
MFLO, MTHI, MTLO added in Chapter4_1/Cpu0InstrInfo.td; HI, LO registers in Chapter4_1/Cpu0RegisterInfo.td
include/llvm/Target/TargetSelectionDAG.td
Except the custom type, llvm IR operations of type expand and promote will call Cpu0DAGToDAGISel::Select()
during instruction selection of DAG translation. The SelectMULT() which called by Select() return the HI part of
multiplication result to HI register for IR operations of mulhs or mulhu. After that, MFHI instruction moves the HI
register to Cpu0 field “a” register, $ra. MFHI instruction is FL format and only use Cpu0 field “a” register, we set the
$rb and imm16 to 0. Fig. 4.4 and ch4_1_mult.cpu0.s are the results of compile ch4_1_mult.bc.
The sensitive readers may find llvm using “multiplication” instead of “div” to get the “%” result just because
our example uses constant as divider, “(b+1)%12” in our example. If programmer uses variable as the divider like
“(b+1)%a”, then: what will happen next? The answer is our code will has error in handling this.
Cpu0 just like Mips uses LO and HI registers to hold the “quotient” and “remainder”. And uses instructions “mflo”
and “mfhi” to get the result from LO or HI registers furthermore. With this solution, the “c = a / b” can be finished
by “div a, b” and “mflo c”; the “c = a % b” can be finished by “div a, b” and “mfhi c”.
To supports operators “%” and “/”, the following code added in Chapter4_1.
1. SDIV, UDIV and it’s reference class, nodes in Cpu0InstrInfo.td.
2. The copyPhysReg() declared and defined in Cpu0InstrInfo.h and Cpu0InstrInfo.cpp.
3. The setOperationAction(ISD::SDIV, MVT::i32, Expand), ..., setTargetDAGCombine(ISD::SDIVREM)
in constructore of Cpu0ISelLowering.cpp; PerformDivRemCombine() and PerformDAGCombine() in
Cpu0ISelLowering.cpp.
The IR instruction sdiv stands for signed div while udiv stands for unsigned div.
lbdex/input/ch4_1_mult2.cpp
int test_mult()
{
int b = 11;
int a = 12;
b = (b+1)%a;
return b;
}
If we run with ch4_1_mult2.cpp, the “div” cannot be gotten for operator “%”. It still uses “multiplication” instead
of “div” in ch4_1_mult2.cpp because llvm do “Constant Propagation Optimization” in this. The ch4_1_mod.cpp
can get the “div” for “%” result since it makes llvm “Constant Propagation Optimization” useless in it.
lbdex/input/ch4_1_mod.cpp
int test_mod()
{
int b = 11;
volatile int a = 12;
b = (b+1)%a;
return b;
}
To explains how to work with “div”, let’s run ch4_1_mod.cpp with debug option as follows,
Table 4.4: Functions handle the DAG translation and pattern match for C operator %
Translation Do by
srem => sdivrem setOperationAction(ISD::SREM, MVT::i32, Expand);
sdivrem => Cpu0ISD::DivRem setTargetDAGCombine(ISD::SDIVREM);
sdivrem => CopyFromReg xx, Hi, xx PerformDivRemCombine();
Cpu0ISD::DivRem => div SDIV (Cpu0InstrInfo.td)
CopyFromReg xx, Hi, xx => mfhi MFLO (Cpu0InstrInfo.td)
Step 2 as above, is triggered by code “setOperationAction(ISD::SREM, MVT::i32, Expand);” in
Cpu0ISelLowering.cpp. About Expand please ref. 7 and 8 . Step 3 is triggered by code “setTargetDAGCom-
bine(ISD::SDIVREM);” in Cpu0ISelLowering.cpp. Step 4 is did by PerformDivRemCombine() which called
by performDAGCombine(). Since the % corresponding srem makes the “N->hasAnyUseOfValue(1)” to true
in PerformDivRemCombine(), it creates DAG of “CopyFromReg”. When using “/” in C, it will make “N-
>hasAnyUseOfValue(0)” to ture. For sdivrem, sdiv makes “N->hasAnyUseOfValue(0)” true while srem makes
“N->hasAnyUseOfValue(1)” ture.
Above steps will change the DAGs when llc is running. After that, the pattern match defined in Chap-
ter4_1/Cpu0InstrInfo.td will translate Cpu0ISD::DivRem into div; and “CopyFromReg xxDAG, Register %H,
Cpu0ISD::DivRem” to mfhi.
The ch4_1_div.cpp is for / div operator test.
Chapter4_1 include the rotate operations translation. The instructions “rol”, “ror”, “rolv” and “rorv” defined in
Cpu0InstrInfo.td handle the translation. Compile ch4_1_rotate.cpp will get Cpu0 “rol” instruction.
lbdex/input/ch4_1_rotate.cpp
//#define TEST_ROXV
int test_rotate_left()
{
unsigned int a = 8;
int result = ((a << 30) | (a >> 2));
return result;
}
#ifdef TEST_ROXV
int test_rotate_left1()
{
volatile unsigned int a = 4;
volatile int n = 30;
int result = ((a << n) | (a >> (32 - n)));
return result;
}
7 http://llvm.org/docs/WritingAnLLVMBackend.html#expand
8 http://llvm.org/docs/CodeGenerator.html#selectiondag-legalizetypes-phase
int test_rotate_right()
{
volatile unsigned int a = 1;
volatile int n = 30;
int result = ((a >> n) | (a << (32 - n)));
return result;
}
#endif
Instructions “rolv” and “rorv” cannot be tested at this moment, they need logic “or” implementation which supported
at next section. Like the previous subsection mentioned at this chapter, some IRs in function @_Z16test_rotate_leftv()
will be combined into one one IR rotl during DAGs translation.
4.2 Logic
Chapter4_2 supports logic operators &, |, ^, !, ==, !=, <, <=, > and >=. They are trivial and easy. Listing the added
code with comments and table for these operators IR, DAG and instructions as below. Please check them with the run
result of bc and asm instructions for ch4_2_logic.cpp as below.
lbdex/chapters/Chapter4_2/Cpu0InstrInfo.td
// SetCC
let Predicates = [Ch4_2] in {
class SetCC_R<bits<8> op, string instr_asm, PatFrag cond_op,
RegisterClass RC>:
FA<op, (outs GPROut:$ra), (ins RC:$rb, RC:$rc),
!strconcat(instr_asm, "\t$ra, $rb, $rc"),
[(set GPROut:$ra, (cond_op RC:$rb, RC:$rc))],
IIAlu>, Requires<[HasSlt]> {
let shamt = 0;
}
// setcc patterns
// a < b
multiclass SetltPatsCmp<RegisterClass RC> {
def : Pat<(setlt RC:$lhs, RC:$rhs),
(ANDi (CMP RC:$lhs, RC:$rhs), 1)>;
// if cpu0 `define N `SW[31] instead of `SW[0] // Negative flag, then need
// 2 more instructions as follows,
// (XORi (ANDi (SHR (CMP RC:$lhs, RC:$rhs), (LUi 0x8000), 31), 1), 1)>;
def : Pat<(setult RC:$lhs, RC:$rhs),
(ANDi (CMP RC:$lhs, RC:$rhs), 1)>;
}
// a <= b
multiclass SetlePatsCmp<RegisterClass RC> {
def : Pat<(setle RC:$lhs, RC:$rhs),
// a <= b is equal to (XORi (b < a), 1)
(XORi (ANDi (CMP RC:$rhs, RC:$lhs), 1), 1)>;
def : Pat<(setule RC:$lhs, RC:$rhs),
(XORi (ANDi (CMP RC:$rhs, RC:$lhs), 1), 1)>;
}
// a > b
multiclass SetgtPatsCmp<RegisterClass RC> {
def : Pat<(setgt RC:$lhs, RC:$rhs),
// a > b is equal to b < a is equal to setlt(b, a)
(ANDi (CMP RC:$rhs, RC:$lhs), 1)>;
def : Pat<(setugt RC:$lhs, RC:$rhs),
(ANDi (CMP RC:$rhs, RC:$lhs), 1)>;
}
// a >= b
multiclass SetgePatsCmp<RegisterClass RC> {
def : Pat<(setge RC:$lhs, RC:$rhs),
// a >= b is equal to b <= a
(XORi (ANDi (CMP RC:$lhs, RC:$rhs), 1), 1)>;
def : Pat<(setuge RC:$lhs, RC:$rhs),
(XORi (ANDi (CMP RC:$lhs, RC:$rhs), 1), 1)>;
}
// a <= b
multiclass SetlePatsSlt<RegisterClass RC, Instruction SLTOp, Instruction SLTuOp> {
def : Pat<(setle RC:$lhs, RC:$rhs),
// a <= b is equal to (XORi (b < a), 1)
(XORi (SLTOp RC:$rhs, RC:$lhs), 1)>;
def : Pat<(setule RC:$lhs, RC:$rhs),
(XORi (SLTuOp RC:$rhs, RC:$lhs), 1)>;
}
// a > b
multiclass SetgtPatsSlt<RegisterClass RC, Instruction SLTOp, Instruction SLTuOp> {
def : Pat<(setgt RC:$lhs, RC:$rhs),
// a > b is equal to b < a is equal to setlt(b, a)
(SLTOp RC:$rhs, RC:$lhs)>;
def : Pat<(setugt RC:$lhs, RC:$rhs),
(SLTuOp RC:$rhs, RC:$lhs)>;
}
// a >= b
multiclass SetgePatsSlt<RegisterClass RC, Instruction SLTOp, Instruction SLTuOp> {
def : Pat<(setge RC:$lhs, RC:$rhs),
// a >= b is equal to b <= a
(XORi (SLTOp RC:$lhs, RC:$rhs), 1)>;
def : Pat<(setuge RC:$lhs, RC:$rhs),
(XORi (SLTuOp RC:$lhs, RC:$rhs), 1)>;
}
defm : SetgePatsCmp<CPURegs>;
}
} // let Predicates = [Ch4_2]
lbdex/chapters/Chapter4_2/Cpu0ISelLowering.cpp
...
}
lbdex/input/ch4_2_logic.cpp
int test_andorxornot()
{
int a = 5;
int b = 3;
int c = 0, d = 0, e = 0;
c = (a & b); // c = 1
d = (a | b); // d = 7
e = (a ^ b); // e = 6
b = !a; // b = 0
return (c+d+e+b); // 14
}
int test_setxx()
{
int a = 5;
int b = 3;
int c, d, e, f, g, h;
c = (a == b); // seq, c = 0
d = (a != b); // sne, d = 1
e = (a < b); // slt, e = 0
f = (a <= b); // sle, f = 0
g = (a > b); // sgt, g = 1
h = (a >= b); // sge, g = 1
return (c+d+e+f+g+h); // 3
}
.globl _Z16test_andorxornotv
...
and $3, $4, $3
...
or $3, $4, $3
...
xor $3, $4, $3
...
cmp $sw, $3, $2
andi $2, $sw, 2
shr $2, $2, 1
...
.globl _Z10test_setxxv
...
cmp $sw, $3, $2
andi $2, $sw, 2
shr $2, $2, 1
...
cmp $sw, $3, $2
andi $2, $sw, 2
shr $2, $2, 1
xori $2, $2, 1
...
cmp $sw, $3, $2
andi $2, $sw, 1
...
cmp $sw, $3, $2
andi $2, $sw, 1
xori $2, $2, 1
...
cmp $sw, $3, $2
andi $2, $sw, 1
...
cmp $sw, $3, $2
andi $2, $sw, 1
xori $2, $2, 1
...
==
• %cmp = icmp eq i32 • %cmp = (setcc %0, • cmp $sw, $3, $2
%0, %1 %1, seteq) • andi $2, $sw, 2
• %conv = zext i1 • and %cmp, 1 • shr $2, $2, 1
%cmp to i32 • andi $2, $2, 1
!=
• %cmp = icmp ne i32 • %cmp = (setcc %0, • cmp $sw, $3, $2
%0, %1 %1, setne) • andi $2, $sw, 2
• %conv = zext i1 • and %cmp, 1 • shr $2, $2, 1
%cmp to i32 • andi $2, $2, 1
<
• %cmp = icmp lt i32 • (setcc %0, %1, setlt) • cmp $sw, $3, $2
%0, %1 • and %cmp, 1 • andi $2, $sw, 2
• %conv = zext i1 • andi $2, $2, 1
%cmp to i32 • andi $2, $2, 1
<=
• %cmp = icmp le i32 • (setcc %0, %1, setle) • cmp $sw, $2, $3
%0, %1 • and %cmp, 1 • andi $2, $sw, 1
• %conv = zext i1 • xori $2, $2, 1
%cmp to i32 • andi $2, $2, 1
>
• %cmp = icmp gt i32 • (setcc %0, %1, setgt) • cmp $sw, $2, $3
%0, %1 • and %cmp, 1 • andi $2, $sw, 2
• %conv = zext i1 • andi $2, $2, 1
%cmp to i32
>=
• %cmp = icmp le i32 • (setcc %0, %1, setle) • cmp $sw, $3, $2
%0, %1 • and %cmp, 1 • andi $2, $sw, 1
• %conv = zext i1 • xori $2, $2, 1
%cmp to i32 • andi $2, $2, 1
==
• %cmp = icmp eq i32 • %cmp = (setcc %0, • xor $2, $3, $2
%0, %1 %1, seteq) • sltiu $2, $2, 1
• %conv = zext i1 • and %cmp, 1 • andi $2, $2, 1
%cmp to i32
!=
• %cmp = icmp ne i32 • %cmp = (setcc %0, • xor $2, $3, $2
%0, %1 %1, setne) • sltu $2, $zero, 2
• %conv = zext i1 • and %cmp, 1 • shr $2, $2, 1
%cmp to i32 • andi $2, $2, 1
<
• %cmp = icmp lt i32 • (setcc %0, %1, setlt) • slt $2, $3, $2
%0, %1 • and %cmp, 1 • andi $2, $2, 1
• %conv = zext i1
%cmp to i32
<=
• %cmp = icmp le i32 • (setcc %0, %1, setle) • slt $2, $3, $2
%0, %1 • and %cmp, 1 • xori $2, $2, 1
• %conv = zext i1 • andi $2, $2, 1
%cmp to i32
>
• %cmp = icmp gt i32 • (setcc %0, %1, setgt) • slt $2, $3, $2
%0, %1 • and %cmp, 1 • andi $2, $2, 1
• %conv = zext i1
%cmp to i32
>=
• %cmp = icmp le i32 • (setcc %0, %1, setle) • slt $2, $3, $2
%0, %1 • and %cmp, 1 • xori $2, $2, 1
• %conv = zext i1 • andi $2, $2, 1
%cmp to i32
From above result, slt spend less instructions than cmp for relation operators translation. Beyond that, slt uses general
purpose register while cmp uses $sw dedicated register.
lbdex/input/ch4_2_slt_explain.cpp
int test_OptSlt()
{
int a = 3, b = 1;
int d = 0, e = 0, f = 0;
d = (a < 1);
e = (b < 2);
f = d + e;
return (f);
}
Run these two llc -mcpu option for Chapter4_2 with ch4_2_slt_explain.cpp to get the above result. Regardless of the
move between $sw and general purpose register in llc -mcpu=cpu032I, the two cmp instructions in it will has hazard
in instruction reorder since both of them use $sw register. The llc -mcpu=cpu032II has not this problem because it
uses slti 9 . The slti version can reorder as follows,
9 See book Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design)
...
ld $2, 16($sp)
slti $2, $2, 2
andi $2, $2, 1
st $2, 8($sp)
ld $2, 20($sp)
slti $2, $2, 1
andi $2, $2, 1
st $2, 12($sp)
...
Chapter4_2 include instructions cmp and slt. Though cpu032II include both of these two instructions, the slt takes the
priority since “let Predicates = [HasSlt]” appeared before “let Predicates = [HasCmp]” in Cpu0InstrInfo.td.
4.3 Summary
List C operators, IR of .bc, Optimized legalized selection DAG and Cpu0 instructions implemented in this chapter
in Table: Chapter 4 mathmetic operators. There are over 20 operators totally in mathmetic and logic support in this
chapter and spend 4xx lines of source code.
!
• %tobool = icmp ne • %lnot = (setcc %to- • %1 = (xor %tobool,
i32 %0, 0 bool, 0, seteq) 0)
• %lnot = xor i1 %to- • %conv = (and %lnot, • %true = (addiu $r0,
bool, true 1) 1)
• %lnot = (xor %1,
%true)
FIVE
The previous chapters introducing the assembly code generation only. This chapter adding the elf obj support and
verify the generated obj by objdump utility. With LLVM support, the Cpu0 backend can generate both big endian
and little endian obj files with only a few code added. The Target Registration mechanism and their structure are
introduced in this chapter.
Currently, we only support translation of llvm IR code into assembly code. If you try running Chapter4_2/ to translate
it into obj code will get the error message as follows,
Chapter5_1/ support obj file generation. It produces obj files both for big endian and little endian with command llc
-march=cpu0 and llc -march=cpu0el , respectively. Run with them will get the obj files as follows,
211
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
The first instruction is “addiu $sp, -56” and its corresponding obj is 0x09ddffc8. The opcode of addiu is 0x09, 8 bits;
$sp register number is 13(0xd), 4bits; and the immediate is 16 bits -56(=0xffc8), so it is correct. The third instruction
“st $2, 52($fp)” and it’s corresponding obj is 0x022b0034. The st opcode is 0x02, $2 is 0x2, $fp is 0xb and immediate
is 52(0x0034). Thanks to Cpu0 instruction format which opcode, register operand and offset(imediate value) size are
multiple of 4 bits. Base on the 4 bits multiple, the obj format is easy to check by eyes. The big endian (B0, B1, B2,
B3) = (09, dd, ff, c8), objdump from B0 to B3 is 0x09ddffc8 and the little endian is (B3, B2, B1, B0) = (09, dd, ff, c8),
objdump from B0 to B3 is 0xc8ffdd09.
To support elf obj generation, the following code changed and added to Chapter5_1.
lbdex/chapters/Chapter5_1/InstPrinter/Cpu0InstPrinter.cpp
#include "MCTargetDesc/Cpu0MCExpr.h"
lbdex/chapters/Chapter5_1/MCTargetDesc/CMakeLists.txt
Cpu0AsmBackend.cpp
Cpu0MCCodeEmitter.cpp
Cpu0MCExpr.cpp
Cpu0ELFObjectWriter.cpp
Cpu0TargetStreamer.cpp
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0AsmBackend.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0ASMBACKEND_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0ASMBACKEND_H
#include "Cpu0Config.h"
#include "MCTargetDesc/Cpu0FixupKinds.h"
#include "llvm/ADT/Triple.h"
#include "llvm/MC/MCAsmBackend.h"
namespace llvm {
class MCAssembler;
struct MCFixupKindInfo;
class Target;
class MCObjectWriter;
public:
Cpu0AsmBackend(const Target &T, Triple::OSType _OSType, bool IsLittle)
: MCAsmBackend(), OSType(_OSType), IsLittle(IsLittle) {}
/// @}
} // namespace
#endif
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0AsmBackend.cpp
#include "MCTargetDesc/Cpu0FixupKinds.h"
#include "MCTargetDesc/Cpu0AsmBackend.h"
#include "MCTargetDesc/Cpu0MCTargetDesc.h"
#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCDirectives.h"
#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCFixupKindInfo.h"
#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
//@adjustFixupValue {
// Prepare value for the target space for it
static unsigned adjustFixupValue(const MCFixup &Fixup, uint64_t Value,
MCContext *Ctx = nullptr) {
return Value;
}
//@adjustFixupValue }
MCObjectWriter *
Cpu0AsmBackend::createObjectWriter(raw_pwrite_stream &OS) const {
return createCpu0ELFObjectWriter(OS,
MCELFObjectTargetWriter::getOSABI(OSType), IsLittle);
}
/// ApplyFixup - Apply the \p Value for given \p Fixup into the provided
/// data fragment, at the offset specified by the fixup and following the
/// fixup kind as appropriate.
void Cpu0AsmBackend::applyFixup(const MCFixup &Fixup, char *Data,
unsigned DataSize, uint64_t Value,
bool IsPCRel) const {
MCFixupKind Kind = Fixup.getKind();
Value = adjustFixupValue(Fixup, Value);
if (!Value)
return; // Doesn't change encoding.
switch ((unsigned)Kind) {
default:
FullSize = 4;
break;
}
//@getFixupKindInfo {
const MCFixupKindInfo &Cpu0AsmBackend::
getFixupKindInfo(MCFixupKind Kind) const {
const static MCFixupKindInfo Infos[Cpu0::NumTargetFixupKinds] = {
// This table *must* be in same the order of fixup_* kinds in
// Cpu0FixupKinds.h.
//
// name offset bits flags
{ "fixup_Cpu0_32", 0, 32, 0 },
{ "fixup_Cpu0_HI16", 0, 16, 0 },
{ "fixup_Cpu0_LO16", 0, 16, 0 },
{ "fixup_Cpu0_GPREL16", 0, 16, 0 },
{ "fixup_Cpu0_GOT", 0, 16, 0 },
{ "fixup_Cpu0_GOT_HI16", 0, 16, 0 },
{ "fixup_Cpu0_GOT_LO16", 0, 16, 0 }
};
// MCAsmBackend
MCAsmBackend *llvm::createCpu0AsmBackendEL32(const Target &T,
const MCRegisterInfo &MRI,
const Triple &TT, StringRef CPU) {
return new Cpu0AsmBackend(T, TT.getOS(), /*IsLittle*/true);
}
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0BaseInfo.h
#include "Cpu0FixupKinds.h"
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0ELFObjectWriter.cpp
#include "Cpu0Config.h"
#include "MCTargetDesc/Cpu0BaseInfo.h"
#include "MCTargetDesc/Cpu0FixupKinds.h"
#include "MCTargetDesc/Cpu0MCTargetDesc.h"
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCSection.h"
#include "llvm/MC/MCValue.h"
#include "llvm/Support/ErrorHandling.h"
#include <list>
namespace {
class Cpu0ELFObjectWriter : public MCELFObjectTargetWriter {
public:
Cpu0ELFObjectWriter(uint8_t OSABI);
~Cpu0ELFObjectWriter() override;
Cpu0ELFObjectWriter::Cpu0ELFObjectWriter(uint8_t OSABI)
: MCELFObjectTargetWriter(/*_is64Bit=false*/ false, OSABI, ELF::EM_CPU0,
/*HasRelocationAddend*/ false) {}
Cpu0ELFObjectWriter::~Cpu0ELFObjectWriter() {}
//@GetRelocType {
unsigned Cpu0ELFObjectWriter::getRelocType(MCContext &Ctx,
const MCValue &Target,
const MCFixup &Fixup,
bool IsPCRel) const {
// determine the type of the relocation
unsigned Type = (unsigned)ELF::R_CPU0_NONE;
unsigned Kind = (unsigned)Fixup.getKind();
switch (Kind) {
default:
llvm_unreachable("invalid fixup kind!");
case FK_Data_4:
Type = ELF::R_CPU0_32;
break;
case Cpu0::fixup_Cpu0_32:
Type = ELF::R_CPU0_32;
break;
case Cpu0::fixup_Cpu0_GPREL16:
Type = ELF::R_CPU0_GPREL16;
break;
case Cpu0::fixup_Cpu0_GOT:
Type = ELF::R_CPU0_GOT16;
break;
case Cpu0::fixup_Cpu0_HI16:
Type = ELF::R_CPU0_HI16;
break;
case Cpu0::fixup_Cpu0_LO16:
Type = ELF::R_CPU0_LO16;
break;
case Cpu0::fixup_Cpu0_GOT_HI16:
Type = ELF::R_CPU0_GOT_HI16;
break;
case Cpu0::fixup_Cpu0_GOT_LO16:
Type = ELF::R_CPU0_GOT_LO16;
break;
}
return Type;
}
//@GetRelocType }
bool
Cpu0ELFObjectWriter::needsRelocateWithSymbol(const MCSymbol &Sym,
unsigned Type) const {
// FIXME: This is extremelly conservative. This really needs to use a
// whitelist with a clear explanation for why each realocation needs to
// point to the symbol, not to the section.
switch (Type) {
default:
return true;
case ELF::R_CPU0_GOT16:
// For Cpu0 pic mode, I think it's OK to return true but I didn't confirm.
// llvm_unreachable("Should have been handled already");
return true;
case ELF::R_CPU0_GPREL16:
return false;
}
}
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0FixupKinds.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0FIXUPKINDS_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0FIXUPKINDS_H
#include "Cpu0Config.h"
#include "llvm/MC/MCFixup.h"
namespace llvm {
namespace Cpu0 {
// Although most of the current fixup types reflect a unique relocation
// one can have multiple fixup types for a given relocation and thus need
// to be uniquely named.
//
// This table *must* be in the save order of
// MCFixupKindInfo Infos[Cpu0::NumTargetFixupKinds]
// in Cpu0AsmBackend.cpp.
//@Fixups {
enum Fixups {
//@ Pure upper 32 bit fixup resulting in - R_CPU0_32.
fixup_Cpu0_32 = FirstTargetFixupKind,
// resulting in - R_CPU0_GOT_HI16
fixup_Cpu0_GOT_HI16,
// resulting in - R_CPU0_GOT_LO16
fixup_Cpu0_GOT_LO16,
// Marker
LastTargetFixupKind,
NumTargetFixupKinds = LastTargetFixupKind - FirstTargetFixupKind
};
//@Fixups }
} // namespace Cpu0
} // namespace llvm
#endif // LLVM_CPU0_CPU0FIXUPKINDS_H
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0MCCodeEmitter.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCCODEEMITTER_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCCODEEMITTER_H
#include "Cpu0Config.h"
#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/Support/DataTypes.h"
namespace llvm {
class MCContext;
class MCExpr;
class MCInst;
class MCInstrInfo;
class MCFixup;
class MCOperand;
class MCSubtargetInfo;
class raw_ostream;
public:
Cpu0MCCodeEmitter(const MCInstrInfo &mcii, MCContext &Ctx_, bool IsLittle)
: MCII(mcii), Ctx(Ctx_), IsLittleEndian(IsLittle) {}
~Cpu0MCCodeEmitter() override {}
#endif
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp
#include "Cpu0MCCodeEmitter.h"
#include "MCTargetDesc/Cpu0BaseInfo.h"
#include "MCTargetDesc/Cpu0FixupKinds.h"
#include "MCTargetDesc/Cpu0MCExpr.h"
#include "MCTargetDesc/Cpu0MCTargetDesc.h"
#include "llvm/ADT/APFloat.h"
#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/raw_ostream.h"
#define GET_INSTRMAP_INFO
#include "Cpu0GenInstrInfo.inc"
#undef GET_INSTRMAP_INFO
namespace llvm {
MCCodeEmitter *llvm::createCpu0MCCodeEmitterEB(const MCInstrInfo &MCII,
const MCRegisterInfo &MRI,
MCContext &Ctx) {
return new Cpu0MCCodeEmitter(MCII, Ctx, false);
}
}
}
//@CH8_1 {
/// getBranch16TargetOpValue - Return binary encoding of the branch
/// target operand. If the machine operand requires relocation,
/// record the relocation and return zero.
unsigned Cpu0MCCodeEmitter::
getBranch16TargetOpValue(const MCInst &MI, unsigned OpNo,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
return 0;
}
//@getJumpTargetOpValue {
unsigned Cpu0MCCodeEmitter::
getJumpTargetOpValue(const MCInst &MI, unsigned OpNo,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
return 0;
}
//@CH8_1 }
//@getExprOpValue {
unsigned Cpu0MCCodeEmitter::
getExprOpValue(const MCExpr *Expr,SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
//@getExprOpValue body {
MCExpr::ExprKind Kind = Expr->getKind();
if (Kind == MCExpr::Constant) {
return cast<MCConstantExpr>(Expr)->getValue();
}
if (Kind == MCExpr::Binary) {
unsigned Res = getExprOpValue(cast<MCBinaryExpr>(Expr)->getLHS(), Fixups, STI);
Res += getExprOpValue(cast<MCBinaryExpr>(Expr)->getRHS(), Fixups, STI);
return Res;
}
if (Kind == MCExpr::Target) {
const Cpu0MCExpr *Cpu0Expr = cast<Cpu0MCExpr>(Expr);
#include "Cpu0GenMCCodeEmitter.inc"
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0MCExpr.h
#ifndef LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCEXPR_H
#define LLVM_LIB_TARGET_CPU0_MCTARGETDESC_CPU0MCEXPR_H
#include "Cpu0Config.h"
#if CH >= CH5_1
#include "llvm/MC/MCAsmLayout.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCValue.h"
namespace llvm {
CEK_GOT_DISP,
CEK_GOT_HI16,
CEK_GOT_LO16,
CEK_GPREL,
CEK_TLSGD,
CEK_TLSLDM,
CEK_TP_HI,
CEK_TP_LO,
CEK_Special,
};
private:
const Cpu0ExprKind Kind;
const MCExpr *Expr;
public:
static const Cpu0MCExpr *create(Cpu0ExprKind Kind, const MCExpr *Expr,
MCContext &Ctx);
static const Cpu0MCExpr *create(const MCSymbol *Symbol,
Cpu0MCExpr::Cpu0ExprKind Kind, MCContext &Ctx);
static const Cpu0MCExpr *createGpOff(Cpu0ExprKind Kind, const MCExpr *Expr,
MCContext &Ctx);
#endif
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0MCExpr.cpp
#include "Cpu0.h"
#include "Cpu0MCExpr.h"
#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCObjectStreamer.h"
#include "llvm/MC/MCSymbolELF.h"
#include "llvm/Support/ELF.h"
MCContext &Ctx) {
const MCSymbolRefExpr *MCSym =
MCSymbolRefExpr::create(Symbol, MCSymbolRefExpr::VK_None, Ctx);
return new (Ctx) Cpu0MCExpr(Kind, MCSym);
}
switch (Kind) {
case CEK_None:
case CEK_Special:
llvm_unreachable("CEK_None and CEK_Special are invalid");
break;
case CEK_CALL_HI16:
OS << "%call_hi";
break;
case CEK_CALL_LO16:
OS << "%call_lo";
break;
case CEK_DTP_HI:
OS << "%dtp_hi";
break;
case CEK_DTP_LO:
OS << "%dtp_lo";
break;
case CEK_GOT:
OS << "%got";
break;
case CEK_GOTTPREL:
OS << "%gottprel";
break;
case CEK_GOT_CALL:
OS << "%call16";
break;
case CEK_GOT_DISP:
OS << "%got_disp";
break;
case CEK_GOT_HI16:
OS << "%got_hi";
break;
case CEK_GOT_LO16:
OS << "%got_lo";
break;
case CEK_GPREL:
OS << "%gp_rel";
break;
case CEK_ABS_HI:
OS << "%hi";
break;
case CEK_ABS_LO:
OS << "%lo";
break;
case CEK_TLSGD:
OS << "%tlsgd";
break;
case CEK_TLSLDM:
OS << "%tlsldm";
break;
case CEK_TP_HI:
OS << "%tp_hi";
break;
case CEK_TP_LO:
OS << "%tp_lo";
break;
}
OS << '(';
if (Expr->evaluateAsAbsolute(AbsVal))
OS << AbsVal;
else
Expr->print(OS, MAI, true);
OS << ')';
}
bool
Cpu0MCExpr::evaluateAsRelocatableImpl(MCValue &Res,
#endif
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0MCTargetDesc.h
lbdex/chapters/Chapter5_1/MCTargetDesc/Cpu0MCTargetDesc.cpp
lbdex/chapters/Chapter5_1/Cpu0MCInstLower.h
#include "MCTargetDesc/Cpu0MCExpr.h"
���������������������������������
���������������������������������������������������������������
������������������������������������������������������������
�����������������������������������
������������������������������� ������������������������������������������
���������������������
����������������
����������������������������������������
The ELF encoder calling functions shown as the figure above. AsmPrinter::OutStreamer is
set to MCObjectStreamer when by llc driver when user input llc -filetype=obj .
���������������������
������������������� �������������������
�����������������������
��������������������������������������
The instruction operands information for encoder is got as the figure above. Steps as follows,
1. Function encodeInstruction() pass MI.Opcode to getBinaryCodeForInstr().
2. getBinaryCodeForInstr() pass MI.Operand[n] to getMachineOpValue() and then,
3. get register number by calling getMachineOpValue().
4. getBinaryCodeForInstr() return the MI with all number of registers to encodeInstruction().
The MI.Opcode is set in Instruction Selection Stage. The table gen function getBinaryCodeForIn-
str() get all the operands information from the td files set by programmer as the following figure.
Cpu0GenMCCodeEmitter.inc
uint64_t Cpu0MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI, {
...
static const uint64_t InstBits[] = {
...
UINT64_C(318767104),// ADD /// 318767104=0x13000000
...
}
const unsigned opcode = MI.getOpcode();
uint64_t Value = InstBits[opcode];
...
switch (opcode) {
...
case Cpu0::ADD:
...
// op: ra
Cpu0InstrInfo.td
op = getMachineOpValue(MI, MI.getOperand(0), Fixups, STI);
class ArithLogicR<bits<8> op, ...
Value |= (op & UINT64_C(15)) << 20;
let Inst{23-20} = ra;
// op: rb
let Inst{19-16} = rb;
op = getMachineOpValue(MI, MI.getOperand(1), Fixups, STI);
let Inst{15-12} = rc;
Value |= (op & UINT64_C(15)) << 16;
...
// op: rc
op = getMachineOpValue(MI, MI.getOperand(2), Fixups, STI);
Value |= (op & UINT64_C(15)) << 12;
break;
}
...
return Value;
}
�����������������������������������������������������
�� �������������������������������������������������������������
For instance, Cpu0 backend will generate “addu $v0, $at, $v1” for the IR “%0 = add %1, %2” once llvm allocate
registers $v0, $at and $v1 for Operands %0, %1 and %2 individually. The MCOperand structure for MI.Operands[]
include register number set in the pass of llvm allocate registers which can be got in getMachineOpValue().
The getEncodingValue(Reg) in getMachineOpValue() as the following will get the RegNo of encode from Register
name such as AT, V0, or V1, ... by using table gen information from Cpu0RegisterInfo.td as the following. My
comment is after “///”.
include//llvm/MC/MCRegisterInfo.h
void InitMCRegisterInfo(...,
const uint16_t *RET) {
...
RegEncodingTable = RET;
}
unsigned Cpu0MCCodeEmitter::
getMachineOpValue(const MCInst &MI, const MCOperand &MO,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
if (MO.isReg()) {
unsigned Reg = MO.getReg();
unsigned RegNo = Ctx.getRegisterInfo()->getEncodingValue(Reg);
return RegNo;
...
}
include/llvm/MC/MCRegisterInfo.h
void InitMCRegisterInfo(...,
const uint16_t *RET) {
...
RegEncodingTable = RET;
}
lbdex/chapters/Chapter5_1/Cpu0RegisterInfo.td
cmake_debug_build/lib/Target/Cpu0/Cpu0GenRegisterInfo.inc
namespace Cpu0 {
enum {
NoRegister,
AT = 1,
...
V0 = 19,
V1 = 20,
NUM_TARGET_REGS // 21
};
} // end namespace Cpu0
0,
13,
15,
0,
4,
5,
9,
10,
7,
8,
6,
2, /// 19, V0
3, /// 20, V1
};
The applyFixup() of Cpu0AsmBackend.cpp will fix up the jeq, jub, ... instructions of “address control flow state-
ments” or “function call statements” used in later chapters. The setting of true or false for each relocation record
in needsRelocateWithSymbol() of Cpu0ELFObjectWriter.cpp depends on whethor this relocation record is needed to
adjust address value during link or not. If set true, then linker has chance to adjust this address value with correct
information. On the other hand, if set false, then linker has no correct information to adjust this relocation record.
About relocation record, it will be introduced in later chapter ELF Support.
When emit elf obj format instruction, the EncodeInstruction() of Cpu0MCCodeEmitter.cpp will be called since it
override the same name of function in parent class MCCodeEmitter.
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
// Address operand
def mem : Operand<iPTR> {
let PrintMethod = "printMemOperand";
let MIOperandInfo = (ops CPURegs, simm16);
let EncoderMethod = "getMemEncoding";
// 32-bit store.
multiclass StoreM32<bits<8> op, string instr_asm, PatFrag OpNode,
bit Pseudo = 0> {
def #NAME# : StoreM<op, instr_asm, OpNode, CPURegs, mem, Pseudo>;
}
The “let EncoderMethod = “getMemEncoding”;” in Cpu0InstrInfo.td as above will making llvm call function get-
MemEncoding() when either ld or st instruction is issued in elf obj since these two instructions use mem Operand.
The other functions in Cpu0MCCodeEmitter.cpp are called by these two functions.
After encoder, the following code will write the encode instructions to buffer.
src/lib/MC/MCELFStreamer.cpp
Now, let’s examine Cpu0MCTargetDesc.cpp. Cpu0MCTargetDesc.cpp do the target registration as mentioned in the
previous chapter here 1 , and the assembly output has explained here 2 . List the register functions of ELF obj output as
follows,
// MCELFStreamer.cpp
MCStreamer *llvm::createELFStreamer(MCContext &Context, MCAsmBackend &MAB,
raw_pwrite_stream &OS, MCCodeEmitter *CE,
bool RelaxAll) {
MCELFStreamer *S = new MCELFStreamer(Context, MAB, OS, CE);
if (RelaxAll)
S->getAssembler().setRelaxAll(true);
return S;
}
Above createELFStreamer takes care the elf obj streamer. Fig. 5.1 as follow is MCELFStreamer inheritance tree. You
can find a lot of operations in that inheritance tree.
1 http://jonathan2251.github.io/lbd/llvmstructure.html#target-registration
2 http://jonathan2251.github.io/lbd/backendstructure.html#add-asmprinter
MCInstPrinter *InstPrint,
bool isVerboseAsm) {
return new Cpu0TargetAsmStreamer(S, OS);
}
// Cpu0TargetStreamer.h
class Cpu0TargetStreamer : public MCTargetStreamer {
public:
Cpu0TargetStreamer(MCStreamer &S);
};
public:
Cpu0TargetAsmStreamer(MCStreamer &S, formatted_raw_ostream &OS);
};
// Cpu0MCCodeEmitter.cpp
MCCodeEmitter *llvm::createCpu0MCCodeEmitterEB(const MCInstrInfo &MCII,
const MCRegisterInfo &MRI,
MCContext &Ctx) {
return new Cpu0MCCodeEmitter(MCII, Ctx, false);
}
Above instancing two objects Cpu0MCCodeEmitter, one is for big endian and the other is for little endian. They take
care the obj format generated while RegisterELFStreamer() reuse the elf streamer class.
Reader maybe has the question: “What are the actual arguments in createCpu0MCCodeEmitterEB(const MCIn-
strInfo &MCII, const MCSubtargetInfo &STI, MCContext &Ctx)?” and “When they are assigned?” Yes, we didn’t
assign it at this point, we register the createXXX() function by function pointer only (according C, TargetReg-
istry::RegisterXXX(TheCpu0Target, createXXX()) where createXXX is function pointer). LLVM keeps a function
pointer to createXXX() when we call target registry, and will call these createXXX() function back at proper time with
arguments assigned during the target registration process, RegisterXXX().
// Cpu0AsmBackend.cpp
MCAsmBackend *llvm::createCpu0AsmBackendEL32(const Target &T,
const MCRegisterInfo &MRI,
const Triple &TT, StringRef CPU) {
return new Cpu0AsmBackend(T, TT.getOS(), /*IsLittle*/true);
}
// Cpu0AsmBackend.h
class Cpu0AsmBackend : public MCAsmBackend {
...
}
Above Cpu0AsmBackend class is the bridge for asm to obj. Two objects take care big endian and little endian, respec-
tively. It derived from MCAsmBackend. Most of code for object file generated is implemented by MCELFStreamer
and it’s parent, MCAsmBackend.
SIX
GLOBAL VARIABLES
In the last three chapters, we only access the local variables. This chapter deals global variable access translation.
The global variable DAG translation is different from the previous DAG translations until now we have. It creates IR
DAG nodes at run time in backend C++ code according the llc -relocation-model option while the others
of DAG just do IR DAG to Machine DAG translation directly according the input file of IR DAGs (except the Pseudo
instruction RetLR used in Chapter3_4). Readers should focus on how to add code for creating DAG nodes at run time
and how to define the pattern match in td for the run time created DAG nodes. In addition, the machine instruction
printing function for global variable related assembly directive (macro) should be cared if your backend has it.
Chapter6_1/ supports the global variable, let’s compile ch6_1.cpp with this version first, then explain the code changes
after that.
lbdex/input/ch6_1.cpp
int gStart = 3;
int gI = 100;
int test_global()
{
int c = 0;
c = gI;
return c;
}
241
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Just like Mips, Cpu0 supports both static and pic mode. There are two different layout of
global variables for static mode which controlled by option cpu0-use-small-section. Chapter6_1/
supports the global variable translation. Let’s run Chapter6_1/ with ch6_1.cpp via four dif-
ferent options llc -relocation-model=static -cpu0-use-small-section=false
, llc -relocation-model=static -cpu0-use-small-section=true ,
llc -relocation-model=pic -cpu0-use-small-section=false and llc
-relocation-model=pic -cpu0-use-small-section=true to tracing the DAGs and Cpu0 in-
structions.
118-165-78-166:input Jonathan$ clang -target mips-unknown-linux-gnu -c
ch6_1.cpp -emit-llvm -o ch6_1.bc
118-165-78-166:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=static -cpu0-use-small-section=false
-filetype=asm -debug ch6_1.bc -o -
...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 12 nodes:
...
0x7ffd5902cc10: <multiple use>
0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=-3]
...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 12 nodes:
...
0x7fc5f382cc10: <multiple use>
0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=-3]
...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 11 nodes:
...
0x7fe03c02e010: <multiple use>
0x7fe03c02e118: ch = store 0x7fe03b50dee0, 0x7fe03c02de00, 0x7fe03c02df08,
0x7fe03c02e010<ST4[%c]> [ORD=3] [ID=-3]
...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 11 nodes:
...
0x7fad7102cc10: <multiple use>
0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=-3]
lbdex/chapters/Chapter6_1/Cpu0Subtarget.h
1 http://llvm.org/docs/CommandLine.html
...
};
lbdex/chapters/Chapter6_1/Cpu0Subtarget.cpp
bool Cpu0ReserveGP;
bool Cpu0NoCpload;
// Set UseSmallSection.
UseSmallSection = UseSmallSectionOpt;
Cpu0ReserveGP = ReserveGPOpt;
Cpu0NoCpload = NoCploadOpt;
...
}
The options ReserveGPOpt and NoCploadOpt will used in Cpu0 linker at later Chapter. Next add the follow-
ing code to files Cpu0BaseInfo.h, Cpu0TargetObjectFile.h, Cpu0TargetObjectFile.cpp, Cpu0RegisterInfo.cpp and
Cpu0ISelLowering.cpp.
lbdex/chapters/Chapter6_1/Cpu0BaseInfo.h
enum TOF {
...
/// MO_GOT16 - Represents the offset into the global offset table at which
/// the address the relocation entry symbol resides during execution.
MO_GOT16,
MO_GOT,
...
}; // enum TOF {
lbdex/chapters/Chapter6_1/Cpu0TargetObjectFile.h
lbdex/chapters/Chapter6_1/Cpu0TargetObjectFile.cpp
// A address must be loaded from a small section if its size is less than the
// small section size threshold. Data in this section must be addressed using
// gp_rel operator.
static bool IsInSmallSection(uint64_t Size) {
return Size > 0 && Size <= SSThreshold;
}
/// Return true if this global address should be placed into small data/bss
/// section. This method does all the work, except for checking the section
/// kind.
bool Cpu0TargetObjectFile::
IsGlobalInSmallSectionImpl(const GlobalValue *GV,
const TargetMachine &TM) const {
const Cpu0Subtarget &Subtarget =
*static_cast<const Cpu0TargetMachine &>(TM).getSubtargetImpl();
MCSection *
Cpu0TargetObjectFile::SelectSectionForGlobal(const GlobalValue *GV,
SectionKind Kind, Mangler &Mang,
const TargetMachine &TM) const {
// TODO: Could also support "weak" symbols as well with ".gnu.linkonce.s.*"
// sections?
lbdex/chapters/Chapter6_1/Cpu0RegisterInfo.cpp
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
...
Reserved.set(Cpu0::GP);
...
}
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h
// This method creates the following nodes, which are necessary for
// computing a local symbol's address:
//
// (add (load (wrapper $gp, %got(sym)), %lo(sym))
template<class NodeTy>
SDValue getAddrLocal(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
SDLoc DL(N);
//@getAddrGlobal {
// This method creates the following nodes, which are necessary for
// computing a global symbol's address:
//
// (load (wrapper $gp, %got(sym)))
template<class NodeTy>
SDValue getAddrGlobal(NodeTy *N, EVT Ty, SelectionDAG &DAG,
unsigned Flag, SDValue Chain,
const MachinePointerInfo &PtrInfo) const {
SDLoc DL(N);
SDValue Tgt = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
getTargetNode(N, Ty, DAG, Flag));
return DAG.getLoad(Ty, DL, Chain, Tgt, PtrInfo);
}
//@getAddrGlobal }
//@getAddrGlobalLargeGOT {
// This method creates the following nodes, which are necessary for
// computing a global symbol's address in large-GOT mode:
//
// (load (wrapper (add %hi(sym), $gp), %lo(sym)))
template<class NodeTy>
SDValue getAddrGlobalLargeGOT(NodeTy *N, EVT Ty, SelectionDAG &DAG,
unsigned HiFlag, unsigned LoFlag,
SDValue Chain,
const MachinePointerInfo &PtrInfo) const {
SDLoc DL(N);
SDValue Hi = DAG.getNode(Cpu0ISD::Hi, DL, Ty,
getTargetNode(N, Ty, DAG, HiFlag));
Hi = DAG.getNode(ISD::ADD, DL, Ty, Hi, getGlobalReg(DAG, Ty));
SDValue Wrapper = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, Hi,
getTargetNode(N, Ty, DAG, LoFlag));
return DAG.getLoad(Ty, DL, Chain, Wrapper, PtrInfo);
}
//@getAddrGlobalLargeGOT }
//@getAddrNonPIC
// This method creates the following nodes, which are necessary for
// computing a symbol's address in non-PIC mode:
//
// (add %hi(sym), %lo(sym))
template<class NodeTy>
SDValue getAddrNonPIC(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
SDLoc DL(N);
SDValue Hi = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_HI);
SDValue Lo = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_LO);
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp
//@getTargetNode(GlobalAddressSDNode
SDValue Cpu0TargetLowering::getTargetNode(GlobalAddressSDNode *N, EVT Ty,
SelectionDAG &DAG,
unsigned Flag) const {
return DAG.getTargetGlobalAddress(N->getGlobal(), SDLoc(N), Ty, 0, Flag);
}
//@getTargetNode(ExternalSymbolSDNode
SDValue Cpu0TargetLowering::getTargetNode(ExternalSymbolSDNode *N, EVT Ty,
SelectionDAG &DAG,
unsigned Flag) const {
return DAG.getTargetExternalSymbol(N->getSymbol(), Ty, Flag);
}
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
}
return SDValue();
}
getTargetMachine().getObjFileLowering());
//@lga 1 {
EVT Ty = Op.getValueType();
GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
const GlobalValue *GV = N->getGlobal();
//@lga 1 }
if (!isPositionIndependent()) {
//@ %gp_rel relocation
if (TLOF->IsGlobalInSmallSection(GV, getTargetMachine())) {
SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
Cpu0II::MO_GPREL);
SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, DL,
DAG.getVTList(MVT::i32), GA);
SDValue GPReg = DAG.getRegister(Cpu0::GP, MVT::i32);
return DAG.getNode(ISD::ADD, DL, MVT::i32, GPReg, GPRelNode);
}
//@large section
if (!TLOF->IsGlobalInSmallSection(GV, getTargetMachine()))
return getAddrGlobalLargeGOT(
N, Ty, DAG, Cpu0II::MO_GOT_HI16, Cpu0II::MO_GOT_LO16,
DAG.getEntryNode(),
MachinePointerInfo::getGOT(DAG.getMachineFunction()));
return getAddrGlobal(
N, Ty, DAG, Cpu0II::MO_GOT, DAG.getEntryNode(),
MachinePointerInfo::getGOT(DAG.getMachineFunction()));
}
The setOperationAction(ISD::GlobalAddress, MVT::i32, Custom) tells llc that we implement global address oper-
ation in C++ function Cpu0TargetLowering::LowerOperation(). LLVM will call this function only when llvm want
to translate IR DAG of loading global variable into machine code. Although all the Custom type of IR operations set
by setOperationAction(ISD::XXX, MVT::XXX, Custom) in construction function Cpu0TargetLowering() will invoke
llvm to call Cpu0TargetLowering::LowerOperation() in stage “Legalized selection DAG”, the global address access
operation can be identified by checking whether the opcode of DAG Node is ISD::GlobalAddress or not, furthmore.
Finally, add the following code in Cpu0ISelDAGToDAG.cpp and Cpu0InstrInfo.td.
lbdex/chapters/Chapter6_1/Cpu0ISelDAGToDAG.h
SDNode *getGlobalBaseReg();
lbdex/chapters/Chapter6_1/Cpu0ISelDAGToDAG.cpp
//@static
if (TM.getRelocationModel() != Reloc::PIC_) {
if ((Addr.getOpcode() == ISD::TargetExternalSymbol ||
Addr.getOpcode() == ISD::TargetGlobalAddress))
return false;
}
...
}
...
}
lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td
// hi/lo relocs
let Predicates = [Ch6_1] in {
def : Pat<(Cpu0Hi tglobaladdr:$in), (LUi tglobaladdr:$in)>;
}
// gp_rel relocs
let Predicates = [Ch6_1] in {
def : Pat<(add CPURegs:$gp, (Cpu0GPRel tglobaladdr:$in)),
(ORi CPURegs:$gp, tglobaladdr:$in)>;
}
//@ wrapper_pic
let Predicates = [Ch6_1] in {
class WrapperPat<SDNode node, Instruction ORiOp, RegisterClass RC>:
Pat<(Cpu0Wrapper RC:$gp, node:$in),
(ORiOp RC:$gp, node:$in)>;
From Table: Cpu0 global variable options, option cpu0-use-small-section=false puts the global varibale in data/bss
while cpu0-use-small-section=true puts in sdata/sbss. The sdata stands for small data area. Section data and sdata are
areas for global variables with initial value (such as int gI = 100 in this example) while Section bss and sbss are areas
for global variables without initial value (for instance, int gI;).
The data/bss are 32 bits addressable areas since Cpu0 is a 32 bits architecture. Option cpu0-use-small-section=false
will generate the following instructions.
...
lui $2, %hi(gI)
ori $2, $2, %lo(gI)
ld $2, 0($2)
...
.type gStart,@object # @gStart
.data
.globl gStart
.align 2
gStart:
.4byte 2 # 0x2
.size gStart, 4
As above code, it loads the high address part of gI PC relative address (16 bits) to register $2 and shift 16 bits. Now,
the register $2 got it’s high part of gI absolute address. Next, it adds register $2 and low part of gI absolute address
into $2. At this point, it gets the gI memory address. Finally, it gets the gI content by instruction “ld $2, 0($2)”.
The llc -relocation-model=static is for absolute address mode which must be used in static link mode.
The dynamic link must be encoded with Position Independent Addressing. As you can see, the PC relative address
can be solved in static link ( The offset between the address of gI and instruction “lui $2, %hi(gI)” can be caculated).
Since Cpu0 uses PC relative address coding, this program can be loaded to any address and run correctly there. If this
program uses absolute address and can be loaded at a specific address known at link stage, the relocation record of gI
variable access instruction such as “lui $2, %hi(gI)” and “ori $2, $2, %lo(gI)” can be solved at link time. On the other
hand, if this program use absolute address and the loading address is known at load time, then this relocation record
will be solved by loader at load time.
IsGlobalInSmallSection() returns true or false depends on UseSmallSectionOpt.
The code fragment of lowerGlobalAddress() as the following corresponding option llc
-relocation-model=static -cpu0-use-small-section=false will translate DAG (Global-
Address<i32* @gI> 0) into (add Cpu0ISD::Hi<gI offset Hi16> Cpu0ISD::Lo<gI offset Lo16>) in stage “Legalized
selection DAG” as below.
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h
// This method creates the following nodes, which are necessary for
// computing a symbol's address in non-PIC mode:
//
// (add %hi(sym), %lo(sym))
template<class NodeTy>
SDValue getAddrNonPIC(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
SDLoc DL(N);
SDValue Hi = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_HI);
SDValue Lo = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_LO);
return DAG.getNode(ISD::ADD, DL, Ty,
DAG.getNode(Cpu0ISD::Hi, DL, Ty, Hi),
DAG.getNode(Cpu0ISD::Lo, DL, Ty, Lo));
}
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp
if (getTargetMachine().getRelocationModel() != Reloc::PIC_) {
...
// %hi/%lo relocation
return getAddrNonPIC(N, Ty, DAG);
}
...
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7ffd5902cc10: <multiple use>
0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=-3]
Finally, the pattern defined in Cpu0InstrInfo.td as the following will translate DAG (add Cpu0ISD::Hi<gI offset Hi16>
Cpu0ISD::Lo<gI offset Lo16>) into Cpu0 instructions as below.
lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td
// hi/lo relocs
let Predicates = [Ch6_1] in {
def : Pat<(Cpu0Hi tglobaladdr:$in), (LUi tglobaladdr:$in)>;
}
...
lui $2, %hi(gI)
ori $2, $2, %lo(gI)
...
As above, Pat<(...),(...)> include two lists of DAGs. The left is IR DAG and the right is machine instruction
DAG. “Pat<(Cpu0Hi tglobaladdr:$in), (LUi, tglobaladdr:$in)>;” will translate DAG (Cpu0ISD::Hi tglobaladdr)
into (lui (ori ZERO, tglobaladdr), 16). “Pat<(add CPURegs:$hi, (Cpu0Lo tglobaladdr:$lo)), (ORi CPURegs:$hi,
tglobaladdr:$lo)>;” will translate DAG (add Cpu0ISD::Hi, Cpu0ISD::Lo) into Cpu0 instruction (ori Cpu0ISD::Hi,
Cpu0ISD::Lo).
The sdata/sbss are 16 bits addressable areas which placed in ELF for fast access. Option cpu0-use-small-section=true
will generate the following instructions.
.globl gI
.align 2
gI:
.4byte 100 # 0x64
.size gI, 4
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp
if (!isPositionIndependent()) {
//@ %gp_rel relocation
if (TLOF->IsGlobalInSmallSection(GV, getTargetMachine())) {
SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
Cpu0II::MO_GPREL);
SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, DL,
DAG.getVTList(MVT::i32), GA);
SDValue GPReg = DAG.getRegister(Cpu0::GP, MVT::i32);
return DAG.getNode(ISD::ADD, DL, MVT::i32, GPReg, GPRelNode);
}
...
}
...
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7fc5f382cc10: <multiple use>
0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=-3]
Finally, the pattern defined in Cpu0InstrInfo.td as the following will translate DAG (add register %GP
Cpu0ISD::GPRel<gI offset>) into Cpu0 instruction as below.
lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td
// gp_rel relocs
let Predicates = [Ch6_1] in {
def : Pat<(add CPURegs:$gp, (Cpu0GPRel tglobaladdr:$in)),
(ORi CPURegs:$gp, tglobaladdr:$in)>;
}
“Pat<(add CPURegs:$gp, (Cpu0GPRel tglobaladdr:$in)), (ADD CPURegs:$gp, (ORi ZERO, tglobaladdr:$in))>;” will
translate (add register %GP Cpu0ISD::GPRel tglobaladdr) into (add $gp, (ori ZERO, tglobaladdr)).
In this mode, the $gp content is assigned at compile/link time, changed only at program be loaded, and is fixed during
the program running; on the contrary, when -relocation-model=pic the $gp can be changed during program running.
For this example code, if $gp is assigned to the start address of .sdata by loader when program ch6_1.cpu0.s is loaded,
then linker can caculate %gp_rel(gI) (= the relative address distance between gI and start of .sdata section). Which
meaning this relocation record can be solved at link time, that’s why it is static mode.
In this mode, we reserve $gp to a specfic fixed address of the program is loaded. As a result, the $gp cannot be
allocated as a general purpose for variables. The following code tells llvm never allocate $gp for variables.
lbdex/chapters/Chapter6_1/Cpu0Subtarget.cpp
// Set UseSmallSection.
UseSmallSection = UseSmallSectionOpt;
Cpu0ReserveGP = ReserveGPOpt;
Cpu0NoCpload = NoCploadOpt;
#ifdef ENABLE_GPRESTORE
if (!TM.isPositionIndependent() && !UseSmallSection && !Cpu0ReserveGP)
FixGlobalBaseReg = false;
else
#endif
FixGlobalBaseReg = true;
lbdex/chapters/Chapter6_1/Cpu0RegisterInfo.cpp
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
//@getReservedRegs body {
...
}
.data
.globl gStart
.align 2
gStart:
.4byte 2 # 0x2
.size gStart, 4
The following code fragment of Cpu0AsmPrinter.cpp will emit .cpload asm pseudo instruction at function entry point
as below.
lbdex/chapters/Chapter6_1/Cpu0MachineFunction.h
GlobalBaseReg(0),
...
};
lbdex/chapters/Chapter6_1/Cpu0MachineFunction.cpp
unsigned Cpu0FunctionInfo::getGlobalBaseReg() {
return GlobalBaseReg = Cpu0::GP;
}
lbdex/chapters/Chapter6_1/Cpu0AsmPrinter.cpp
} else if (EmitCPLoad) {
SmallVector<MCInst, 4> MCInsts;
MCInstLowering.LowerCPLOAD(MCInsts);
for (SmallVector<MCInst, 4>::iterator I = MCInsts.begin();
I != MCInsts.end(); ++I)
OutStreamer->EmitInstruction(*I, getSubtargetInfo());
...
.set noreorder
.cpload $6
.set nomacro
...
The .cpload is the assembly directive (macro) which will expand to several instructions. Issue .cpload before .set
nomacro since the .set nomacro option causes the assembler to print a warning message whenever an assembler
operation generates more than one machine language instruction, reference Mips ABI 2 .
Following code will exspand .cpload into machine instructions as below. “0fa00000 09aa0000 13aa6000” is the
.cpload machine instructions displayed in comments of Cpu0MCInstLower.cpp.
lbdex/chapters/Chapter6_1/Cpu0MCInstLower.h
2 http://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf
private:
MCOperand LowerSymbolOperand(const MachineOperand &MO,
MachineOperandType MOTy, unsigned Offset) const;
...
}
lbdex/chapters/Chapter6_1/Cpu0MCInstLower.cpp
MCInsts.resize(3);
Note: // Mips ABI: _gp_disp After calculating the gp, a function allocates the local stack space and saves the gp on
the stack, so it can be restored after subsequent function calls. In other words, the gp is a caller saved register.
...
_gp_disp represents the offset between the beginning of the function and the global offset table. Various optimizations
are possible in this code example and the others that follow. For example, the calculation of gp need not be done for a
position-independent function that is strictly local to an object module.
The _gp_disp as above is a relocation record, it means both the machine instructions 0da00000 (offset 0) and 0daa0000
(offset 8) which equal to assembly “ori $gp, $zero, %hi(_gp_disp)” and assembly “ori $gp, $gp, %lo(_gp_disp)”,
respectively, are relocated records depend on _gp_disp. The loader or OS can caculate _gp_disp by (x - start address
of .data) when load the dynamic function into memory x, and adjusts these two instructions offet correctly. Since
shared function is loaded when this function is called, the relocation record “ld $2, %got(gI)($gp)” cannot be resolved
in link time. In spite of the reloation record is solved on load time, the name binding is static, since linker deliver the
memory address to loader, and loader can solve this just by caculate the offset directly. The memory reference bind
with the offset of _gp_disp at link time. The ELF relocation records will be introduced in Chapter ELF Support. So,
don’t worry if you don’t quite understand it at this point.
The code fragment of lowerGlobalAddress() as the following corresponding option llc
-relocation-model=pic will translate DAG (GlobalAddress<i32* @gI> 0) into (load EntryToken,
(Cpu0ISD::Wrapper Register %GP, TargetGlobalAddress<i32* @gI> 0)) in stage “Legalized selection DAG” as
below.
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h
// This method creates the following nodes, which are necessary for
// computing a global symbol's address:
//
// (load (wrapper $gp, %got(sym)))
template<class NodeTy>
SDValue getAddrGlobal(NodeTy *N, EVT Ty, SelectionDAG &DAG,
unsigned Flag, SDValue Chain,
const MachinePointerInfo &PtrInfo) const {
SDLoc DL(N);
SDValue Tgt = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
getTargetNode(N, Ty, DAG, Flag));
return DAG.getLoad(Ty, DL, Chain, Tgt, PtrInfo);
}
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp
EVT Ty = Op.getValueType();
GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
const GlobalValue *GV = N->getGlobal();
if (TLOF->IsGlobalInSmallSection(GV, getTargetMachine())) {
SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
Cpu0II::MO_GPREL);
SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, DL,
DAG.getVTList(MVT::i32), GA);
SDValue GPReg = DAG.getRegister(Cpu0::GP, MVT::i32);
return DAG.getNode(ISD::ADD, DL, MVT::i32, GPReg, GPRelNode);
...
}
lbdex/chapters/Chapter6_1/Cpu0ISelDAGToDAG.cpp
...
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7fad7102cc10: <multiple use>
0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=-3]
Finally, the pattern Cpu0 instruction ld defined before in Cpu0InstrInfo.td will translate DAG (load EntryToken,
(Cpu0ISD::Wrapper Register %GP, TargetGlobalAddress<i32* @gI> 0)) into Cpu0 instruction as follows,
...
ld $2, %got(gI)($gp)
...
Remind in pic mode, Cpu0 uses ”.cpload” and “ld $2, %got(gI)($gp)” to access global variable as Mips. It takes
4 instructions in both Cpu0 and Mips. The cost came from we didn’t assume that register $gp is always assigned
to address .sdata and fixed there. Even we reserve $gp in this function, the $gp register can be changed at other
functions. In last sub-section, the $gp is assumed to preserved at any function. If $gp is fixed during the run time,
then ”.cpload” can be removed here and have only one instruction cost in global variable access. The advantage of
”.cpload” removing come from losing one general purpose register $gp which can be allocated for variables. In last
sub-section, .sdata mode, we use ”.cpload” removing since it is static link. In pic mode, the dynamic loading takes
too much time. Romove ”.cpload” with the cost of losing one general purpose register at all functions is not deserved
here. The relocation records of ”.cpload” from llc -relocation-model=pic can also be solved in link stage
if we want to link this function by static link.
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h
// This method creates the following nodes, which are necessary for
// computing a global symbol's address in large-GOT mode:
//
// (load (wrapper (add %hi(sym), $gp), %lo(sym)))
template<class NodeTy>
SDValue getAddrGlobalLargeGOT(NodeTy *N, EVT Ty, SelectionDAG &DAG,
unsigned HiFlag, unsigned LoFlag,
SDValue Chain,
const MachinePointerInfo &PtrInfo) const {
SDLoc DL(N);
SDValue Hi = DAG.getNode(Cpu0ISD::Hi, DL, Ty,
getTargetNode(N, Ty, DAG, HiFlag));
Hi = DAG.getNode(ISD::ADD, DL, Ty, Hi, getGlobalReg(DAG, Ty));
SDValue Wrapper = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, Hi,
getTargetNode(N, Ty, DAG, LoFlag));
return DAG.getLoad(Ty, DL, Chain, Wrapper, PtrInfo);
}
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp
EVT Ty = Op.getValueType();
GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
const GlobalValue *GV = N->getGlobal();
if (!TLOF->IsGlobalInSmallSection(GV, getTargetMachine()))
return getAddrGlobalLargeGOT(
N, Ty, DAG, Cpu0II::MO_GOT_HI16, Cpu0II::MO_GOT_LO16,
DAG.getEntryNode(),
MachinePointerInfo::getGOT(DAG.getMachineFunction()));
return getAddrGlobal(
N, Ty, DAG, Cpu0II::MO_GOT, DAG.getEntryNode(),
MachinePointerInfo::getGOT(DAG.getMachineFunction()));
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:'
SelectionDAG has 10 nodes:
...
0x7fb77a02cd10: ch = store 0x7fb779c10a08, 0x7fb77a02ca10, 0x7fb77a02cb10,
0x7fb77a02cc10<ST4[%c]> [ORD=1] [ID=-3]
Finally, the pattern Cpu0 instruction ld defined before in Cpu0InstrInfo.td will translate DAG (load EntryToken,
(Cpu0ISD::Wrapper (add Cpu0ISD::Hi<gI offset Hi16>, Register %GP), Cpu0ISD::Lo<gI offset Lo16>)) into Cpu0
instructions as below.
...
ori $2, $zero, %got_hi(gI)
shl $2, $2, 16
add $2, $2, $gp
ld $2, %got_lo(gI)($2)
...
The following code in Cpu0InstrInfo.td is needed for example input ch8_2_select_global_pic.cpp. Since
ch8_2_select_global_pic.cpp uses llvm IR select, it cannot be run at this point. It will be run in later Chapter Control
flow statements.
lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td
lbdex/input/ch8_2_select_global_pic.cpp
volatile int a1 = 1;
volatile int b1 = 2;
int test_select_global_pic()
{
if (a1 < b1)
return gI1;
else
return gJ1;
}
Above code is for global address DAG translation. Next, add the following code to Cpu0MCInstLower.cpp and
Cpu0ISelLowering.cpp for global variable printing operand function.
lbdex/chapters/Chapter6_1/Cpu0MCInstLower.cpp
switch(MO.getTargetFlags()) {
default: llvm_unreachable("Invalid target flag!");
case Cpu0II::MO_NO_FLAG:
break;
case Cpu0II::MO_GOT:
TargetKind = Cpu0MCExpr::CEK_GOT;
break;
// ABS_HI and ABS_LO is for llc -march=cpu0 -relocation-model=static (global
// var in .data).
case Cpu0II::MO_ABS_HI:
TargetKind = Cpu0MCExpr::CEK_ABS_HI;
break;
case Cpu0II::MO_ABS_LO:
TargetKind = Cpu0MCExpr::CEK_ABS_LO;
break;
case Cpu0II::MO_GOT_HI16:
TargetKind = Cpu0MCExpr::CEK_GOT_HI16;
break;
case Cpu0II::MO_GOT_LO16:
TargetKind = Cpu0MCExpr::CEK_GOT_LO16;
break;
}
switch (MOTy) {
case MachineOperand::MO_GlobalAddress:
Symbol = AsmPrinter.getSymbol(MO.getGlobal());
Offset += MO.getOffset();
break;
default:
llvm_unreachable("<unknown operand type>");
}
if (Offset) {
// Assume offset is never negative.
assert(Offset > 0);
Expr = MCBinaryExpr::createAdd(Expr, MCConstantExpr::create(Offset, *Ctx),
*Ctx);
}
if (TargetKind != Cpu0MCExpr::CEK_None)
Expr = Cpu0MCExpr::create(TargetKind, Expr, *Ctx);
return MCOperand::createExpr(Expr);
switch (MOTy) {
case MachineOperand::MO_GlobalAddress:
//@1
return LowerSymbolOperand(MO, MOTy, offset);
...
}
...
}
The Cpu0MCExpr::printImpl() of Cpu0InstPrinter.cpp in last chapter is for global variable printing operand function
too.
The following function is for llc -debug this chapter DAG node name printing. It is added at Chapter3_1 already.
lbdex/chapters/Chapter3_1/Cpu0ISelLowering.cpp
6.5 Summary
The global variable Instruction Selection for DAG translation is not like the ordinary IR node translation, it has static
(absolute address) and pic mode. Backend deals this translation by create DAG nodes in function lowerGlobalAd-
dress() which called by LowerOperation(). Function LowerOperation() takes care all Custom type of operation. Back-
end set global address as Custom operation by ”setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);” in
Cpu0TargetLowering() constructor. Different address mode create their own DAG list at run time. By setting the pat-
tern Pat<> in Cpu0InstrInfo.td, the llvm can apply the compiler mechanism, pattern match, in the Instruction Selection
stage.
There are three types for setXXXAction(), Promote, Expand and Custom. Except Custom, the other two maybe no
need to coding. Here 3 is the references.
As shown in this chapter, the global variable can be laid in .sdata/.sbss by option -cpu0-use-small-section=true. It is
possible that the variables of small data section (16 bits addressable) are full out at link stage. When that happens,
linker will highlights that error and forces the toolchain users to fix it. As the result, the toolchain user need to
reconsider which global variables should be moved from .sdata/.sbss to .data/.bss by set option -cpu0-use-small-
section=false in Makefile as follows,
Makefile
The rule for global variables allocation is “set the small and frequent variables in small 16 addressable area”.
3 http://llvm.org/docs/WritingAnLLVMBackend.html#the-selectiondag-legalize-phase
SEVEN
Until now, we only handle both int and long type of 32 bits size. This chapter introduce other types, such as pointer
and those are not 32-bit size which inlcude bool, char, short int and long long.
To support pointer to local variable, add this code fragment in Cpu0InstrInfo.td and Cpu0InstPrinter.cpp as follows,
lbdex/chapters/Chapter7_1/Cpu0InstrInfo.td
273
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
lbdex/chapters/Chapter3_2/InstPrinter/Cpu0InstPrinter.h
lbdex/chapters/Chapter3_2/InstPrinter/Cpu0InstPrinter.cpp
As comment in Cpu0InstPrinter.cpp, the printMemOperandEA is added at early chapter 3_2 since the DAG data node,
mem_ea of Cpu0InstrInfo.td, cannot be disabled by ch7_1_localpointer, only opcode node can be disabled. Run
ch7_1_localpointer.cpp with code Chapter7_1/ which support pointer to local variable, will get result as follows,
lbdex/input/ch7_1_localpointer.cpp
int test_local_pointer()
{
int b = 3;
int* p = &b;
return *p;
}
To support signed/unsigned type of char and short int, adding the following code to Chapter7_1/.
lbdex/chapters/Chapter7_1/Cpu0InstrInfo.td
lbdex/input/ch7_1_char_in_struct.cpp
struct Date
{
short year;
char month;
char day;
char hour;
char minute;
char second;
};
int test_char()
{
unsigned char a = b[1];
char c = (char)b[1];
return 0;
}
.type b,@object # @b
.data
.globl b
b:
.asciz "abc"
.size b, 4
lbdex/input/ch7_1_char_short.cpp
int test_signed_char()
{
char a = 0x80;
int i = (signed int)a;
i = i + 2; // i = (-128+2) = -126
return i;
}
int test_unsigned_char()
{
unsigned char c = 0x80;
unsigned int ui = (unsigned int)c;
ui = ui + 2; // i = (128+2) = 130
return (int)ui;
}
int test_signed_short()
{
short a = 0x8000;
int i = (signed int)a;
i = i + 2; // i = (-32768+2) = -32766
return i;
}
int test_unsigned_short()
{
unsigned short c = 0x8000;
unsigned int ui = (unsigned int)c;
ui = ui + 2; // i = (32768+2) = 32770
c = (unsigned short)ui;
return (int)ui;
}
attributes #0 = { nounwind }
...
.globl _Z16test_signed_charv
...
lb $2, 4($sp)
...
.end _Z16test_signed_charv
.globl _Z18test_unsigned_charv
...
lbu $2, 4($sp)
...
.end _Z18test_unsigned_charv
.globl _Z17test_signed_shortv
...
lh $2, 4($sp)
...
.end _Z17test_signed_shortv
.globl _Z19test_unsigned_shortv
...
lhu $2, 4($sp)
...
.end _Z19test_unsigned_shortv
...
As you can see lb/lh are for signed byte/short type while lbu/lhu are for unsigned byte/short type. To support C type-
cast or type-conversion feature efficiently, Cpu0 provide instruction “lb” to converse type char to int with one single
instruction. The other instructions lbu, lh, lhu, sb and sh are applied in both signed or unsigned of type byte and short
conversion. Their differences have been explained in Chapter 2.
To support load bool type, the following code added.
lbdex/chapters/Chapter7_1/Cpu0ISelLowering.cpp
...
}
The setBooleanContents() purpose as following, but I don’t know it well. Without it, the ch7_1_bool2.ll still works as
below. The IR input file ch7_1_bool2.ll is used in testing here since the c++ version need flow control which is not
supported at this point. File ch_run_backend.cpp include the test fragment for bool as below.
include/llvm/Target/TargetLowering.h
lbdex/input/ch7_1_bool2.ll
.section .mdebug.abi32
.previous
.file "ch7_1_bool2.ll"
.text
.globl verify_load_bool
.align 2
.type verify_load_bool,@function
.ent verify_load_bool # @verify_load_bool
verify_load_bool:
.cfi_startproc
.frame $sp,8,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0: # %entry
addiu $sp, $sp, -8
$tmp1:
.cfi_def_cfa_offset 8
The ch7_1_bool.cpp is the bool test version for C language. You can run with it at Chapter8_1 to get the similar result
with ch7_1_bool2.ll.
lbdex/input/ch7_1_bool.cpp
bool test_load_bool()
{
int a = 1;
if (a < 0)
return false;
return true;
}
Table 7.1: The C, IR, and DAG translation for char, short and bool translation (ch7_1_char_short.cpp and ch7_1_bool2.ll).
C .bc Optimized legalized selection
DAG
char a =0x80; %1 = load i8* %a, align 1 •
int i = (signed int)a; %2 = sext i8 %1 to i32 load ..., <..., sext from i8>
unsigned char c = 0x80; %1 = load i8* %c, align 1 •
unsigned int ui = (unsigned int)c; %2 = zext i8 %1 to i32 load ..., <..., zext from i8>
short a =0x8000; %1 = load i16* %a, align 2 •
int i = (signed int)a; %2 = sext i16 %1 to i32 load ..., <..., sext from i16>
unsigned short c = 0x8000; %1 = load i16* %c, align 2 •
unsigned int ui = (unsigned int)c; %2 = zext i16 %1 to i32 load ..., <..., zext from i16>
c = (unsigned short)ui; %6 = trunc i32 %5 to i16 •
• store i16 %6, i16* %c, align 2 store ...,<..., trunc to i16>
return true; store i1 1, i1* %retval, align 1 store ...,<..., trunc to i8>
Table 7.2: The backend translation for char, short and bool translation (ch7_1_char_short.cpp and
ch7_1_bool2.ll).
Optimized legalized selection DAG Cpu0 pattern in Cpu0InstrInfo.td
load ..., <..., sext from i8> lb LB : LoadM32<0x03, “lb”, sextloadi8>;
load ..., <..., zext from i8> lbu LBu : LoadM32<0x04, “lbu”, zextloadi8>;
load ..., <..., sext from i16> lh LH : LoadM32<0x06, “lh”, sextloadi16_a>;
load ..., <..., zext from i16> lhu LHu : LoadM32<0x07, “lhu”, zextloadi16_a>;
store ...,<..., trunc to i16> sh SH : StoreM32<0x08, “sh”, truncstorei16_a>;
store ...,<..., trunc to i8> sb SB : StoreM32<0x05, “sb”, truncstorei8>;
Like Mips, the type long of Cpu0 is 32-bit and type long long is 64-bit for C language. To support type long long, we
add the following code to Chapter7_1/.
lbdex/chapters/Chapter7_1/Cpu0SEISelDAGToDAG.cpp
SDNode *Carry;
if (Subtarget->hasCpu032II())
Carry = CurDAG->getMachineNode(Cpu0::SLTu, DL, VT, Ops);
else {
SDNode *StatusWord = CurDAG->getMachineNode(Cpu0::CMP, DL, VT, Ops);
SDValue Constant1 = CurDAG->getTargetConstant(1, DL, VT);
Carry = CurDAG->getMachineNode(Cpu0::ANDi, DL, VT,
SDValue(StatusWord,0), Constant1);
}
SDNode *AddCarry = CurDAG->getMachineNode(Cpu0::ADDu, DL, VT,
SDValue(Carry,0), RHS);
///
// Instruction Selection not handled by the auto-generated
// tablegen selection should be handled here.
///
///
// Instruction Selection not handled by the auto-generated
// tablegen selection should be handled here.
///
EVT NodeTy = Node->getValueType(0);
unsigned MultOpc;
switch(Opcode) {
default: break;
case ISD::SUBE: {
SDValue InFlag = Node->getOperand(2);
selectAddESubE(Cpu0::SUBu, InFlag, InFlag.getOperand(0), DL, Node);
return true;
}
case ISD::ADDE: {
SDValue InFlag = Node->getOperand(2);
selectAddESubE(Cpu0::ADDu, InFlag, InFlag.getValue(0), DL, Node);
return true;
}
if (!SDValue(Node, 0).use_empty())
ReplaceUses(SDValue(Node, 0), SDValue(LoHi.first, 0));
if (!SDValue(Node, 1).use_empty())
ReplaceUses(SDValue(Node, 1), SDValue(LoHi.second, 0));
CurDAG->RemoveDeadNode(Node);
return true;
}
...
}
lbdex/chapters/Chapter7_1/Cpu0ISelLowering.h
...
}
lbdex/chapters/Chapter7_1/Cpu0ISelLowering.cpp
...
}
The added code in Cpu0ISelLowering.cpp are for shift operations which support type long long 64-bit. When applying
operators << and >> in 64-bit variables will create DAG SHL_PARTS, SRA_PARTS and SRL_PARTS those which
take care the 32 bits operands during llvm DAGs translation. File ch9_7.cpp of 64-bit shift operations cannot be run
at this point. It will be verified on later chapter “Function call”.
Run Chapter7_1 with ch7_1_longlong.cpp to get the result as follows,
lbdex/input/ch7_1_longlong.cpp
st $2, 48($fp)
lui $2, 768
ori $2, $2, 4096
st $2, 44($fp)
lui $2, 512
ori $2, $2, 4096
st $2, 40($fp)
ld $2, 52($fp)
ld $3, 60($fp)
addu $3, $3, $2
ld $4, 56($fp)
ld $5, 48($fp)
st $3, 36($fp)
cmp $sw, $3, $2
andi $2, $sw, 1
addu $2, $2, $5
addu $2, $4, $2
st $2, 32($fp)
ld $2, 52($fp)
ld $3, 60($fp)
subu $4, $3, $2
ld $5, 56($fp)
ld $t9, 48($fp)
st $4, 28($fp)
cmp $sw, $3, $2
andi $2, $sw, 1
addu $2, $2, $t9
subu $2, $5, $2
st $2, 24($fp)
ld $2, 52($fp)
ld $3, 60($fp)
multu $3, $2
mflo $4
mfhi $5
ld $t9, 56($fp)
ld $7, 48($fp)
st $4, 20($fp)
mul $3, $3, $7
addu $3, $5, $3
mul $2, $t9, $2
addu $2, $3, $2
st $2, 16($fp)
ld $2, 40($fp)
ld $3, 44($fp)
mult $3, $2
mflo $2
mfhi $4
st $2, 12($fp)
st $4, 8($fp)
ld $5, 28($fp)
ld $3, 36($fp)
addu $t9, $3, $5
ld $7, 20($fp)
addu $8, $t9, $7
addu $3, $8, $2
cmp $sw, $3, $2
andi $2, $sw, 1
addu $2, $2, $4
Cpu0 only has integer instructions at this point. For float operations, Cpu0 backend will call the library function to
translate integer to float. This float (or double) function call for Cpu0 will be supported after the chapter of function
call. For hardware cost reason, many CPU have no hardware float instructions. They call library function to finish
float operations. Mips sperarate float operations with a sperarate co-processor for those needing “float intended”
application.
In order to support float point library (part of compiler-rt) 2 , the following code are added to support instructions clz
and clo.
lbdex/chapters/Chapter7_1/Cpu0InstrInfo.td
2 http://jonathan2251.github.io/lbt/lib.html#compiler-rt
lbdex/input/ch7_1_globalstructoffset.cpp
struct Date
{
int year;
int month;
int day;
};
int test_struct()
{
int day = date.day;
int i = a[1];
// ch7_1_globalstructoffset.ll
; ModuleID = 'ch7_1_globalstructoffset.bc'
...
%struct.Date = type { i32, i32, i32 }
1 http://llvm.org/docs/LangRef.html#getelementptr-instruction
ret i32 %5
}
Run Chapter6_1/ with ch7_1_globalstructoffset.bc on static mode will get the incorrect asm file as follows,
1-160-134-62:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/bin/
Debug/llc -march=cpu0 -relocation-model=static -filetype=asm
ch7_1_globalstructoffset.bc -o -
...
lui $2, %hi(date)
ori $2, $2, %lo(date)
ld $2, 0($2) // the correct one is ld $2, 8($2)
...
For “day = date.day”, the correct one is “ld $2, 8($2)”, not “ld $2, 0($2)”, since date.day is offset 8(date) ( Type int
is 4 bytes in Cpu0, and the date.day has fields year and month before it). Let’s use debug option in llc to see what’s
wrong,
jonathantekiimac:input Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc -march=cpu0 -debug -relocation-model=static
-filetype=asm ch6_2.bc -o ch6_2.cpu0.static.s
...
=== main
Initial selection DAG: BB#0 'main:entry'
SelectionDAG has 20 nodes:
0x7f7f5b02d210: i32 = undef [ORD=1]
...
...
Through llc -debug , you can see the DAG translation process. As above, the DAG list for date.day (add Global-
Address<[3 x i32]* @a> 0, Constant<8>) with 3 nodes is replaced by 1 node GlobalAddress<%struct.Date* @date> +
8. The DAG list for a[1] is same. The replacement occurs since TargetLowering.cpp::isOffsetFoldingLegal(...) return
true in llc -static static addressing mode as below. In Cpu0 the ld instruction format is “ld $r1, offset($r2)”
which meaning load $r2 address+offset to $r1. So, we just replace the isOffsetFoldingLegal(...) function by override
mechanism as below.
lib/CodeGen/SelectionDAG/TargetLowering.cpp
bool
TargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
// Assume that everything is safe in static mode.
if (getTargetMachine().getRelocationModel() == Reloc::Static)
return true;
lbdex/chapters/Chapter7_1/Cpu0ISelLowering.cpp
bool
Cpu0TargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
// The Cpu0 target isn't yet aware of offsets.
return false;
}
lbdex/chapters/Chapter7_1/Cpu0ISelDAGToDAG.cpp
else
Base = Addr.getOperand(0);
...
}
Recall we have translated DAG list for date.day (add GlobalAddress<[3 x i32]* @a> 0, Constant<8>) into (add (add
Cpu0ISD::Hi (Cpu0II::MO_ABS_HI), Cpu0ISD::Lo(Cpu0II::MO_ABS_LO)), Constant<8>) by the following code
in Cpu0ISelLowering.cpp.
lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h
// This method creates the following nodes, which are necessary for
// computing a symbol's address in non-PIC mode:
//
// (add %hi(sym), %lo(sym))
template<class NodeTy>
SDValue getAddrNonPIC(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
SDLoc DL(N);
SDValue Hi = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_HI);
SDValue Lo = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_LO);
return DAG.getNode(ISD::ADD, DL, Ty,
DAG.getNode(Cpu0ISD::Hi, DL, Ty, Hi),
DAG.getNode(Cpu0ISD::Lo, DL, Ty, Lo));
}
So, when the SelectAddr(...) of Cpu0ISelDAGToDAG.cpp is called. The Addr SDValue in SelectAddr(..., Addr, ...) is
DAG list for date.day (add (add Cpu0ISD::Hi (Cpu0II::MO_ABS_HI), Cpu0ISD::Lo(Cpu0II::MO_ABS_LO)), Con-
stant<8>). Since Addr.getOpcode() = ISD:ADD, Addr.getOperand(0) = (add Cpu0ISD::Hi (Cpu0II::MO_ABS_HI),
Cpu0ISD::Lo(Cpu0II::MO_ABS_LO)) and Addr.getOperand(1).getOpcode() = ISD::Constant, the Base = SDValue
(add Cpu0ISD::Hi (Cpu0II::MO_ABS_HI), Cpu0ISD::Lo(Cpu0II::MO_ABS_LO)) and Offset = Constant<8>. After
set Base and Offset, the load DAG will translate the global address date.day into machine instruction “ld $r1, 8($r2)”
in Instruction Selection stage.
Chapter7_1/ include these changes as above, you can run it with ch7_1_globalstructoffset.cpp to get the correct gen-
erated instruction “ld $r1, 8($r2)” for date.day access, as follows.
...
lui $2, %hi(date)
ori $2, $2, %lo(date)
ld $2, 8($2) // correct
...
The ch7_1_localarrayinit.cpp is for local variable initialization test. The result as follows,
lbdex/input/ch7_1_localarrayinit.cpp
int main()
{
int a[3]={0, 1, 2};
return 0;
}
Vector types are used when multiple primitive data are operated in parallel using a single instruction (SIMD) 3 . Mips
supports the following llvm IRs “icmp slt” and “sext” for vector type, Cpu0 supports them either.
lbdex/input/ch7_1_vector.cpp
int test_cmplt_short() {
volatile vector8short a0 = {0, 1, 2, 3};
volatile vector8short b0 = {2, 2, 2, 2};
volatile vector8short c0;
c0 = a0 < b0; // c0[0] = -2147483647=0x80000001, c0[1] = -2147483647=0x80000001,
˓→c0[2] = 0, c0[3] = 0
return (int)(c0[0]+c0[1]+c0[2]+c0[3]); // 2
}
int test_cmplt_long() {
volatile vector8long a0 = {2, 2, 2, 2, 1, 1, 1, 1};
volatile vector8long b0 = {1, 1, 1, 1, 2, 2, 2, 2};
volatile vector8long c0;
c0 = a0 < b0; // c0[0..3] = {0, 0, ...}, c0[4..7] = {-2147483647=0x80000001, ...}
3 http://llvm.org/docs/LangRef.html#vector-type
st $3, 8($sp)
shl $3, $5, 31
sra $3, $3, 31
st $3, 4($sp)
slt $2, $2, $t9
shl $2, $2, 31
sra $2, $2, 31
st $2, 0($sp)
ld $2, 12($sp)
ld $2, 8($sp)
ld $2, 4($sp)
ld $2, 0($sp)
ld $3, 4($sp)
addu $2, $2, $3
ld $3, 12($sp)
ld $3, 8($sp)
ld $3, 0($sp)
ld $3, 8($sp)
addu $2, $2, $3
ld $3, 12($sp)
ld $3, 4($sp)
ld $3, 0($sp)
ld $3, 12($sp)
addu $2, $2, $3
ld $3, 8($sp)
ld $3, 4($sp)
ld $3, 0($sp)
addiu $sp, $sp, 48
ret $lr
.set macro
.set reorder
.end _Z16test_cmplt_shortv
$func_end0:
.size _Z16test_cmplt_shortv, ($func_end0)-_Z16test_cmplt_shortv
lbdex/chapters/Chapter7_1/Cpu0ISelLowering.h
lbdex/chapters/Chapter7_1/Cpu0ISelLowering.cpp
return VT.changeVectorElementTypeToInteger();
}
EIGHT
This chapter illustrates the corresponding IR for control flow statements, such as “if else”, “while” and “for” loop
statements in C, and how to translate these control flow statements of llvm IR into Cpu0 instructions in section I.
In section “Cpu0 backend Optimization: Remove useless JMP”, an optimization pass of control flow for backend is
introduced. It’s a simple tutorial program to let readers know how to add a backend optimization pass and program it.
Section “Conditional instruction”, include the conditional instructions handling since clang will generate specific IRs,
select and select_cc, to support the backend optimiation in control flow statement.
lbdex/input/ch8_1_1.cpp
int test_ifctrl()
{
unsigned int a = 0;
if (a == 0) {
a++; // a = 1
}
return a;
}
297
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
...
%0 = load i32* %a, align 4
%cmp = icmp eq i32 %0, 0
br i1 %cmp, label %if.then, label %if.end
; <label>:3: ; preds = %0
%1 = load i32* %a, align 4
%inc = add i32 %1, 1
store i32 %inc, i32* %a, align 4
br label %if.end
...
The “icmp ne” stands for integer compare NotEqual, “slt” stands for Set Less Than, “sle” stands for Set Less or
Equal. Run version Chapter8_1/ with llc -view-isel-dags or -debug option, you can see the if statement
is translated into (br (brcond (%1, setcc(%2, Constant<c>, setne)), BasicBlock_02), BasicBlock_01). Ignore %1, then
we will get the form (br (brcond (setcc(%2, Constant<c>, setne)), BasicBlock_02), BasicBlock_01). For explanation,
listing the IR DAG as follows,
For the last IR br, we translate unconditional branch (br BasicBlock_01) into jmp BasicBlock_01 by the following
pattern definition,
lbdex/chapters/Chapter8_1/Cpu0InstrInfo.td
...
def JMP : UncondBranch<0x26, "jmp">;
The pattern [(br bb:$imm24)] in class UncondBranch is translated into jmp machine instruction. The translation for
the pair Cpu0 instructions, cmp and jne, is not happened before this chapter. To solve this chained IR to machine
lbdex/chapters/Chapter8_1/Cpu0InstrInfo.td
// brcond patterns
multiclass BrcondPatsCmp<RegisterClass RC, Instruction JEQOp, Instruction JNEOp,
Instruction JLTOp, Instruction JGTOp, Instruction JLEOp, Instruction JGEOp,
Instruction CMPOp> {
...
def : Pat<(brcond (i32 (setne RC:$lhs, RC:$rhs)), bb:$dst),
(JNEOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
...
def : Pat<(brcond RC:$cond, bb:$dst),
(JNEOp (CMPOp RC:$cond, ZEROReg), bb:$dst)>;
...
}
Since the aboved BrcondPats pattern uses RC (Register Class) as operand, the following ADDiu pattern defined in
Chapter2 will generate instruction addiu before the instruction cmp for the first IR, setcc(%2, Constant<c>, setne),
as above.
lbdex/chapters/Chapter2/Cpu0InstrInfo.td
// Small immediates
def : Pat<(i32 immSExt16:$in),
(ADDiu ZERO, imm:$in)>;
The definition of BrcondPats supports setne, seteq, setlt, ..., register operand compare and setult, setugt, ..., for unsigned
int type. In addition to seteq and setne, we define setueq and setune by refering Mips code, even though we don’t find
how to generate setune IR from C language. We have tried to define unsigned int type, but clang still generates setne
instead of setune. The order of Pattern Search is from the order of their appearing in context. The last pattern (brcond
RC:$cond, bb:$dst) means branch to $dst if $cond != 0. So we set the corresponding translation to (JNEOp (CMPOp
RC:$cond, ZEROReg), bb:$dst).
The CMP instruction will set the result to register SW, and then JNE check the condition based on SW status as Fig.
8.1. Since SW belongs to a different register class, it will be correct even an instruction is inserted between CMP and
JNE as follows,
cmp %2, %3
addiu $r1, $r2, 3 // $r1 register never be allocated to $SW because in
// class ArithLogicI, GPROut is the output register
// class and the GPROut is defined without $SW in
// Cpu0RegisterInforGPROutForOther.td
jne BasicBlock_02
The reserved registers setting by the following function code we defined before,
lbdex/chapters/Chapter3_1/Cpu0RegisterInfo.cpp
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
//@getReservedRegs body {
static const uint16_t ReservedCPURegs[] = {
return Reserved;
}
Although the following definition in Cpu0RegisterInfo.td has no real effect in Reserved Registers, it’s better to com-
ment the Reserved Registers in it for readability. Setting SW in both register classes CPURegs and SR to allow
access SW by RISC instructions like andi , and allow programmers use traditional assembly instruction cmp . The
copyPhysReg() is called when both DestReg and SrcReg are belonging to different Register Classes.
lbdex/chapters/Chapter2/Cpu0RegisterInfo.td
//===----------------------------------------------------------------------===//
lbdex/chapters/Chapter2/Cpu0RegisterInfoGPROutForOther.td
//===----------------------------------------------------------------------===//
// Register Classes
//===----------------------------------------------------------------------===//
Chapter8_1/ include support for control flow statement. Run with it as well as the following llc option, you will get
the obj file. Dump it’s content by gobjdump or hexdump after as follows,
The immediate value of jne (op 0x31) is 16; The offset between jne and $BB0_2 is 20 (5 words = 5*4 bytes). Suppose
the jne address is X, then the label $BB0_2 is X+20. Cpu0’s instruction set is designed as a RISC CPU with 5 stages
of pipeline just like 5 stages of Mips. Cpu0 do branch instruction execution at decode stage which like mips too.
After the jne instruction fetched, the PC (Program Counter) is X+4 since cpu0 update PC at fetch stage. The $BB0_2
address is equal to PC+16 for the jne branch instruction execute at decode stage. List and explain this again as follows,
If Cpu0 do “jne” in execution stage, then we should set PC=PC+12, offset of ($BB0_2, jne $BB02) – 8, in this
example.
In reality, the conditional branch is important in performance of CPU design. According bench mark information,
every 7 instructions will meet 1 branch instruction in average. The cpu032I spends 2 instructions in conditional
branch, (jne(cmp...)), while cpu032II use one instruction (bne) as follws,
Beside brcond explained in this section, above code also include DAG opcode br_jt and label JumpTable which
occurs during DAG translation for some kind of program.
The ch8_1_ctrl.cpp include “nest if” “for loop”, “while loop”, “continue”, “break” and “goto”. The
ch8_1_br_jt.cpp is for br_jt and JumpTable test. The ch8_1_blockaddr.cpp is for blockaddress and indirectbr
test. You can run with them if you like to test more.
List the control flow statements of C, IR, DAG and Cpu0 instructions as the following table.
Table 8.1: Control flow statements of C, IR, DAG and Cpu0 instructions
C if, else, for, while, goto, switch, break
IR (icmp + (eq, ne, sgt, sge, slt, sle)0 + br
DAG (seteq, setne, setgt, setge, setlt, setle) + brcond,
• (setueq, setune, setugt, setuge, setult, setule) + brcond
As last section, cpu032II uses beq and bne to improve performance but the jump offset reduces from 24 bits to 16 bits.
If program exists more than 16 bits, cpu032II will fail to generate code. Mips backend has solution and Cpu0 hire the
solution from it.
To support long branch the following code added in Chapter8_1.
lbdex/chapters/Chapter8_2/CMakeLists.txt
Cpu0LongBranch.cpp
lbdex/chapters/Chapter8_2/Cpu0.h
lbdex/chapters/Chapter8_2/MCTargetDesc/Cpu0MCCodeEmitter.cpp
unsigned Cpu0MCCodeEmitter::
getJumpTargetOpValue(const MCInst &MI, unsigned OpNo,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
...
}
lbdex/chapters/Chapter8_2/Cpu0AsmPrinter.h
lbdex/chapters/Chapter8_2/Cpu0AsmPrinter.cpp
...
}
lbdex/chapters/Chapter8_2/Cpu0InstrInfo.h
lbdex/chapters/Chapter8_2/Cpu0InstrInfo.td
lbdex/chapters/Chapter8_2/Cpu0LongBranch.cpp
#include "Cpu0.h"
#include "MCTargetDesc/Cpu0BaseInfo.h"
#include "Cpu0MachineFunction.h"
#include "Cpu0TargetMachine.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetRegisterInfo.h"
namespace {
typedef MachineBasicBlock::iterator Iter;
typedef MachineBasicBlock::reverse_iterator ReverseIter;
struct MBBInfo {
uint64_t Size, Address;
bool HasLongBranch;
MachineInstr *Br;
public:
static char ID;
Cpu0LongBranch(TargetMachine &tm)
: MachineFunctionPass(ID), TM(tm), IsPIC(TM.isPositionIndependent()),
ABI(static_cast<const Cpu0TargetMachine &>(TM).getABI()) {}
private:
void splitMBB(MachineBasicBlock *MBB);
void initMBBInfo();
int64_t computeOffset(const MachineInstr *Br);
void replaceBranch(MachineBasicBlock &MBB, Iter Br, const DebugLoc &DL,
MachineBasicBlock *MBBOpnd);
void expandToLongBranch(MBBInfo &Info);
};
char Cpu0LongBranch::ID = 0;
} // end of anonymous namespace
/// Iterate over list of Br's operands and search for a MachineBasicBlock
/// operand.
static MachineBasicBlock *getTargetMBB(const MachineInstr &Br) {
for (unsigned I = 0, E = Br.getDesc().getNumOperands(); I < E; ++I) {
const MachineOperand &MO = Br.getOperand(I);
if (MO.isMBB())
return MO.getMBB();
}
return E;
}
// Create a new MBB. Move instructions in MBB to the newly created MBB.
MachineBasicBlock *NewMBB =
MF->CreateMachineBasicBlock(MBB->getBasicBlock());
// Fill MBBInfos.
void Cpu0LongBranch::initMBBInfo() {
// Split the MBBs if they have two branches. Each basic block should have at
// most one branch after this loop is executed.
for (auto &MBB : *MF)
splitMBB(&MBB);
MF->RenumberBlocks();
MBBInfos.clear();
MBBInfos.resize(MF->size());
return Offset + 4;
}
Offset += MBBInfos[N].Size;
return -Offset + 4;
}
// Replace Br with a branch which has the opposite condition code and a
// MachineBasicBlock operand MBBOpnd.
void Cpu0LongBranch::replaceBranch(MachineBasicBlock &MBB, Iter Br,
const DebugLoc &DL,
MachineBasicBlock *MBBOpnd) {
const Cpu0InstrInfo *TII = static_cast<const Cpu0InstrInfo *>(
MBB.getParent()->getSubtarget().getInstrInfo());
unsigned NewOpc = TII->getOppositeBranchOpc(Br->getOpcode());
const MCInstrDesc &NewDesc = TII->get(NewOpc);
if (!MO.isReg()) {
assert(MO.isMBB() && "MBB operand expected.");
break;
}
MIB.addReg(MO.getReg());
}
MIB.addMBB(MBBOpnd);
if (Br->hasDelaySlot()) {
// Bundle the instruction in the delay slot to the newly created branch
// and erase the original branch.
assert(Br->isBundledWithSucc());
MachineBasicBlock::instr_iterator II = Br.getInstrIterator();
MIBundleBuilder(&*MIB).append((++II)->removeFromBundle());
}
Br->eraseFromParent();
}
MF->insert(FallThroughMBB, LongBrMBB);
MBB->replaceSuccessor(TgtMBB, LongBrMBB);
if (IsPIC) {
MachineBasicBlock *BalTgtMBB = MF->CreateMachineBasicBlock(BB);
MF->insert(FallThroughMBB, BalTgtMBB);
LongBrMBB->addSuccessor(BalTgtMBB);
BalTgtMBB->addSuccessor(TgtMBB);
// $longbr:
// addiu $sp, $sp, -8
// st $lr, 0($sp)
// lui $at, %hi($tgt - $baltgt)
// addiu $lr, $lr, %lo($tgt - $baltgt)
// bal $baltgt
// nop
// $baltgt:
// addu $at, $lr, $at
// addiu $sp, $sp, 8
// ld $lr, 0($sp)
// jr $at
// nop
// $fallthrough:
//
Pos = LongBrMBB->begin();
// LUi and ADDiu instructions create 32-bit offset of the target basic
// block from the target of BAL instruction. We cannot use immediate
// value for this offset because it cannot be determined accurately when
// the program has inline assembly statements. We therefore use the
// relocation expressions %hi($tgt-$baltgt) and %lo($tgt-$baltgt) which
// are resolved during the fixup, so the values will always be correct.
//
// Since we cannot create %hi($tgt-$baltgt) and %lo($tgt-$baltgt)
// expressions at this point (it is possible only at the MC layer),
// we replace LUi and ADDiu with pseudo instructions
// LONG_BRANCH_LUi and LONG_BRANCH_ADDiu, and add both basic
// blocks as operands to these instructions. When lowering these pseudo
// instructions to LUi and ADDiu in the MC layer, we will create
// %hi($tgt-$baltgt) and %lo($tgt-$baltgt) expressions and add them as
// operands to lowered instructions.
Pos = BalTgtMBB->begin();
MIBundleBuilder(*BalTgtMBB, Pos)
.append(BuildMI(*MF, DL, TII->get(Cpu0::JR)).addReg(Cpu0::AT))
.append(BuildMI(*MF, DL, TII->get(Cpu0::NOP)));
assert(LongBrMBB->size() == LongBranchSeqSize);
}
if (I.Br->isUnconditionalBranch()) {
// Change branch destination.
assert(I.Br->getDesc().getNumOperands() == 1);
I.Br->RemoveOperand(0);
I.Br->addOperand(MachineOperand::CreateMBB(LongBrMBB));
} else
// Change branch destination and reverse condition.
replaceBranch(*MBB, I.Br, DL, &*FallThroughMBB);
}
if (!STI.enableLongBranchPass())
return false;
MF = &F;
initMBBInfo();
SmallVectorImpl<MBBInfo>::iterator I, E = MBBInfos.end();
bool EverMadeChange = false, MadeChange = true;
while (MadeChange) {
MadeChange = false;
int ShVal = 4;
int64_t Offset = computeOffset(I->Br) / ShVal;
I->HasLongBranch = true;
I->Size += LongBranchSeqSize * 4;
++LongBranches;
EverMadeChange = MadeChange = true;
}
}
if (!EverMadeChange)
return true;
// Do the expansion.
for (I = MBBInfos.begin(); I != E; ++I)
if (I->HasLongBranch)
expandToLongBranch(*I);
MF->RenumberBlocks();
return true;
}
lbdex/chapters/Chapter8_2/Cpu0MCInstLower.h
lbdex/chapters/Chapter8_2/Cpu0MCInstLower.cpp
void Cpu0MCInstLower::
lowerLongBranchLUi(const MachineInstr *MI, MCInst &OutMI) const {
OutMI.setOpcode(Cpu0::LUi);
// Create %hi($tgt-$baltgt).
OutMI.addOperand(createSub(MI->getOperand(1).getMBB(),
MI->getOperand(2).getMBB(),
Cpu0MCExpr::CEK_ABS_HI));
}
void Cpu0MCInstLower::
lowerLongBranchADDiu(const MachineInstr *MI, MCInst &OutMI, int Opcode,
Cpu0MCExpr::Cpu0ExprKind Kind) const {
OutMI.setOpcode(Opcode);
case Cpu0::LONG_BRANCH_LUi:
lowerLongBranchLUi(MI, OutMI);
return true;
case Cpu0::LONG_BRANCH_ADDiu:
lowerLongBranchADDiu(MI, OutMI, Cpu0::ADDiu,
Cpu0MCExpr::CEK_ABS_LO);
return true;
}
}
if (lowerLongBranch(MI, OutMI))
return;
...
}
lbdex/chapters/Chapter8_2/Cpu0SEInstrInfo.h
lbdex/chapters/Chapter8_2/Cpu0SEInstrInfo.cpp
lbdex/chapters/Chapter8_2/Cpu0TargetMachine.cpp
addPass(createCpu0LongBranchPass(TM));
return;
}
lbdex/input/ch8_2_longbranch.cpp
int test_longbranch()
{
volatile int a = 2;
volatile int b = 1;
int result = 0;
if (a < b)
result = 1;
return result;
}
nop
.LBB0_3:
st $2, 0($fp)
.LBB0_4:
ld $2, 0($fp)
move $sp, $fp
ld $fp, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z15test_longbranchv
$func_end0:
.size _Z15test_longbranchv, ($func_end0)-_Z15test_longbranchv
LLVM uses functional pass both in code generation and optimization. Following the 3 tiers of compiler architecture,
LLVM do most optimization in middle tier of LLVM IR, SSA form. Beyond middle tier optimization, there are
opportunities in optimization which depend on backend features. The “fill delay slot” in Mips is an example of
backend optimization used in pipeline RISC machine. You can port it from Mips if your backend is a pipeline RISC
with delay slot. In this section, we apply the “delete useless jmp” in Cpu0 backend optimization. This algorithm is
simple and effective to be a perfect tutorial in optimization. Through this example, you can understand how to add an
optimization pass and coding your complicated optimization algorithm on your backend in real project.
Chapter8_2/ supports “delete useless jmp” optimization algorithm which add codes as follows,
lbdex/chapters/Chapter8_2/CMakeLists.txt
Cpu0DelUselessJMP.cpp
lbdex/chapters/Chapter8_2/Cpu0.h
lbdex/chapters/Chapter8_2/Cpu0TargetMachine.cpp
addPass(createCpu0DelJmpPass(TM));
lbdex/chapters/Chapter8_2/Cpu0DelUselessJMP.cpp
#include "Cpu0.h"
#if CH >= CH8_2
#include "Cpu0TargetMachine.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"
namespace {
struct DelJmp : public MachineFunctionPass {
static char ID;
DelJmp(TargetMachine &tm)
: MachineFunctionPass(ID) { }
};
char DelJmp::ID = 0;
} // end of anonymous namespace
bool DelJmp::
runOnMachineBasicBlock(MachineBasicBlock &MBB, MachineBasicBlock &MBBN) {
bool Changed = false;
MachineBasicBlock::iterator I = MBB.end();
if (I != MBB.begin())
I--; // set I to the last instruction
else
return Changed;
#endif
As above code, except Cpu0DelUselessJMP.cpp, other files are changed for registering class DelJmp as a functional
pass. As the comment of above code, MBB is the current block and MBBN is the next block. For each last instruction
of every MBB, we check whether or not it is the JMP instruction and its Operand is the next basic block. By getMBB()
in MachineOperand, you can get the MBB address. For the member functions of MachineOperand, please check
include/llvm/CodeGen/MachineOperand.h Now, let’s run Chapter8_2/ with ch8_2_deluselessjmp.cpp for explanation.
lbdex/input/ch8_2_deluselessjmp.cpp
int test_DelUselessJMP()
{
int a = 1; int b = -2; int c = 3;
if (a == 0) {
a++;
}
if (b == 0) {
a = a + 3;
b++;
} else if (b < 0) {
a = a + b;
b--;
}
if (c > 0) {
a = a + c;
c++;
}
return a;
}
The terminal displays “Number of useless jmp deleted” by llc -stats option because we set the “STATIS-
TIC(NumDelJmp, “Number of useless jmp deleted”)” in code. It deletes 2 jmp instructions from block “# BB#0” and
“$BB0_6”. You can check it by llc -enable-cpu0-del-useless-jmp=false option to see the difference
to non-optimization version. If you run with ch8_1_1.cpp, you will find 10 jmp instructions are deleted from 120 lines
of assembly code, which meaning 8% improvement in speed and code size 1 .
1 On a platform with cache and DRAM, the cache miss costs serveral tens time of instruction cycle. Usually, the compiler engineers who work
in the vendor of platform solution are spending much effort of trying to reduce the cache miss for speed. Reduce code size will decrease the cache
miss frequency too.
Cpu0 instruction set is designed to be a classical RISC pipeline machine. Classical RISC machine has many perfect
features 3 4 . I change Cpu0 backend to a 5 stages of classical RISC pipeline machine with one delay slot like some
of Mips model (The original Cpu0 from its author, is a 3 stages of RISC machine). With this change, the backend
needs filling the NOP instruction in the branch delay slot. In order to make this tutorial simple for learning, Cpu0
backend code not fill the branch delay slot with any useful instruction for optimization. Readers can reference the
MipsDelaySlotFiller.cpp to know how to insert useful instructions in backend optimization. Following code added in
Chapter8_2 for NOP fill in Branch Delay Slot.
lbdex/chapters/Chapter8_2/CMakeLists.txt
Cpu0DelaySlotFiller.cpp
lbdex/chapters/Chapter8_2/Cpu0.h
lbdex/chapters/Chapter8_2/Cpu0TargetMachine.cpp
addPass(createCpu0DelaySlotFillerPass(TM));
lbdex/chapters/Chapter8_2/Cpu0DelaySlotFiller.cpp
3 See book Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design)
4 http://en.wikipedia.org/wiki/Classic_RISC_pipeline
#include "Cpu0.h"
#if CH >= CH8_2
#include "Cpu0InstrInfo.h"
#include "Cpu0TargetMachine.h"
#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetRegisterInfo.h"
namespace {
typedef MachineBasicBlock::iterator Iter;
typedef MachineBasicBlock::reverse_iterator ReverseIter;
/// runOnMachineBasicBlock - Fill in delay slots for the given basic block.
/// We assume there is only one delay slot per delayed instruction.
bool Filler::runOnMachineBasicBlock(MachineBasicBlock &MBB) {
bool Changed = false;
const Cpu0Subtarget &STI = MBB.getParent()->getSubtarget<Cpu0Subtarget>();
const Cpu0InstrInfo *TII = STI.getInstrInfo();
++FilledSlots;
Changed = true;
return Changed;
}
#endif
To make the basic block label remains same, statement MIBundleBuilder() needs to be inserted after the statement
BuildMI(..., NOP) of Cpu0DelaySlotFiller.cpp. MIBundleBuilder() make both the branch instruction and NOP bun-
dled into one instruction (first part is branch instruction and second part is NOP).
lbdex/chapters/Chapter3_2/Cpu0AsmPrinter.cpp
do {
MCInst TmpInst0;
MCInstLowering.Lower(&*I, TmpInst0);
OutStreamer->EmitInstruction(TmpInst0, getSubtargetInfo());
} while ((++I != E) && I->isInsideBundle()); // Delay slot check
}
In order to print the NOP, the Cpu0AsmPrinter.cpp of Chapter3_2 prints all bundle instructions in loop. Without the
loop, only the first part of the bundle instruction (branch instruction only) is printed. In llvm 3.1 the basice block label
remains same even if you didn’t do the bundle after it. But for some reasons, it changed in llvm at some later version
and you need doing “bundle” in order to keep block label unchanged at later llvm phase.
lbdex/input/ch8_2_select.cpp
// The following files will generate IR select even compile with clang -O0.
int test_movx_1()
{
volatile int a = 1;
int c = 0;
c = !a ? 1:3;
return c;
}
int test_movx_2()
{
volatile int a = 1;
int c = 0;
c = a ? 1:3;
return c;
}
As above llvm IR, ch8_2_select.bc, clang generates select IR for small basic control block (if statement only include
one assign statement). This select IR is the result of optimization for CPUs with conditional instructions support. And
from above error message, obviously IR select is changed to select_cc during DAG optimization stages.
Chapter8_2 supports select with the following code added and changed.
lbdex/chapters/Chapter8_2/Cpu0InstrInfo.td
lbdex/chapters/Chapter8_2/Cpu0CondMov.td
// Conditional moves:
// These instructions are expanded in
// Cpu0ISelLowering::EmitInstrWithCustomInserter if target does not have
// conditional move instructions.
// cond:int, data:int
class CondMovIntInt<RegisterClass CRC, RegisterClass DRC, bits<8> op,
string instr_asm> :
// select patterns
multiclass MovzPats0Slt<RegisterClass CRC, RegisterClass DRC,
Instruction MOVZInst, Instruction SLTOp,
Instruction SLTuOp, Instruction SLTiOp,
Instruction SLTiuOp> {
def : Pat<(select (i32 (setge CRC:$lhs, CRC:$rhs)), DRC:$T, DRC:$F),
(MOVZInst DRC:$T, (SLTOp CRC:$lhs, CRC:$rhs), DRC:$F)>;
def : Pat<(select (i32 (setuge CRC:$lhs, CRC:$rhs)), DRC:$T, DRC:$F),
(MOVZInst DRC:$T, (SLTuOp CRC:$lhs, CRC:$rhs), DRC:$F)>;
def : Pat<(select (i32 (setge CRC:$lhs, immSExt16:$rhs)), DRC:$T, DRC:$F),
(MOVZInst DRC:$T, (SLTiOp CRC:$lhs, immSExt16:$rhs), DRC:$F)>;
def : Pat<(select (i32 (setuge CRC:$lh, immSExt16:$rh)), DRC:$T, DRC:$F),
(MOVZInst DRC:$T, (SLTiuOp CRC:$lh, immSExt16:$rh), DRC:$F)>;
def : Pat<(select (i32 (setle CRC:$lhs, CRC:$rhs)), DRC:$T, DRC:$F),
(MOVZInst DRC:$T, (SLTOp CRC:$rhs, CRC:$lhs), DRC:$F)>;
def : Pat<(select (i32 (setule CRC:$lhs, CRC:$rhs)), DRC:$T, DRC:$F),
(MOVZInst DRC:$T, (SLTuOp CRC:$rhs, CRC:$lhs), DRC:$F)>;
}
// Instantiation of instructions.
def MOVZ_I_I : CondMovIntInt<CPURegs, CPURegs, 0x0a, "movz">;
lbdex/chapters/Chapter8_2/Cpu0ISelLowering.h
lbdex/chapters/Chapter8_2/Cpu0ISelLowering.cpp
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
}
...
}
SDValue Cpu0TargetLowering::
lowerSELECT(SDValue Op, SelectionDAG &DAG) const
{
return Op;
}
Set ISD::SELECT_CC to “Expand” will stop llvm optimization from merging “setcc” and “select” into one IR “se-
lect_cc” 2 . Next the LowerOperation() return Op code directly for ISD::SELECT. Finally the pattern defined in
Cpu0CondMov.td will translate the select IR into conditional instruction, movz or movn. Let’s run Chapter8_2 with
ch8_2_select.cpp to get the following result. Again, the cpu032II uses slt instead of cmp has a little improved in
instructions number.
114-37-150-209:input Jonathan$ ~/llvm/test/cmake_debug_build/Debug/bin/llc
-march=cpu0 -mcpu=cpu032I -relocation-model=static -filetype=asm ch8_2_select.bc -o -
...
.type _Z11test_movx_1v,@function
...
addiu $2, $zero, 3
movz $2, $3, $4
...
.type _Z11test_movx_2v,@function
2 http://llvm.org/docs/WritingAnLLVMBackend.html#expand
...
addiu $2, $zero, 3
movn $2, $3, $4
...
The clang uses select IR in small basic block to reduce the branch cost in pipeline machine since the branch will
make the pipeline “stall”. But it needs the conditional instruction support 3 . If your backend has no conditional
instruction and needs clang compiler with optimization option O1 above level, you can change clang to force it
generating traditional branch basic block instead of generating IR select. RISC CPU came from the advantage of
pipeline and add more and more instruction when time passed. Compare Mips and ARM, the Mips has only movz
and movn two instructions while ARM has many. We create Cpu0 instructions as a simple instructions RISC pipeline
machine for compiler toolchain tutorial. However the cmp instruction is hired because many programmer is used to it
in past and now (ARM use it). This instruction matches the thinking in assembly programming, but the slt instruction
is more efficient in RISC pipleline. If you designed a backend aimed for C/C++ highlevel language, you may consider
slt instead of cmp since assembly code are rare used in programming and beside, the assembly programmer can accept
slt not difficultly since usually they are professional.
File ch8_2_select2.cpp will generate IR select if compile with clang -O1 .
lbdex/input/ch8_2_select2.cpp
// The following files will generate IR select when compile with clang -O1 but
// clang -O0 won't generate IR select.
volatile int a = 1;
volatile int b = 2;
int test_movx_3()
{
int c = 0;
if (a < b)
return 1;
else
return 2;
}
int test_movx_4()
{
int c = 0;
if (a)
c = 1;
else
c = 3;
return c;
}
List the conditional statements of C, IR, DAG and Cpu0 instructions as the following table.
lbdex/input/ch8_2_select_global_pic.cpp
volatile int a1 = 1;
volatile int b1 = 2;
int test_select_global_pic()
{
if (a1 < b1)
return gI1;
else
return gJ1;
}
.section .mdebug.abi32
.previous
.file "ch8_2_select_global_pic.bc"
.text
.globl _Z18test_select_globalv
.align 2
.type _Z18test_select_globalv,@function
.ent _Z18test_select_globalv # @_Z18test_select_globalv
_Z18test_select_globalv:
.frame $sp,0,$lr
.mask 0x00000000,0
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
lui $2, %got_hi(a1)
addu $2, $2, $gp
ld $2, %got_lo(a1)($2)
ld $2, 0($2)
lui $3, %got_hi(b1)
addu $3, $3, $gp
ld $3, %got_lo(b1)($3)
ld $3, 0($3)
cmp $sw, $2, $3
andi $2, $sw, 1
lui $3, %got_hi(gJ1)
addu $3, $3, $gp
ori $3, $3, %got_lo(gJ1)
lui $4, %got_hi(gI1)
addu $4, $4, $gp
ori $4, $4, %got_lo(gI1)
movn $3, $4, $2
ld $2, 0($3)
ld $2, 0($2)
ret $lr
.set macro
.set reorder
.end _Z18test_select_globalv
$tmp0:
.size _Z18test_select_globalv, ($tmp0)-_Z18test_select_globalv
Since phi (Φ) node is popular used in SSA form 5 , llvm applies phi node in IR for optimization work either. Phi node
exists for “live variable analysis”. An example for C is here 6 . As mentioned in wiki web site of reference above,
through finding dominance frontiers, compiler knows where to insert Φ functions. The following input let you know
the benefits of phi node as follows,
lbdex/input/ch8_2_phinode.cpp
if (a == 0) {
a++; // a = 1
}
else if (b != 0) {
a--; // b = 2
}
else if (c == 0) {
a += 2;
}
d = a + b;
return d;
}
; <label>:2 ; preds = %0
%3 = icmp eq i32 %b, 0
br i1 %3, label %6, label %4
; <label>:4 ; preds = %2
%5 = add nsw i32 %a, -1
5 https://en.wikipedia.org/wiki/Static_single_assignment_form
6 http://stackoverflow.com/questions/11485531/what-exactly-phi-instruction-does-and-how-to-use-it-in-llvm
br label %9
; <label>:6 ; preds = %2
%7 = icmp eq i32 %c, 0
%8 = add nsw i32 %a, 2
%.a = select i1 %7, i32 %8, i32 %a
br label %9
; <label>:6 ; preds = %0
%7 = load i32, i32* %1, align 4
%8 = add nsw i32 %7, 1
store i32 %8, i32* %1, align 4
br label %23
; <label>:9 ; preds = %0
%10 = load i32, i32* %2, align 4
%11 = icmp ne i32 %10, 0
br i1 %11, label %12, label %15
; <label>:12 ; preds = %9
%13 = load i32, i32* %1, align 4
%14 = add nsw i32 %13, -1
store i32 %14, i32* %1, align 4
br label %22
; <label>:15 ; preds = %9
%16 = load i32, i32* %3, align 4
%17 = icmp eq i32 %16, 0
br i1 %17, label %18, label %21
br label %21
Compile with clang -O3 will generate phi function. The phi function can assign virtual register value directly from
multi basic blocks. Compile with clang -O0 doesn’t generate phi, it assigns virtual register value by loading stack
slot where the stack slot is saved in each of multi basic blocks before. In this example the pointer of %1 point to the
stack slot, and “store i32 %8, i32* %1”, ” store i32 %14, i32* %1”, “store i32 %20, i32* %1” in label 6, 12 and 18,
respectively. In other words, it needs 3 store instructions. It’s possible that compiler finds that the a == 0 is always
true after optimization analysis through phi node. If so, the phi node version will bring better result because clang
-O0 version uses load and store with pointer %1 which may cut the optimization opportunity.
If you are interested in more details than the wiki web site, please refer book here 7 for phi node, or book here 8 for
the dominator tree analysis if you have this book.
As mentioned in the previous section, Cpu0 is a RISC (Reduced Instruction Set Computer) CPU with 5 stages of
pipeline. RISC CPU is full in the world, even the X86 of CISC (Complex Instruction Set Computer) is RISC inside (It
translates CISC instruction into micro-instructions which do pipeline as RISC). Knowledge with RISC concept may
make you satisfied in compiler design. List these two excellent books we have read for reference. Sure, there are
many books in Computer Architecture and some of them contain real RISC CPU knowledge needed, but these two are
excellent and popular.
Computer Organization and Design: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer
Architecture and Design)
Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and De-
sign)
The book of “Computer Organization and Design: The Hardware/Software Interface” (there are 4 editions at the book
is written) is for the introduction, while “Computer Architecture: A Quantitative Approach” is more complicate and
deep in CPU architecture (there are 5 editions at the book is written).
Above two books use Mips CPU as an example since Mips is more RISC-like than other market CPUs. ARM serials
of CPU dominate the embedded market especially in mobile phone and other portable devices. The following book is
good which I am reading now.
ARM System Developer’s Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in
Computer Architecture and Design).
7 Section 8.11 of Muchnick, Steven S. (1997). Advanced Compiler Design and Implementation. Morgan Kaufmann. ISBN 1-55860-320-4.
8 Refer chapter 9 of book Compilers: Principles, Techniques, and Tools (2nd Edition)
NINE
FUNCTION CALL
The subroutine/function call of backend translation is supported in this chapter. A lot of code are needed to support
function call in this chapter. They are added according llvm supplied interface to explain easily. This chapter starts
from introducing the Mips stack frame structure since we borrow many parts of ABI from it. Although each CPU has
it’s own ABI, most of ABI for RISC CPUs are similar. The section “4.5 DAG Lowering” of tricore_llvm.pdf contains
333
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
knowledge about Lowering process. Section “4.5.1 Calling Conventions” of tricore_llvm.pdf is the related material
you can reference further.
If you have problem in reading the stack frame illustrated in the first three sections of this chapter, you can read the
appendix B of “Procedure Call Convention” of book “Computer Organization and Design, 1st Edition” 1 , “Run Time
Memory” of compiler book, or “Function Call Sequence” and “Stack Frame” of Mips ABI 3 .
The first thing for designing the Cpu0 function call is deciding how to pass arguments in function call. There are two
options. One is passing arguments all in stack. The other is passing arguments in the registers which are reserved for
function arguments, and put the other arguments in stack if it over the number of registers reserved for function call.
For example, Mips pass the first 4 arguments in register $a0, $a1, $a2, $a3, and the other arguments in stack if it over
4 arguments. Fig. 9.1 is the Mips stack frame.
Run llc -march=mips for ch9_1.bc, you will get the following result. See comments “//”.
lbdex/input/ch9_1.cpp
int gI = 100;
int sum_i(int x1, int x2, int x3, int x4, int x5, int x6)
{
int sum = gI + x1 + x2 + x3 + x4 + x5 + x6;
return sum;
}
int main()
{
int a = sum_i(1, 2, 3, 4, 5, 6);
1 Computer Organization and Design: The Hardware/Software Interface 1st edition (The Morgan Kaufmann Series in Computer Architecture
and Design)
3 http://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf
return a;
}
.cfi_endproc
.globl main
.align 2
.type main,@function
.set nomips16 # @main
.ent main
main:
.cfi_startproc
.frame $sp,40,$ra
.mask 0x80000000,-4
.fmask 0x00000000,0
.set noreorder
.set nomacro
.set noat
# BB#0:
lui $2, %hi(_gp_disp)
ori $2, $2, %lo(_gp_disp)
addiu $sp, $sp, -40
$tmp5:
.cfi_def_cfa_offset 40
sw $ra, 36($sp) # 4-byte Folded Spill
$tmp6:
.cfi_offset 31, -4
addu $gp, $2, $25
sw $zero, 32($sp)
addiu $1, $zero, 6
sw $1, 20($sp) // Save argument 6 to 20($sp)
addiu $1, $zero, 5
sw $1, 16($sp) // Save argument 5 to 16($sp)
lw $25, %call16(_Z5sum_iiiiiii)($gp)
addiu $4, $zero, 1 // Pass argument 1 to $4 (=$a0)
addiu $5, $zero, 2 // Pass argument 2 to $5 (=$a1)
addiu $t9, $zero, 3
jalr $25
addiu $7, $zero, 4
sw $2, 28($sp)
lw $ra, 36($sp) # 4-byte Folded Reload
jr $ra
addiu $sp, $sp, 40
.set at
.set macro
.set reorder
.end main
$tmp7:
.size main, ($tmp7)-main
.cfi_endproc
From the mips assembly code generated as above, we see that it saves the first 4 arguments to $a0..$a3 and last
2 arguments to 16($sp) and 20($sp). Fig. 9.2 is the location of arguments for example code ch9_1.cpp. It loads
argument 5 from 48($sp) in sum_i() since the argument 5 is saved to 16($sp) in main(). The stack size of sum_i() is
32, so 16+32($sp) is the location of incoming argument 5.
The 007-2418-003.pdf in here 2 is the Mips assembly language manual. Here 3 is Mips Application Binary Interface
which include the Fig. 9.1.
2 http://math-atlas.sourceforge.net/devel/assembly/007-2418-003.pdf
From last section, in order to support function call, we need implementing the arguments passing mechanism with
stack frame. Before doing it, let’s run the old version of code Chapter8_2/ with ch9_1.cpp and see what happens.
118-165-79-31:input Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch9_1.bc -o ch9_1.cpu0.s
Assertion failed: (InVals.size() == Ins.size() && "LowerFormalArguments didn't
emit the correct number of values!"), function LowerArguments, file /Users/
Jonathan/llvm/test/src/lib/CodeGen/SelectionDAG/
SelectionDAGBuilder.cpp, ...
...
0. Program arguments: /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=pic -filetype=asm ch9_1.bc -o
ch9_1.cpu0.s
1. Running pass 'Function Pass Manager' on module 'ch9_1.bc'.
2. Running pass 'CPU0 DAG->DAG Pattern Instruction Selection' on function
'@_Z5sum_iiiiiii'
Illegal instruction: 4
Since Chapter8_2/ define the LowerFormalArguments() with empty body, we get the error messages as above. Before
defining LowerFormalArguments(), we have to choose how to pass arguments in function call. For demonstration,
Cpu0 passes first two arguments in registers as default setting of llc -cpu0-s32-calls=false . When llc
-cpu0-s32-calls=true , Cpu0 passes all it’s arguments in stack.
Function LowerFormalArguments() is in charge of incoming arguments creation. We define it as follows,
lbdex/chapters/Chapter9_1/Cpu0ISelLowering.h
/// Cpu0CC - This class provides methods used to analyze formal and call
/// arguments and inquire about calling convention information.
class Cpu0CC {
/// Return the function that analyzes fixed argument list functions.
llvm::CCAssignFn *fixedArgFn() const;
};
...
/// copyByValArg - Copy argument registers which were used to pass a byval
/// argument to the stack. Create a stack frame object for the byval
/// argument.
void copyByValRegs(SDValue Chain, const SDLoc &DL,
std::vector<SDValue> &OutChains, SelectionDAG &DAG,
const ISD::ArgFlagsTy &Flags,
SmallVectorImpl<SDValue> &InVals,
const Argument *FuncArg,
const Cpu0CC &CC, const ByValArgInfo &ByVal) const;
...
}
lbdex/chapters/Chapter9_1/Cpu0ISelLowering.cpp
// addLiveIn - This helper function adds the specified physical register to the
// MachineFunction as a live in value. It also creates a corresponding
// virtual register for it.
static unsigned
addLiveIn(MachineFunction &MF, unsigned PReg, const TargetRegisterClass *RC)
{
unsigned VReg = MF.getRegInfo().createVirtualRegister(RC);
MF.getRegInfo().addLiveIn(PReg, VReg);
return VReg;
}
//===----------------------------------------------------------------------===//
// TODO: Implement a generic logic using tblgen that can support this.
// Cpu0 32 ABI rules:
// ---
//===----------------------------------------------------------------------===//
LocVT = MVT::i32;
if (ArgFlags.isSExt())
LocInfo = CCValAssign::SExt;
else if (ArgFlags.isZExt())
LocInfo = CCValAssign::ZExt;
else
LocInfo = CCValAssign::AExt;
}
unsigned Reg;
// f32 and f64 are allocated in A0, A1 when either of the following
// is true: function is vararg, argument is 3rd or higher, there is previous
// argument which is not f32 or f64.
bool AllocateFloatsInIntReg = true;
unsigned OrigAlign = ArgFlags.getOrigAlign();
bool isI64 = (ValVT == MVT::i32 && OrigAlign == 8);
if (!Reg) {
unsigned Offset = State.AllocateStack(ValVT.getSizeInBits() >> 3,
OrigAlign);
State.addLoc(CCValAssign::getMem(ValNo, ValVT, Offset, LocVT, LocInfo));
} else
State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));
return false;
}
//===----------------------------------------------------------------------===//
// Call Calling Convention Implementation
//===----------------------------------------------------------------------===//
//@LowerCall {
/// LowerCall - functions arguments are copied from virtual regs to
/// (physical regs)/(stack frame), CALLSEQ_START and CALLSEQ_END are emitted.
SDValue
Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const {
//@LowerCall {
/// LowerCall - functions arguments are copied from virtual regs to
/// (physical regs)/(stack frame), CALLSEQ_START and CALLSEQ_END are emitted.
SDValue
Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const {
return CLI.Chain;
//===----------------------------------------------------------------------===//
//@LowerFormalArguments {
/// LowerFormalArguments - transform physical registers into virtual registers
/// and generate load operations for arguments places on the stack.
SDValue
Cpu0TargetLowering::LowerFormalArguments(SDValue Chain,
CallingConv::ID CallConv,
bool IsVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals)
const {
MachineFunction &MF = DAG.getMachineFunction();
MachineFrameInfo *MFI = MF.getFrameInfo();
Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
Cpu0FI->setVarArgsFrameIndex(0);
Function::const_arg_iterator FuncArg =
DAG.getMachineFunction().getFunction()->arg_begin();
bool UseSoftFloat = Subtarget.abiUsesSoftFloat();
unsigned CurArgIdx = 0;
Cpu0CC::byval_iterator ByValArg = Cpu0CCInfo.byval_begin();
//@2 {
//@byval pass {
if (Flags.isByVal()) {
assert(Flags.getByValSize() &&
"ByVal args of size 0 should have been ignored by front-end.");
assert(ByValArg != Cpu0CCInfo.byval_end());
copyByValRegs(Chain, DL, OutChains, DAG, Flags, InVals, &*FuncArg,
Cpu0CCInfo, *ByValArg);
++ByValArg;
continue;
}
//@byval pass }
// Arguments stored on registers
if (ABI.IsO32() && IsRegLoc) {
MVT RegVT = VA.getLocVT();
unsigned ArgReg = VA.getLocReg();
const TargetRegisterClass *RC = getRegClassFor(RegVT);
// sanity check
assert(VA.isMemLoc());
// All stores are grouped in one node to allow the matching between
// the size of Ins and InVals. This only happens when on varg functions
if (!OutChains.empty()) {
OutChains.push_back(Chain);
Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, OutChains);
}
return Chain;
}
// @LowerFormalArguments }
//===----------------------------------------------------------------------===//
void Cpu0TargetLowering::Cpu0CC::
analyzeFormalArguments(const SmallVectorImpl<ISD::InputArg> &Args,
bool IsSoftFloat, Function::const_arg_iterator FuncArg) {
unsigned NumArgs = Args.size();
llvm::CCAssignFn *FixedFn = fixedArgFn();
unsigned CurArgIdx = 0;
if (ArgFlags.isByVal()) {
handleByValArg(I, ArgVT, ArgVT, CCValAssign::Full, ArgFlags);
continue;
}
#ifndef NDEBUG
dbgs() << "Formal Arg #" << I << " has unhandled type "
<< EVT(ArgVT).getEVTString();
#endif
llvm_unreachable(nullptr);
}
}
if (useRegsForByval())
allocateRegs(ByVal, ByValSize, Align);
else // IsS32
return CC_Cpu0S32;
}
ByVal.FirstIdx = CCInfo.getFirstUnallocated(IntArgRegs);
Refresh “section Global variable” 4 , we handled global variable translation by creating the IR DAG in LowerGlobal-
Address() first, and then finish the Instruction Selection according their corresponding machine instruction DAGs in
Cpu0InstrInfo.td. LowerGlobalAddress() is called when llc meets the global variable access. LowerFormalArgu-
ments() work in the same way. It is called when function is entered. It gets incoming arguments information by
CCInfo(CallConv,..., ArgLocs, ...) before entering “for loop”. In ch9_1.cpp, there are 6 arguments in sum_i(...) func-
tion call. So ArgLocs.size() is 6, each argument information is in ArgLocs[i]. When VA.isRegLoc() is true, meaning
the arguement passes in register. On the contrary, when VA.isMemLoc() is true, meaning the arguement pass in mem-
ory stack. When passing in register, it marks the register “live in” and copy directly from the register. When passing in
memory stack, it creates stack offset for this frame index object and load node with the created stack offset, and then
puts the load node into vector InVals.
When llc -cpu0-s32-calls=false it passes first two arguments registers and the other arguments in stack
frame. When llc -cpu0-s32-calls=true it passes all arguments in stack frame.
Before taking care the arguments as above, it calls analyzeFormalArguments(). In analyzeFormalArguments() it
calls fixedArgFn() which return the function pointer of CC_Cpu0O32() or CC_Cpu0S32(). ArgFlags.isByVal()
will be true when it meets “struct pointer byval” keyword, such as “%struct.S* byval” in tailcall.ll. When llc
-cpu0-s32-calls=false the stack offset begin from 8 (in case the arguement registers need spill out) while
llc -cpu0-s32-calls=true stack offset begin from 0.
For instance of example code ch9_1.cpp with llc -cpu0-s32-calls=true (using memory stack only to pass
arguments), LowerFormalArguments() will be called twice. First time is for sum_i() which will create 6 “load DAGs”
for 6 incoming arguments passing into this function. Second time is for main() which won’t create any “load DAG”
since no incoming argument passing into main(). In addition to LowerFormalArguments() which creates the “load
DAG”, we need loadRegFromStackSlot() (defined in the early chapter) to issue the machine instruction “ld $r, off-
set($sp)” to load incoming arguments from stack frame offset. GetMemOperand(..., FI, ...) return the Memory location
of the frame index variable, which is the offset.
4 http://jonathan2251.github.io/lbd/globalvar.html#global-variable
For input ch9_incoming.cpp as below, LowerFormalArguments() will generate the red box parts of DAG nodes shown
as the next two figures for llc -cpu0-s32-calls=true and llc -cpu0-s32-calls=false , respec-
tively. The root node at bottom is created by
lbdex/input/ch9_incoming.cpp
return sum;
}
� � � � � � � � �
����������������������� ����������������������� �����������������������
�� �� ��
��� �� ��� �� ��� ��
� � � � � �
��� �����������
�� ��
��� ��
� �
������������
���
���
���
���
���
��������������������
� � �
���������
���
�� ����
� � �
������������
���
��
�����������
���������
�����������������������������������������������������������������������������
� � � �
�������������� �����
����������� �����������
�� ��
�� ��
��� ���
��� �� ��� ��
� � � � �
����������������������� ���
�� ��
��� �� ���
� �
������������
���
���
��
���
���
��������������������
� � �
���������
���
�� ����
� � �
������������
���
��
�����������
���������
348 Chapter 9. Function call
��������������������������������������������������������������������������������������
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
In addition to Calling Convention and LowerFormalArguments(), Chapter9_1/ adds the following code for the instruc-
tion selection and printing of Cpu0 instructions swi (Software Interrupt), jsub and jalr (function call).
lbdex/chapters/Chapter9_1/Cpu0InstrInfo.td
// Call
def Cpu0JmpLink : SDNode<"Cpu0ISD::JmpLink",SDT_Cpu0JmpLink,
[SDNPHasChain, SDNPOutGlue, SDNPOptInGlue,
SDNPVariadic]>;
class IsTailCall {
bit isCall = 1;
bit isTerminator = 1;
bit isReturn = 1;
bit isBarrier = 1;
bit hasExtraSrcRegAllocReq = 1;
bit isCodeGenOnly = 1;
}
lbdex/chapters/Chapter9_1/Cpu0MCInstLower.cpp
switch(MO.getTargetFlags()) {
case Cpu0II::MO_GOT_CALL:
TargetKind = Cpu0MCExpr::CEK_GOT_CALL;
break;
...
}
switch (MOTy) {
. ...
case MachineOperand::MO_ExternalSymbol:
Symbol = AsmPrinter.GetExternalSymbolSymbol(MO.getSymbolName());
Offset += MO.getOffset();
break;
...
}
...
}
switch (MOTy) {
//@2
case MachineOperand::MO_ExternalSymbol:
...
}
...
}
lbdex/chapters/Chapter9_1/MCTargetDesc/Cpu0AsmBackend.cpp
case Cpu0::fixup_Cpu0_CALL16:
...
}
...
}
lbdex/chapters/Chapter9_1/MCTargetDesc/Cpu0ELFObjectWriter.cpp
switch (Kind) {
case Cpu0::fixup_Cpu0_CALL16:
Type = ELF::R_CPU0_CALL16;
break;
...
}
...
}
lbdex/chapters/Chapter9_1/MCTargetDesc/Cpu0FixupKinds.h
enum Fixups {
// resulting in - R_CPU0_CALL16.
fixup_Cpu0_CALL16,
...
. }
lbdex/chapters/Chapter9_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp
unsigned Cpu0MCCodeEmitter::
getJumpTargetOpValue(const MCInst &MI, unsigned OpNo,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
Fixups.push_back(MCFixup::create(0, Expr,
MCFixupKind(Cpu0::fixup_Cpu0_PC24)));
...
}
unsigned Cpu0MCCodeEmitter::
getExprOpValue(const MCExpr *Expr,SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
// switch(cast<MCSymbolRefExpr>(Expr)->getKind()) {
case Cpu0MCExpr::CEK_GOT_CALL:
FixupKind = Cpu0::fixup_Cpu0_CALL16;
break;
...
}
...
}
lbdex/chapters/Chapter9_1/Cpu0MachineFunction.h
InArgFIRange(std::make_pair(-1, 0)),
OutArgFIRange(std::make_pair(-1, 0)), GPFI(0), DynAllocFI(0),
...
};
lbdex/chapters/Chapter9_1/Cpu0SEFrameLowering.h
lbdex/chapters/Chapter9_1/Cpu0SEFrameLowering.cpp
bool Cpu0SEFrameLowering::
spillCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {
MachineFunction *MF = MBB.getParent();
MachineBasicBlock *EntryBlock = &MF->front();
const TargetInstrInfo &TII = *MF->getSubtarget().getInstrInfo();
// method Cpu0TargetLowering::LowerRETURNADDR.
// It's killed at the spill, unless the register is LR and return address
// is taken.
unsigned Reg = CSI[i].getReg();
bool IsRAAndRetAddrIsTaken = (Reg == Cpu0::LR)
&& MF->getFrameInfo()->isReturnAddressTaken();
if (!IsRAAndRetAddrIsTaken)
EntryBlock->addLiveIn(Reg);
return true;
}
Both JSUB and JALR defined in Cpu0InstrInfo.td as above use Cpu0JmpLink node. They are distinguishable since
JSUB use “imm” operand while JALR uses register operand.
lbdex/chapters/Chapter9_1/Cpu0InstrInfo.td
The code tells TableGen generating pattern match code that matching the “imm” for “tglobaladdr” pattern first. If it
fails then trying to match “texternalsym” next. The function you declared belongs to “tglobaladdr”, (for instance the
function sum_i(...) defined in ch9_1.cpp belongs to “tglobaladdr”); the function which implicitly used by llvm belongs
to “texternalsym” (for instance the function “memcpy” belongs to “texternalsym”). The “memcpy” will be generated
when defining a long string. The ch9_1_2.cpp is an example for generating “memcpy” function call. It will be shown
in next section with Chapter9_2 example code. Cpu0GenDAGISel.inc contains pattern matched information of JSUB
and JALR which generated from TablGen as follows,
/*SwitchOpcode*/ 74, TARGET_VAL(Cpu0ISD::JmpLink),// ->734
/*660*/ OPC_RecordNode, // #0 = 'Cpu0JmpLink' chained node
/*661*/ OPC_CaptureGlueInput,
/*662*/ OPC_RecordChild1, // #1 = $target
/*663*/ OPC_Scope, 57, /*->722*/ // 2 children in Scope
/*665*/ OPC_MoveChild, 1,
/*667*/ OPC_SwitchOpcode /*3 cases */, 22, TARGET_VAL(ISD::Constant),
// ->693
/*671*/ OPC_MoveParent,
/*672*/ OPC_EmitMergeInputChains1_0,
/*673*/ OPC_EmitConvertToTarget, 1,
/*675*/ OPC_Scope, 7, /*->684*/ // 2 children in Scope
/*684*/ /*Scope*/ 7, /*->692*/
/*685*/ OPC_MorphNodeTo, TARGET_VAL(Cpu0::JSUB), 0|OPFL_Chain|
OPFL_GlueInput|OPFL_GlueOutput|OPFL_Variadic1,
0/*#VTs*/, 1/*#Ops*/, 2,
// Src: (Cpu0JmpLink (imm:iPTR):$target) - Complexity = 6
After above changes, you can run Chapter9_1/ with ch9_1.cpp and see what happens in the following,
Now, the LowerFormalArguments() has the correct number, but LowerCall() has not the correct number of values!
Fig. 9.2 depicts two steps to take care arguments passing. One is store outgoing arguments into caller function, the
other is load incoming arguments into callee function. We defined LowerFormalArguments() for “load incoming
arguments” in callee function last section. Now, we will finish “store outgoing arguments” in caller function.
LowerCall() is responsible in doing this. The implementation as follows,
lbdex/chapters/Chapter9_2/Cpu0MachineFunction.h
lbdex/chapters/Chapter9_2/Cpu0MachineFunction.cpp
lbdex/chapters/Chapter9_2/Cpu0ISelLowering.h
/// This function fills Ops, which is the list of operands that will later
/// be used when a function call node is created. It also generates
/// copyToReg nodes to set up argument registers.
virtual void
getOpndList(SmallVectorImpl<SDValue> &Ops,
std::deque< std::pair<unsigned, SDValue> > &RegsToPass,
bool IsPICCall, bool GlobalOrExternal, bool InternalLinkage,
CallLoweringInfo &CLI, SDValue Callee, SDValue Chain) const;
/// Cpu0CC - This class provides methods used to analyze formal and call
/// arguments and inquire about calling convention information.
class Cpu0CC {
. };
lbdex/chapters/Chapter9_2/Cpu0ISelLowering.cpp
SDValue
Cpu0TargetLowering::passArgOnStack(SDValue StackPtr, unsigned Offset,
SDValue Chain, SDValue Arg, const SDLoc &DL,
bool IsTailCall, SelectionDAG &DAG) const {
if (!IsTailCall) {
SDValue PtrOff =
DAG.getNode(ISD::ADD, DL, getPointerTy(DAG.getDataLayout()), StackPtr,
DAG.getIntPtrConstant(Offset, DL));
return DAG.getStore(Chain, DL, Arg, PtrOff, MachinePointerInfo());
}
void Cpu0TargetLowering::
getOpndList(SmallVectorImpl<SDValue> &Ops,
std::deque< std::pair<unsigned, SDValue> > &RegsToPass,
bool IsPICCall, bool GlobalOrExternal, bool InternalLinkage,
CallLoweringInfo &CLI, SDValue Callee, SDValue Chain) const {
// T9 should contain the address of the callee function if
// -reloction-model=pic or it is an indirect call.
if (IsPICCall || !GlobalOrExternal) {
unsigned T9Reg = Cpu0::T9;
RegsToPass.push_front(std::make_pair(T9Reg, Callee));
} else
Ops.push_back(Callee);
// Add argument registers to the end of the list so that they are
// known live into the call.
for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i)
Ops.push_back(CLI.DAG.getRegister(RegsToPass[i].first,
RegsToPass[i].second.getValueType()));
if (InFlag.getNode())
Ops.push_back(InFlag);
}
Cpu0CCInfo.analyzeCallOperands(Outs, IsVarArg,
Subtarget.abiUsesSoftFloat(),
Callee.getNode(), CLI.getArgs());
//@TailCall 1 {
// Check if it's really possible to do a tail call.
if (IsTailCall)
IsTailCall =
isEligibleForTailCallOptimization(Cpu0CCInfo, NextStackOffset,
*MF.getInfo<Cpu0FunctionInfo>());
if (IsTailCall)
++NumTailCalls;
//@TailCall 1 }
//@TailCall 2 {
if (!IsTailCall)
Chain = DAG.getCALLSEQ_START(Chain, NextStackOffsetVal, DL);
//@TailCall 2 }
SDValue StackPtr =
DAG.getCopyFromReg(Chain, DL, Cpu0::SP,
getPointerTy(DAG.getDataLayout()));
//@1 {
//@ByVal Arg {
if (Flags.isByVal()) {
assert(Flags.getByValSize() &&
"ByVal args of size 0 should have been ignored by front-end.");
assert(ByValArg != Cpu0CCInfo.byval_end());
assert(!IsTailCall &&
"Do not tail-call optimize if there is a byval argument.");
passByValArg(Chain, DL, RegsToPass, MemOpChains, StackPtr, MFI, DAG, Arg,
Cpu0CCInfo, *ByValArg, Flags, Subtarget.isLittle());
++ByValArg;
continue;
}
//@ByVal Arg }
// Transform all store nodes into one single node because all store
// nodes are independent of each other.
if (!MemOpChains.empty())
Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, MemOpChains);
if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {
if (IsPICCall) {
const GlobalValue *Val = G->getGlobal();
InternalLinkage = Val->hasInternalLinkage();
if (InternalLinkage)
Callee = getAddrLocal(G, Ty, DAG);
else
Callee = getAddrGlobal(G, Ty, DAG, Cpu0II::MO_GOT_CALL, Chain,
FuncInfo->callPtrInfo(Val));
} else
Callee = DAG.getTargetGlobalAddress(G->getGlobal(), DL,
getPointerTy(DAG.getDataLayout()), 0,
Cpu0II::MO_NO_FLAG);
GlobalOrExternal = true;
}
else if (ExternalSymbolSDNode *S = dyn_cast<ExternalSymbolSDNode>(Callee)) {
const char *Sym = S->getSymbol();
if (!IsPIC) // static
Callee = DAG.getTargetExternalSymbol(Sym,
getPointerTy(DAG.getDataLayout()),
Cpu0II::MO_NO_FLAG);
else // PIC
Callee = getAddrGlobal(S, Ty, DAG, Cpu0II::MO_GOT_CALL, Chain,
FuncInfo->callPtrInfo(Sym));
GlobalOrExternal = true;
}
//@TailCall 3 {
if (IsTailCall)
return DAG.getNode(Cpu0ISD::TailCall, DL, MVT::Other, Ops);
//@TailCall 3 }
// Handle result values, copying them out of physregs into vregs that we
// return.
return LowerCallResult(Chain, InFlag, CallConv, IsVarArg,
Ins, DL, DAG, InVals, CLI.Callee.getNode(), CLI.RetTy);
}
Cpu0CCInfo.analyzeCallResult(Ins, Subtarget.abiUsesSoftFloat(),
CallNode, RetTy);
if (RVLocs[i].getValVT() != RVLocs[i].getLocVT())
Val = DAG.getNode(ISD::BITCAST, DL, RVLocs[i].getValVT(), Val);
InVals.push_back(Val);
}
return Chain;
}
bool
Cpu0TargetLowering::CanLowerReturn(CallingConv::ID CallConv,
MachineFunction &MF, bool IsVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,
LLVMContext &Context) const {
SmallVector<CCValAssign, 16> RVLocs;
CCState CCInfo(CallConv, IsVarArg, MF,
RVLocs, Context);
return CCInfo.CheckReturn(Outs, RetCC_Cpu0);
}
Cpu0TargetLowering::Cpu0CC::SpecialCallingConvType
Cpu0TargetLowering::getSpecialCallingConv(SDValue Callee) const {
Cpu0CC::SpecialCallingConvType SpecialCallingConv =
Cpu0CC::NoSpecialCallingConv;
return SpecialCallingConv;
}
void Cpu0TargetLowering::Cpu0CC::
analyzeCallOperands(const SmallVectorImpl<ISD::OutputArg> &Args,
bool IsVarArg, bool IsSoftFloat, const SDNode *CallNode,
std::vector<ArgListEntry> &FuncArgs) {
//@analyzeCallOperands body {
assert((CallConv != CallingConv::Fast || !IsVarArg) &&
"CallingConv::Fast shouldn't be used for vararg functions.");
//@3 {
for (unsigned I = 0; I != NumOpnds; ++I) {
//@3 }
MVT ArgVT = Args[I].VT;
ISD::ArgFlagsTy ArgFlags = Args[I].Flags;
bool R;
if (ArgFlags.isByVal()) {
handleByValArg(I, ArgVT, ArgVT, CCValAssign::Full, ArgFlags);
continue;
}
{
MVT RegVT = getRegVT(ArgVT, FuncArgs[Args[I].OrigArgIndex].Ty, CallNode,
IsSoftFloat);
R = FixedFn(I, ArgVT, RegVT, CCValAssign::Full, ArgFlags, CCInfo);
}
if (R) {
#ifndef NDEBUG
dbgs() << "Call operand #" << I << " has unhandled type "
<< EVT(ArgVT).getEVTString();
#endif
llvm_unreachable(nullptr);
}
}
}
Just like load incoming arguments from stack frame, we call CCInfo(CallConv,..., ArgLocs, ...) to get outgoing
arguments information before entering “for loop”*. They’re almost same in **“for loop” with LowerFormalArgu-
ments(), except LowerCall() creates “store DAG vector” instead of “load DAG vector”. After the “for loop”, it create
“ld $t9, %call16(_Z5sum_iiiiiii)($gp)” and jalr $t9 for calling subroutine (the $6 is $t9) in PIC mode.
Like loading incoming arguments, we need to implement storeRegToStackSlot() at early chapter.
DAG.getCALLSEQ_START() and DAG.getCALLSEQ_END() are set before and after the “for loop”, respectively,
they insert CALLSEQ_START, CALLSEQ_END, and translate them into pseudo machine instructions !ADJCALL-
STACKDOWN, !ADJCALLSTACKUP later according Cpu0InstrInfo.td definition as follows.
lbdex/chapters/Chapter9_2/Cpu0InstrInfo.td
//===----------------------------------------------------------------------===//
// Pseudo instructions
//===----------------------------------------------------------------------===//
//@def CPRESTORE {
// When handling PIC code the assembler needs .cpload and .cprestore
// directives. If the real instructions corresponding these directives
// are used, we have the same behavior, but get also a bunch of warnings
// from the assembler.
let hasSideEffects = 0 in
def CPRESTORE : Cpu0Pseudo<(outs), (ins i32imm:$loc, CPURegs:$gp),
".cprestore\t$loc", []>;
} // let Predicates = [Ch9_2]
With below definition, eliminateCallFramePseudoInstr() will be called when llvm meets pseudo instructions ADJ-
CALLSTACKDOWN and ADJCALLSTACKUP. It justs discard these 2 pseudo instructions, and llvm will add offset
to stack.
lbdex/chapters/Chapter9_2/Cpu0InstrInfo.cpp
Cpu0GenInstrInfo(Cpu0::ADJCALLSTACKDOWN, Cpu0::ADJCALLSTACKUP),
lbdex/chapters/Chapter9_2/Cpu0FrameLowering.h
MachineBasicBlock::iterator
eliminateCallFramePseudoInstr(MachineFunction &MF,
MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) const override;
lbdex/chapters/Chapter9_2/Cpu0FrameLowering.cpp
return MBB.erase(I);
}
The whole DAGs created for outgoing arguments as “Figure Outgoing arguments DAG (A)...” below for
ch9_outgoing.cpp with cpu032I. LowerCall() (excluding calling LowerCallResult()) will generate the DAG nodes
as “Figure Outgoing arguments DAG (B)...” below for ch9_outgoing.cpp with cpu032I. The corresponding code of
DAGs Store and TargetGlobalAddress are listed in the figure, user can match the other DAGs to function LowerCall()
easily. Through Graphivz tool with llc option -view-dag-combine1-dags, you can design a small input C or llvm IR
source code and then check the DAGs to understand the code in LowerCall() and LowerFormalArguments(). At the
sub-sections “variable arguments” and “dynamic stack allocation support” in the later section of this chapter, you can
design the input example with this features and check the DAGs with these two functions again to make sure you know
the code in these two function. About Graphivz, please refer to section “Display llvm IR nodes with Graphviz” of
chapter 4, Arithmetic and logic instructions. The DAGs diagram can be got by llc option as follows,
lbdex/input/ch9_outgoing.cpp
int call_sum_i() {
return sum_i(1);
}
0dfaf1.dot'... done.
Running 'Graphviz' program...
���������� �����������������
�� ��
�� ���
� �
������������
�������������
��
��
���
�� ����
� �
����������� �����
�����������
�� ��
��
��� ���
��� ��
� � � �
�������������������������������������������� ������������
���������������������
��� ���
��
��� �������
��
� � �
�����������������
����������������
���
���
���
�� ����
� � � �
����������� �������������������������������������� ������������
�����������
�� �� ���
���
��� ��� ���
�� ����
���������
� � �
�����������
���
��� �� ����
���������������
� � �
���������
���
�� ����
� � �
������������
���
��
�����������
���������
����������������������������������������������������������������������������������������
���������� �����������������
�� ��
�� ���
� �
������������
�������������
��
��
���
�� ����
// Transform all store nodes into one single node because all store
� �
����������� // nodes are independent of each other. �����
�����������
�� if (!MemOpChains.empty()) ��
��
��� Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, MemOpChains); ���
��� ��
...
� � � �
�������������������������������������������� ������������
���������������������
��� ���
��
��� �������
��
� � �
�����������������
����������������
���
���
���
�� ����
if (!IsPIC) // static
� � � �
Callee = DAG.getTargetExternalSymbol(Sym,
�����������
getPointerTy(DAG.getDataLayout()),
���
Cpu0II::MO_NO_FLAG);
�� ����
...
�������������������������������������������������������������������������������������������������������
Mentioned in last section, option llc -cpu0-s32-calls=true uses S32 calling convention which passes all
arguements at registers while option llc -cpu0-s32-calls=false uses O32 pass first two arguments at reg-
isters and other arguments at stack. The result as follows,
ld $3, 48($sp)
ld $4, 44($sp)
ld $5, 40($sp)
ld $t9, 36($sp)
ld $7, 32($sp)
st $7, 28($sp)
st $t9, 24($sp)
st $5, 20($sp)
st $4, 16($sp)
st $3, 12($sp)
lui $3, %got_hi(gI)
addu $3, $3, $gp
st $2, 8($sp)
ld $3, %got_lo(gI)($3)
ld $3, 0($3)
ld $4, 28($sp)
addu $3, $3, $4
ld $4, 24($sp)
addu $3, $3, $4
ld $4, 20($sp)
addu $3, $3, $4
ld $4, 16($sp)
addu $3, $3, $4
ld $4, 12($sp)
addu $3, $3, $4
addu $2, $3, $2
st $2, 4($sp)
addiu $sp, $sp, 32
ret $lr
nop
.set macro
.set reorder
.end _Z5sum_iiiiiii
$tmp0:
.size _Z5sum_iiiiiii, ($tmp0)-_Z5sum_iiiiiii
.globl main
.align 2
.type main,@function
.ent main # @main
main:
.frame $fp,40,$lr
.mask 0x00004000,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -40
st $lr, 36($sp) # 4-byte Folded Spill
addiu $2, $zero, 0
st $2, 32($sp)
addiu $2, $zero, 6
st $2, 20($sp)
addiu $2, $zero, 5
st $2, 16($sp)
addiu $2, $zero, 4
st $2, 12($sp)
addiu $2, $zero, 3
st $2, 8($sp)
addiu $2, $zero, 2
st $2, 4($sp)
addiu $2, $zero, 1
st $2, 0($sp)
ld $t9, %call16(_Z5sum_iiiiiii)($gp)
jalr $t9
nop
st $2, 28($sp)
ld $lr, 36($sp) # 4-byte Folded Reload
addiu $sp, $sp, 40
ret $lr
nop
.set macro
.set reorder
.end main
$tmp1:
.size main, ($tmp1)-main
nop
st $2, 28($sp)
ld $lr, 36($sp) # 4-byte Folded Reload
addiu $sp, $sp, 40
ret $lr
nop
.set macro
.set reorder
.end main
The last section mentioned the “JSUB texternalsym” pattern. Run Chapter9_2 with ch9_1_2.cpp to get the result as
below. For long string, llvm call memcpy() to initialize string (char str[81] = “Hello world” in this case). For short
string, the “call memcpy” is translated into “store with contant” in stages of optimization.
lbdex/input/ch9_1_2.cpp
int main()
{
char str[81] = "Hello world";
char s[6] = "Hello";
return 0;
}
ret i32 0
}
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
.size $_ZZ4mainE3str, 81
The “call memcpy” for short string is optimized by llvm before “DAG->DAG Pattern Instruction Selection” stage and
translates it into “store with contant” as follows,
JonathantekiiMac:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build
/Debug/bin/llc -march=cpu0 -mcpu=cpu032II -cpu0-s32-calls=true
-relocation-model=static -filetype=asm ch9_1_2.bc -debug -o -
...
The incoming arguments is the formal arguments defined in compiler and program language books. The outgoing
arguments is the actual arguments. Summary as Table: Callee incoming arguments and caller outgoing arguments.
The following code in Chapter9_1/ and Chapter3_4/ support the ordinary structure type in function call.
lbdex/chapters/Chapter9_1/Cpu0ISelLowering.cpp
Reg = MF.getRegInfo().createVirtualRegister(
getRegClassFor(MVT::i32));
Cpu0FI->setSRetReturnReg(Reg);
}
SDValue Copy = DAG.getCopyToReg(DAG.getEntryNode(), DL, Reg, InVals[i]);
Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Copy, Chain);
break;
}
}
SDValue
Cpu0TargetLowering::LowerReturn(SDValue Chain,
CallingConv::ID CallConv, bool IsVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals,
const SDLoc &DL, SelectionDAG &DAG) const {
// The cpu0 ABIs for returning structs by value requires that we copy
// the sret argument into $v0 for the return. We saved the argument into
// a virtual register in the entry block, so now we copy the value out
// and into $v0.
if (MF.getFunction()->hasStructRetAttr()) {
Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
unsigned Reg = Cpu0FI->getSRetReturnReg();
if (!Reg)
llvm_unreachable("sret virtual register not created in the entry block");
SDValue Val =
DAG.getCopyFromReg(Chain, DL, Reg, getPointerTy(DAG.getDataLayout()));
unsigned V0 = Cpu0::V0;
In addition to above code, we defined the calling convention in early chapter as follows,
lbdex/chapters/Chapter3_4/Cpu0CallingConv.td
It meaning for the return value, we keep it in registers V0, V1, A0, A1 if the size of return value doesn’t over 4
registers; If it overs 4 registers, cpu0 will save them in memory with a pointer of memory in register. For explanation,
let’s run Chapter9_2/ with ch9_1_struct.cpp and explain with this example.
lbdex/input/ch9_1_struct.cpp
struct Date
{
int year;
int month;
int day;
int hour;
int minute;
int second;
};
static Date gDate = {2012, 10, 12, 1, 2, 3};
struct Time
{
int hour;
int minute;
int second;
};
static Time gTime = {2, 20, 30};
int test_func_arg_struct()
{
Time time1 = {1, 10, 12};
Date date1 = getDate();
Date date2 = copyDate(date1);
Date date3 = copyDate(&date1);
Time time2 = copyTime(time1);
Time time3 = copyTime(&time1);
if (!(date1.year == 2012 && date1.month == 10 && date1.day == 12 && date1.hour
== 1 && date1.minute == 2 && date1.second == 3))
return 1;
if (!(date2.year == 2012 && date2.month == 10 && date2.day == 12 && date2.hour
== 1 && date2.minute == 2 && date2.second == 3))
return 1;
if (!(time2.hour == 1 && time2.minute == 10 && time2.second == 12))
return 1;
if (!(time3.hour == 1 && time3.minute == 10 && time3.second == 12))
return 1;
#ifdef PRINT_TEST
printf("date1 = %d %d %d %d %d %d", date1.year, date1.month, date1.day,
date1.hour, date1.minute, date1.second); // date1 = 2012 10 12 1 2 3
if (date1.year == 2012 && date1.month == 10 && date1.day == 12 && date1.hour
== 1 && date1.minute == 2 && date1.second == 3)
printf(", PASS\n");
else
printf(", FAIL\n");
printf("date2 = %d %d %d %d %d %d", date2.year, date2.month, date2.day,
date2.hour, date2.minute, date2.second); // date2 = 2012 10 12 1 2 3
if (date2.year == 2012 && date2.month == 10 && date2.day == 12 && date2.hour
== 1 && date2.minute == 2 && date2.second == 3)
printf(", PASS\n");
else
printf(", FAIL\n");
// time2 = 1 10 12
printf("time2 = %d %d %d", time2.hour, time2.minute, time2.second);
if (time2.hour == 1 && time2.minute == 10 && time2.second == 12)
printf(", PASS\n");
else
printf(", FAIL\n");
// time3 = 1 10 12
printf("time3 = %d %d %d", time3.hour, time3.minute, time3.second);
if (time3.hour == 1 && time3.minute == 10 && time3.second == 12)
printf(", PASS\n");
else
printf(", FAIL\n");
#endif
return 0;
}
.set nomacro
# BB#0:
lui $2, %got_hi(gDate)
addu $2, $2, $gp
ld $3, %got_lo(gDate)($2)
ld $2, 0($sp)
ld $4, 20($3) // save gDate contents to 212..192($sp)
st $4, 20($2)
ld $4, 16($3)
st $4, 16($2)
ld $4, 12($3)
st $4, 12($2)
ld $4, 8($3)
st $4, 8($2)
ld $4, 4($3)
st $4, 4($2)
ld $3, 0($3)
st $3, 0($2)
ret $lr
nop
.set macro
.set reorder
.end _Z7getDatev
$tmp0:
.size _Z7getDatev, ($tmp0)-_Z7getDatev
.cfi_endproc
...
.globl _Z20test_func_arg_structv
.align 2
.type _Z20test_func_arg_structv,@function
.ent _Z20test_func_arg_structv # @main
_Z20test_func_arg_structv:
.cfi_startproc
.frame $sp,248,$lr
.mask 0x00004180,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -200
st $lr, 196($sp) # 4-byte Folded Spill
st $8, 192($sp) # 4-byte Folded Spill
ld $2, %got($_ZZ20test_func_arg_structvE5time1)($gp)
ori $2, $2, %lo($_ZZ20test_func_arg_structvE5time1)
ld $3, 8($2)
st $3, 184($sp)
ld $3, 4($2)
st $3, 180($sp)
ld $2, 0($2)
st $2, 176($sp)
addiu $8, $sp, 152
st $8, 0($sp)
ld $t9, %call16(_Z7getDatev)($gp) // copy gDate contents to date1, 176..
˓→152($sp)
jalr $t9
nop
ld $gp, 176($sp)
ld $2, 172($sp)
st $2, 124($sp)
ld $2, 168($sp)
st $2, 120($sp)
ld $2, 164($sp)
st $2, 116($sp)
ld $2, 160($sp)
st $2, 112($sp)
ld $2, 156($sp)
st $2, 108($sp)
ld $2, 152($sp)
st $2, 104($sp)
...
The ch9_1_constructor.cpp includes C++ class “Date” implementation. It can be translated into cpu0 backend too
since the frontend (clang in this example) translate them into C language form. If you mark the “if hasStructRetAttr()”
part from both of above functions, the output of cpu0 code for ch9_1_struct.cpp will use $3 instead of $2 as return
register as follows,
.text
.section .mdebug.abiS32
.previous
.file "ch9_1_struct.bc"
.globl _Z7getDatev
.align 2
.type _Z7getDatev,@function
.ent _Z7getDatev # @_Z7getDatev
_Z7getDatev:
.frame $fp,0,$lr
.mask 0x00000000,0
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
lui $2, %got_hi(gDate)
addu $2, $2, $gp
ld $2, %got_lo(gDate)($2)
ld $3, 0($sp)
ld $4, 20($2)
st $4, 20($3)
ld $4, 16($2)
st $4, 16($3)
ld $4, 12($2)
st $4, 12($3)
ld $4, 8($2)
st $4, 8($3)
ld $4, 4($2)
st $4, 4($3)
ld $2, 0($2)
st $2, 0($3)
ret $lr
nop
...
The following code in Chapter9_1/ and Chapter9_2/ support the byval structure type in function call.
lbdex/chapters/Chapter9_1/Cpu0ISelLowering.cpp
void Cpu0TargetLowering::
copyByValRegs(SDValue Chain, const SDLoc &DL, std::vector<SDValue> &OutChains,
SelectionDAG &DAG, const ISD::ArgFlagsTy &Flags,
SmallVectorImpl<SDValue> &InVals, const Argument *FuncArg,
const Cpu0CC &CC, const ByValArgInfo &ByVal) const {
MachineFunction &MF = DAG.getMachineFunction();
MachineFrameInfo *MFI = MF.getFrameInfo();
unsigned RegAreaSize = ByVal.NumRegs * CC.regSize();
unsigned FrameObjSize = std::max(Flags.getByValSize(), RegAreaSize);
int FrameObjOffset;
if (RegAreaSize)
FrameObjOffset = (int)CC.reservedArgArea() -
(int)((CC.numIntArgRegs() - ByVal.FirstIdx) * CC.regSize());
else
FrameObjOffset = ByVal.Address;
if (!ByVal.NumRegs)
return;
if (Flags.isByVal()) {
assert(Flags.getByValSize() &&
"ByVal args of size 0 should have been ignored by front-end.");
assert(ByValArg != Cpu0CCInfo.byval_end());
copyByValRegs(Chain, DL, OutChains, DAG, Flags, InVals, &*FuncArg,
Cpu0CCInfo, *ByValArg);
++ByValArg;
continue;
}
...
. }
...
}
lbdex/chapters/Chapter9_2/Cpu0ISelLowering.cpp
if (ByVal.NumRegs) {
const ArrayRef<MCPhysReg> ArgRegs = CC.intArgRegs();
bool LeftoverBytes = (ByVal.NumRegs * RegSizeInBytes > ByValSizeInBytes);
unsigned I = 0;
// Copy the remainder of the byval argument with sub-word loads and shifts.
if (LeftoverBytes) {
assert((ByValSizeInBytes > OffsetInBytes) &&
(ByValSizeInBytes < OffsetInBytes + RegSizeInBytes) &&
"Size of the remainder should be smaller than RegSizeInBytes.");
SDValue Val;
// Load subword.
SDValue LoadPtr = DAG.getNode(ISD::ADD, DL, PtrTy, Arg,
DAG.getConstant(OffsetInBytes, DL, PtrTy));
SDValue LoadVal = DAG.getExtLoad(
ISD::ZEXTLOAD, DL, RegTy, Chain, LoadPtr, MachinePointerInfo(),
MVT::getIntegerVT(LoadSizeInBytes * 8), Alignment);
MemOpChains.push_back(LoadVal.getValue(1));
if (isLittle)
Shamt = TotalBytesLoaded * 8;
else
Shamt = (RegSizeInBytes - (TotalBytesLoaded + LoadSizeInBytes)) * 8;
if (Val.getNode())
OffsetInBytes += LoadSizeInBytes;
TotalBytesLoaded += LoadSizeInBytes;
Alignment = std::min(Alignment, LoadSizeInBytes);
}
if (Flags.isByVal()) {
assert(Flags.getByValSize() &&
"ByVal args of size 0 should have been ignored by front-end.");
assert(ByValArg != Cpu0CCInfo.byval_end());
assert(!IsTailCall &&
"Do not tail-call optimize if there is a byval argument.");
passByValArg(Chain, DL, RegsToPass, MemOpChains, StackPtr, MFI, DAG, Arg,
Cpu0CCInfo, *ByValArg, Flags, Subtarget.isLittle());
++ByValArg;
continue;
}
...
}
...
}
In LowerCall(), Flags.isByVal() will be true if it meets byval for struct type in caller function as follows,
lbdex/input/tailcall.ll
In LowerFormalArguments(), Flags.isByVal() will be true when it meets byval in callee function as follows,
lbdex/input/tailcall.ll
At this point, I don’t know how to create a make clang to generate byval IR with C language.
Tail call optimization is used in some situation of function call. For some situation, the caller and callee stack can
share the same memory stack. When this situation applied in recursive function call, it often asymptotically reduces
stack space requirements from linear, or O(n), to constant, or O(1) 5 . LLVM IR supports tailcall here 6 .
The tailcall appeared in Cpu0ISelLowering.cpp and Cpu0InstrInfo.td are used to make tail call optimization.
lbdex/input/ch9_2_tailcall.cpp
int factorial(int x)
{
if (x > 0)
return x*factorial(x-1);
else
return 1;
}
int test_tailcall(int a)
{
return factorial(a);
}
$BB0_2: # %tailrecurse._crit_edge
ret $lr
nop
.set macro
.set reorder
.end _Z9factoriali
$tmp0:
.size _Z9factoriali, ($tmp0)-_Z9factoriali
.globl _Z13test_tailcalli
.align 2
.type _Z13test_tailcalli,@function
.ent _Z13test_tailcalli # @_Z13test_tailcalli
_Z13test_tailcalli:
.frame $sp,0,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
jmp _Z9factoriali
nop
.set macro
.set reorder
.end _Z13test_tailcalli
$tmp1:
.size _Z13test_tailcalli, ($tmp1)-_Z13test_tailcalli
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
...
1 cpu0-lower - Number of tail calls
...
The tail call optimization shares caller’s and callee’s stack and it is applied in cpu032II only for this example (it uses
“jmp _Z9factoriali” instead of “jsub _Z9factoriali”). Then cpu032I (pass all arguments in stack) doesn’t satisfy the
statement, NextStackOffset <= FI.getIncomingArgSize() in isEligibleForTailCallOptimization(), and return false for
the function as follows,
lbdex/chapters/Chapter9_2/Cpu0SEISelLowering.cpp
bool Cpu0SETargetLowering::
isEligibleForTailCallOptimization(const Cpu0CC &Cpu0CCInfo,
unsigned NextStackOffset,
const Cpu0FunctionInfo& FI) const {
if (!EnableCpu0TailCalls)
return false;
lbdex/chapters/Chapter9_2/Cpu0ISelLowering.cpp
if (IsTailCall)
++NumTailCalls;
if (!IsTailCall)
Chain = DAG.getCALLSEQ_START(Chain, NextStackOffsetVal, DL);
if (IsTailCall)
return DAG.getNode(Cpu0ISD::TailCall, DL, MVT::Other, Ops);
...
}
Since tailcall optimization will translate jmp instruction directly instead of jsub. The
callseq_start, callseq_end, and the DAG nodes created in LowerCallResult() and Lower-
Return() are needless. It creates DAGs for ch9_2_tailcall.cpp as the following figure,
���������� ���������������
�� ��
�� ���
� �
������������
�����������
��
��
���
��� ��
� � �
������������������������������������������������ ������������
���������
�� ���
��
��� �������
�� ����
� � � � �
�����������������
���
��
���������
��������������������������������������������������������������
Finally, listing the DAGs translation of tail call as the following table.
lbdex/chapters/Chapter9_1/Cpu0InstrInfo.td
// Tail call
def Cpu0TailCall : SDNode<"Cpu0ISD::TailCall", SDT_Cpu0JmpLink,
[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
lbdex/chapters/Chapter9_1/Cpu0InstrInfo.td
lbdex/chapters/Chapter9_1/Cpu0AsmPrinter.h
// tblgen'erated function.
bool emitPseudoExpansionLowering(MCStreamer &OutStreamer,
const MachineInstr *MI);
lbdex/chapters/Chapter9_1/Cpu0AsmPrinter.cpp
PrintDebugValueComment(MI, OS);
return;
}
do {
// Do any auto-generated pseudo lowerings.
if (emitPseudoExpansionLowering(*OutStreamer, &*I))
continue;
MCInst TmpInst0;
MCInstLowering.Lower(&*I, TmpInst0);
OutStreamer->EmitInstruction(TmpInst0, getSubtargetInfo());
} while ((++I != E) && I->isInsideBundle()); // Delay slot check
}
As last section, cpu032I cannot does tail call optimization in ch9_2_tailcall.cpp since the limitation of arguments size
is not satisfied. If runnig with clang -O3 option, it can get the same or better performance than tail call as follows,
JonathantekiiMac:input Jonathan$ clang -O1 -target mips-unknown-linux-gnu -c
ch9_2_tailcall.cpp -emit-llvm -o ch9_2_tailcall.bc
JonathantekiiMac:input Jonathan$ ~/llvm/test/cmake_debug_build/bin/
llvm-dis ch9_2_tailcall.bc -o -
...
; Function Attrs: nounwind readnone
define i32 @_Z9factoriali(i32 %x) #0 {
%1 = icmp sgt i32 %x, 0
br i1 %1, label %tailrecurse.preheader, label %tailrecurse._crit_edge
tailrecurse.preheader: ; preds = %0
br label %tailrecurse
tailrecurse.i.preheader: ; preds = %0
br label %tailrecurse.i
.end _Z9factoriali
$tmp0:
.size _Z9factoriali, ($tmp0)-_Z9factoriali
.globl _Z13test_tailcalli
.align 2
.type _Z13test_tailcalli,@function
.ent _Z13test_tailcalli # @_Z13test_tailcalli
_Z13test_tailcalli:
.frame $sp,0,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $2, $zero, 1
ld $3, 0($sp)
cmp $sw, $3, $2
jlt $sw, $BB1_2
nop
$BB1_1: # %tailrecurse.i
# =>This Inner Loop Header: Depth=1
mul $2, $2, $3
addiu $3, $3, -1
addiu $4, $zero, 0
cmp $sw, $3, $4
jgt $sw, $BB1_1
nop
$BB1_2: # %_Z9factoriali.exit
ret $lr
nop
.set macro
.set reorder
.end _Z13test_tailcalli
$tmp1:
.size _Z13test_tailcalli, ($tmp1)-_Z13test_tailcalli
According above llvm IR, clang -O3 option replace recursion with loop by inline the callee recursion function.
This is a frontend optimization through cross over function analysis.
Cpu0 doesn’t support fastcc 7 but it can pass the fastcc keyword of IR. Mips supports fastcc by using as more registers
as possible without following ABI specification.
This section supports features for “$gp register caller saved register in PIC addressing mode”, “variable number of
arguments” and “dynamic stack allocation”.
Run Chapter9_2/ with ch9_3_vararg.cpp to get the following error,
lbdex/input/ch9_3_vararg.cpp
#include <stdarg.h>
7 http://llvm.org/docs/LangRef.html#calling-conventions
va_list vl;
va_start(vl, amount);
for (i = 0; i < amount; i++)
{
val = va_arg(vl, int);
sum += val;
}
va_end(vl);
return sum;
}
int test_vararg()
{
int a = sum_i(6, 0, 1, 2, 3, 4, 5);
return a;
}
lbdex/input/ch9_3_alloc.cpp
//#include <alloca.h>
//#include <stdlib.h>
int sum(int x1, int x2, int x3, int x4, int x5, int x6)
{
int sum = x1 + x2 + x3 + x4 + x5 + x6;
return sum;
}
int weight_sum(int x1, int x2, int x3, int x4, int x5, int x6)
{
// int *b = (int*)alloca(sizeof(int) * 1 * x1);
int test_alloc()
{
int a = weight_sum(1, 2, 3, 4, 5, 6); // 31
return a;
}
9.6.1 The $gp register caller saved register in PIC addressing mode
According the original cpu0 web site information, it only supports “jsub” of 24-bit address range access. We add
“jalr” to cpu0 and expand it to 32 bit address. We do this change for two reasons. One is that cpu0 can be expanded
to 32 bit address space by only adding this instruction, and the other is cpu0 and this book are designed for tutorial.
We reserve “jalr” as PIC mode for dynamic linking function to demonstrates:
1. How caller handles the caller saved register $gp in calling the function.
2. How the code in the shared libray function uses $gp to access global variable address.
3. The jalr for dynamic linking function is easier in implementation and faster. As we have depicted in section
“pic mode” of chapter “Global variables, structs and arrays, other type”. This solution is popular in reality and
deserve changing cpu0 official design as a compiler book.
In chapter “Global variable”, we mentioned two link type, the static link and dynamic link. The option -relocation-
model=static is for static link function while option -relocation-model=pic is for dynamic link function. One instance
of dynamic link function is used is for calling functions of share library. Share library includes a lots of dynamic link
functions usually can be loaded at run time. Since share library can be loaded in different memory address, the global
variable address be accessed cannot be decided at link time. Whatever, he distance between the global variable address
and the start address of shared library function can be calculated when it has been loaded.
Let’s run Chapter9_3/ with ch9_gprestore.cpp to get the following result. We putting the comments in the result for
explanation.
lbdex/input/ch9_gprestore.cpp
int call_sum_i() {
int a = sum_i(1);
a += sum_i(2);
return a;
}
As above code comment, “.cprestore 8” is a pseudo instruction for saving $gp to 8($sp) while Instruction “ld $gp,
8($sp)” restore the $gp, refer to Table 8-1 of “MIPSpro TM Assembly Language Programmer’s Guide” 2 . In other
words, $gp is a caller saved register, so main() need to save/restore $gp before/after call the shared library _Z5sum_ii()
function. In llvm Mips 3.5, it removed the .cprestore in mode PIC which meaning $gp is not a caller saved register in
PIC anymore. However, it is still existed in Cpu0 and this feature can be removed by not defining it in Cpu0Config.h.
The #ifdef ENABLE_GPRESTORE part of code in Cpu0 can be removed but it comes with the cost of reserving
$gp register as a specific register and cannot be allocated for the program variable in PIC mode. As explained in early
chapter Gloabal variable, the PIC is not critial function and the performance advantage can be ignored in dynamic link,
so we keep this feature in Cpu0. Reserving $gp as a specific register in PIC will save a lot of code in programming.
When reserving $gp, .cprestore can be disabled by option “-cpu0-reserve-gp”. The .cpload is needed even reserving
$gp (considering that programmers implement a boot code function with C and assembly mixed, programmer can set
$gp value through .cpload be issued.
If enabling “-cpu0-no-cpload”, and undefining ENABLE_GPRESTORE or enable “-cpu0-reserve-gp”, .cpload and
$gp save/restore won’t be issued as follow,
LLVM Mips 3.1 issues the .cpload and .cprestore and Cpu0 borrows it from that version. But now, llvm Mips replace
.cpload with real instructions and remove .cprestore. It treats $gp as reserved register in PIC mode. Since the Mips
assembly document which I reference say $gp is “caller save register”, Cpu0 follows this document at this point and
provides reserving $gp register as option.
# BB#0: # %entry
lui $2, %hi(_gp_disp)
ori $2, $2, %lo(_gp_disp)
addiu $sp, $sp, -32
$tmp0:
.cfi_def_cfa_offset 32
sw $ra, 28($sp) # 4-byte Folded Spill
sw $fp, 24($sp) # 4-byte Folded Spill
sw $16, 20($sp) # 4-byte Folded Spill
$tmp1:
.cfi_offset 31, -4
$tmp2:
.cfi_offset 30, -8
$tmp3:
.cfi_offset 16, -12
move $fp, $sp
$tmp4:
.cfi_def_cfa_register 30
addu $16, $2, $25
lw $25, %call16(_Z5sum_ii)($16)
addiu $4, $zero, 1
jalr $25
move $gp, $16
sw $2, 16($fp)
lw $25, %call16(_Z5sum_ii)($16)
jalr $25
addiu $4, $zero, 2
lw $1, 16($fp)
addu $2, $1, $2
sw $2, 16($fp)
move $sp, $fp
lw $16, 20($sp) # 4-byte Folded Reload
lw $fp, 24($sp) # 4-byte Folded Reload
lw $ra, 28($sp) # 4-byte Folded Reload
jr $ra
addiu $sp, $sp, 32
The following code added in Chapter9_3/ issues “.cprestore” or the corresponding machine code before the first time
of PIC function call.
lbdex/chapters/Chapter9_3/Cpu0ISelLowering.cpp
#ifdef ENABLE_GPRESTORE
if (!Cpu0ReserveGP) {
// If this is the first call, create a stack frame object that points to
// a location to which .cprestore saves $gp.
if (IsPIC && Cpu0FI->globalBaseRegFixed() && !Cpu0FI->getGPFI())
Cpu0FI->setGPFI(MFI->CreateFixedObject(4, 0, true));
if (Cpu0FI->needGPSaveRestore())
MFI->setObjectOffset(Cpu0FI->getGPFI(), NextStackOffset);
}
#endif
...
}
lbdex/chapters/Chapter9_3/Cpu0MachineFunction.h
#ifdef ENABLE_GPRESTORE
bool needGPSaveRestore() const { return getGPFI(); }
#endif
lbdex/chapters/Chapter9_3/Cpu0SEFrameLowering.cpp
#ifdef ENABLE_GPRESTORE
// Restore GP from the saved stack location
if (Cpu0FI->needGPSaveRestore()) {
unsigned Offset = MFI->getObjectOffset(Cpu0FI->getGPFI());
BuildMI(MBB, MBBI, dl, TII.get(Cpu0::CPRESTORE)).addImm(Offset)
.addReg(Cpu0::GP);
}
#endif
lbdex/chapters/Chapter9_3/Cpu0RegisterInfo.cpp
...
}
lbdex/chapters/Chapter9_3/Cpu0InstrInfo.td
// When handling PIC code the assembler needs .cpload and .cprestore
// directives. If the real instructions corresponding these directives
// are used, we have the same behavior, but get also a bunch of warnings
// from the assembler.
let hasSideEffects = 0 in
def CPRESTORE : Cpu0Pseudo<(outs), (ins i32imm:$loc, CPURegs:$gp),
".cprestore\t$loc", []>;
lbdex/chapters/Chapter9_3/Cpu0AsmPrinter.cpp
#ifdef ENABLE_GPRESTORE
void Cpu0AsmPrinter::EmitInstrWithMacroNoAT(const MachineInstr *MI) {
MCInst TmpInst;
MCInstLowering.Lower(MI, TmpInst);
OutStreamer->EmitRawText(StringRef("\t.set\tmacro"));
if (Cpu0FI->getEmitNOAT())
OutStreamer->EmitRawText(StringRef("\t.set\tat"));
OutStreamer->EmitInstruction(TmpInst, getSubtargetInfo());
if (Cpu0FI->getEmitNOAT())
OutStreamer->EmitRawText(StringRef("\t.set\tnoat"));
OutStreamer->EmitRawText(StringRef("\t.set\tnomacro"));
}
#endif
#ifdef ENABLE_GPRESTORE
void Cpu0AsmPrinter::emitPseudoCPRestore(MCStreamer &OutStreamer,
const MachineInstr *MI) {
unsigned Opc = MI->getOpcode();
SmallVector<MCInst, 4> MCInsts;
const MachineOperand &MO = MI->getOperand(0);
assert(MO.isImm() && "CPRESTORE's operand must be an immediate.");
int64_t Offset = MO.getImm();
if (OutStreamer.hasRawTextSupport()) {
// output assembly
if (!isInt<16>(Offset)) {
EmitInstrWithMacroNoAT(MI);
return;
}
MCInst TmpInst0;
MCInstLowering.Lower(MI, TmpInst0);
OutStreamer.EmitInstruction(TmpInst0, getSubtargetInfo());
} else {
// output elf
MCInstLowering.LowerCPRESTORE(Offset, MCInsts);
return;
}
}
#endif
#ifdef ENABLE_GPRESTORE
if (I->getOpcode() == Cpu0::CPRESTORE) {
emitPseudoCPRestore(*OutStreamer, &*I);
continue;
}
#endif
...
}
lbdex/chapters/Chapter9_3/Cpu0MCInstLower.h
#ifdef ENABLE_GPRESTORE
void LowerCPRESTORE(int64_t Offset, SmallVector<MCInst, 4>& MCInsts);
#endif
lbdex/chapters/Chapter9_3/Cpu0MCInstLower.cpp
#ifdef ENABLE_GPRESTORE
// Lower ".cprestore offset" to "st $gp, offset($sp)".
void Cpu0MCInstLower::LowerCPRESTORE(int64_t Offset,
SmallVector<MCInst, 4>& MCInsts) {
assert(isInt<32>(Offset) && (Offset >= 0) &&
"Imm operand of .cprestore must be a non-negative 32-bit value.");
if (!isInt<16>(Offset)) {
unsigned Hi = ((Offset + 0x8000) >> 16) & 0xffff;
Offset &= 0xffff;
MCOperand ATReg = MCOperand::createReg(Cpu0::AT);
BaseReg = ATReg;
// lui at,hi
// add at,at,sp
MCInsts.resize(2);
CreateMCInst(MCInsts[0], Cpu0::LUi, ATReg, ZEROReg, MCOperand::createImm(Hi));
CreateMCInst(MCInsts[1], Cpu0::ADD, ATReg, ATReg, SPReg);
}
MCInst St;
CreateMCInst(St, Cpu0::ST, GPReg, BaseReg, MCOperand::createImm(Offset));
MCInsts.push_back(St);
}
#endif
The added code of Cpu0AsmPrinter.cpp as above will call the LowerCPRESTORE() when user run program with
llc -filetype=obj . The added code of Cpu0MCInstLower.cpp as above takes care the .cprestore machine
instructions.
Run llc -static will call jsub instruction instead of jalr as follows,
Run ch9_1.bc with llc -filetype=obj , you will find the Cx of “jsub Cx” is 0 since the Cx is calculated by
linker as below. Mips has the same 0 in it’s jal instruction.
The following code will emit “ld $gp, ($gp save slot on stack)” after jalr by creating file Cpu0EmitGPRestore.cpp
which run as a function pass.
lbdex/chapters/Chapter9_3/CMakeLists.txt
Cpu0EmitGPRestore.cpp
lbdex/chapters/Chapter9_3/Cpu0TargetMachine.cpp
#ifdef ENABLE_GPRESTORE
void addPreRegAlloc() override;
#endif
#ifdef ENABLE_GPRESTORE
void Cpu0PassConfig::addPreRegAlloc() {
if (!Cpu0ReserveGP) {
// $gp is a caller-saved register.
addPass(createCpu0EmitGPRestorePass(getCpu0TargetMachine()));
}
return;
}
#endif
lbdex/chapters/Chapter9_3/Cpu0.h
#ifdef ENABLE_GPRESTORE
FunctionPass *createCpu0EmitGPRestorePass(Cpu0TargetMachine &TM);
#endif
lbdex/chapters/Chapter9_3/Cpu0EmitGPRestore.cpp
#include "Cpu0.h"
#if CH >= CH9_3
#ifdef ENABLE_GPRESTORE
#include "Cpu0TargetMachine.h"
#include "Cpu0MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/ADT/Statistic.h"
namespace {
struct Inserter : public MachineFunctionPass {
TargetMachine &TM;
if ((TM.getRelocationModel() != Reloc::PIC_) ||
(!Cpu0FI->globalBaseRegFixed()))
return false;
// Insert ld.
++I;
DebugLoc dl = I != MBB.end() ? I->getDebugLoc() : DebugLoc();
BuildMI(MBB, I, dl, TII->get(Cpu0::LD), Cpu0::GP).addFrameIndex(FI)
.addImm(0);
Changed = true;
}
while (I != MFI->end()) {
if (I->getOpcode() != Cpu0::JALR) {
++I;
continue;
DebugLoc dl = I->getDebugLoc();
// emit ld $gp, ($gp save slot on stack) after jalr
BuildMI(MBB, ++I, dl, TII->get(Cpu0::LD), Cpu0::GP).addFrameIndex(FI)
.addImm(0);
Changed = true;
}
}
return Changed;
}
#endif
#endif
Until now, we support fixed number of arguments in formal function definition (Incoming Arguments). This subsection
supports variable number of arguments since C language supports this feature.
Run Chapter9_3/ with ch9_3_vararg.cpp as well as clang option, clang -target mips-unknown-linux-gnu, to get the
following result,
.globl _Z11test_varargv
.align 2
.type _Z11test_varargv,@function
.ent _Z11test_varargv # @_Z11test_varargv
_Z11test_varargv:
.frame $sp,88,$lr
.mask 0x00004000,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -48
st $lr, 44($sp) # 4-byte Folded Spill
The analysis of output ch9_3_vararg.cpu0.s as above in comment. As above code in # BB#0, we get the first argument
“amount” from “ld $2, 24($fp)” since the stack size of the callee function “_Z5sum_iiz()” is 24. And then setting
argument pointer, arg_ptr, to 0($fp), &arg[1]. Next, checking i < amount in block $BB0_1. If i < amount, than entering
into $BB0_2. In $BB0_2, it does sum += *arg_ptr and arg_ptr+=4. In # BB#3, it does i+=1.
To support variable number of arguments, the following code needed to add in Chapter9_3/. The ch9_3_template.cpp
is C++ template example code, it can be translated into cpu0 backend code too.
lbdex/chapters/Chapter9_3/Cpu0ISelLowering.h
/// Cpu0CC - This class provides methods used to analyze formal and call
/// arguments and inquire about calling convention information.
class Cpu0CC {
/// Return the function that analyzes variable argument list functions.
llvm::CCAssignFn *varArgFn() const;
...
. };
...
. };
lbdex/chapters/Chapter9_3/Cpu0ISelLowering.cpp
//@llvm.stacksave
// Use the default for now
setOperationAction(ISD::STACKSAVE, MVT::Other, Expand);
setOperationAction(ISD::STACKRESTORE, MVT::Other, Expand);
...
}
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
}
return SDValue();
}
SDLoc DL = SDLoc(Op);
SDValue FI = DAG.getFrameIndex(FuncInfo->getVarArgsFrameIndex(),
getPointerTy(MF.getDataLayout()));
// vastart just stores the address of the VarArgsFrameIndex slot into the
// memory location argument.
const Value *SV = cast<SrcValueSDNode>(Op.getOperand(2))->getValue();
return DAG.getStore(Op.getOperand(0), DL, FI, Op.getOperand(1),
MachinePointerInfo(SV));
}
if (IsVarArg)
writeVarArgRegs(OutChains, Cpu0CCInfo, Chain, DL, DAG);
...
}
void Cpu0TargetLowering::Cpu0CC::
analyzeCallOperands(const SmallVectorImpl<ISD::OutputArg> &Args,
bool IsVarArg, bool IsSoftFloat, const SDNode *CallNode,
std::vector<ArgListEntry> &FuncArgs) {
...
}
...
}
if (NumRegs == Idx)
VaArgOffset = alignTo(CCInfo.getNextStackOffset(), RegSize);
else
VaArgOffset = (int)CC.reservedArgArea() - (int)(RegSize * (NumRegs - Idx));
// Copy the integer registers that have not been used for argument passing
// to the argument register save area. For O32, the save area is allocated
// in the caller's stack frame, while for N32/64, it is allocated in the
// callee's stack frame.
for (unsigned I = Idx; I < NumRegs; ++I, VaArgOffset += RegSize) {
unsigned Reg = addLiveIn(MF, ArgRegs[I], RC);
SDValue ArgValue = DAG.getCopyFromReg(Chain, DL, Reg, RegTy);
FI = MFI->CreateFixedObject(RegSize, VaArgOffset, true);
SDValue PtrOff = DAG.getFrameIndex(FI, getPointerTy(DAG.getDataLayout()));
SDValue Store = DAG.getStore(Chain, DL, ArgValue, PtrOff,
MachinePointerInfo());
cast<StoreSDNode>(Store.getNode())->getMemOperand()->setValue(
(Value *)nullptr);
OutChains.push_back(Store);
}
}
lbdex/input/ch9_3_template.cpp
#include <stdarg.h>
template<class T>
T sum(T amount, ...)
{
T i = 0;
T val = 0;
T sum = 0;
va_list vl;
va_start(vl, amount);
for (i = 0; i < amount; i++)
{
val = va_arg(vl, T);
sum += val;
}
va_end(vl);
return sum;
}
int test_template()
{
int a = sum<int>(6, 0, 1, 2, 3, 4, 5);
return a;
}
Mips qemu reference 8 , you can download and run it with gcc to verify the result with printf() function at this point.
We will verify the correction of the code in chapter “Verify backend on Verilog simulator” through the CPU0 Verilog
language machine.
Even though C language is very rare using dynamic stack allocation, there are languages use it frequently. The
following C example code uses it.
Chapter9_3 supports dynamic stack allocation with the following code added.
lbdex/chapters/Chapter9_2/Cpu0FrameLowering.cpp
if (!hasReservedCallFrame(MF)) {
int64_t Amount = I->getOperand(0).getImm();
if (I->getOpcode() == Cpu0::ADJCALLSTACKDOWN)
Amount = -Amount;
return MBB.erase(I);
}
8 http://developer.mips.com/clang-llvm/
lbdex/chapters/Chapter9_3/Cpu0SEFrameLowering.cpp
unsigned FP = Cpu0::FP;
unsigned ZERO = Cpu0::ZERO;
unsigned ADDu = Cpu0::ADDu;
unsigned FP = Cpu0::FP;
unsigned ZERO = Cpu0::ZERO;
unsigned ADDu = Cpu0::ADDu;
unsigned FP = Cpu0::FP;
lbdex/chapters/Chapter9_3/Cpu0ISelLowering.cpp
setStackPointerRegisterToSaveRestore(Cpu0::SP);
lbdex/chapters/Chapter9_3/Cpu0RegisterInfo.cpp
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
if (Cpu0FI->isOutArgFI(FrameIndex) || Cpu0FI->isGPFI(FrameIndex) ||
Cpu0FI->isDynAllocFI(FrameIndex))
Offset = spOffset;
Run Chapter9_3 with ch9_3_alloc.cpp will get the following correct result.
define i32 @_Z5sum_iiiiiii(i32 %x1, i32 %x2, i32 %x3, i32 %x4, i32 %x5, i32 %x6)
nounwind uwtable ssp {
...
%9 = alloca i8, i32 %8 // int* b = (int*)__builtin_alloca(sizeof(int) * 1 *
˓→x1);
ld $3, 20($fp)
ld $4, 28($fp)
ld $t9, 24($fp)
ld $7, 16($fp)
addiu $sp, $sp, -24
st $7, 20($sp)
st $t9, 12($sp)
st $4, 8($sp)
shl $3, $3, 1
st $3, 16($sp)
addiu $3, $zero, 3
mul $4, $2, $3
ld $t9, %call16(_Z3sumiiiiii)($gp)
jalr $t9
nop
ld $gp, 24($fp)
addiu $sp, $sp, 24
st $2, 4($fp)
ld $3, 8($fp)
ld $3, 0($3)
addu $2, $2, $3
move $sp, $fp
ld $fp, 40($sp) # 4-byte Folded Reload
ld $lr, 44($sp) # 4-byte Folded Reload
addiu $sp, $sp, 48
ret $lr
nop
.set macro
.set reorder
.end _Z10weight_sumiiiiii
$func_end1:
.size _Z10weight_sumiiiiii, ($func_end1)-_Z10weight_sumiiiiii
...
As you can see, the dynamic stack allocation needs frame pointer register fp support. As above assembly, the sp
is adjusted to (sp - 48) when it enter the function as usual by instruction addiu $sp, $sp, -48. Next, the fp is set
to sp where the position is just above alloca() spaces area as Fig. 9.3 when meets instruction move $fp, $sp. After
that, the sp is changed to the area just below of alloca(). Remind, the alloca() area where the b point to, “*b =
(int*)__builtin_alloca(sizeof(int) * 2 * x6)”, is allocated at run time since the size of the space which depends on x1
variable and cannot be calculated at link time.
Fig. 9.4 depict how the stack pointer changes back to the caller stack bottom. As above, the fp is set to the address
just above of alloca(). The first step is changing the sp to fp by instruction move $sp, $fp. Next, sp is changed back to
caller stack bottom by instruction addiu $sp, $sp, 40.
Using fp to keep the old stack pointer value is not the only solution. Actually, we can keep the size of alloca() spaces
on a specific memory address and the sp can be set back to the the old sp by adding the size of alloca() spaces. Most
ABI like Mips and ARM access the above area of alloca() by fp and the below area of alloca() by sp, as Fig. 9.5
depicted. The reason for this definition is the speed for local variable access. Since the RISC CPU use immediate
offset for load and store as below, using fp and sp for access both areas of local variables have better performance
comparing to use the sp only.
ld $2, 64($fp)
st $3, 4($sp)
Cpu0 uses fp and sp to access the above and below areas of alloca() too. As ch9_3_alloc.cpu0.s, it accesses local
variables (above of alloca()) by fp offset and outgoing arguments (below of alloca()) by sp offset.
And more, the “move $sp, $fp” is the alias instruction of “addu $fp, $sp, $zero”. The machine code is the latter
one, and the former is only for easy understanding by user. This alias comes from code added in Chapter3_2 and
Chapter3_5 as follows,
lbdex/chapters/Chapter3_2/InstPrinter/Cpu0InstPrinter.cpp
lbdex/chapters/Chapter3_5/Cpu0InstrInfo.td
lbdex/input/ch9_3_longlongshift.cpp
#include "debug.h"
long long c;
c = (b >> a);
return c; // 22
}
st $2, 44($fp)
addiu $4, $zero, 0
st $4, 40($fp)
addiu $5, $zero, 18
st $5, 36($fp)
st $4, 32($fp)
ld $2, 44($fp)
st $2, 8($sp)
jsub __lshrdi3
nop
st $3, 28($fp)
st $2, 24($fp)
ld $2, 44($fp)
st $2, 8($sp)
ld $4, 32($fp)
ld $5, 36($fp)
jsub __ashldi3
nop
st $3, 20($fp)
st $2, 16($fp)
ld $4, 28($fp)
addu $4, $4, $3
cmp $sw, $4, $3
andi $3, $sw, 1
addu $2, $3, $2
ld $3, 24($fp)
addu $2, $3, $2
addu $3, $zero, $4
move $sp, $fp
ld $fp, 48($sp) # 4-byte Folded Reload
ld $lr, 52($sp) # 4-byte Folded Reload
addiu $sp, $sp, 56
ret $lr
nop
.set macro
.set reorder
.end _Z20test_longlong_shift1v
$tmp0:
.size _Z20test_longlong_shift1v, ($tmp0)-_Z20test_longlong_shift1v
.globl _Z20test_longlong_shift2v
.align 2
.type _Z20test_longlong_shift2v,@function
.ent _Z20test_longlong_shift2v # @_Z20test_longlong_shift2v
_Z20test_longlong_shift2v:
.frame $fp,48,$lr
.mask 0x00005000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -48
st $lr, 44($sp) # 4-byte Folded Spill
st $fp, 40($sp) # 4-byte Folded Spill
move $fp, $sp
addiu $2, $zero, 48
st $2, 36($fp)
addiu $2, $zero, 0
st $2, 32($fp)
LLVM supports variable sized arrays in C99 9 . The following code added for this support. Set them to expand,
meaning llvm uses other DAGs replace them.
lbdex/chapters/Chapter9_3/Cpu0ISelLowering.cpp
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
...
}
...
}
lbdex/input/ch9_3_stacksave.cpp
int test_stacksaverestore(unsigned x) {
// CHECK: call i8* @llvm.stacksave()
char s1[x];
9 http://www.llvm.org/docs/LangRef.html#llvm-stacksave-intrinsic
s1[x] = 5;
return s1[x];
// CHECK: call void @llvm.stackrestore(i8*
}
I think these llvm instinsic IRs are for the implementation of exception handle 10 [#returnaddr]. With these IRs,
programmer can record the frame address and return address to be used in implementing program of exception handler
by C++ as the example below. In order to support these llvm intrinsic IRs, the following code added to Cpu0 backend.
lbdex/chapters/Chapter9_3/Cpu0ISelLowering.cpp
10 http://llvm.org/docs/ExceptionHandling.html#overview
...
}
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
...
}
...
}
SDValue Cpu0TargetLowering::
lowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const {
// check the depth
assert((cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue() == 0) &&
"Frame address can only be determined for current frame.");
// Return LR, which contains the return address. Mark it an implicit live-in.
unsigned Reg = MF.addLiveIn(LR, getRegClassFor(VT));
return DAG.getCopyFromReg(DAG.getEntryNode(), SDLoc(Op), Reg, VT);
}
Cpu0FI->setCallsEhReturn();
SDValue Chain = Op.getOperand(0);
SDValue Offset = Op.getOperand(1);
SDValue Handler = Op.getOperand(2);
SDLoc DL(Op);
EVT Ty = MVT::i32;
// Store stack offset in V1, store jump target in V0. Glue CopyToReg and
// EH_RETURN nodes, so that instructions are emitted back-to-back.
unsigned OffsetReg = Cpu0::V1;
unsigned AddrReg = Cpu0::V0;
Chain = DAG.getCopyToReg(Chain, DL, OffsetReg, Offset, SDValue());
Chain = DAG.getCopyToReg(Chain, DL, AddrReg, Handler, Chain.getValue(1));
return DAG.getNode(Cpu0ISD::EH_RETURN, DL, MVT::Other, Chain,
DAG.getRegister(OffsetReg, Ty),
DAG.getRegister(AddrReg, getPointerTy(MF.getDataLayout())),
Chain.getValue(1));
}
Cpu0FI->setCallsEhDwarf();
return Op;
}
lbdex/input/ch9_3_frame_return_addr.cpp
int display_frameaddress() {
return (int)__builtin_frame_address(0);
}
int display_returnaddress() {
int a = (int)__builtin_return_address(0);
fn();
return a;
}
.end _Z20display_frameaddressv
$func_end0:
.size _Z20display_frameaddressv, ($func_end0)-_Z20display_frameaddressv
.globl _Z22display_returnaddress1v
.align 2
.type _Z22display_returnaddress1v,@function
.ent _Z22display_returnaddress1v # @_Z22display_returnaddress1v
_Z22display_returnaddress1v:
.cfi_startproc
.frame $fp,24,$lr
.mask 0x00005000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -24
$tmp0:
.cfi_def_cfa_offset 24
st $lr, 20($sp) # 4-byte Folded
˓→Spill
$tmp1:
.cfi_offset 14, -4
$tmp2:
.cfi_offset 12, -8
move $fp, $sp
$tmp3:
.cfi_def_cfa_register 12
st $lr, 12($fp)
jsub _Z2fnv
nop
ld $2, 12($fp)
move $sp, $fp
ld $fp, 16($sp) # 4-byte Folded
˓→Reload
The asm “ld $2, 12($fp)” in function _Z22display_returnaddress1v reloads $lr to $2 after “jsub _Z3fnv”. The reason
that Cpu0 doesn’t produce “addiu $2, $zero, $lr” is if a bug program in _Z3fnv changes $lr value without following
ABI then it will get the wrong $lr to $2. The following code kills $lr register and make the reference to $lr by loading
from stack slot rather than uses register directly.
lbdex/chapters/Chapter9_1/Cpu0SEFrameLowering.cpp
bool Cpu0SEFrameLowering::
spillCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {
...
for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
// Add the callee-saved register as live-in. Do not add if the register is
// LR and return address is taken, because it has already been added in
// method Cpu0TargetLowering::LowerRETURNADDR.
// It's killed at the spill, unless the register is LR and return address
// is taken.
unsigned Reg = CSI[i].getReg();
bool IsRAAndRetAddrIsTaken = (Reg == Cpu0::LR)
&& MF->getFrameInfo()->isReturnAddressTaken();
if (!IsRAAndRetAddrIsTaken)
EntryBlock->addLiveIn(Reg);
eh.return intrinsic
Beside lowerRETURNADDR() in Cpu0ISelLowering, the following code is for eh.return supporting only, and it can
run with input ch9_3_detect_exception.cpp as below.
lbdex/chapters/Chapter9_3/Cpu0SEFrameLowering.cpp
if (Cpu0FI->callsEhReturn()) {
// Insert instructions that spill eh data registers.
for (int I = 0; I < ABI.EhDataRegSize(); ++I) {
if (!MBB.isLiveIn(ABI.GetEhDataReg(I)))
MBB.addLiveIn(ABI.GetEhDataReg(I));
TII.storeRegToStackSlot(MBB, MBBI, ABI.GetEhDataReg(I), false,
Cpu0FI->getEhDataRegFI(I), RC, &RegInfo);
}
...
}
if (Cpu0FI->callsEhReturn()) {
const TargetRegisterClass *RC = &Cpu0::GPROutRegClass;
...
}
...
}
lbdex/chapters/Chapter9_3/Cpu0InstrInfo.td
lbdex/chapters/Chapter9_3/Cpu0SEInstrInfo.h
lbdex/chapters/Chapter9_3/Cpu0SEInstrInfo.cpp
case Cpu0::CPU0eh_return32:
expandEhReturn(MBB, MI);
break;
...
}
expandRetLR(MBB, I);
}
lbdex/input/ch9_3_detect_exception.cpp
__attribute__ ((weak))
int test_detect_exception(bool exception) {
exceptionOccur = false;
void* handler = (void*)(&exception_handler);
if (exception) {
returnAddr = __builtin_return_address(0);
__builtin_eh_return(0, handler); // no warning, eh_return never returns.
}
else {
return 0;
}
}
; No predecessors!
ret void
}
; <label>:5 ; preds = %0
%6 = call i8* @llvm.returnaddress(i32 0)
store i8* %6, i8** @returnAddr, align 4
%7 = load i8*, i8** %handler, align 4
call void @llvm.eh.return.i32(i32 0, i8* %7)
unreachable
; <label>:8 ; preds = %0
ret i32 0
}
# BB#0:
addiu $sp, $sp, -16
st $fp, 12($sp) # 4-byte Folded Spill
st $4, 4($fp)
st $5, 0($fp)
move $fp, $sp
lui $2, %got_hi(exceptionOccur)
addu $2, $2, $gp
ld $2, %got_lo(exceptionOccur)($2)
addiu $3, $zero, 1
sb $3, 0($2)
st $fp, 8($fp)
lui $2, %got_hi(returnAddr)
addu $2, $2, $gp
ld $2, %got_lo(returnAddr)($2)
ld $2, 0($2)
addiu $3, $zero, 0
move $sp, $fp
ld $4, 4($fp)
ld $5, 0($fp)
ld $fp, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
move $t9, $2
move $lr, $2
addu $sp, $sp, $3
ret $lr
nop
.set macro
.set reorder
.end _Z17exception_handlerv
$func_end0:
.size _Z17exception_handlerv, ($func_end0)-_Z17exception_handlerv
.weak _Z21test_detect_exceptionb
.align 2
.type _Z21test_detect_exceptionb,@function
.ent _Z21test_detect_exceptionb # @_Z21test_detect_exceptionb
_Z21test_detect_exceptionb:
.cfi_startproc
.frame $fp,24,$lr
.mask 0x00001000,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -24
$tmp0:
.cfi_def_cfa_offset 24
st $fp, 20($sp) # 4-byte Folded Spill
$tmp1:
.cfi_offset 12, -4
st $4, 8($fp)
st $5, 4($fp)
$tmp2:
.cfi_offset 4, -16
$tmp3:
.cfi_offset 5, -20
move $fp, $sp
$tmp4:
.cfi_def_cfa_register 12
sb $4, 16($fp)
lui $2, %got_hi(exceptionOccur)
addu $2, $2, $gp
ld $2, %got_lo(exceptionOccur)($2)
addiu $3, $zero, 0
sb $3, 0($2)
lui $2, %got_hi(_Z17exception_handlerv)
addu $2, $2, $gp
ld $2, %got_lo(_Z17exception_handlerv)($2)
st $2, 12($fp)
lbu $2, 16($fp)
andi $2, $2, 1
beq $2, $zero, .LBB1_2
nop
jmp .LBB1_1
nop
.LBB1_2:
addiu $2, $zero, 0
move $sp, $fp
ld $4, 8($fp)
ld $5, 4($fp)
ld $fp, 20($sp) # 4-byte Folded Reload
addiu $sp, $sp, 24
ret $lr
nop
.LBB1_1:
lui $2, %got_hi(returnAddr)
addu $2, $2, $gp
ld $2, %got_lo(returnAddr)($2)
st $lr, 0($2)
ld $2, 12($fp)
addiu $3, $zero, 0
move $sp, $fp
ld $4, 8($fp)
ld $5, 4($fp)
ld $fp, 20($sp) # 4-byte Folded Reload
addiu $sp, $sp, 24
move $t9, $2
move $lr, $2
addu $sp, $sp, $3
ret $lr
nop
.set macro
.set reorder
.end _Z21test_detect_exceptionb
$func_end1:
.size _Z21test_detect_exceptionb, ($func_end1)-_Z21test_detect_exceptionb
.cfi_endproc
If you disable “__attribute__ ((weak))” in the C file, then the IR will has “nounwind” in attributes #3. The side effect
in asm output is “No .cfi_offset issued” like function exception_handler().
This example code of exception handler implementation can get frame, return and call exception handler by call
__builtin_xxx in clang in C language, without introducing any assembly instruction. And this example can be verified
in the Chapter “Cpu0 ELF linker” of the other book “llvm tool chain for Cpu0” 12 . Through examining global variable,
exceptionOccur, is true or false, program will set the control flow to exception_handler() or not to correctly.
eh.dwarf intrinsic
Beside lowerADD() in Cpu0ISelLowering, the following code is for the eh.dwarf supporting only, and it can run with
input eh-dwarf-cfa.ll as below.
lbdex/chapters/Chapter9_3/Cpu0SEFrameLowering.cpp
...
}
lbdex/input/eh-dwarf-cfa.ll
12 http://jonathan2251.github.io/lbt/lld.html
bswap intrinsic
lbdex/chapters/Chapter12_1/Cpu0ISelLowering.cpp
...
}
13 http://llvm.org/docs/LangRef.html#llvm-bswap-intrinsics
lbdex/input/ch9_3_bswap.cpp
int test_bswap16() {
volatile int a = 0x1234;
int result = (__builtin_bswap16(a) ^ 0x3412);
return result;
}
int test_bswap32() {
volatile int a = 0x1234;
int result = (__builtin_bswap32(a) ^ 0x34120000);
return result;
}
int test_bswap64() {
volatile int a = 0x1234;
int result = (__builtin_bswap64(a) ^ 0x3412000000000000);
return result;
}
int test_bswap() {
int result = test_bswap16() + test_bswap32() + test_bswap64();
return result;
}
9.7 Summary
Now, Cpu0 backend code can take care both the integer function call and control statement just like the example code
of llvm frontend tutorial does. It can translate some of the C++ OOP language into Cpu0 instructions also without
much effort in backend, because the most complex things in language, such as C++ syntax, is handled by frontend.
LLVM is a real structure following the compiler theory, any backend of LLVM can get benefit from this structure. The
best part of 3 tiers compiler structure is that backend will grow up automatically in languages support as the frontend
supporting languages more and more when the frontend doesn’t add any new IR for a new language.
TEN
ELF SUPPORT
• ELF format
– ELF header and Section header table
– Relocation Record
– Cpu0 ELF related files
• llvm-objdump
– llvm-objdump -t -r
– llvm-objdump -d
Cpu0 backend generated the ELF format of obj. The ELF (Executable and Linkable Format) is a common standard
file format for executables, object code, shared libraries and core dumps. First published in the System V Application
Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among different vendors
of Unixsystems. In 1999 it was chosen as the standard binary file format for Unix and Unix-like systems on x86 by
the x86open project. Please reference 1 .
The binary encode of Cpu0 instruction set in obj has been checked in the previous chapters. But we didn’t dig into
the ELF file format like elf header and relocation record at that time. This chapter will use the binutils which has
been installed in “sub-section Install other tools on iMac” of Appendix A: “Installing LLVM” 2 to check the generated
cpu0 ELF file. You will learn the objdump, readelf, ..., tools and understand the ELF file format itself through using
these tools to analyze the cpu0 generated obj in this chapter. LLVM has the llvm-objdump tool which like objdump.
We will make cpu0 support llvm-objdump tool further in this chapter. The binutils is a cross compiler tool chains
include a couple of CPU ELF dump function support. Linux platform has binutils already and no need to install it
further. The reason we use Linux binutils in this chapter just because my iMac will display Chinese text. The iMac
corresponding binutils have no problem except it add g in command name and and display with your area language
instead of pure English on iMac. For example, when using gobjdump instead of objdump, I have the result of Chinese
language unicode display instead of pure English on my iMac.
The binutils tool we use is not a part of llvm tools, but it’s a powerful tool in ELF analysis. This chapter introduce the
tool to readers since we think it is a valuable knowledge in this popular ELF format and the ELF binutils analysis tool.
An LLVM compiler engineer has the responsibility to make sure his backend has generated a right obj since the obj is
needed to be handled by linker or loader later. With this tool, you can verify your generated ELF format.
The cpu0 author has published a “System Software” book which introduces the topics of assembler, linker, loader,
compiler and OS in concept, and at same time demonstrates how to use binutils and gcc to analysis ELF through
the example code in his book. It’s a Chinese book of “System Software” in concept and practice. This book does
1 http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
2 http://jonathan2251.github.io/lbd/install.html#install-other-tools-on-imac
435
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
the real analysis through binutils. The “System Software” 3 written by Beck is a famous book in concept of telling
readers what about the compiler output, what about the linker output, what about the loader output, and how they work
together. But it covers the concept only. You can reference it to understand how the “Relocation Record” works if
you need to refresh or learning this knowledge for this chapter.
4 5 6
, , are the Chinese documents available from the cpu0 author on web site.
ELF is a format used in both obj and executable file. So, there are two views in it as Fig. 10.1.
As Fig. 10.1, the “Section header table” include sections .text, .rodata, ..., .data which are sections layout for code,
read only data, ..., and read/write data, respectively. “Program header table” include segments for run time code and
data. The definition of segments is the run time layout for code and data while sections is the link time layout for code
and data.
3 Leland Beck, System Software: An Introduction to Systems Programming.
4 http://ccckmit.wikidot.com/lk:aout
5 http://ccckmit.wikidot.com/lk:objfile
6 http://ccckmit.wikidot.com/lk:elf
Let’s run Chapter9_3/ with ch6_1.cpp, and dump ELF header information by readelf -h to see what information
the ELF header contains.
As above ELF header display, it contains information of magic number, version, ABI, ..., . The Machine field of cpu0
is unknown while mips is known as MIPSR3000. It is unknown because cpu0 is not a popular CPU recognized by
The result is in expectation because cpu0 obj is for link only, not for execution. So, the segments is empty. Check ELF
sections information as follows. Every section contains offset and size information.
[Gamma@localhost input]$ readelf -S ch6_1.cpu0.o
There are 10 section headers, starting at offset 0xd4:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000034 00 AX 0 0 4
[ 2] .rel.text REL 00000000 000310 000018 08 8 1 4
[ 3] .data PROGBITS 00000000 000068 000004 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 00006c 000000 00 WA 0 0 4
[ 5] .eh_frame PROGBITS 00000000 00006c 000028 00 A 0 0 4
[ 6] .rel.eh_frame REL 00000000 000328 000008 08 8 5 4
[ 7] .shstrtab STRTAB 00000000 000094 00003e 00 0 0 1
[ 8] .symtab SYMTAB 00000000 000264 000090 10 9 6 4
[ 9] .strtab STRTAB 00000000 0002f4 00001b 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
[Gamma@localhost input]$
.globl gI
.align 2
gI:
.4byte 100 # 0x64
.size gI, 4
Section Headers:
[Nr] Name
Type Addr Off Size ES Lk Inf Al
Flags
[ 0]
NULL 00000000 000000 000000 00 0 0 0
[00000000]:
[ 1] .text
PROGBITS 00000000 000034 000044 00 0 0 4
[00000006]: ALLOC, EXEC
[ 2] .rel.text
REL 00000000 0002a8 000020 08 6 1 4
[00000000]:
[ 3] .data
PROGBITS 00000000 000078 000008 00 0 0 4
[00000003]: WRITE, ALLOC
[ 4] .bss
NOBITS 00000000 000080 000000 00 0 0 4
[00000003]: WRITE, ALLOC
[ 5] .shstrtab
STRTAB 00000000 000080 000030 00 0 0 1
[00000000]:
[ 6] .symtab
SYMTAB 00000000 0001f0 000090 10 7 5 4
[00000000]:
[ 7] .strtab
STRTAB 00000000 000280 000025 00 0 0 1
[00000000]:
Section Headers:
[Nr] Name
Type Addr Off Size ES Lk Inf Al
Flags
[ 0]
NULL 00000000 000000 000000 00 0 0 0
[00000000]:
[ 1] .text
PROGBITS 00000000 000034 000038 00 0 0 4
[00000006]: ALLOC, EXEC
[ 2] .rel.text
REL 00000000 0002f8 000018 08 7 1 4
[00000000]:
[ 3] .data
PROGBITS 00000000 00006c 000008 00 0 0 4
[00000003]: WRITE, ALLOC
[ 4] .bss
NOBITS 00000000 000074 000000 00 0 0 4
[00000003]: WRITE, ALLOC
[ 5] .reginfo
MIPS_REGINFO 00000000 000074 000018 00 0 0 1
[00000002]: ALLOC
[ 6] .shstrtab
STRTAB 00000000 00008c 000039 00 0 0 1
[00000000]:
[ 7] .symtab
SYMTAB 00000000 000230 0000a0 10 8 6 4
[00000000]:
[ 8] .strtab
STRTAB 00000000 0002d0 000025 00 0 0 1
[00000000]:
As depicted in section Handle $gp register in PIC addressing mode, it translates “.cpload %reg” into the following.
The _gp_disp value is determined by loader. So, it’s undefined in obj. You can find both the Relocation Records
for offset 0 and 4 of .text section refer to _gp_disp value. The offset 0 and 4 of .text section are instructions “lui
$gp, %hi(_gp_disp)” and “ori $gp, $gp, %lo(_gp_disp)” which their corresponding obj encode are 0fa00000 and
0daa0000, respectively. The obj translates the %hi(_gp_disp) and %lo(_gp_disp) into 0 since when loader loads this
obj into memory, loader will know the _gp_disp value at run time and will update these two offset relocation records
to the correct offset value. You can check if the cpu0 of %hi(_gp_disp) and %lo(_gp_disp) are correct by above
mips Relocation Records of R_MIPS_HI(_gp_disp) and R_MIPS_LO(_gp_disp) even though the cpu0 is not a CPU
recognized by readelf utilitly. The instruction “ld $2, %got(gI)($gp)” is same since we don’t know what the address
of .data section variable will load to. So, Cpu0 translate the address to 0 and made a relocation record on 0x00000020
of .text section. Linker or Loader will change this address when this program is linked or loaded depends on the
program is static link or dynamic link.
Files Cpu0ELFObjectWrite.cpp and Cpu0MC*.cpp are the files take care the obj format. Most obj code translation
about specific instructions are defined by Cpu0InstrInfo.td and Cpu0RegisterInfo.td. With these td description, LLVM
translate Cpu0 instructions into obj format automatically.
10.2 llvm-objdump
10.2.1 llvm-objdump -t -r
In iMac, gobjdump -tr can display the information of relocation records like readelf -tr . LLVM tool
llvm-objdump is the same tool as objdump. Let’s run gobjdump and llvm-objdump commands as follows to see the
differences.
SYMBOL TABLE:
00000000 l df *ABS* 00000000 ch9_3.bc
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 g F .text 00000084 _Z5sum_iiz
00000084 g F .text 00000080 main
00000000 *UND* 00000000 _gp_disp
SYMBOL TABLE:
00000000 l df *ABS* 00000000 ch9_3.bc
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 g F .text 00000084 _Z5sum_iiz
00000084 g F .text 00000080 main
00000000 *UND* 00000000 _gp_disp
The llvm-objdump can display the file format and relocation records information well while the objdump cannot since
we add the relocation records information in ELF.h as follows,
include/llvm/support/ELF.h
// Machine architectures
enum {
...
EM_CPU0 = 998, // Document LLVM Backend Tutorial Cpu0
EM_CPU0_LE = 999 // EM_CPU0_LE: little endian; EM_CPU0: big endian
}
lib/object/ELF.cpp
...
include/llvm/Support/ELFRelocs/Cpu0.def
#ifndef ELF_RELOC
#error "ELF_RELOC must be defined"
#endif
ELF_RELOC(R_CPU0_NONE, 0)
ELF_RELOC(R_CPU0_32, 2)
ELF_RELOC(R_CPU0_HI16, 5)
ELF_RELOC(R_CPU0_LO16, 6)
ELF_RELOC(R_CPU0_GPREL16, 7)
ELF_RELOC(R_CPU0_LITERAL, 8)
ELF_RELOC(R_CPU0_GOT16, 9)
ELF_RELOC(R_CPU0_PC16, 10)
ELF_RELOC(R_CPU0_CALL16, 11)
ELF_RELOC(R_CPU0_GPREL32, 12)
ELF_RELOC(R_CPU0_PC24, 13)
ELF_RELOC(R_CPU0_GOT_HI16, 22)
ELF_RELOC(R_CPU0_GOT_LO16, 23)
ELF_RELOC(R_CPU0_RELGOT, 36)
ELF_RELOC(R_CPU0_TLS_GD, 42)
ELF_RELOC(R_CPU0_TLS_LDM, 43)
ELF_RELOC(R_CPU0_TLS_DTP_HI16, 44)
ELF_RELOC(R_CPU0_TLS_DTP_LO16, 45)
ELF_RELOC(R_CPU0_TLS_GOTTPREL, 46)
ELF_RELOC(R_CPU0_TLS_TPREL32, 47)
ELF_RELOC(R_CPU0_TLS_TP_HI16, 49)
ELF_RELOC(R_CPU0_TLS_TP_LO16, 50)
ELF_RELOC(R_CPU0_GLOB_DAT, 51)
ELF_RELOC(R_CPU0_JUMP_SLOT, 127)
include/llvm/Object/ELFObjectFile.h
In addition to llvm-objdump -t -r , the llvm-readobj -h can display the Cpu0 elf header information
with above EM_CPU0 defined.
10.2.2 llvm-objdump -d
Run the last Chapter example code with command llvm-objdump -d for dumping file from elf to hex as follows,
JonathantekiiMac:input Jonathan$ clang -target mips-unknown-linux-gnu -c
ch8_1_1.cpp -emit-llvm -o ch8_1_1.bc
JonathantekiiMac:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/Debug/bin/llc -march=cpu0 -relocation-model=pic -filetype=obj ch8_1_1.bc
-o ch8_1_1.cpu0.o
JonathantekiiMac:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/Debug/bin/llvm-objdump -d ch8_1_1.cpu0.o
To support llvm-objdump, the following code added to Chapter10_1/ (the DecoderMethod for brtarget24 has been
added in previous chapter).
lbdex/chapters/Chapter10_1/CMakeLists.txt
add_subdirectory(Disassembler)
lbdex/chapters/Chapter10_1/LLVMBuild.txt
subdirectories =
Disassembler
has_disassembler = 1
lbdex/chapters/Chapter10_1/Cpu0InstrInfo.td
lbdex/chapters/Chapter10_1/Disassembler/CMakeLists.txt
add_llvm_library(LLVMCpu0Disassembler
Cpu0Disassembler.cpp
)
lbdex/chapters/Chapter10_1/Disassembler/LLVMBuild.txt
[component_0]
type = Library
name = Cpu0Disassembler
parent = Cpu0
required_libraries = MCDisassembler Support Cpu0Info
add_to_library_groups = Cpu0
lbdex/chapters/Chapter10_1/Disassembler/Cpu0Disassembler.cpp
//
// This file is part of the Cpu0 Disassembler.
//
//===----------------------------------------------------------------------===//
#include "Cpu0.h"
#include "Cpu0RegisterInfo.h"
#include "Cpu0Subtarget.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCFixedLenDisassembler.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/TargetRegistry.h"
namespace {
virtual ~Cpu0DisassemblerBase() {}
protected:
bool IsBigEndian;
};
namespace llvm {
extern Target TheCpu0elTarget, TheCpu0Target, TheCpu064Target,
TheCpu064elTarget;
#include "Cpu0GenDisassemblerTables.inc"
/// Read four bytes from the ArrayRef and return 32 bit word sorted
/// according to the given endianess
static DecodeStatus readInstruction32(ArrayRef<uint8_t> Bytes, uint64_t Address,
uint64_t &Size, uint32_t &Insn,
bool IsBigEndian) {
// We want to read exactly 4 Bytes of data.
if (Bytes.size() < 4) {
Size = 0;
return MCDisassembler::Fail;
}
if (IsBigEndian) {
// Encoded as a big-endian 32-bit word in the stream.
Insn = (Bytes[3] << 0) |
(Bytes[2] << 8) |
(Bytes[1] << 16) |
(Bytes[0] << 24);
}
else {
// Encoded as a small-endian 32-bit word in the stream.
Insn = (Bytes[0] << 0) |
(Bytes[1] << 8) |
(Bytes[2] << 16) |
(Bytes[3] << 24);
}
return MCDisassembler::Success;
}
DecodeStatus
Cpu0Disassembler::getInstruction(MCInst &Instr, uint64_t &Size,
ArrayRef<uint8_t> Bytes,
uint64_t Address,
raw_ostream &VStream,
raw_ostream &CStream) const {
uint32_t Insn;
DecodeStatus Result;
if (Result == MCDisassembler::Fail)
return MCDisassembler::Fail;
return MCDisassembler::Fail;
}
Inst.addOperand(MCOperand::createReg(CPURegsTable[RegNo]));
return MCDisassembler::Success;
}
Inst.addOperand(MCOperand::createReg(C0RegsTable[RegNo]));
return MCDisassembler::Success;
}
//@DecodeMem {
static DecodeStatus DecodeMem(MCInst &Inst,
unsigned Insn,
uint64_t Address,
const void *Decoder) {
//@DecodeMem body {
int Offset = SignExtend32<16>(Insn & 0xffff);
int Reg = (int)fieldFromInstruction(Insn, 20, 4);
int Base = (int)fieldFromInstruction(Insn, 16, 4);
Inst.addOperand(MCOperand::createReg(CPURegsTable[Reg]));
Inst.addOperand(MCOperand::createReg(CPURegsTable[Base]));
Inst.addOperand(MCOperand::createImm(Offset));
return MCDisassembler::Success;
}
/* CBranch instruction define $ra and then imm24; The printOperand() print
operand 1 (operand 0 is $ra and operand 1 is imm24), so we Create register
operand first and create imm24 next, as follows,
// Cpu0InstrInfo.td
class CBranch<bits<8> op, string instr_asm, RegisterClass RC,
list<Register> UseRegs>:
FJ<op, (outs), (ins RC:$ra, brtarget:$addr),
!strconcat(instr_asm, "\t$addr"),
[(brcond RC:$ra, bb:$addr)], IIBranch> {
// Cpu0AsmWriter.inc
void Cpu0InstPrinter::printInstruction(const MCInst *MI, raw_ostream &O) {
...
case 3:
// CMP, JEQ, JGE, JGT, JLE, JLT, JNE
printOperand(MI, 1, O);
break;
*/
static DecodeStatus DecodeBranch24Target(MCInst &Inst,
unsigned Insn,
uint64_t Address,
const void *Decoder) {
int BranchOffset = fieldFromInstruction(Insn, 0, 24);
if (BranchOffset > 0x8fffff)
BranchOffset = -1*(0x1000000 - BranchOffset);
Inst.addOperand(MCOperand::createReg(Cpu0::SW));
Inst.addOperand(MCOperand::createImm(BranchOffset));
return MCDisassembler::Success;
As above code, it adds directory Disassembler to handle the reverse translation from obj to assembly. So, add Dis-
assembler/Cpu0Disassembler.cpp and modify the CMakeList.txt and LLVMBuild.txt to build directory Disassembler,
and enable the disassembler table generated by “has_disassembler = 1”. Most of code is handled by the table defined
in *.td files. Not every instruction in *.td can be disassembled without trouble even though they can be translated
into assembly and obj successfully. For those cannot be disassembled, LLVM supply the “let DecoderMethod”
keyword to allow programmers implement their decode function. For example in Cpu0, we define functions De-
codeBranch24Target(), DecodeJumpTarget() and DecodeJumpFR() in Cpu0Disassembler.cpp and tell the llvm-tblgen
by writing “let DecoderMethod = ...” in the corresponding instruction definitions or ISD node of Cpu0InstrInfo.td.
LLVM will call these DecodeMethod when user uses Disassembler tools, such as llvm-objdump -d .
Finally cpu032II include all cpu032I instruction set and adds some instrucitons. When llvm-objdump -d is
invoked, function selectCpu0ArchFeature() as the following will be called through createCpu0MCSubtargetInfo().
The llvm-objdump cannot set cpu option like llc as llc -mcpu=cpu032I , so the varaible CPU in se-
lectCpu0ArchFeature() is empty when invoked by llvm-objdump -d . Set Cpu0ArchFeature to “+cpu032II”
than it can disassemble all instructions (cpu032II include all cpu032I instructions and add some new instructions).
lbdex/chapters/Chapter10_1/MCTargetDesc/Cpu0MCTargetDesc.cpp
/// Select the Cpu0 Architecture Feature for the given triple and cpu name.
/// The function will be called at command 'llvm-objdump -d' for Cpu0 elf input.
Now, run Chapter10_1/ with command llvm-objdump -d ch8_1_1.cpu0.o will get the following result.
ELEVEN
ASSEMBLER
• AsmParser support
– Code structure explanation
– Code list and some detail functions explanation
• Inline assembly
ordinary assembly
inline assembly
In llvm, the first is supported by LLVM AsmParser, and the second is inline assembly handler. With AsmParser and
inline assembly support in Cpu0 backend, we can hand-code the assembly language in C/C++ file and translate it into
obj (elf format).
This section lists all the AsmParser code for Cpu0 backend with only a few explanation. Please refer here 1 for more
AsmParser explanation.
Run Chapter10_1/ with ch11_1.cpp will get the following error message.
1 http://www.embecosm.com/appnotes/ean10/ean10-howto-llvmas-1.0.html
453
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
lbdex/input/ch11_1.cpp
Since we havn’t implemented Cpu0 assembler, it has the error message as above. The Cpu0 can translate LLVM IR
into assembly and obj directly, but it cannot translate hand-code assembly instructions into obj.
Directory AsmParser handle the assembly to obj translation. The assembling Data Flow Diagram (DFD) as follows,
������������
���������������������������������������������
��������� ������������
����������
��������� ��������������������� ������������������������������
���������������������� ���������������������
�������������������� ��������������������
������������������ ������������������������� �������������������
Given an example of assembly instruction “add $v1, $v0, $at”, llvm AsmParser kernel call backend ParseInstruc-
tion() of Cpu0AsmParser.cpp when it parses and recognises that the first token at the beginning of line is identifier.
ParseInstruction() parses one assembly instruction, creates Operands and return to llvm AsmParser. Then AsmParser
calls backend MatchAndEmitInstruction() to set Opcode and Operands to MCInst, then encoder can encode binary
instruction from MCInst with the information come from Cpu0InstrInfo.td which includes binary value for Opcode ID
and Operand IDs of the instruction.
List the key functions and data structure of MatchAndEmitInstruction() and encodeInstruction(), explaining in com-
ments which begin with ///.
llvm/cmake_debug_build/lib/Target/Cpu0/Cpu0GenAsmMatcher.inc
enum InstructionConversionKind {
Convert__Reg1_0__Reg1_1__Reg1_2,
Convert__Reg1_0__Reg1_1__Imm1_2,
...
CVT_NUM_SIGNATURES
};
struct MatchEntry {
uint16_t Mnemonic;
uint16_t Opcode;
uint8_t ConvertFn;
uint32_t RequiredFeatures;
uint8_t Classes[3];
StringRef getMnemonic() const {
return StringRef(MnemonicTable + Mnemonic + 1,
MnemonicTable[Mnemonic]);
}
};
unsigned Cpu0AsmParser::
MatchInstructionImpl(const OperandVector &Operands,
MCInst &Inst, uint64_t &ErrorInfo,
bool matchingInlineAsm, unsigned VariantID) {
...
// Find the appropriate table for this asm variant.
const MatchEntry *Start, *End;
switch (VariantID) {
default: llvm_unreachable("invalid variant!");
case 0: Start = std::begin(MatchTable0); End = std::end(MatchTable0); break;
}
// Search the table.
auto MnemonicRange = std::equal_range(Start, End, Mnemonic, LessOpcode());
...
for (const MatchEntry *it = MnemonicRange.first, *ie = MnemonicRange.second;
it != ie; ++it) {
...
// We have selected a definite instruction, convert the parsed
// operands into the appropriate MCInst.
unsigned OpIdx;
Inst.setOpcode(Opcode);
for (const uint8_t *p = Converter; *p; p+= 2) {
OpIdx = *(p + 1);
switch (*p) {
default: llvm_unreachable("invalid conversion entry!");
case CVT_Reg:
static_cast<Cpu0Operand&>(*Operands[OpIdx]).addRegOperands(Inst, 1);
break;
...
}
}
}
lbdex/chapters/Chapter11_1/AsmParser/Cpu0AsmParser.cpp
/// For "ADD , V1, AT, V0", ParseInstruction() set Operands[1].Reg.RegNum = V1,
/// Operands[2].Reg.RegNum = AT, ..., by Cpu0Operand::CreateReg(RegNo, S,
/// Parser.getTok().getLoc()) in calling ParseOperand().
/// So, after (*Operands[1..3]).addRegOperands(Inst, 1),
/// Inst.Opcode = ADD, Inst.Operand[0] = V1, Inst.Operand[1] = AT,
/// Inst.Operand[2] = V0.
class Cpu0Operand : public MCParsedAsmOperand {
...
void addRegOperands(MCInst &Inst, unsigned N) const {
assert(N == 1 && "Invalid number of operands!");
Inst.addOperand(MCOperand::createReg(getReg()));
}
...
unsigned getReg() const override {
assert((Kind == k_Register) && "Invalid access!");
return Reg.RegNum;
}
...
}
lbdex/chapters/Chapter11_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp
void Cpu0MCCodeEmitter::
encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const
{
uint32_t Binary = getBinaryCodeForInstr(MI, Fixups, STI);
...
EmitInstruction(Binary, Size, OS);
}
llvm/cmake_debug_build/lib/Target/Cpu0/Cpu0GenMCCodeEmitter.inc
����������������������������������������������������������������������
�������������������������
�����
������������� ������ �����������������
������������������� ��� ����������������������������� �������������� ���������������
�������������� ������������������������ �����������
��������������
����������������������������������������������������������������������������������������������������������
������������������
������������������������� ��������������������� �������������������
���������������������
�������������������� ������
����������������� ����������������������� �����������������
MatchTable0 include all the possibile combinations of opcode and operands type. Even the assembly instruction of
user input may pass Cpu0AsmParser in syntax check, the MatchAndEmitInstruction() still can be fail. For example,
instruction “asm(“move $3, $2);” can pass but “asm(“move $3, $2, $1”);” will fail.
lbdex/chapters/Chapter11_1/AsmParser/Cpu0AsmParser.cpp
#include "Cpu0.h"
#if CH >= CH11_1
#include "MCTargetDesc/Cpu0MCExpr.h"
#include "MCTargetDesc/Cpu0MCTargetDesc.h"
#include "Cpu0RegisterInfo.h"
#include "llvm/ADT/APInt.h"
#include "llvm/ADT/StringSwitch.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstBuilder.h"
#include "llvm/MC/MCParser/MCAsmLexer.h"
#include "llvm/MC/MCParser/MCParsedAsmOperand.h"
#include "llvm/MC/MCParser/MCTargetAsmParser.h"
#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/MC/MCParser/MCAsmLexer.h"
#include "llvm/MC/MCParser/MCParsedAsmOperand.h"
#include "llvm/MC/MCValue.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/TargetRegistry.h"
namespace {
class Cpu0AssemblerOptions {
public:
Cpu0AssemblerOptions():
reorder(true), macro(true) {
}
private:
bool reorder;
bool macro;
};
}
namespace {
class Cpu0AsmParser : public MCTargetAsmParser {
MCAsmParser &Parser;
Cpu0AssemblerOptions Options;
#define GET_ASSEMBLER_HEADER
#include "Cpu0GenAsmMatcher.inc"
bool parseDirectiveSet();
bool parseSetAtDirective();
bool parseSetNoAtDirective();
bool parseSetMacroDirective();
bool parseSetNoMacroDirective();
bool parseSetReorderDirective();
bool parseSetNoReorderDirective();
public:
Cpu0AsmParser(const MCSubtargetInfo &sti, MCAsmParser &parser,
const MCInstrInfo &MII, const MCTargetOptions &Options)
: MCTargetAsmParser(Options, sti), Parser(parser) {
// Initialize the set of available features.
setAvailableFeatures(ComputeAvailableFeatures(getSTI().getFeatureBits()));
}
};
}
namespace {
enum KindTy {
k_Immediate,
k_Memory,
k_Register,
k_Token
} Kind;
public:
Cpu0Operand(KindTy K) : MCParsedAsmOperand(), Kind(K) {}
struct Token {
const char *Data;
unsigned Length;
};
struct PhysRegOp {
unsigned RegNum; /// Register Number
};
struct ImmOp {
const MCExpr *Val;
};
struct MemOp {
unsigned Base;
const MCExpr *Off;
};
union {
struct Token Tok;
struct PhysRegOp Reg;
struct ImmOp Imm;
struct MemOp Mem;
};
public:
void addRegOperands(MCInst &Inst, unsigned N) const {
assert(N == 1 && "Invalid number of operands!");
Inst.addOperand(MCOperand::createReg(getReg()));
}
Inst.addOperand(MCOperand::createReg(getMemBase()));
/// getStartLoc - Get the location of the first token of this operand.
SMLoc getStartLoc() const override { return StartLoc; }
/// getEndLoc - Get the location of the last token of this operand.
SMLoc getEndLoc() const override { return EndLoc; }
case k_Immediate:
OS << "Imm<";
OS << *Imm.Val;
OS << ">";
break;
case k_Memory:
OS << "Mem<";
OS << Mem.Base;
OS << ", ";
OS << *Mem.Off;
OS << ">";
break;
case k_Register:
OS << "Register<" << Reg.RegNum << ">";
break;
case k_Token:
OS << Tok.Data;
break;
}
}
};
}
//@1 {
bool Cpu0AsmParser::needsExpansion(MCInst &Inst) {
switch(Inst.getOpcode()) {
case Cpu0::LoadImm32Reg:
case Cpu0::LoadAddr32Imm:
case Cpu0::LoadAddr32Reg:
return true;
default:
return false;
}
}
tmpInst.addOperand(MCOperand::createReg(RegOp.getReg()));
tmpInst.addOperand(MCOperand::createImm(ImmValue & 0xffff));
Instructions.push_back(tmpInst);
}
}
//@2 {
bool Cpu0AsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
OperandVector &Operands,
MCStreamer &Out,
uint64_t &ErrorInfo,
bool MatchingInlineAsm) {
printCpu0Operands(Operands);
MCInst Inst;
unsigned MatchResult = MatchInstructionImpl(Operands, Inst, ErrorInfo,
MatchingInlineAsm);
switch (MatchResult) {
default: break;
case Match_Success: {
if (needsExpansion(Inst)) {
SmallVector<MCInst, 4> Instructions;
expandInstruction(Inst, IDLoc, Instructions);
for(unsigned i =0; i < Instructions.size(); i++){
Out.EmitInstruction(Instructions[i], getSTI());
}
} else {
Inst.setLoc(IDLoc);
Out.EmitInstruction(Inst, getSTI());
}
return false;
}
//@2 }
case Match_MissingFeature:
Error(IDLoc, "instruction requires a CPU feature not currently enabled");
return true;
case Match_InvalidOperand: {
SMLoc ErrorLoc = IDLoc;
if (ErrorInfo != ~0U) {
if (ErrorInfo >= Operands.size())
return Error(IDLoc, "too few operands for instruction");
int CC;
CC = StringSwitch<unsigned>(Name)
.Case("zero", Cpu0::ZERO)
.Case("at", Cpu0::AT)
.Case("v0", Cpu0::V0)
.Case("v1", Cpu0::V1)
.Case("a0", Cpu0::A0)
.Case("a1", Cpu0::A1)
.Case("t9", Cpu0::T9)
.Case("t0", Cpu0::T0)
.Case("t1", Cpu0::T1)
.Case("s0", Cpu0::S0)
.Case("s1", Cpu0::S1)
.Case("sw", Cpu0::SW)
.Case("gp", Cpu0::GP)
.Case("fp", Cpu0::FP)
.Case("sp", Cpu0::SP)
.Case("lr", Cpu0::LR)
.Case("pc", Cpu0::PC)
.Case("hi", Cpu0::HI)
.Case("lo", Cpu0::LO)
.Case("epc", Cpu0::EPC)
.Default(-1);
if (CC != -1)
return CC;
return -1;
}
if (Tok.is(AsmToken::Identifier)) {
std::string lowerCase = Tok.getString().lower();
RegNum = matchRegisterName(lowerCase);
} else if (Tok.is(AsmToken::Integer))
RegNum = matchRegisterByNumber(static_cast<unsigned>(Tok.getIntVal()),
Mnemonic.lower());
else
return RegNum; //error
return RegNum;
}
bool Cpu0AsmParser::
tryParseRegisterOperand(OperandVector &Operands,
StringRef Mnemonic){
SMLoc S = Parser.getTok().getLoc();
RegNo = tryParseRegister(Mnemonic);
if (RegNo == -1)
return true;
Operands.push_back(Cpu0Operand::CreateReg(RegNo, S,
Parser.getTok().getLoc()));
Parser.Lex(); // Eat register token.
return false;
}
switch (getLexer().getKind()) {
default:
Error(Parser.getTok().getLoc(), "unexpected token in operand");
return true;
case AsmToken::Dollar: {
// parse register
SMLoc S = Parser.getTok().getLoc();
Parser.Lex(); // Eat dollar token.
// parse register operand
if (!tryParseRegisterOperand(Operands, Mnemonic)) {
if (getLexer().is(AsmToken::LParen)) {
// check if it is indexed addressing operand
Operands.push_back(Cpu0Operand::CreateToken("(", S));
Parser.Lex(); // eat parenthesis
if (getLexer().isNot(AsmToken::Dollar))
return true;
if (!getLexer().is(AsmToken::RParen))
return true;
S = Parser.getTok().getLoc();
Operands.push_back(Cpu0Operand::CreateToken(")", S));
Parser.Lex();
}
return false;
}
Operands.push_back(Cpu0Operand::CreateImm(Res, S, E));
return false;
}
case AsmToken::Identifier:
case AsmToken::LParen:
case AsmToken::Minus:
case AsmToken::Plus:
case AsmToken::Integer:
case AsmToken::String: {
// quoted label names
const MCExpr *IdVal;
SMLoc S = Parser.getTok().getLoc();
if (getParser().parseExpression(IdVal))
return true;
SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1);
Operands.push_back(Cpu0Operand::CreateImm(IdVal, S, E));
return false;
}
case AsmToken::Percent: {
// it is a symbol reference or constant expression
const MCExpr *IdVal;
SMLoc S = Parser.getTok().getLoc(); // start location of the operand
if (parseRelocOperand(IdVal))
return true;
Operands.push_back(Cpu0Operand::CreateImm(IdVal, S, E));
return false;
} // case AsmToken::Percent
} // switch(getLexer().getKind())
return true;
}
.Case("got_lo", Cpu0MCExpr::CEK_GOT_LO16)
.Case("gottprel", Cpu0MCExpr::CEK_GOTTPREL)
.Case("gp_rel", Cpu0MCExpr::CEK_GPREL)
.Case("hi", Cpu0MCExpr::CEK_ABS_HI)
.Case("lo", Cpu0MCExpr::CEK_ABS_LO)
.Case("tlsgd", Cpu0MCExpr::CEK_TLSGD)
.Case("tlsldm", Cpu0MCExpr::CEK_TLSLDM)
.Case("tp_hi", Cpu0MCExpr::CEK_TP_HI)
.Case("tp_lo", Cpu0MCExpr::CEK_TP_LO)
.Default(Cpu0MCExpr::CEK_None);
assert(Kind != Cpu0MCExpr::CEK_None);
return Cpu0MCExpr::create(Kind, Expr, getContext());
}
if (getLexer().getKind() == AsmToken::LParen) {
while (1) {
Parser.Lex(); // eat '(' token
if (getLexer().getKind() == AsmToken::Percent) {
Parser.Lex(); // eat % token
const AsmToken &nextTok = Parser.getTok();
if (nextTok.isNot(AsmToken::Identifier))
return true;
Str += "(%";
Str += nextTok.getIdentifier();
Parser.Lex(); // eat identifier
if (getLexer().getKind() != AsmToken::LParen)
return true;
} else
break;
}
if (getParser().parseParenExpression(IdVal,EndLoc))
return true;
} else
return true; // parenthesis must follow reloc operand
StartLoc = Parser.getTok().getLoc();
RegNo = tryParseRegister("");
EndLoc = Parser.getTok().getLoc();
return (RegNo == (unsigned)-1);
}
SMLoc S;
switch(getLexer().getKind()) {
default:
return true;
case AsmToken::Integer:
case AsmToken::Minus:
case AsmToken::Plus:
return (getParser().parseExpression(Res));
case AsmToken::Percent:
return parseRelocOperand(Res);
case AsmToken::LParen:
return false; // it's probably assuming 0
}
return true;
}
if (parseMemOffset(IdVal))
return MatchOperand_ParseFail;
} else {
Error(Parser.getTok().getLoc(), "unexpected token in operand");
return MatchOperand_ParseFail;
}
if (!IdVal)
IdVal = MCConstantExpr::create(0, getContext());
bool Cpu0AsmParser::
parseMathOperation(StringRef Name, SMLoc NameLoc,
OperandVector &Operands) {
// split the format
size_t Start = Name.find('.'), Next = Name.rfind('.');
StringRef Format1 = Name.slice(Start, Next);
// and add the first format to the operands
Operands.push_back(Cpu0Operand::CreateToken(Format1, NameLoc));
// now for the second format
StringRef Format2 = Name.slice(Next, StringRef::npos);
Operands.push_back(Cpu0Operand::CreateToken(Format2, NameLoc));
if (getLexer().isNot(AsmToken::Comma)) {
}
Parser.Lex(); // Eat the comma.
if (getLexer().isNot(AsmToken::EndOfStatement)) {
SMLoc Loc = getLexer().getLoc();
Parser.eatToEndOfStatement();
return Error(Loc, "unexpected token in argument list");
}
bool Cpu0AsmParser::
ParseInstruction(ParseInstructionInfo &Info, StringRef Name, SMLoc NameLoc,
OperandVector &Operands) {
// Create the leading tokens for the mnemonic, split by '.' characters.
size_t Start = 0, Next = Name.find('.');
StringRef Mnemonic = Name.slice(Start, Next);
// Refer to the explanation in source code of function DecodeJumpFR(...) in
// Cpu0Disassembler.cpp
if (Mnemonic == "ret")
Mnemonic = "jr";
Operands.push_back(Cpu0Operand::CreateToken(Mnemonic, NameLoc));
while (getLexer().is(AsmToken::Comma) ) {
Parser.Lex(); // Eat the comma.
if (getLexer().isNot(AsmToken::EndOfStatement)) {
SMLoc Loc = getLexer().getLoc();
Parser.eatToEndOfStatement();
return Error(Loc, "unexpected token in argument list");
}
bool Cpu0AsmParser::parseSetReorderDirective() {
Parser.Lex();
// if this is not the end of the statement, report error
if (getLexer().isNot(AsmToken::EndOfStatement)) {
reportParseError("unexpected token in statement");
return false;
}
Options.setReorder();
Parser.Lex(); // Consume the EndOfStatement
return false;
}
bool Cpu0AsmParser::parseSetNoReorderDirective() {
Parser.Lex();
// if this is not the end of the statement, report error
if (getLexer().isNot(AsmToken::EndOfStatement)) {
reportParseError("unexpected token in statement");
return false;
}
Options.setNoreorder();
Parser.Lex(); // Consume the EndOfStatement
return false;
}
bool Cpu0AsmParser::parseSetMacroDirective() {
Parser.Lex();
// if this is not the end of the statement, report error
if (getLexer().isNot(AsmToken::EndOfStatement)) {
reportParseError("unexpected token in statement");
return false;
}
Options.setMacro();
Parser.Lex(); // Consume the EndOfStatement
return false;
}
bool Cpu0AsmParser::parseSetNoMacroDirective() {
Parser.Lex();
// if this is not the end of the statement, report error
if (getLexer().isNot(AsmToken::EndOfStatement)) {
if (Tok.getString() == "reorder") {
return parseSetReorderDirective();
} else if (Tok.getString() == "noreorder") {
return parseSetNoReorderDirective();
} else if (Tok.getString() == "macro") {
return parseSetMacroDirective();
} else if (Tok.getString() == "nomacro") {
return parseSetNoMacroDirective();
}
return true;
}
if (DirectiveID.getString() == ".ent") {
// ignore this directive for now
Parser.Lex();
return false;
}
if (DirectiveID.getString() == ".end") {
// ignore this directive for now
Parser.Lex();
return false;
}
if (DirectiveID.getString() == ".frame") {
// ignore this directive for now
Parser.eatToEndOfStatement();
return false;
}
if (DirectiveID.getString() == ".set") {
return parseDirectiveSet();
}
if (DirectiveID.getString() == ".fmask") {
// ignore this directive for now
Parser.eatToEndOfStatement();
return false;
}
if (DirectiveID.getString() == ".mask") {
// ignore this directive for now
Parser.eatToEndOfStatement();
return false;
}
if (DirectiveID.getString() == ".gpword") {
// ignore this directive for now
Parser.eatToEndOfStatement();
return false;
}
return true;
}
#define GET_REGISTER_MATCHER
#define GET_MATCHER_IMPLEMENTATION
#include "Cpu0GenAsmMatcher.inc"
lbdex/chapters/Chapter11_1/AsmParser/CMakeLists.txt
add_llvm_library(LLVMCpu0AsmParser
Cpu0AsmParser.cpp
)
lbdex/chapters/Chapter11_1/AsmParser/LLVMBuild.txt
[component_0]
type = Library
name = Cpu0AsmParser
parent = Cpu0
required_libraries = Cpu0Desc Cpu0Info MC MCParser Support
add_to_library_groups = Cpu0
The Cpu0AsmParser.cpp contains one thousand lines of code which do the assembly language parsing. You can
understand it with a little patience only. To let files in directory of AsmParser be built, modify CMakeLists.txt and
LLVMBuild.txt as follows,
lbdex/chapters/Chapter11_1/CMakeLists.txt
set(LLVM_TARGET_DEFINITIONS Cpu0Asm.td)
tablegen(LLVM Cpu0GenAsmMatcher.inc -gen-asm-matcher)
lbdex/chapters/Chapter11_1/LLVMBuild.txt
subdirectories =
AsmParser
has_asmparser = 1
lbdex/chapters/Chapter11_1/Cpu0Asm.td
//===-- Cpu0Asm.td - Describe the Cpu0 Target Machine ------*- tablegen -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
// This is the top level entry point for the Cpu0 target.
//===----------------------------------------------------------------------===//
//===----------------------------------------------------------------------===//
// Target-independent interfaces
//===----------------------------------------------------------------------===//
include "llvm/Target/Target.td"
//===----------------------------------------------------------------------===//
// Target-dependent interfaces
//===----------------------------------------------------------------------===//
include "Cpu0RegisterInfo.td"
include "Cpu0RegisterInfoGPROutForAsm.td"
include "Cpu0.td"
lbdex/chapters/Chapter11_1/Cpu0RegisterInfoGPROutForAsm.td
//===----------------------------------------------------------------------===//
// Register Classes
//===----------------------------------------------------------------------===//
lbdex/chapters/Chapter2/Cpu0RegisterInfo.td
lbdex/chapters/Chapter11_1/Cpu0.td
lbdex/chapters/Chapter11_1/Cpu0InstrFormats.td
lbdex/chapters/Chapter11_1/Cpu0InstrInfo.td
// Address operand
def mem : Operand<i32> {
...
let ParserMatchClass = Cpu0MemAsmOperand;
}
...
//===----------------------------------------------------------------------===//
// Pseudo Instruction definition
//===----------------------------------------------------------------------===//
Above Cpu0InstrInfo.td declare the let ParserMethod = “parseMemOperand” and implement the parseMemO-
perand() in Cpu0AsmParser.cpp to handle the “mem” operand which used in Cpu0 instructions ld and st. For example,
ld $2, 4($sp), the mem operand is 4($sp). Accompany with “let ParserMatchClass = Cpu0MemAsmOperand;”,
LLVM will call parseMemOperand() of Cpu0AsmParser.cpp when it meets the assembly mem operand 4($sp). With
above “let” assignment, TableGen will generate the following structure and functions in Cpu0GenAsmMatcher.inc.
cmake_debug_build/lib/Target/Cpu0/Cpu0GenAsmMatcher.inc
enum OperandMatchResultTy {
MatchOperand_Success, // operand matched successfully
MatchOperand_NoMatch, // operand did not match
MatchOperand_ParseFail // operand matched but had errors
};
OperandMatchResultTy MatchOperandParserImpl(
OperandVector &Operands,
StringRef Mnemonic);
OperandMatchResultTy tryCustomParseOperand(
OperandVector &Operands,
unsigned MCK);
...
Cpu0AsmParser::OperandMatchResultTy Cpu0AsmParser::
tryCustomParseOperand(OperandVector &Operands,
unsigned MCK) {
switch(MCK) {
case MCK_Mem:
return parseMemOperand(Operands);
default:
return MatchOperand_NoMatch;
}
return MatchOperand_NoMatch;
}
Cpu0AsmParser::OperandMatchResultTy Cpu0AsmParser::
MatchOperandParserImpl(OperandVector &Operands,
StringRef Mnemonic) {
...
}
Above three Pseudo Instruction definitions in Cpu0InstrInfo.td, such as LoadImm32Reg, are handled by
Cpu0AsmParser.cpp as follows,
lbdex/chapters/Chapter11_1/AsmParser/Cpu0AsmParser.cpp
switch(Inst.getOpcode()) {
case Cpu0::LoadImm32Reg:
case Cpu0::LoadAddr32Imm:
case Cpu0::LoadAddr32Reg:
return true;
default:
return false;
}
...
}
Finally, remind that the CPURegs as below must follow the order of register number because AsmParser uses them
when do register number encoding.
lbdex/chapters/Chapter11_1/Cpu0RegisterInfo.td
//===----------------------------------------------------------------------===//
// The register string, such as "9" or "gp" will show on "llvm-objdump -d"
//@ All registers definition
let Namespace = "Cpu0" in {
//@ General Purpose Registers
def ZERO : Cpu0GPRReg<0, "zero">, DwarfRegNum<[0]>;
def AT : Cpu0GPRReg<1, "1">, DwarfRegNum<[1]>;
def V0 : Cpu0GPRReg<2, "2">, DwarfRegNum<[2]>;
//===----------------------------------------------------------------------===//
//@Register Classes
//===----------------------------------------------------------------------===//
The instructions cmp and jeg printed with explicit $sw displayed in assembly and disassembly. You can change the
code in AsmParser and Dissassembly (the last chapter) to hide the $sw printed in these instructions (such as “jeq 20”
rather than “jeq $sw, 20”).
Both AsmParser and Cpu0AsmParser inherited from MCAsmParser as follows,
src/lib/MC/MCParser/AsmParser.cpp
src/lib/MC/MCParser/AsmParser.cpp
lbdex/input/ch11_2.cpp
// call i32 asm sideeffect "addu $0,$1,$2", "=r,r,r"(i32 %1, i32 %2) #1, !srcloc !1
__asm__ __volatile__("addu %0,%1,%2"
:"=r"(foo) // 5
:"r"(foo), "r"(bar)
);
return foo;
}
int inlineasm_longlong(void)
{
int a, b;
const long long bar = 0x0000000500000006;
int* p = (int*)&bar;
// int* q = (p+1); // Do not set q here.
// call i32 asm sideeffect "ld $0,$1", "=r,*m"(i32* %2) #2, !srcloc !2
__asm__ __volatile__("ld %0,%1"
:"=r"(a) // 0x700070007000700b
:"m"(*p)
);
int* q = (p+1); // Set q just before inline asm refer to avoid register clobbered.
// call i32 asm sideeffect "ld $0,$1", "=r,*m"(i32* %6) #2, !srcloc !3
__asm__ __volatile__("ld %0,%1"
:"=r"(b) // 11
:"m"(*q)
// Or use :"m"(*(p+1)) to avoid register clobbered.
);
return (a+b);
}
int inlineasm_constraint(void)
{
int foo = 10;
const int n_5 = -5;
const int n5 = 5;
const int n0 = 0;
const unsigned int un5 = 5;
const int n65536 = 0x10000;
const int n_65531 = -65531;
// call i32 asm sideeffect "addiu $0,$1,$2", "=r,r,I"(i32 %1, i32 15) #1, !srcloc !2
__asm__ __volatile__("addiu %0,%1,%2"
:"=r"(foo) // 15
:"r"(foo), "I"(n_5)
);
return foo;
}
return w;
}
int inlineasm_global()
{
int c, d;
__asm__ __volatile__("ld %0,%1"
:"=r"(c) // c=3
:"m"(g[2])
);
return d;
}
#ifdef TESTSOFTFLOATLIB
// test_float() will call soft float library
int inlineasm_float()
{
float a = 2.2;
float b = 3.3;
int d;
__asm__ __volatile__("addiu %0,%1,1"
:"=r"(d)
:"r"(c)
);
return d;
}
#endif
int test_inlineasm()
{
int a, b, c, d, e, f;
a = inlineasm_addu(); // 25
b = inlineasm_longlong(); // 11
c = inlineasm_constraint(); // 15
d = inlineasm_arg(1, 10); // -9
e = inlineasm_arg(6, 3); // 3
__asm__ __volatile__("addiu %0,%1,1"
:"=r"(f) // e=4
:"r"(e)
);
The ch11_2.cpp is a inline assembly example. The clang supports inline assembly like gcc. The inline assembly
used in C/C++ when program need to access the specific allocated register or memory for the C/C++ variable. For
example, the variable foo of ch11_2.cpp may be allocated by compiler to register $2, $3 or any other register. The
inline assembly fills the gap between high level language and assembly language. Reference here 2 . Chapter11_2
supports inline assembly as follows,
lbdex/chapters/Chapter11_2/Cpu0AsmPrinter.h
lbdex/chapters/Chapter11_2/Cpu0AsmPrinter.cpp
2 http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
return false;
case 'm': // decimal const int minus 1
if ((MO.getType()) != MachineOperand::MO_Immediate)
return true;
O << MO.getImm() - 1;
return false;
case 'z': {
// $0 if zero, regular printing otherwise
if (MO.getType() != MachineOperand::MO_Immediate)
return true;
int64_t Val = MO.getImm();
if (Val)
O << Val;
else
O << "$0";
return false;
}
}
}
return false;
}
if (MO.getTargetFlags())
closeP = true;
switch(MO.getTargetFlags()) {
case Cpu0II::MO_GPREL: O << "%gp_rel("; break;
case Cpu0II::MO_GOT_CALL: O << "%call16("; break;
case Cpu0II::MO_GOT: O << "%got("; break;
case Cpu0II::MO_ABS_HI: O << "%hi("; break;
case Cpu0II::MO_ABS_LO: O << "%lo("; break;
case Cpu0II::MO_GOT_HI16: O << "%got_hi16("; break;
case Cpu0II::MO_GOT_LO16: O << "%got_lo16("; break;
}
switch (MO.getType()) {
case MachineOperand::MO_Register:
O << '$'
<< StringRef(Cpu0InstPrinter::getRegisterName(MO.getReg())).lower();
break;
case MachineOperand::MO_Immediate:
O << MO.getImm();
break;
case MachineOperand::MO_MachineBasicBlock:
O << *MO.getMBB()->getSymbol();
return;
case MachineOperand::MO_GlobalAddress:
O << *getSymbol(MO.getGlobal());
break;
case MachineOperand::MO_BlockAddress: {
MCSymbol *BA = GetBlockAddressSymbol(MO.getBlockAddress());
O << BA->getName();
break;
}
case MachineOperand::MO_ExternalSymbol:
O << *GetExternalSymbolSymbol(MO.getSymbolName());
break;
case MachineOperand::MO_JumpTableIndex:
O << MAI->getPrivateGlobalPrefix() << "JTI" << getFunctionNumber()
<< '_' << MO.getIndex();
break;
case MachineOperand::MO_ConstantPoolIndex:
O << MAI->getPrivateGlobalPrefix() << "CPI"
<< getFunctionNumber() << "_" << MO.getIndex();
if (MO.getOffset())
O << "+" << MO.getOffset();
break;
default:
llvm_unreachable("<unknown operand type>");
}
lbdex/chapters/Chapter11_2/Cpu0InstrInfo.cpp
/// Return the number of bytes of code the specified instruction may be.
unsigned Cpu0InstrInfo::GetInstSizeInBytes(const MachineInstr &MI) const {
lbdex/chapters/Chapter11_2/Cpu0ISelDAGToDAG.h
lbdex/chapters/Chapter11_2/Cpu0ISelDAGToDAG.cpp
// inlineasm begin
bool Cpu0DAGToDAGISel::
SelectInlineAsmMemoryOperand(const SDValue &Op, unsigned ConstraintID,
std::vector<SDValue> &OutOps) {
// All memory constraints can at least accept raw pointers.
switch(ConstraintID) {
default:
llvm_unreachable("Unexpected asm memory constraint");
case InlineAsm::Constraint_m:
OutOps.push_back(Op);
return false;
}
return true;
}
// inlineasm end
lbdex/chapters/Chapter11_2/Cpu0ISelLowering.h
/// Examine constraint string and operand type and determine a weight value.
/// The operand object must already have been set up with the operand type.
ConstraintWeight getSingleConstraintMatchWeight(
AsmOperandInfo &info, const char *constraint) const override;
lbdex/chapters/Chapter11_2/Cpu0ISelLowering.cpp
//===----------------------------------------------------------------------===//
// Cpu0 Inline Assembly Support
//===----------------------------------------------------------------------===//
/// Examine constraint type and operand type and determine a weight value.
/// This object must already have been set up with the operand type
/// and the current alternative constraint selected.
TargetLowering::ConstraintWeight
Cpu0TargetLowering::getSingleConstraintMatchWeight(
AsmOperandInfo &info, const char *constraint) const {
ConstraintWeight weight = CW_Invalid;
Value *CallOperandVal = info.CallOperandVal;
// If we don't have a value, we can't do a match,
// but allow it at the lowest weight.
if (!CallOperandVal)
return CW_Default;
Type *type = CallOperandVal->getType();
// Look at the constraint type.
switch (*constraint) {
default:
weight = TargetLowering::getSingleConstraintMatchWeight(info, constraint);
break;
case 'c': // $t9 for indirect jumps
if (type->isIntegerTy())
weight = CW_SpecificReg;
break;
case 'I': // signed 16 bit immediate
case 'J': // integer zero
case 'K': // unsigned 16 bit immediate
case 'L': // signed 32 bit immediate where lower 16 bits are 0
case 'N': // immediate in the range of -65535 to -1 (inclusive)
case 'O': // signed 15 bit immediate (+- 16383)
case 'P': // immediate in the range of 65535 to 1 (inclusive)
if (isa<ConstantInt>(CallOperandVal))
weight = CW_Constant;
break;
case 'R':
weight = CW_Memory;
break;
}
return weight;
}
/// This is a helper function to parse a physical register string and split it
/// into non-numeric and numeric parts (Prefix and Reg). The first boolean flag
/// that is returned indicates whether parsing was successful. The second flag
/// is true if the numeric part exists.
static std::pair<bool, bool>
parsePhysicalReg(const StringRef &C, std::string &Prefix,
unsigned long long &Reg) {
if (C.front() != '{' || C.back() != '}')
return std::make_pair(false, false);
Prefix.assign(B, I - B);
if (!R.first)
return std::make_pair(0U, nullptr);
if (!R.second)
return std::make_pair(0U, nullptr);
// Parse $0-$15.
assert(Prefix == "$");
/// Given a register class constraint, like 'r', if this corresponds directly
/// to an LLVM register class, return a register of 0 and the register class
/// pointer.
std::pair<unsigned, const TargetRegisterClass *>
Cpu0TargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,
StringRef Constraint,
MVT VT) const
{
if (Constraint.size() == 1) {
switch (Constraint[0]) {
case 'r':
if (VT == MVT::i32 || VT == MVT::i16 || VT == MVT::i8) {
return std::make_pair(0U, &Cpu0::CPURegsRegClass);
}
if (VT == MVT::i64)
return std::make_pair(0U, &Cpu0::CPURegsRegClass);
// This will generate an error message
return std::make_pair(0u, static_cast<const TargetRegisterClass*>(0));
case 'c': // register suitable for indirect jump
if (VT == MVT::i32)
return std::make_pair((unsigned)Cpu0::T9, &Cpu0::CPURegsRegClass);
assert("Unexpected type.");
}
}
if (R.second)
return R;
if (Result.getNode()) {
Ops.push_back(Result);
return;
}
switch (AM.Scale) {
case 0: // "r+i" or just "i", depending on HasBaseReg.
break;
case 1:
if (!AM.HasBaseReg) // allow "r+i".
break;
return false; // disallow "r+r" or "r+r+i".
default:
return false;
}
return true;
}
Same with backend structure, the structure of inline assembly can be divided by file name as Table: the structure of
inline assembly.
Except Cpu0ISelDAGToDAG.cpp, the other functions are same with backend’s compile code. The
Cpu0ISelLowering.cpp inline asm is explained after the result of running with ch11_2.cpp. Cpu0ISelDAGToDAG.cpp
just save OP code in SelectInlineAsmMemoryOperand(). Since the the OP code is Cpu0 inline assembly instruction,
no llvm IR DAG translation needed further, just save OP directly and return false to notify llvm system that Cpu0
.file "ch11_2.bc"
.text
.globl _Z14inlineasm_adduv
.align 2
.type _Z14inlineasm_adduv,@function
.ent _Z14inlineasm_adduv # @_Z14inlineasm_adduv
_Z14inlineasm_adduv:
.frame $fp,16,$lr
.mask 0x00001000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -16
st $fp, 12($sp) # 4-byte Folded Spill
addu $fp, $sp, $zero
addiu $2, $zero, 10
st $2, 8($fp)
addiu $2, $zero, 15
st $2, 4($fp)
ld $3, 8($fp)
#APP
addu $2,$3,$2
#NO_APP
st $2, 8($fp)
addu $sp, $fp, $zero
ld $fp, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z14inlineasm_adduv
$tmp3:
.size _Z14inlineasm_adduv, ($tmp3)-_Z14inlineasm_adduv
.globl _Z18inlineasm_longlongv
.align 2
.type _Z18inlineasm_longlongv,@function
.ent _Z18inlineasm_longlongv # @_Z18inlineasm_longlongv
_Z18inlineasm_longlongv:
.frame $fp,32,$lr
.mask 0x00001000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -32
st $fp, 28($sp) # 4-byte Folded Spill
addu $fp, $sp, $zero
addiu $2, $zero, 6
st $2, 12($fp)
addiu $2, $zero, 5
st $2, 8($fp)
addiu $2, $fp, 8
st $2, 4($fp)
#APP
ld $2,0($2)
#NO_APP
st $2, 24($fp)
ld $2, 4($fp)
addiu $2, $2, 4
st $2, 0($fp)
#APP
ld $2,0($2)
#NO_APP
st $2, 20($fp)
ld $3, 24($fp)
addu $2, $3, $2
addu $sp, $fp, $zero
ld $fp, 28($sp) # 4-byte Folded Reload
addiu $sp, $sp, 32
ret $lr
.set macro
.set reorder
.end _Z18inlineasm_longlongv
$tmp7:
.size _Z18inlineasm_longlongv, ($tmp7)-_Z18inlineasm_longlongv
.globl _Z20inlineasm_constraintv
.align 2
.type _Z20inlineasm_constraintv,@function
.ent _Z20inlineasm_constraintv # @_Z20inlineasm_constraintv
_Z20inlineasm_constraintv:
.frame $fp,32,$lr
.mask 0x00001000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -32
st $fp, 28($sp) # 4-byte Folded Spill
addu $fp, $sp, $zero
addiu $2, $zero, 10
st $2, 24($fp)
addiu $2, $zero, -5
st $2, 20($fp)
addiu $2, $zero, 5
st $2, 16($fp)
addiu $3, $zero, 0
st $3, 12($fp)
st $2, 8($fp)
lui $2, 1
st $2, 4($fp)
lui $2, 65535
ori $2, $2, 5
st $2, 0($fp)
ld $2, 24($fp)
#APP
addiu $2,$2,-5
#NO_APP
st $2, 24($fp)
#APP
addiu $2,$2,0
#NO_APP
st $2, 24($fp)
#APP
addiu $2,$2,5
#NO_APP
st $2, 24($fp)
#APP
ori $2,$2,65536
#NO_APP
st $2, 24($fp)
#APP
addiu $2,$2,-65531
#NO_APP
st $2, 24($fp)
#APP
addiu $2,$2,-5
#NO_APP
st $2, 24($fp)
#APP
addiu $2,$2,5
#NO_APP
st $2, 24($fp)
addu $sp, $fp, $zero
ld $fp, 28($sp) # 4-byte Folded Reload
addiu $sp, $sp, 32
ret $lr
nop
.set macro
.set reorder
.end _Z20inlineasm_constraintv
$tmp11:
.size _Z20inlineasm_constraintv, ($tmp11)-_Z20inlineasm_constraintv
.globl _Z13inlineasm_argii
.align 2
.type _Z13inlineasm_argii,@function
.ent _Z13inlineasm_argii # @_Z13inlineasm_argii
_Z13inlineasm_argii:
.frame $fp,16,$lr
.mask 0x00001000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -16
st $fp, 12($sp) # 4-byte Folded Spill
addu $fp, $sp, $zero
ld $2, 16($fp)
st $2, 8($fp)
ld $2, 20($fp)
st $2, 4($fp)
ld $3, 8($fp)
#APP
subu $2,$3,$2
#NO_APP
st $2, 0($fp)
addu $sp, $fp, $zero
ld $fp, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z13inlineasm_argii
$tmp15:
.size _Z13inlineasm_argii, ($tmp15)-_Z13inlineasm_argii
.globl _Z16inlineasm_globalv
.align 2
.type _Z16inlineasm_globalv,@function
.ent _Z16inlineasm_globalv # @_Z16inlineasm_globalv
_Z16inlineasm_globalv:
.frame $fp,16,$lr
.mask 0x00001000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -16
st $fp, 12($sp) # 4-byte Folded Spill
addu $fp, $sp, $zero
lui $2, %hi(g)
ori $2, $2, %lo(g)
addiu $2, $2, 8
#APP
ld $2,0($2)
#NO_APP
st $2, 8($fp)
#APP
addiu $2,$2,1
#NO_APP
st $2, 4($fp)
addu $sp, $fp, $zero
ld $fp, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z16inlineasm_globalv
$tmp19:
.size _Z16inlineasm_globalv, ($tmp19)-_Z16inlineasm_globalv
.globl _Z14test_inlineasmv
.align 2
.type _Z14test_inlineasmv,@function
.ent _Z14test_inlineasmv # @_Z14test_inlineasmv
_Z14test_inlineasmv:
.frame $fp,48,$lr
.mask 0x00005000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -48
st $lr, 44($sp) # 4-byte Folded Spill
st $fp, 40($sp) # 4-byte Folded Spill
addu $fp, $sp, $zero
jsub _Z14inlineasm_adduv
nop
st $2, 36($fp)
jsub _Z18inlineasm_longlongv
nop
st $2, 32($fp)
jsub _Z20inlineasm_constraintv
nop
st $2, 28($fp)
addiu $2, $zero, 10
st $2, 4($sp)
addiu $2, $zero, 1
st $2, 0($sp)
jsub _Z13inlineasm_argii
nop
st $2, 24($fp)
addiu $2, $zero, 3
st $2, 4($sp)
addiu $2, $zero, 6
st $2, 0($sp)
jsub _Z13inlineasm_argii
nop
st $2, 20($fp)
#APP
addiu $2,$2,1
#NO_APP
st $2, 16($fp)
jsub _Z16inlineasm_globalv
nop
st $2, 12($fp)
ld $3, 32($fp)
ld $4, 36($fp)
addu $3, $4, $3
ld $4, 28($fp)
addu $3, $3, $4
ld $4, 24($fp)
addu $3, $3, $4
ld $4, 20($fp)
addu $3, $3, $4
ld $4, 16($fp)
addu $3, $3, $4
addu $2, $3, $2
addu $sp, $fp, $zero
ld $fp, 40($sp) # 4-byte Folded Reload
ld $lr, 44($sp) # 4-byte Folded Reload
addiu $sp, $sp, 48
ret $lr
nop
.set macro
.set reorder
.end _Z14test_inlineasmv
$tmp23:
.size _Z14test_inlineasmv, ($tmp23)-_Z14test_inlineasmv
.type g,@object # @g
.data
.globl g
.align 2
g:
.4byte 1 # 0x1
.4byte 2 # 0x2
.4byte 3 # 0x3
.size g, 12
Clang translates gcc style inline assembly __asm__ into llvm IR Inline Assembler Expressions first 3 , then replace
the variable registers of SSA form to physical registers during llc register allocation stage. From above example,
functions LowerAsmOperandForConstraint() and getSingleConstraintMatchWeight() of Cpu0ISelLowering.cpp will
create different range of const operand by I, J, K, L, N, O, or P, and register operand by r . For instance, the following
__asm__ will create the llvm asm immediately after it.
%2 = call i32 asm sideeffect "addiu $0,$1,$2", "=r,r,I"(i32 %1, i32 -5) #0, !srcloc !1
%10 = call i32 asm sideeffect "addiu $0,$1,$2", "=r,r,N"(i32 %9, i32 -65531) #0, !
˓→srcloc !5
%14 = call i32 asm sideeffect "addiu $0,$1,$2", "=r,r,P"(i32 %13, i32 5) #0, !srcloc !
˓→7
The r in __asm__ will generate register, %1, in llvm IR asm while I in __asm__ will generate const operand, -5, in
llvm IR asm. Remind, the LowerAsmOperandForConstraint() limit the range of positive or negative const operand
value to 16 bits since FL type immediate operand is 16 bits in Cpu0 instruction. So, the range of N is -65535 to -1 and
the range of P is 65535 to 1. For any value out of the range, the code in LowerAsmOperandForConstraint() will treat
it as error since FL instruction format has limitation of 16 bits.
3 http://llvm.org/docs/LangRef.html#inline-assembler-expressions
TWELVE
C++ SUPPORT
• Exception handle
• Thread variable
• Atomic
The Chapter11_2 can be built and run with the C++ polymorphism example code of ch12_inherit.cpp as follows,
lbdex/input/ch12_inherit.cpp
#ifdef COUT_TEST
#include <iostream>
using namespace std;
#endif
507
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
ppoly1->set_values (4,5);
ppoly2->set_values (4,5);
ppoly3->set_values (4,5);
ppoly1->printarea();
ppoly2->printarea();
ppoly3->printarea();
if (ppoly1->area() == 20 && ppoly2->area() == 10 && ppoly3->area() == 5)
return 0;
return 0;
}
#endif
If using cout instead of printf in ch12_inherit.cpp, it won’t generate exception handler IRs on Linux, whereas it will
generate invoke, landing, resume and unreachable exception handler IRs on iMac. Example code, ch12_eh.cpp, which
supports try and catch exception handler as the following will generate these exception handler IRs both on iMac and
Linux.
lbdex/input/ch12_eh.cpp
if (a > b) {
throw ex1;
}
}
int test_try_catch() {
try {
throw_exception(2, 1);
}
catch(...) {
return 1;
}
return 0;
}
; ModuleID = 'ch12_eh.bc'
source_filename = "ch12_eh.bc"
target datalayout = "E-m:m-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64"
target triple = "mips-unknown-linux-gnu"
%class.Ex1 = type { i8 }
; <label>:6: ; preds = %0
%7 = call i8* @__cxa_allocate_exception(i32 1) #1
; <label>:9: ; preds = %0
ret void
}
; <label>:5: ; preds = %0
br label %13
; <label>:6: ; preds = %0
%7 = landingpad { i8*, i32 }
catch i8* null
%8 = extractvalue { i8*, i32 } %7, 0
store i8* %8, i8** %2
%9 = extractvalue { i8*, i32 } %7, 1
store i32 %9, i32* %3
br label %10
; <label>:10: ; preds = %6
%11 = load i8*, i8** %2
%12 = call i8* @__cxa_begin_catch(i8* %11) #1
store i32 1, i32* %1
store i32 1, i32* %4
call void @__cxa_end_catch()
br label %14
; <label>:13: ; preds = %5
store i32 0, i32* %1
br label %14
!llvm.ident = !{!0}
About the IRs of LLVM exception handling, please reference here 1 . Chapter12_1 supports the llvm IRs of corre-
sponding try and catch exception C++ keywords. It can compile ch12_eh.bc as follows,
lbdex/chapters/Chapter12_1/Cpu0ISelLowering.h
/// If a physical register, this returns the register that receives the
/// exception address on entry to an EH pad.
unsigned
getExceptionPointerRegister(const Constant *PersonalityFn) const override {
return Cpu0::A0;
}
/// If a physical register, this returns the register that receives the
/// exception typeid on entry to a landing pad.
unsigned
getExceptionSelectorRegister(const Constant *PersonalityFn) const override {
return Cpu0::A1;
}
.text
.section .mdebug.abiO32
.previous
.file "ch12_eh.bc"
.globl _Z15throw_exceptionii
.p2align 2
.type _Z15throw_exceptionii,@function
.ent _Z15throw_exceptionii # @_Z15throw_exceptionii
_Z15throw_exceptionii:
.cfi_startproc
.frame $fp,40,$lr
1 http://llvm.org/docs/ExceptionHandling.html
.mask 0x00005000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -40
$tmp0:
.cfi_def_cfa_offset 40
st $lr, 36($sp) # 4-byte Folded Spill
st $fp, 32($sp) # 4-byte Folded Spill
$tmp1:
.cfi_offset 14, -4
$tmp2:
.cfi_offset 12, -8
move $fp, $sp
$tmp3:
.cfi_def_cfa_register 12
st $4, 28($fp)
st $5, 24($fp)
ld $2, 28($fp)
cmp $sw, $2, $5
jle $sw, $BB0_2
nop
jmp $BB0_1
$BB0_2:
move $sp, $fp
ld $fp, 32($sp) # 4-byte Folded Reload
ld $lr, 36($sp) # 4-byte Folded Reload
addiu $sp, $sp, 40
ret $lr
nop
$BB0_1:
addiu $4, $zero, 1
jsub __cxa_allocate_exception
nop
addiu $3, $zero, 0
st $3, 8($sp)
lui $3, %hi(_ZTI3Ex1)
ori $5, $3, %lo(_ZTI3Ex1)
addu $4, $zero, $2
jsub __cxa_throw
nop
.set macro
.set reorder
.end _Z15throw_exceptionii
$func_end0:
.size _Z15throw_exceptionii, ($func_end0)-_Z15throw_exceptionii
.cfi_endproc
.globl _Z14test_try_catchv
.p2align 2
.type _Z14test_try_catchv,@function
.ent _Z14test_try_catchv # @_Z14test_try_catchv
_Z14test_try_catchv:
$tmp7:
$func_begin0 = ($tmp7)
.cfi_startproc
.cfi_personality 0, __gxx_personality_v0
.cfi_lsda 0, $exception0
.frame $fp,40,$lr
.mask 0x00005200,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -40
$tmp8:
.cfi_def_cfa_offset 40
st $lr, 36($sp) # 4-byte Folded Spill
st $fp, 32($sp) # 4-byte Folded Spill
st $9, 28($sp) # 4-byte Folded Spill
$tmp9:
.cfi_offset 14, -4
$tmp10:
.cfi_offset 12, -8
$tmp11:
.cfi_offset 9, -12
move $fp, $sp
$tmp12:
.cfi_def_cfa_register 12
$tmp4:
addiu $4, $zero, 2
addiu $9, $zero, 1
addu $5, $zero, $9
jsub _Z15throw_exceptionii
nop
$tmp5:
# BB#2:
addiu $2, $zero, 0
st $2, 24($fp)
$BB1_3:
ld $2, 24($fp)
move $sp, $fp
ld $9, 28($sp) # 4-byte Folded Reload
ld $fp, 32($sp) # 4-byte Folded Reload
ld $lr, 36($sp) # 4-byte Folded Reload
addiu $sp, $sp, 40
ret $lr
nop
$BB1_1:
$tmp6:
st $4, 20($fp)
st $5, 16($fp)
ld $4, 20($fp)
jsub __cxa_begin_catch
nop
st $9, 24($fp)
st $9, 12($fp)
jsub __cxa_end_catch
nop
jmp $BB1_3
.set macro
.set reorder
.end _Z14test_try_catchv
$func_end1:
.size _Z14test_try_catchv, ($func_end1)-_Z14test_try_catchv
.cfi_endproc
.section .gcc_except_table,"a",@progbits
.p2align 2
GCC_except_table1:
$exception0:
.byte 255 # @LPStart Encoding = omit
.byte 0 # @TType Encoding = absptr
.asciz "\242\200\200" # @TType base offset
.byte 3 # Call site Encoding = udata4
.byte 26 # Call site table length
.4byte ($tmp4)-($func_begin0) # >> Call Site 1 <<
.4byte ($tmp5)-($tmp4) # Call between $tmp4 and $tmp5
.4byte ($tmp6)-($func_begin0) # jumps to $tmp6
.byte 1 # On action: 1
.4byte ($tmp5)-($func_begin0) # >> Call Site 2 <<
.4byte ($func_end1)-($tmp5) # Call between $tmp5 and $func_end1
.4byte 0 # has no landing pad
.byte 0 # On action: cleanup
.byte 1 # >> Action Record 1 <<
# Catch TypeInfo 1
.byte 0 # No further actions
# >> Catch TypeInfos <<
.4byte 0 # TypeInfo 1
.p2align 2
lbdex/input/ch12_thread_var.cpp
__thread int a = 0;
thread_local int b = 0; // need option -std=c++11
int test_thread_var()
{
a = 2;
return a;
}
int test_thread_var_2()
{
b = 3;
return b;
}
While global variable is a single instance shared by all threads in a process, thread variable has different instances
for each different thread in a process. The same thread share the thread variable but different threads have their own
thread variable with the same name 2 .
To support thread variable, tlsgd, tlsldm, dtp_hi, dtp_lo, gottp, tp_hi and tp_lo in both evaluateRelocExpr() of
Cpu0AsmParser.cpp and printImpl() of Cpu0MCExpr.cpp are needed, and the following code are required. Most
of them are for relocation record handle and display since the thread variable created by OS or language library which
support multi-threads programming.
lbdex/chapters/Chapter12_1/MCTargetDesc/Cpu0AsmBackend.cpp
{ "fixup_Cpu0_TLSGD", 0, 16, 0 },
{ "fixup_Cpu0_GOTTP", 0, 16, 0 },
{ "fixup_Cpu0_TP_HI", 0, 16, 0 },
{ "fixup_Cpu0_TP_LO", 0, 16, 0 },
{ "fixup_Cpu0_TLSLDM", 0, 16, 0 },
{ "fixup_Cpu0_DTP_HI", 0, 16, 0 },
{ "fixup_Cpu0_DTP_LO", 0, 16, 0 },
...
};
...
}
lbdex/chapters/Chapter12_1/MCTargetDesc/Cpu0BaseInfo.h
namespace Cpu0II {
/// Target Operand Flag enum.
enum TOF {
//===------------------------------------------------------------------===//
// Cpu0 Specific MachineOperand flags.
2 http://en.wikipedia.org/wiki/Thread-local_storage
/// MO_TLSGD - Represents the offset into the global offset table at which
// the module ID and TSL block offset reside during execution (General
// Dynamic TLS).
MO_TLSGD,
/// MO_TLSLDM - Represents the offset into the global offset table at which
// the module ID and TSL block offset reside during execution (Local
// Dynamic TLS).
MO_TLSLDM,
MO_DTP_HI,
MO_DTP_LO,
/// MO_GOTTPREL - Represents the offset from the thread pointer (Initial
// Exec TLS).
MO_GOTTPREL,
/// MO_TPREL_HI/LO - Represents the hi and low part of the offset from
// the thread pointer (Local Exec TLS).
MO_TP_HI,
MO_TP_LO,
...
};
...
}
lbdex/chapters/Chapter12_1/MCTargetDesc/Cpu0ELFObjectWriter.cpp
switch (Kind) {
case Cpu0::fixup_Cpu0_TLSGD:
Type = ELF::R_CPU0_TLS_GD;
break;
case Cpu0::fixup_Cpu0_GOTTPREL:
Type = ELF::R_CPU0_TLS_GOTTPREL;
break;
...
}
lbdex/chapters/Chapter12_1/MCTargetDesc/Cpu0FixupKinds.h
enum Fixups {
// resulting in - R_CPU0_TLS_GD.
fixup_Cpu0_TLSGD,
// resulting in - R_CPU0_TLS_GOTTPREL.
fixup_Cpu0_GOTTPREL,
// resulting in - R_CPU0_TLS_TPREL_HI16.
fixup_Cpu0_TP_HI,
// resulting in - R_CPU0_TLS_TPREL_LO16.
fixup_Cpu0_TP_LO,
// resulting in - R_CPU0_TLS_LDM.
fixup_Cpu0_TLSLDM,
// resulting in - R_CPU0_TLS_DTP_HI16.
fixup_Cpu0_DTP_HI,
// resulting in - R_CPU0_TLS_DTP_LO16.
fixup_Cpu0_DTP_LO,
...
};
lbdex/chapters/Chapter12_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp
unsigned Cpu0MCCodeEmitter::
getExprOpValue(const MCExpr *Expr,SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {
case Cpu0MCExpr::CEK_TLSGD:
FixupKind = Cpu0::fixup_Cpu0_TLSGD;
break;
case Cpu0MCExpr::CEK_TLSLDM:
FixupKind = Cpu0::fixup_Cpu0_TLSLDM;
break;
case Cpu0MCExpr::CEK_DTP_HI:
FixupKind = Cpu0::fixup_Cpu0_DTP_HI;
break;
case Cpu0MCExpr::CEK_DTP_LO:
FixupKind = Cpu0::fixup_Cpu0_DTP_LO;
break;
case Cpu0MCExpr::CEK_GOTTPREL:
FixupKind = Cpu0::fixup_Cpu0_GOTTPREL;
break;
case Cpu0MCExpr::CEK_TP_HI:
FixupKind = Cpu0::fixup_Cpu0_TP_HI;
break;
case Cpu0MCExpr::CEK_TP_LO:
FixupKind = Cpu0::fixup_Cpu0_TP_LO;
break;
...
}
lbdex/chapters/Chapter12_1/Cpu0InstrInfo.td
// TpHi and TpLo nodes are used to handle Local Exec TLS
def Cpu0TpHi : SDNode<"Cpu0ISD::TpHi", SDTIntUnaryOp>;
def Cpu0TpLo : SDNode<"Cpu0ISD::TpLo", SDTIntUnaryOp>;
lbdex/chapters/Chapter12_1/Cpu0SelLowering.cpp
...
}
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
...
}
...
}
SDValue Cpu0TargetLowering::
lowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const
{
// If the relocation model is PIC, use the General Dynamic TLS Model or
// Local Dynamic TLS model, otherwise use the Initial Exec or
// Local Exec TLS Model.
SDLoc DL(GA);
const GlobalValue *GV = GA->getGlobal();
EVT PtrVT = getPointerTy(DAG.getDataLayout());
ArgListTy Args;
ArgListEntry Entry;
Entry.Node = Argument;
Entry.Ty = PtrTy;
Args.push_back(Entry);
TargetLowering::CallLoweringInfo CLI(DAG);
CLI.setDebugLoc(DL).setChain(DAG.getEntryNode())
.setCallee(CallingConv::C, PtrTy, TlsGetAddr, std::move(Args));
std::pair<SDValue, SDValue> CallResult = LowerCallTo(CLI);
if (model != TLSModel::LocalDynamic)
return Ret;
SDValue Offset;
if (model == TLSModel::InitialExec) {
lbdex/chapters/Chapter12_1/Cpu0ISelLowering.h
lbdex/chapters/Chapter12_1/Cpu0MCInstLower.cpp
switch(MO.getTargetFlags()) {
case Cpu0II::MO_TLSGD:
TargetKind = Cpu0MCExpr::CEK_TLSGD;
break;
case Cpu0II::MO_TLSLDM:
TargetKind = Cpu0MCExpr::CEK_TLSLDM;
break;
case Cpu0II::MO_DTP_HI:
TargetKind = Cpu0MCExpr::CEK_DTP_HI;
break;
case Cpu0II::MO_DTP_LO:
TargetKind = Cpu0MCExpr::CEK_DTP_LO;
break;
case Cpu0II::MO_GOTTPREL:
TargetKind = Cpu0MCExpr::CEK_GOTTPREL;
break;
case Cpu0II::MO_TP_HI:
TargetKind = Cpu0MCExpr::CEK_TP_HI;
break;
case Cpu0II::MO_TP_LO:
TargetKind = Cpu0MCExpr::CEK_TP_LO;
break;
...
}
...
}
; ModuleID = 'ch12_thread_var.bc'
source_filename = "ch12_thread_var.bc"
target datalayout = "E-m:m-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64"
target triple = "mips-unknown-linux-gnu"
!llvm.ident = !{!0}
.text
.section .mdebug.abiO32
.previous
.file "ch12_thread_var.bc"
.globl _Z15test_thread_varv
.p2align 2
.type _Z15test_thread_varv,@function
.ent _Z15test_thread_varv # @_Z15test_thread_varv
_Z15test_thread_varv:
.frame $fp,16,$lr
.mask 0x00005000,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -16
st $lr, 12($sp) # 4-byte Folded Spill
st $fp, 8($sp) # 4-byte Folded Spill
move $fp, $sp
.cprestore 8
ld $t9, %call16(__tls_get_addr)($gp)
ori $4, $gp, %tlsgd(a)
jalr $t9
nop
ld $gp, 8($fp)
addiu $3, $zero, 2
st $3, 0($2)
addu $2, $zero, $3
move $sp, $fp
ld $fp, 8($sp) # 4-byte Folded Reload
ld $lr, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z15test_thread_varv
$func_end0:
.size _Z15test_thread_varv, ($func_end0)-_Z15test_thread_varv
.globl _Z17test_thread_var_2v
.p2align 2
.type _Z17test_thread_var_2v,@function
.ent _Z17test_thread_var_2v # @_Z17test_thread_var_2v
_Z17test_thread_var_2v:
.cfi_startproc
.frame $fp,16,$lr
.mask 0x00005000,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -16
$tmp0:
.cfi_def_cfa_offset 16
st $lr, 12($sp) # 4-byte Folded Spill
st $fp, 8($sp) # 4-byte Folded Spill
$tmp1:
.cfi_offset 14, -4
$tmp2:
.cfi_offset 12, -8
move $fp, $sp
$tmp3:
.cfi_def_cfa_register 12
.cprestore 8
ld $t9, %call16(_ZTW1b)($gp)
jalr $t9
nop
ld $gp, 8($fp)
addiu $3, $zero, 3
st $3, 0($2)
ld $t9, %call16(_ZTW1b)($gp)
jalr $t9
nop
ld $gp, 8($fp)
ld $2, 0($2)
move $sp, $fp
ld $fp, 8($sp) # 4-byte Folded Reload
ld $lr, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z17test_thread_var_2v
$func_end1:
.size _Z17test_thread_var_2v, ($func_end1)-_Z17test_thread_var_2v
.cfi_endproc
.hidden _ZTW1b
.weak _ZTW1b
.p2align 2
.type _ZTW1b,@function
.ent _ZTW1b # @_ZTW1b
_ZTW1b:
.cfi_startproc
.frame $sp,16,$lr
.mask 0x00004000,-4
.set noreorder
.cpload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -16
$tmp4:
.cfi_def_cfa_offset 16
st $lr, 12($sp) # 4-byte Folded Spill
$tmp5:
.cfi_offset 14, -4
.cprestore 8
ld $t9, %call16(__tls_get_addr)($gp)
ori $4, $gp, %tlsgd(b)
jalr $t9
nop
ld $gp, 8($sp)
ld $lr, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _ZTW1b
$func_end2:
.size _ZTW1b, ($func_end2)-_ZTW1b
.cfi_endproc
.type a,@object # @a
.section .tbss,"awT",@nobits
.globl a
.p2align 2
a:
.4byte 0 # 0x0
.size a, 4
.type b,@object # @b
.globl b
.p2align 2
b:
.4byte 0 # 0x0
.size b, 4
In pic mode, the __thread variable access by call function __tls_get_addr with the address of thread variable.
The c++11 standard thread_local variable is accessed by calling function _ZTW1b which also call the function
__tls_get_addr to get the thread_local variable address. In static mode, the thread variable is accessed by machine
instructions as follows,
.text
.section .mdebug.abiO32
.previous
.file "ch12_thread_var.bc"
.globl _Z15test_thread_varv
.p2align 2
.type _Z15test_thread_varv,@function
.ent _Z15test_thread_varv # @_Z15test_thread_varv
_Z15test_thread_varv:
.frame $fp,8,$lr
.mask 0x00001000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -8
st $fp, 4($sp) # 4-byte Folded Spill
.globl _Z17test_thread_var_2v
.p2align 2
.type _Z17test_thread_var_2v,@function
.ent _Z17test_thread_var_2v # @_Z17test_thread_var_2v
_Z17test_thread_var_2v:
.cfi_startproc
.frame $fp,16,$lr
.mask 0x00005000,-4
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -16
$tmp0:
.cfi_def_cfa_offset 16
st $lr, 12($sp) # 4-byte Folded Spill
st $fp, 8($sp) # 4-byte Folded Spill
$tmp1:
.cfi_offset 14, -4
$tmp2:
.cfi_offset 12, -8
move $fp, $sp
$tmp3:
.cfi_def_cfa_register 12
jsub _ZTW1b
nop
addiu $3, $zero, 3
st $3, 0($2)
jsub _ZTW1b
nop
ld $2, 0($2)
move $sp, $fp
ld $fp, 8($sp) # 4-byte Folded Reload
ld $lr, 12($sp) # 4-byte Folded Reload
addiu $sp, $sp, 16
ret $lr
nop
.set macro
.set reorder
.end _Z17test_thread_var_2v
$func_end1:
.size _Z17test_thread_var_2v, ($func_end1)-_Z17test_thread_var_2v
.cfi_endproc
.hidden _ZTW1b
.weak _ZTW1b
.p2align 2
.type _ZTW1b,@function
.ent _ZTW1b # @_ZTW1b
_ZTW1b:
.cfi_startproc
.frame $sp,0,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
lui $2, %tp_hi(b)
ori $2, $2, %tp_lo(b)
ret $lr
nop
.set macro
.set reorder
.end _ZTW1b
$func_end2:
.size _ZTW1b, ($func_end2)-_ZTW1b
.cfi_endproc
.type a,@object # @a
.section .tbss,"awT",@nobits
.globl a
.p2align 2
a:
.4byte 0 # 0x0
.size a, 4
.type b,@object # @b
.globl b
.p2align 2
b:
.4byte 0 # 0x0
.size b, 4
While Mips uses rdhwr instruction to access thread varaible as below, Cpu0 access thread varaible without inventing
any new instruction. The thread variables are keeped in thread varaible memory location which accessed through
%tp_hi and %tp_lo, and furthermore, this section of memory is protected through kernel mode program. Thus, the
user mode program cannot access this area of memory and no space to breathe for hack program.
JonathantekiiMac:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=mips -relocation-model=static -filetype=asm
ch12_thread_var.bc -o -
...
lui $1, %tprel_hi(a)
ori $1, $1, %tprel_lo(a)
.set push
.set mips32r2
rdhwr $3, $29
.set pop
In static mode, the thread variable is similar to global variable. In general, they are same in IRs, DAGs and machine
code translation. List them in the following tables. You can check them with debug option enabled.
12.3 Atomic
In tradition, C uses different API which provided by OS or library to support multi-thread programming. For example,
posix thread API on unix/linux, MS windows API, ..., etc. In order to achieve synchronization to solve race condition
between threads, OS provide their own lock or semaphore functions to programmer. But this solution is OS dependent.
After c++11, programmer can use atomic to program and run the code on every different platform since the thread and
atomic are part of c++ standard. Beside of portability, the other important benifit is the compiler can generate high
performance code by the target hardware instructions rather than couting on lock() function only 3 4 5 .
In order to support atomic in C++ and java, llvm provides the atomic IRs here 6 7 .
For supporting llvm atomic IRs, the following code added to Chapter12_1.
3 https://en.wikipedia.org/wiki/Memory_model_%28programming%29
4 http://stackoverflow.com/questions/6319146/c11-introduced-a-standardized-memory-model-what-does-it-mean-and-how-is-it-g
5 http://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-and-modern-hardware/
6 http://llvm.org/docs/Atomics.html
7 http://llvm.org/docs/LangRef.html#ordering
lbdex/chapters/Chapter12_1/Disassembler/Cpu0Disassembler.cpp
if(Inst.getOpcode() == Cpu0::SC){
Inst.addOperand(MCOperand::createReg(Reg));
}
...
}
lbdex/chapters/Chapter12_1/Cpu0InstrInfo.td
def : Cpu0InstAlias<"sync",
(SYNC 0), 1>;
lbdex/chapters/Chapter12_1/Cpu0ISelLowering.h
MachineBasicBlock *
EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *MBB) const override;
lbdex/chapters/Chapter12_1/Cpu0SelLowering.cpp
...
}
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
switch (Op.getOpcode())
{
...
}
MachineBasicBlock *
Cpu0TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *BB) const {
switch (MI.getOpcode()) {
default:
llvm_unreachable("Unexpected instr type to insert");
case Cpu0::ATOMIC_LOAD_ADD_I8:
return emitAtomicBinaryPartword(MI, BB, 1, Cpu0::ADDu);
case Cpu0::ATOMIC_LOAD_ADD_I16:
return emitAtomicBinaryPartword(MI, BB, 2, Cpu0::ADDu);
case Cpu0::ATOMIC_LOAD_ADD_I32:
return emitAtomicBinary(MI, BB, 4, Cpu0::ADDu);
case Cpu0::ATOMIC_LOAD_AND_I8:
return emitAtomicBinaryPartword(MI, BB, 1, Cpu0::AND);
case Cpu0::ATOMIC_LOAD_AND_I16:
return emitAtomicBinaryPartword(MI, BB, 2, Cpu0::AND);
case Cpu0::ATOMIC_LOAD_AND_I32:
return emitAtomicBinary(MI, BB, 4, Cpu0::AND);
case Cpu0::ATOMIC_LOAD_OR_I8:
return emitAtomicBinaryPartword(MI, BB, 1, Cpu0::OR);
case Cpu0::ATOMIC_LOAD_OR_I16:
return emitAtomicBinaryPartword(MI, BB, 2, Cpu0::OR);
case Cpu0::ATOMIC_LOAD_OR_I32:
return emitAtomicBinary(MI, BB, 4, Cpu0::OR);
case Cpu0::ATOMIC_LOAD_XOR_I8:
return emitAtomicBinaryPartword(MI, BB, 1, Cpu0::XOR);
case Cpu0::ATOMIC_LOAD_XOR_I16:
return emitAtomicBinaryPartword(MI, BB, 2, Cpu0::XOR);
case Cpu0::ATOMIC_LOAD_XOR_I32:
return emitAtomicBinary(MI, BB, 4, Cpu0::XOR);
case Cpu0::ATOMIC_LOAD_NAND_I8:
return emitAtomicBinaryPartword(MI, BB, 1, 0, true);
case Cpu0::ATOMIC_LOAD_NAND_I16:
return emitAtomicBinaryPartword(MI, BB, 2, 0, true);
case Cpu0::ATOMIC_LOAD_NAND_I32:
return emitAtomicBinary(MI, BB, 4, 0, true);
case Cpu0::ATOMIC_LOAD_SUB_I8:
return emitAtomicBinaryPartword(MI, BB, 1, Cpu0::SUBu);
case Cpu0::ATOMIC_LOAD_SUB_I16:
return emitAtomicBinaryPartword(MI, BB, 2, Cpu0::SUBu);
case Cpu0::ATOMIC_LOAD_SUB_I32:
return emitAtomicBinary(MI, BB, 4, Cpu0::SUBu);
case Cpu0::ATOMIC_SWAP_I8:
return emitAtomicBinaryPartword(MI, BB, 1, 0);
case Cpu0::ATOMIC_SWAP_I16:
return emitAtomicBinaryPartword(MI, BB, 2, 0);
case Cpu0::ATOMIC_SWAP_I32:
return emitAtomicBinary(MI, BB, 4, 0);
case Cpu0::ATOMIC_CMP_SWAP_I8:
return emitAtomicCmpSwapPartword(MI, BB, 1);
case Cpu0::ATOMIC_CMP_SWAP_I16:
return emitAtomicCmpSwapPartword(MI, BB, 2);
case Cpu0::ATOMIC_CMP_SWAP_I32:
return emitAtomicCmpSwap(MI, BB, 4);
}
LL = Cpu0::LL;
SC = Cpu0::SC;
AND = Cpu0::AND;
XOR = Cpu0::XOR;
ZERO = Cpu0::ZERO;
BEQ = Cpu0::BEQ;
// thisMBB:
// ...
// fallthrough --> loopMBB
BB->addSuccessor(loopMBB);
loopMBB->addSuccessor(loopMBB);
loopMBB->addSuccessor(exitMBB);
// loopMBB:
// ll oldval, 0(ptr)
// <binop> storeval, oldval, incr
// sc success, storeval, 0(ptr)
// beq success, $0, loopMBB
BB = loopMBB;
return exitMBB;
}
MachineBasicBlock *Cpu0TargetLowering::emitSignExtendToI32InReg(
MachineInstr &MI, MachineBasicBlock *BB, unsigned Size, unsigned DstReg,
unsigned SrcReg) const {
const TargetInstrInfo *TII = Subtarget.getInstrInfo();
DebugLoc DL = MI.getDebugLoc();
return BB;
}
MachineBasicBlock *Cpu0TargetLowering::emitAtomicBinaryPartword(
MachineInstr &MI, MachineBasicBlock *BB, unsigned Size, unsigned BinOpcode,
bool Nand) const {
assert((Size == 1 || Size == 2) &&
"Unsupported size for EmitAtomicBinaryPartial.");
MF->insert(It, loopMBB);
MF->insert(It, sinkMBB);
MF->insert(It, exitMBB);
BB->addSuccessor(loopMBB);
loopMBB->addSuccessor(loopMBB);
loopMBB->addSuccessor(sinkMBB);
sinkMBB->addSuccessor(exitMBB);
// thisMBB:
// addiu masklsb2,$0,-4 # 0xfffffffc
// and alignedaddr,ptr,masklsb2
// andi ptrlsb2,ptr,3
// sll shiftamt,ptrlsb2,3
// ori maskupper,$0,255 # 0xff
// sll mask,maskupper,shiftamt
// xor mask2,$0,mask
// xor mask3,$0,mask2
// sll incr2,incr,shiftamt
if (Subtarget.isLittle()) {
BuildMI(BB, DL, TII->get(Cpu0::SHL), ShiftAmt).addReg(PtrLSB2).addImm(3);
} else {
unsigned Off = RegInfo.createVirtualRegister(RC);
BuildMI(BB, DL, TII->get(Cpu0::XORi), Off)
.addReg(PtrLSB2).addImm((Size == 1) ? 3 : 2);
BuildMI(BB, DL, TII->get(Cpu0::SHL), ShiftAmt).addReg(Off).addImm(3);
}
BuildMI(BB, DL, TII->get(Cpu0::ORi), MaskUpper)
.addReg(Cpu0::ZERO).addImm(MaskImm);
BuildMI(BB, DL, TII->get(Cpu0::SHLV), Mask)
.addReg(MaskUpper).addReg(ShiftAmt);
BuildMI(BB, DL, TII->get(Cpu0::XOR), Mask2).addReg(Cpu0::ZERO).addReg(Mask);
BuildMI(BB, DL, TII->get(Cpu0::XOR), Mask3).addReg(Cpu0::ZERO).addReg(Mask2);
BuildMI(BB, DL, TII->get(Cpu0::SHLV), Incr2).addReg(Incr).addReg(ShiftAmt);
// atomic.load.binop
// loopMBB:
// ll oldval,0(alignedaddr)
// binop binopres,oldval,incr2
// and newval,binopres,mask
// and maskedoldval0,oldval,mask3
// or storeval,maskedoldval0,newval
// sc success,storeval,0(alignedaddr)
// beq success,$0,loopMBB
// atomic.swap
// loopMBB:
// ll oldval,0(alignedaddr)
// and newval,incr2,mask
// and maskedoldval0,oldval,mask3
// or storeval,maskedoldval0,newval
// sc success,storeval,0(alignedaddr)
// beq success,$0,loopMBB
BB = loopMBB;
unsigned LL = Cpu0::LL;
BuildMI(BB, DL, TII->get(LL), OldVal).addReg(AlignedAddr).addImm(0);
if (Nand) {
// and andres, oldval, incr2
// xor binopres, $0, andres
// xor binopres2, $0, binopres
// and newval, binopres, mask
BuildMI(BB, DL, TII->get(Cpu0::AND), AndRes).addReg(OldVal).addReg(Incr2);
BuildMI(BB, DL, TII->get(Cpu0::XOR), BinOpRes)
.addReg(Cpu0::ZERO).addReg(AndRes);
BuildMI(BB, DL, TII->get(Cpu0::XOR), BinOpRes2)
.addReg(Cpu0::ZERO).addReg(BinOpRes);
BuildMI(BB, DL, TII->get(Cpu0::AND), NewVal).addReg(BinOpRes).addReg(Mask);
} else if (BinOpcode) {
// <binop> binopres, oldval, incr2
// and newval, binopres, mask
BuildMI(BB, DL, TII->get(BinOpcode), BinOpRes).addReg(OldVal).addReg(Incr2);
BuildMI(BB, DL, TII->get(Cpu0::AND), NewVal).addReg(BinOpRes).addReg(Mask);
} else { // atomic.swap
// and newval, incr2, mask
BuildMI(BB, DL, TII->get(Cpu0::AND), NewVal).addReg(Incr2).addReg(Mask);
}
// sinkMBB:
// and maskedoldval1,oldval,mask
// srl srlres,maskedoldval1,shiftamt
// sign_extend dest,srlres
BB = sinkMBB;
return exitMBB;
}
LL = Cpu0::LL;
SC = Cpu0::SC;
ZERO = Cpu0::ZERO;
BNE = Cpu0::BNE;
BEQ = Cpu0::BEQ;
MachineFunction::iterator It = ++BB->getIterator();
MF->insert(It, loop1MBB);
MF->insert(It, loop2MBB);
MF->insert(It, exitMBB);
// thisMBB:
// ...
// fallthrough --> loop1MBB
BB->addSuccessor(loop1MBB);
loop1MBB->addSuccessor(exitMBB);
loop1MBB->addSuccessor(loop2MBB);
loop2MBB->addSuccessor(loop1MBB);
loop2MBB->addSuccessor(exitMBB);
// loop1MBB:
// ll dest, 0(ptr)
// bne dest, oldval, exitMBB
BB = loop1MBB;
BuildMI(BB, DL, TII->get(LL), Dest).addReg(Ptr).addImm(0);
BuildMI(BB, DL, TII->get(BNE))
.addReg(Dest).addReg(OldVal).addMBB(exitMBB);
// loop2MBB:
// sc success, newval, 0(ptr)
// beq success, $0, loop1MBB
BB = loop2MBB;
BuildMI(BB, DL, TII->get(SC), Success)
.addReg(NewVal).addReg(Ptr).addImm(0);
BuildMI(BB, DL, TII->get(BEQ))
.addReg(Success).addReg(ZERO).addMBB(loop1MBB);
return exitMBB;
}
MachineBasicBlock *
Cpu0TargetLowering::emitAtomicCmpSwapPartword(MachineInstr &MI,
MachineBasicBlock *BB,
unsigned Size) const {
assert((Size == 1 || Size == 2) &&
"Unsupported size for EmitAtomicCmpSwapPartial.");
MF->insert(It, loop1MBB);
MF->insert(It, loop2MBB);
MF->insert(It, sinkMBB);
MF->insert(It, exitMBB);
BB->addSuccessor(loop1MBB);
loop1MBB->addSuccessor(sinkMBB);
loop1MBB->addSuccessor(loop2MBB);
loop2MBB->addSuccessor(loop1MBB);
loop2MBB->addSuccessor(sinkMBB);
sinkMBB->addSuccessor(exitMBB);
// andi maskednewval,newval,255
// shl shiftednewval,maskednewval,shiftamt
int64_t MaskImm = (Size == 1) ? 255 : 65535;
BuildMI(BB, DL, TII->get(Cpu0::ADDiu), MaskLSB2)
.addReg(Cpu0::ZERO).addImm(-4);
BuildMI(BB, DL, TII->get(Cpu0::AND), AlignedAddr)
.addReg(Ptr).addReg(MaskLSB2);
BuildMI(BB, DL, TII->get(Cpu0::ANDi), PtrLSB2).addReg(Ptr).addImm(3);
if (Subtarget.isLittle()) {
BuildMI(BB, DL, TII->get(Cpu0::SHL), ShiftAmt).addReg(PtrLSB2).addImm(3);
} else {
unsigned Off = RegInfo.createVirtualRegister(RC);
BuildMI(BB, DL, TII->get(Cpu0::XORi), Off)
.addReg(PtrLSB2).addImm((Size == 1) ? 3 : 2);
BuildMI(BB, DL, TII->get(Cpu0::SHL), ShiftAmt).addReg(Off).addImm(3);
}
BuildMI(BB, DL, TII->get(Cpu0::ORi), MaskUpper)
.addReg(Cpu0::ZERO).addImm(MaskImm);
BuildMI(BB, DL, TII->get(Cpu0::SHLV), Mask)
.addReg(MaskUpper).addReg(ShiftAmt);
BuildMI(BB, DL, TII->get(Cpu0::XOR), Mask2).addReg(Cpu0::ZERO).addReg(Mask);
BuildMI(BB, DL, TII->get(Cpu0::XOR), Mask3).addReg(Cpu0::ZERO).addReg(Mask2);
BuildMI(BB, DL, TII->get(Cpu0::ANDi), MaskedCmpVal)
.addReg(CmpVal).addImm(MaskImm);
BuildMI(BB, DL, TII->get(Cpu0::SHLV), ShiftedCmpVal)
.addReg(MaskedCmpVal).addReg(ShiftAmt);
BuildMI(BB, DL, TII->get(Cpu0::ANDi), MaskedNewVal)
.addReg(NewVal).addImm(MaskImm);
BuildMI(BB, DL, TII->get(Cpu0::SHLV), ShiftedNewVal)
.addReg(MaskedNewVal).addReg(ShiftAmt);
// loop1MBB:
// ll oldval,0(alginedaddr)
// and maskedoldval0,oldval,mask
// bne maskedoldval0,shiftedcmpval,sinkMBB
BB = loop1MBB;
unsigned LL = Cpu0::LL;
BuildMI(BB, DL, TII->get(LL), OldVal).addReg(AlignedAddr).addImm(0);
BuildMI(BB, DL, TII->get(Cpu0::AND), MaskedOldVal0)
.addReg(OldVal).addReg(Mask);
BuildMI(BB, DL, TII->get(Cpu0::BNE))
.addReg(MaskedOldVal0).addReg(ShiftedCmpVal).addMBB(sinkMBB);
// loop2MBB:
// and maskedoldval1,oldval,mask3
// or storeval,maskedoldval1,shiftednewval
// sc success,storeval,0(alignedaddr)
// beq success,$0,loop1MBB
BB = loop2MBB;
BuildMI(BB, DL, TII->get(Cpu0::AND), MaskedOldVal1)
.addReg(OldVal).addReg(Mask3);
BuildMI(BB, DL, TII->get(Cpu0::OR), StoreVal)
.addReg(MaskedOldVal1).addReg(ShiftedNewVal);
unsigned SC = Cpu0::SC;
BuildMI(BB, DL, TII->get(SC), Success)
.addReg(StoreVal).addReg(AlignedAddr).addImm(0);
BuildMI(BB, DL, TII->get(Cpu0::BEQ))
.addReg(Success).addReg(Cpu0::ZERO).addMBB(loop1MBB);
// sinkMBB:
// srl srlres,maskedoldval0,shiftamt
// sign_extend dest,srlres
BB = sinkMBB;
return exitMBB;
}
lbdex/chapters/Chapter12_1/Cpu0RegisterInfo.h
lbdex/chapters/Chapter12_1/Cpu0RegisterInfo.cpp
const TargetRegisterClass *
Cpu0RegisterInfo::getPointerRegClass(const MachineFunction &MF,
unsigned Kind) const {
return &Cpu0::CPURegsRegClass;
}
lbdex/chapters/Chapter12_1/Cpu0SEISelLowering.cpp
...
}
lbdex/chapters/Chapter12_1/Cpu0TargetMachine.cpp
...
};
void Cpu0PassConfig::addIRPasses() {
TargetPassConfig::addIRPasses();
addPass(createAtomicExpandPass(&getCpu0TargetMachine()));
}
Since SC instruction uses RegisterOperand type in Cpu0InstrInfo.td and SC uses FMem node which DecoderMethod
is “DecodeMem”, the DecodeMem() of Cpu0Disassembler.cpp need to be changed as above.
The atomic node defined in “let usesCustomInserter = 1 in” of Cpu0InstrInfo.td tells llvm calling
EmitInstrWithCustomInserter() of Cpu0ISelLowering.cpp. For example, “def ATOMIC_LOAD_ADD_I8 :
Atomic2Ops<atomic_load_add_8, CPURegs>;” will calling EmitInstrWithCustomInserter() with Machine Instruction
Opcode “ATOMIC_LOAD_ADD_I8” when it meets IR “load atomic i8*”.
The “setInsertFencesForAtomic(true);” in Cpu0ISelLowering.cpp will trigger addIRPasses() of
Cpu0TargetMachine.cpp, then the createAtomicExpandPass() of addIRPasses() will create llvm IR ATOMIC_FENCE.
Next, the lowerATOMIC_FENCE() of Cpu0ISelLowering.cpp will create Cpu0ISD::Sync when it meets
IR ATOMIC_FENCE since “setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Custom);” of
Cpu0SEISelLowering.cpp. Finally the pattern defined in Cpu0InstrInfo.td translate it into instruction “sync”
by “def SYNC” and alias “SYNC 0”.
This part of Cpu0 backend code is same with Mips except Cpu0 has no instruction “nor”.
List the atomic IRs, corresponding DAGs and Opcode as the following table.
Table 12.3: The atomic related IRs, their corresponding DAGs and Opcode of
Cpu0ISelLowering.cpp
IR DAG Opcode
load atomic AtomicLoad ATOMIC_CMP_SWAP_XXX
store atomic AtomicStore ATOMIC_SWAP_XXX
atomicrmw add AtomicLoadAdd ATOMIC_LOAD_ADD_XXX
atomicrmw sub AtomicLoadSub ATOMIC_LOAD_SUB_XXX
atomicrmw xor AtomicLoadXor ATOMIC_LOAD_XOR_XXX
atomicrmw and AtomicLoadAnd ATOMIC_LOAD_AND_XXX
atomicrmw nand AtomicLoadNand ATOMIC_LOAD_NAND_XXX
atomicrmw or AtomicLoadOr ATOMIC_LOAD_OR_XXX
cmpxchg AtomicCmpSwapWithSuccess ATOMIC_CMP_SWAP_XXX
atomicrmw xchg AtomicLoadSwap ATOMIC_SWAP_XXX
Input files atomics.ll and atomics-fences.ll include the llvm atomic IRs test. Input files ch12_atomics.cpp and
ch12_atomics-fences.cpp are the C++ files for generating llvm atomic IRs. The C++ files need to run with clang
options “clang++ -pthread -std=c++11”.
THIRTEEN
Until now, we have an llvm backend to compile C or assembly as the white part
of the following figure. If without global variable, the elf obj can be dumped
to hex file via llvm-objdump -d which finished in Chapter ELF Support.
�����
��
��� ���
��� ��� ��������������� ���������������
�����������������������
�����������������������������������
This chapter will implement Cpu0 instructions by Verilog language as the gray part of above figure. With this Verilog
machine, we can run this hex program on the Cpu0 Verilog machine on PC and see the Cpu0 instructions execution
result.
Verilog language is an IEEE standard in IC design. There are a lot of books and documents for this language. Free
documents exist in Web sites 1 2 3 4 5 . Verilog also called as Verilog HDL but not VHDL. VHDL is the same purpose
language which compete against Verilog. About VHDL reference here 6 . Example code, lbdex/verilog/cpu0.v, is the
Cpu0 design in Verilog. In Appendix A, we have downloaded and installed Icarus Verilog tool both on iMac and
Linux. The cpu0.v is a simple design with only few hundreds lines of code totally. This implementation hasn’t the
pipeline features, but through implement the delay slot simulation (SIMULATE_DELAY_SLOT part of code), the
exact pipeline machine cycles can be calculated.
1 http://ccckmit.wikidot.com/ve:main
2 http://www.ece.umd.edu/courses/enee359a/
3 http://www.ece.umd.edu/courses/enee359a/verilog_tutorial.pdf
4 http://d1.amobbs.com/bbs_upload782111/files_33/ourdev_585395BQ8J9A.pdf
5 http://en.wikipedia.org/wiki/Verilog
6 http://en.wikipedia.org/wiki/VHDL
543
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Verilog is a C like language in syntex and this book is a compiler book, so we list the cpu0.v as well as the building
command without explanation as below. We expect readers can understand the Verilog code just with a little patience
in reading it. There are two type of I/O according computer architecture. One is memory mapped I/O, the other is
instruction I/O. Cpu0 uses memory mapped I/O where memory address 0x80000 as the output port. When meet the
instruction “st $ra, cx($rb)”, where cx($rb) is 0x80000, Cpu0 displays the content as follows,
ST : begin
...
if (R[b]+c16 == `IOADDR) begin
outw(R[a]);
lbdex/verilog/cpu0.v
`define SIMULATE_DELAY_SLOT
`define MEMSIZE 'h80000
`define MEMEMPTY 8'hFF
`define NULL 8'h00
`define IOADDR 'h80000 // IO mapping address
// Operand width
`define INT32 2'b11 // 32 bits
`define INT24 2'b10 // 24 bits
`define INT16 2'b01 // 16 bits
`define BYTE 2'b00 // 8 bits
// register name
`define SP R[13] // Stack Pointer
`define LR R[14] // Link Register
`define SW R[15] // Status Word
// C0 register name
`define PC C0R[0] // Program Counter
`define EPC C0R[1] // exception PC value
// SW Flage
next_state = Decode;
end
Decode: begin // Tick 2 : instruction decode, ir = m[PC]
memReadEnd(ir); // IR = dbus = m[PC]
{op,a,b,c} = ir[31:12];
c24 = $signed(ir[23:0]);
c16 = $signed(ir[15:0]);
uc16 = ir[15:0];
c12 = $signed(ir[11:0]);
c5 = ir[4:0];
Ra = R[a];
Rb = R[b];
Rc = R[c];
URa = R[a];
URb = R[b];
URc = R[c];
next_state = Execute;
end
Execute: begin // Tick 3 : instruction execution
case (op)
NOP: ;
// load and store instructions
LD: memReadStart(Rb+c16, `INT32); // LD Ra,[Rb+Cx]; Ra<=[Rb+Cx]
ST: memWriteStart(Rb+c16, Ra, `INT32); // ST Ra,[Rb+Cx]; Ra=>[Rb+Cx]
// LB Ra,[Rb+Cx]; Ra<=(byte)[Rb+Cx]
LB: memReadStart(Rb+c16, `BYTE);
// LBu Ra,[Rb+Cx]; Ra<=(byte)[Rb+Cx]
LBu: memReadStart(Rb+c16, `BYTE);
// SB Ra,[Rb+Cx]; Ra=>(byte)[Rb+Cx]
SB: memWriteStart(Rb+c16, Ra, `BYTE);
LH: memReadStart(Rb+c16, `INT16); // LH Ra,[Rb+Cx]; Ra<=(2bytes)[Rb+Cx]
LHu: memReadStart(Rb+c16, `INT16); // LHu Ra,[Rb+Cx]; Ra<=(2bytes)[Rb+Cx]
// SH Ra,[Rb+Cx]; Ra=>(2bytes)[Rb+Cx]
SH: memWriteStart(Rb+c16, Ra, `INT16);
// Conditional move
MOVZ: if (Rc==0) regSet(a, Rb); // move if Rc equal to 0
MOVN: if (Rc!=0) regSet(a, Rb); // move if Rc not equal to 0
// Mathematic
ADDiu: regSet(a, Rb+c16); // ADDiu Ra, Rb+Cx; Ra<=Rb+Cx
CMP: begin `N=(Rb-Rc<0);`Z=(Rb-Rc==0); end // CMP Rb, Rc; SW=(Rb >=< Rc)
ADDu: regSet(a, Rb+Rc); // ADDu Ra,Rb,Rc; Ra<=Rb+Rc
ADD: begin regSet(a, Rb+Rc); if (a < Rb) `V = 1; else `V = 0;
if (`V) begin `I0 = 1; `I = 1; end
end
// ADD Ra,Rb,Rc; Ra<=Rb+Rc
SUBu: regSet(a, Rb-Rc); // SUBu Ra,Rb,Rc; Ra<=Rb-Rc
SUB: begin regSet(a, Rb-Rc); if (Rb < 0 && Rc > 0 && a >= 0)
`V = 1; else `V =0;
if (`V) begin `I0 = 1; `I = 1; end
end // SUB Ra,Rb,Rc; Ra<=Rb-Rc
CLZ: begin
for (i=0; (i<32)&&((Rb&32'h80000000)==32'h00000000); i=i+1) begin
Rb=Rb<<1;
end
regSet(a, i);
end
CLO: begin
for (i=0; (i<32)&&((Rb&32'h80000000)==32'h80000000); i=i+1) begin
Rb=Rb<<1;
end
regSet(a, i);
end
MUL: regSet(a, Rb*Rc); // MUL Ra,Rb,Rc; Ra<=Rb*Rc
DIVu: regHILOSet(URa%URb, URa/URb); // DIVu URa,URb; HI<=URa%URb;
// LO<=URa/URb
// without exception overflow
DIV: begin regHILOSet(Ra%Rb, Ra/Rb);
if ((Ra < 0 && Rb < 0) || (Ra == 0)) `V = 1;
else `V =0; end // DIV Ra,Rb; HI<=Ra%Rb; LO<=Ra/Rb; With overflow
AND: regSet(a, Rb&Rc); // AND Ra,Rb,Rc; Ra<=(Rb and Rc)
ANDi: regSet(a, Rb&uc16); // ANDi Ra,Rb,c16; Ra<=(Rb and c16)
OR: regSet(a, Rb|Rc); // OR Ra,Rb,Rc; Ra<=(Rb or Rc)
ORi: regSet(a, Rb|uc16); // ORi Ra,Rb,c16; Ra<=(Rb or c16)
XOR: regSet(a, Rb^Rc); // XOR Ra,Rb,Rc; Ra<=(Rb xor Rc)
XORi: regSet(a, Rb^uc16); // XORi Ra,Rb,c16; Ra<=(Rb xor c16)
LUi: regSet(a, uc16<<16);
SHL: regSet(a, Rb<<c5); // Shift Left; SHL Ra,Rb,Cx; Ra<=(Rb << Cx)
SRA: regSet(a, (Rb&'h80000000)|(Rb>>c5));
// Shift Right with signed bit fill;
// SHR Ra,Rb,Cx; Ra<=(Rb&0x80000000)|(Rb>>Cx)
SHR: regSet(a, Rb>>c5); // Shift Right with 0 fill;
// SHR Ra,Rb,Cx; Ra<=(Rb >> Cx)
SHLV: regSet(a, Rb<<Rc); // Shift Left; SHLV Ra,Rb,Rc; Ra<=(Rb << Rc)
SRAV: regSet(a, (Rb&'h80000000)|(Rb>>Rc));
// Shift Right with signed bit fill;
// SHRV Ra,Rb,Rc; Ra<=(Rb&0x80000000)|(Rb>>Rc)
SHRV: regSet(a, Rb>>Rc); // Shift Right with 0 fill;
// SHRV Ra,Rb,Rc; Ra<=(Rb >> Rc)
ROL: regSet(a, (Rb<<c5)|(Rb>>(32-c5))); // Rotate Left;
ROR: regSet(a, (Rb>>c5)|(Rb<<(32-c5))); // Rotate Right;
ROLV: begin // Can set Rc to -32<=Rc<=32 more efficently.
while (Rc < -32) Rc=Rc+32;
while (Rc > 32) Rc=Rc-32;
regSet(a, (Rb<<Rc)|(Rb>>(32-Rc))); // Rotate Left;
end
RORV: begin
while (Rc < -32) Rc=Rc+32;
while (Rc > 32) Rc=Rc-32;
regSet(a, (Rb>>Rc)|(Rb<<(32-Rc))); // Rotate Right;
end
MFLO: regSet(a, LO); // MFLO Ra; Ra<=LO
MFHI: regSet(a, HI); // MFHI Ra; Ra<=HI
MTLO: LO = Ra; // MTLO Ra; LO<=Ra
MTHI: HI = Ra; // MTHI Ra; HI<=Ra
MULT: {HI, LO}=Ra*Rb; // MULT Ra,Rb; HI<=((Ra*Rb)>>32);
// LO<=((Ra*Rb) and 0x00000000ffffffff);
// with exception overflow
MULTu: {HI, LO}=URa*URb; // MULT URa,URb; HI<=((URa*URb)>>32);
// LO<=((URa*URb) and 0x00000000ffffffff);
// without exception overflow
MFC0: regSet(a, C0R[b]); // MFC0 a, b; Ra<=C0R[Rb]
MTC0: C0regSet(a, Rb); // MTC0 a, b; C0R[a]<=Rb
C0MOV: C0regSet(a, C0R[b]); // C0MOV a, b; C0R[a]<=C0R[b]
`ifdef CPU0II
// set
SLT: if (Rb < Rc) R[a]=1; else R[a]=0;
case (op)
MULT, MULTu, DIV, DIVu, MTHI, MTLO :
if (`D)
$display("%4dns %8x : %8x HI=%8x LO=%8x SW=%8x", $stime, pc0, ir, HI,
LO, `SW);
ST : begin
if (`D)
$display("%4dns %8x : %8x m[%-04x+%-04x]=%8x SW=%8x", $stime, pc0,
ir, R[b], c16, R[a], `SW);
if (R[b]+c16 == `IOADDR) begin
outw(R[a]);
end
end
SB : begin
if (`D)
$display("%4dns %8x : %8x m[%-04x+%-04x]=%c SW=%8x, R[a]=%8x",
$stime, pc0, ir, R[b], c16, R[a][7:0], `SW, R[a]);
if (R[b]+c16 == `IOADDR) begin
if (`LE)
outc(R[a][7:0]);
else
outc(R[a][7:0]);
end
end
MFC0, MTC0 :
if (`D)
$display("%4dns %8x : %8x R[%02d]=%-8x C0R[%02d]=%-8x SW=%8x",
$stime, pc0, ir, a, R[a], a, C0R[a], `SW);
C0MOV :
if (`D)
$display("%4dns %8x : %8x C0R[%02d]=%-8x C0R[%02d]=%-8x SW=%8x",
$stime, pc0, ir, a, C0R[a], b, C0R[b], `SW);
default :
if (`D) // Display the written register content
$display("%4dns %8x : %8x R[%02d]=%-8x SW=%8x", $stime, pc0, ir,
a, R[a], `SW);
endcase
if (`PC < 0) begin
$display("total cpu cycles = %-d", cycles);
$display("RET to PC < 0, finished!");
$finish;
end
next_state = Fetch;
end
endcase
end endtask
m_en = 0;
state = Fetch;
end else if (inExe == 0 && itype == `RESET) begin
// Condition itype == `RESET must after the other `IE condition
taskInterrupt(`RESET);
`M = `RESET;
state = Fetch;
end else begin
`ifdef TRACE
`D = 1; // Trace register content at beginning
`endif
taskExecute();
state = next_state;
end
pc = `PC;
cycles = cycles + 1;
end
endmodule
integer i;
initial begin
// erase memory
for (i=0; i < `MEMSIZE; i=i+1) begin
m[i] = `MEMEMPTY;
end
// load config from file to memory
$readmemh("cpu0.config", mconfig);
// load program from file to memory
$readmemh("cpu0.hex", m);
// display memory contents
`ifdef TRACE
for (i=0; i < `MEMSIZE && (m[i] != `MEMEMPTY || m[i+1] != `MEMEMPTY ||
m[i+2] != `MEMEMPTY || m[i+3] != `MEMEMPTY); i=i+4) begin
$display("%8x: %8x", i, {m[i], m[i+1], m[i+2], m[i+3]});
end
`endif
end
module main;
reg clock;
reg [2:0] itype;
wire [2:0] tick;
wire [31:0] pc, ir, mar, mdr, dbus;
wire m_en, m_rw;
wire [1:0] m_size;
wire cfg;
initial
begin
clock = 0;
itype = `RESET;
#300000000 $finish;
end
endmodule
lbdex/verilog/Makefile
#TRACE=-D TRACE
all:
iverilog ${TRACE} -o cpu0Is cpu0.v
iverilog ${TRACE} -D CPU0II -o cpu0IIs cpu0.v
.PHONY: clean
clean:
rm -rf cpu0.hex cpu0Is cpu0IIs
rm -f *~ cpu0.config
Since Cpu0 Verilog machine supports both big and little endian, the memory and cpu module both have a wire connect-
ting each other. The endian information stored in ROM of memory module, and memory module send the information
when it is up according the following code,
lbdex/verilog/cpu0.v
Instead of setting endian tranfer in memory module, the endian transfer can also be set in CPU module, and memory
moudle always return with big endian. I am not an professional engineer in FPGA/CPU hardware design. But ac-
cording book “Computer Architecture: A Quantitative Approach”, some operations may have no tolerance in time of
execution stage. Any endian swap will make the clock cycle time longer and affect the CPU performance. So, I set
the endian transfer in memory module. In system with bus, it will be set in bus system I think.
Now let’s compile ch_run_backend.cpp as below. Since code size grows up from low to high address and stack grows
up from high to low address. $sp is set at 0x7fffc because assuming cpu0.v uses 0x80000 bytes of memory.
lbdex/input/start.h
#define SET_SW \
asm("andi $sw, $zero, 0"); \
asm("ori $sw, $sw, 0x1e00"); // enable all interrupts
#define initRegs() \
asm("addiu $1, $zero, 0"); \
asm("addiu $2, $zero, 0"); \
asm("addiu $3, $zero, 0"); \
asm("addiu $4, $zero, 0"); \
asm("addiu $5, $zero, 0"); \
asm("addiu $t9, $zero, 0"); \
asm("addiu $7, $zero, 0"); \
asm("addiu $8, $zero, 0"); \
asm("addiu $9, $zero, 0"); \
asm("addiu $10, $zero, 0"); \
SET_SW; \
asm("addiu $fp, $zero, 0");
lbdex/input/boot.cpp
#include "start.h"
// boot:
asm("boot:");
// asm("_start:");
asm("jmp 12"); // RESET: jmp RESET_START;
asm("jmp 4"); // ERROR: jmp ERR_HANDLE;
asm("jmp 4"); // IRQ: jmp IRQ_HANDLE;
asm("jmp -4"); // ERR_HANDLE: jmp ERR_HANDLE; (loop forever)
// RESET_START:
initRegs();
asm("addiu $gp, $ZERO, 0");
asm("addiu $lr, $ZERO, -1");
lbdex/input/print.h
#ifndef _PRINT_H_
#define _PRINT_H_
lbdex/input/print.cpp
#include "print.h"
#include "itoa.cpp"
// For memory IO
void print_char(const char c)
{
char *p = (char*)OUT_MEM;
*p = c;
return;
}
return;
}
// For memory IO
void print_integer(int x)
{
char str[INT_DIGITS + 2];
itoa(str, x);
print_string(str);
return;
}
lbdex/input/ch_nolld.h
#include "debug.h"
#include "boot.cpp"
#include "print.h"
int test_nolld();
lbdex/input/ch_nolld.cpp
#define TEST_ROXV
#define RUN_ON_VERILOG
#include "print.cpp"
#include "ch4_1_math.cpp"
#include "ch4_1_rotate.cpp"
#include "ch4_1_mult2.cpp"
#include "ch4_1_mod.cpp"
#include "ch4_1_div.cpp"
#include "ch4_2_logic.cpp"
#include "ch7_1_localpointer.cpp"
#include "ch7_1_char_short.cpp"
#include "ch7_1_bool.cpp"
#include "ch7_1_longlong.cpp"
#include "ch7_1_vector.cpp"
#include "ch8_1_ctrl.cpp"
#include "ch8_2_deluselessjmp.cpp"
#include "ch8_2_select.cpp"
#include "ch9_3_vararg.cpp"
#include "ch9_3_stacksave.cpp"
#include "ch9_3_bswap.cpp"
#include "ch9_3_alloc.cpp"
#include "ch11_2.cpp"
void test_asm_build()
{
#include "ch11_1.cpp"
#ifdef CPU032II
#include "ch11_1_2.cpp"
#endif
}
int test_rotate()
{
int a = test_rotate_left1(); // rolv 4, 30 = 1
int b = test_rotate_left(); // rol 8, 30 = 2
int c = test_rotate_right(); // rorv 1, 30 = 4
return (a+b+c);
}
int test_nolld()
{
bool pass = true;
int a = 0;
a = test_math();
print_integer(a); // a = 74
if (a != 74) pass = false;
a = test_rotate();
print_integer(a); // a = 7
if (a != 7) pass = false;
a = test_mult();
print_integer(a); // a = 0
if (a != 0) pass = false;
a = test_mod();
print_integer(a); // a = 0
if (a != 0) pass = false;
a = test_div();
print_integer(a); // a = 253
if (a != 253) pass = false;
a = test_local_pointer();
print_integer(a); // a = 3
if (a != 3) pass = false;
a = (int)test_load_bool();
print_integer(a); // a = 1
if (a != 1) pass = false;
a = test_andorxornot();
print_integer(a); // a = 14
if (a != 14) pass = false;
a = test_setxx();
print_integer(a); // a = 3
if (a != 3) pass = false;
a = test_signed_char();
print_integer(a); // a = -126
if (a != -126) pass = false;
a = test_unsigned_char();
print_integer(a); // a = 130
if (a != 130) pass = false;
a = test_signed_short();
print_integer(a); // a = -32766
if (a != -32766) pass = false;
a = test_unsigned_short();
print_integer(a); // a = 32770
if (a != 32770) pass = false;
long long b = test_longlong();
print_integer((int)(b >> 32)); // 393307
if ((int)(b >> 32) != 393307) pass = false;
print_integer((int)b); // 16777222
if ((int)(b) != 16777222) pass = false;
a = test_cmplt_short();
print_integer(a); // a = 2
if (a != 2) pass = false;
a = test_cmplt_long();
print_integer(a); // a = 4
if (a != 4) pass = false;
a = test_control1();
print_integer(a); // a = 51
if (a != 51) pass = false;
a = test_DelUselessJMP();
print_integer(a); // a = 2
if (a != 2) pass = false;
a = test_movx_1();
print_integer(a); // a = 3
if (a != 3) pass = false;
a = test_movx_2();
print_integer(a); // a = 1
if (a != 1) pass = false;
print_integer(2147483647); // test mod % (mult) from itoa.cpp
print_integer(-2147483648); // test mod % (multu) from itoa.cpp
a = test_vararg();
print_integer(a); // a = 15
if (a != 15) pass = false;
a = test_stacksaverestore(100);
print_integer(a); // a = 5
if (a != 5) pass = false;
a = test_bswap();
print_integer(a); // a = 0
if (a != 0) pass = false;
a = test_alloc();
print_integer(a); // a = 31
if (a != 31) pass = false;
a = test_inlineasm();
print_integer(a); // a = 49
if (a != 49) pass = false;
return pass;
}
lbdex/input/ch_run_backend.cpp
#include "ch_nolld.h"
int main()
{
bool pass = true;
pass = test_nolld();
return pass;
}
#include "ch_nolld.cpp"
lbdex/input/functions.sh
prologue() {
if [ $argNum == 0 ]; then
echo "useage: bash $sh_name cpu_type endian"
echo " cpu_type: cpu032I or cpu032II"
echo " endian: be (big endian, default) or le (little endian)"
echo "for example:"
echo " bash build-slinker.sh cpu032I be"
exit 1;
fi
if [ $arg1 != cpu032I ] && [ $arg1 != cpu032II ]; then
echo "1st argument is cpu032I or cpu032II"
exit 1
fi
OS=`uname -s`
echo "OS =" ${OS}
CPU=$arg1
echo "CPU =" "${CPU}"
bash clean.sh
}
isLittleEndian() {
echo "endian = " "$endian"
if [ "$endian" == "LittleEndian" ] ; then
le="true"
elif [ "$endian" == "BigEndian" ] ; then
le="false"
else
echo "!endian unknown"
exit 1
fi
}
elf2hex() {
${TOOLDIR}/llvm-objdump -elf2hex -le=${le} a.out > ../verilog/cpu0.hex
if [ ${le} == "true" ] ; then
echo "1 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
else
echo "0 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
}
epilogue() {
endian=`${TOOLDIR}/llvm-readobj -h a.out|grep "DataEncoding"|awk '{print $2}'`
isLittleEndian;
elf2hex;
}
lbdex/input/build-run_backend.sh
#!/usr/bin/env bash
source functions.sh
sh_name=build-run_backend.sh
argNum=$#
arg1=$1
arg2=$2
DEFFLAGS=""
if [ "$arg1" == cpu032II ] ; then
DEFFLAGS=${DEFFLAGS}" -DCPU032II"
fi
echo ${DEFFLAGS}
prologue;
> ../verilog/cpu0.hex
if [ "$arg2" == le ] ; then
echo "1 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
else
echo "0 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
To run program without linker implementation at this point, the boot.cpp must be set at the beginning of code, and the
main() of ch_run_backend.cpp comes immediately after it. Let’s run Chapter11_2/ with llvm-objdump -d for
input file ch_run_backend.cpp to generate the hex file via build-run_bacekend.sh, then feed hex file to cpu0Is Verilog
simulator to get the output result as below. Remind ch_run_backend.cpp have to be compiled with option clang
-target mips-unknown-linux-gnu since the example code ch9_3_vararg.cpp which uses the vararg needs
to be compiled with this option. Other example codes have no differences between this option and default option.
The “total cpu cycles” is calculated in this verilog simualtor so that the backend compiler and CPU performance can
be reviewed. Only the CPU cycles are counted in this implemenation since I/O cycles time is unknown. As explained
in chapter “Control flow statements”, cpu032II which uses instructions slt and beq has better performance than cmp
and jeq in cpu032I. Instructions “jmp” has no delay slot so it is better used in dynamic linker implementation.
You can trace the memory binary code and destination register changed at every instruction execution by unmark
TRACE in Makefile as below,
lbdex/verilog/Makefile
TRACE=-D TRACE
00000000: 2600000c
00000004: 26000004
00000008: 26000004
0000000c: 26fffffc
00000010: 09100000
00000014: 09200000
...
taskInterrupt(001)
1530ns 00000054 : 02ed002c m[28620+44 ]=-1 SW=00000000
1610ns 00000058 : 02bd0028 m[28620+40 ]=0 SW=00000000
...
RET to PC < 0, finished!
As above result, cpu0.v dumps the memory first after reading input file cpu0.hex. Next, it runs instructions from
address 0 and print each destination register value in the fourth column. The first column is the nano seconds of
timing. The second is instruction address. The third is instruction content. Now, most example codes depicted in the
previous chapters are verified by print the variable with print_integer().
Since the cpu0.v machine is created by Verilog language, suppose it can run on real FPGA device (but I never do
it). The real output hardware interface/port is hardware output device dependent, such as RS232, speaker, LED, ....
You should implement the I/O interface/port when you want to program FPGA and wire I/O device to the I/O port.
Through running the compiled code on Verilog simulator, Cpu0 backend compiled result and CPU cycles are verified
and calculated. Currently, this Cpu0 Verilog program is not a pipeline architecture, but according the instruction set it
can be implemented as a pipeline model. The cycle time of Cpu0 pipeline model is more than 1/5 of “total cpu cycles”
displayed as above since there are dependences exist between instructions. Though the Verilog simulator is slow in
running the whole system program and not include the cycles counting in cache and I/O, it is a simple and easy way to
verify your idea about CPU design at early stage with small program pattern. The overall system simulator is complex
to create. Even wiki web site here 7 include tools for creating the simulator, it needs a lot of effort.
To generate cpu032I as well as little endian code, you can run with the following command. File build-run_backend.sh
write the endian information to ../verilog/cpu0.config as below.
../verilog/cpu0.config
lbdex/input/ch_nolld2.h
#include "debug.h"
#include "boot.cpp"
#include "print.h"
int test_nolld2();
7 https://en.wikipedia.org/wiki/Computer_architecture_simulator
lbdex/input/ch_nolld2.cpp
#include "print.cpp"
#include "ch9_3_alloc.cpp"
int test_nolld2()
{
bool pass = true;
int a = 0;
a = test_alloc();
print_integer(a); // a = 31
if (a != 31) pass = false;
return pass;
}
lbdex/input/ch_run_backend2.cpp
#include "ch_nolld2.h"
int main()
{
bool pass = true;
pass = test_nolld2();
return pass;
}
#include "ch_nolld2.cpp"
lbdex/input/build-run_backend2.sh
#!/usr/bin/env bash
source functions.sh
sh_name=build-run_backend.sh
argNum=$#
arg1=$1
arg2=$2
DEFFLAGS=""
if [ "$arg1" == cpu032II ] ; then
DEFFLAGS=${DEFFLAGS}" -DCPU032II"
fi
echo ${DEFFLAGS}
prologue;
if [ "$arg2" == le ] ; then
echo "1 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
else
echo "0 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
You can find the Cpu0 ELF linker implementation based on lld which is the llvm official linker project, as well as
elf2hex which modified from llvm-objdump driver at web: http://jonathan2251.github.io/lbt/index.html.
FOURTEEN
Cpu0 example code, lbdex, can be found at near left bottom of this web site. Or here http://jonathan2251.github.io/
lbd/lbdex.tar.gz.
In this chapter, we will run through how to set up LLVM using if you are using Mac OS X or Linux. For information
on using cmake to build LLVM, please refer to the “Building LLVM with CMake” 1 documentation for further
information.
We will install two llvm directories in this chapter. One is the directory ~/llvm/release/ which contains the clang and
clang++ compiler we will use to translate the C/C++ input file into llvm IR. The other is the directory ~/llvm/test/
which contains our cpu0 backend program without clang and clang++.
This chapter details the installation of related software for this book. If you are know well in llvm/clang installation or
think it is too details, you can run the bash script files after you install the Xcode and cmake as follows,
1 http://llvm.org/docs/CMake.html?highlight=cmake
567
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
lbdex/install_llvm/get-llvm.sh
#!/usr/bin/env bash
VERSION=3.9.0
# Download address can be gotten from "Copy link location" of right clicking
# mouse on firefox browser on llvm.org download page.
curl -O http://llvm.org/releases/${VERSION}/llvm-${VERSION}.src.tar.xz
curl -O http://llvm.org/releases/${VERSION}/cfe-${VERSION}.src.tar.xz
lbdex/install_llvm/build-llvm-lbdex.sh
#!/usr/bin/env bash
VERSION=3.9.0
LLVM_DIR=~/llvm
LLVM_RELEASE_DIR=${LLVM_DIR}/release
LLVM_TEST_DIR=${LLVM_DIR}/test
if [ -e /proc/cpuinfo ]; then
export procs=`cat /proc/cpuinfo | grep processor | wc -l`
else
export procs=1
fi
568 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
The Xcode include clang and llvm already. The following three sub-sections are needless. List them just for readers
who like to build clang and llvm with cmake GUI interface.
Todo
Fix centering for figure captions.
Install Xcode from the Mac App Store. Then install cmake, which can be found here: 3 . Before installing cmake,
ensure you can install applications you download from the Internet. Open System Preferences → Security & Privacy.
Click the lock to make changes, and under “Allow applications downloaded from:” select the radio button next to
“Anywhere.” See Fig. 14.1 below for an illustration. You may want to revert this setting after installing cmake.
Alternatively, you can mount the cmake .dmg image file you downloaded. Untar the latest cmake for Darwin, copy
the cmake /Applications/ and set PATH as follows,
3 http://www.cmake.org/cmake/resources/software.html
570 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
These tools mentioned in this section is for coding and debug. You can work even without these tools. Files compare
tools Kdiff3 came from web site 7 . FileMerge is a part of Xcode, you can type FileMerge in Finder – Applications as
Fig. 14.2 and drag it into the Dock as Fig. 14.3.
Download tool Graphviz for display llvm IR nodes in debugging, 8 . We choose mountainlion as Fig. 14.4 since our
iMac is Mountain Lion.
After install Graphviz, please set the path to .profile. For example, we install the Graphviz in directory /Applica-
tions/Graphviz.app/Contents/MacOS/, so add this path to /User/Jonathan/.profile as follows,
118-165-12-177:InputFiles Jonathan$ cat /Users/Jonathan/.profile
export PATH=$PATH:/Applications/Xcode.app/Contents/bin:
/Applications/Graphviz.app/Contents/MacOS/:/Users/Jonathan/llvm/release/
cmake_release_build/Debug/bin
The Graphviz information for llvm is at section “SelectionDAG Instruction Selection Process” ” of “The LLVM Target-
Independent Code Generator” here 9 and at section “Viewing graphs while debugging code” of “LLVM Programmer’s
Manual” here 10 . TextWrangler is for edit file with line number display and dump binary file like the obj file, *.o, that
7 http://kdiff3.sourceforge.net
8 http://www.graphviz.org/Download_macos.php
9 http://llvm.org/docs/CodeGenerator.html#selectiondag-instruction-selection-process
10 http://llvm.org/docs/ProgrammersManual.html#viewing-graphs-while-debugging-code
will be generated in chapter of Generating object files if you havn’t gobjdump available. You can download from App
Store. To dump binary file, first, open the binary file, next, select menu “File – Hex Front Document” as Fig. 14.5.
Then select “Front document’s file” as Fig. 14.6.
Install binutils by command brew install binutils as follows,
Xcode include clang execution file to compile code already, but if the version of Xcode’s clang is not as new as the
llvm we want to install later, then we need to install and build the clang with llvm as this sub-section.
Please download LLVM latest release version 3.9 (llvm, clang) from the “LLVM Download Page” 2 . Then extract them
using tar -xvf {llvm-3.9.0.src.tar.xz,cfe-3.9.0.src.tar.xz} , and change the llvm source
code root directory into src. After that, move the clang source code to src/tools/clang as shown as follows. The
2 http://llvm.org/releases/download.html#3.9
572 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
compiler-rt should not installed in iMac OS X 10.9 and Xcode 5.x. If you did as clang installation web document, it
will has compiler error.
118-165-78-111:Downloads Jonathan$ tar -xvf cfe-3.9.0.src.tar.xz
118-165-78-111:Downloads Jonathan$ tar -xvf llvm-3.9.0.src.tar.xz
118-165-78-111:Downloads Jonathan$ mv llvm-3.9.0.src src
118-165-78-111:Downloads Jonathan$ mv cfe-3.9.0.src src/tools/clang
118-165-78-111:Downloads Jonathan$ pwd
/Users/Jonathan/Downloads
118-165-78-111:Downloads Jonathan$ ls
cfe-3.9.0.src.tar.xz llvm-3.9.0.src.tar.xz
src
118-165-78-111:Downloads Jonathan$ ls src/tools/
CMakeLists.txt clang llvm-as llvm-dis llvm-mcmarkup
llvm-readobj llvm-stub LLVMBuild.txt gold llvm-bcanalyzer
llvm-dwarfdump llvm-nm llvm-rtdyld lto Makefile
llc llvm-config llvm-extract llvm-objdump llvm-shlib
macho-dump bugpoint lli llvm-cov llvm-link
llvm-prof llvm-size opt bugpoint-passes llvm-ar
llvm-diff llvm-mc llvm-ranlib llvm-stress
Next, copy the LLVM source to /Users/Jonathan/llvm/release/src by executing the terminal command cp -rf
/Users/Jonathan/Downloads/src /Users/Jonathan/llvm/release/. .
We installed llvm source code with clang on directory /Users/Jonathan/llvm/release/ in last section. Now, will generate
the LLVM.xcodeproj in this chapter.
114-43-213-176:release Jonathan$ pwd
/Users/Jonathan/llvm/release
114-43-213-176:release Jonathan$ mkdir cmake_release_build
114-43-213-176:release Jonathan$ cd cmake_release_build
114-43-213-176:cmake_release_build Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_FLAGS=-std=c++11 -DCMAKE_BUILD_TYPE=Debug
-G "Xcode" ../src
...
114-43-213-176:cmake_release_build Jonathan$ ls
... LLVM.xcodeproj
Now, LLVM.xcodeproj is created. Open the cmake_release_build/LLVM.xcodeproj by Xcode and click menu “Prod-
uct – Build” as Fig. 14.7.
After few minutes of build, the clang, llc, llvm-as, ..., can be found in cmake_release_build/Debug/bin/ as follows.
118-165-78-111:cmake_release_build Jonathan$ cd Debug/bin/
118-165-78-111:bin Jonathan$ pwd
/Users/Jonathan/llvm/release/cmake_release_build/Debug/bin
118-165-78-111:bin Jonathan$ ls
...
clang
...
llc
...
574 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
llvm-as
...
To access those execution files, edit .profile (if you .profile not exists, please create file .profile), save .profile to
/Users/Jonathan/, and enable $PATH by command source .profile as follows. Please add path /Applica-
tions//Xcode.app/Contents/Developer/usr/bin to .profile if you didn’t add it after Xcode download.
We have installed llvm with clang on directory llvm/release/. Now, we want to install llvm with our cpu0 backend
code on directory llvm/test/ in this section.
This book is on the process of merging into llvm trunk but not finished yet. The merged llvm trunk version on lbd git
hub is LLVM 3.9 released version. The lbd of Cpu0 example code is also based on llvm 3.9. So, please install the
llvm 3.9 debug version as the llvm release 3.9 installation, but without clang since the clang will waste time in build
the Cpu0 backend tutorial code. Steps as follows,
The details of installing Cpu0 backend example code as follows,
118-165-78-111:llvm Jonathan$ mkdir test
118-165-78-111:llvm Jonathan$ cd test
118-165-78-111:test Jonathan$ pwd
/Users/Jonathan/llvm/test
118-165-78-111:test Jonathan$ cp /Users/Jonathan/Downloads/llvm-3.9.0.src.tar.xz .
118-165-78-111:test Jonathan$ tar -xvf llvm-3.9.0.src.tar.xz
118-165-78-111:test Jonathan$ mv llvm-3.9.0.src src
118-165-78-111:test Jonathan$ cp /Users/Jonathan/Downloads/
lbdex.tar.gz .
118-165-78-111:test Jonathan$ tar -zxvf lbdex.tar.gz
118-165-78-111:test Jonathan$ cp -rf lbdex/src/modify/src/* src/.
118-165-78-111:test Jonathan$ grep -R "Cpu0" src/include
...
src/include/llvm/MC/MCExpr.h: VK_Cpu0_GPREL,
src/include/llvm/MC/MCExpr.h: VK_Cpu0_GOT_CALL,
src/include/llvm/MC/MCExpr.h: VK_Cpu0_GOT16,
src/include/llvm/MC/MCExpr.h: VK_Cpu0_GOT,
src/include/llvm/MC/MCExpr.h: VK_Cpu0_ABS_HI,
src/include/llvm/MC/MCExpr.h: VK_Cpu0_ABS_LO,
...
src/lib/MC/MCExpr.cpp: case VK_Cpu0_GOT_PAGE: return "GOT_PAGE";
src/lib/MC/MCExpr.cpp: case VK_Cpu0_GOT_OFST: return "GOT_OFST";
src/lib/Target/LLVMBuild.txt:subdirectories = ARM CellSPU CppBackend Hexagon
MBlaze MSP430 NVPTX Mips Cpu0 PowerPC Sparc X86 XCore
118-165-78-111:test Jonathan$
Next, please copy Cpu0 example code according the following commands,
118-165-78-111:test Jonathan$ pwd
/Users/Jonathan/llvm/test
118-165-78-111:test Jonathan$ cp -rf lbdex/Cpu0 src/lib/Target/.
118-165-78-111:test Jonathan$ ls src/lib/Target/Cpu0
CMakeLists.txt Cpu0InstrInfo.td Cpu0TargetMachine.cpp
˓→TargetInfo
Now, it’s ready for building llvm/test/src code by command cmake as follows.
118-165-78-111:test Jonathan$ pwd
/Users/Jonathan/llvm/test
118-165-78-111:test Jonathan$ ls
src
118-165-78-111:test Jonathan$ mkdir cmake_debug_build
118-165-78-111:test Jonathan$ cd cmake_debug_build/
118-165-78-111:cmake_debug_build Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD=Cpu0
-G "Xcode" ../src/
-- The C compiler identification is Clang 5.0
-- The CXX compiler identification is Clang 5.0
-- Check for working C compiler using: Xcode
...
576 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
-- Targeting Cpu0
...
-- Performing Test SUPPORTS_GLINE_TABLES_ONLY_FLAG
-- Performing Test SUPPORTS_GLINE_TABLES_ONLY_FLAG - Success
-- Performing Test SUPPORTS_NO_C99_EXTENSIONS_FLAG
-- Performing Test SUPPORTS_NO_C99_EXTENSIONS_FLAG - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/Jonathan/llvm/test/cmake_debug_build
118-165-78-111:cmake_debug_build Jonathan$
Now, you can build this llvm build with Cpu0 backend only by Xcode.
On iMac, tt also can do cmake and make with `cmake -G "Unix Makefiles" same as the Linux as the
following section.
Since Xcode use clang compiler and lldb instead of gcc and gdb, we can run lldb debug as follows,
118-165-65-128:InputFiles Jonathan$ pwd
/Users/Jonathan/lbdex/InputFiles
118-165-65-128:InputFiles Jonathan$ clang -c ch3.cpp -emit-llvm -o ch3.bc
118-165-65-128:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc -march=mips -relocation-model=pic -filetype=asm
ch3.bc -o ch3.mips.s
118-165-65-128:InputFiles Jonathan$ lldb -- /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc -march=mips -relocation-model=pic -filetype=
asm ch3.bc -o ch3.mips.s
Current executable set to '/Users/Jonathan/llvm/test/cmake_debug_build/bin/
Debug/llc' (x86_64).
(lldb) b MipsTargetInfo.cpp:19
breakpoint set --file 'MipsTargetInfo.cpp' --line 19
Breakpoint created: 1: file ='MipsTargetInfo.cpp', line = 19, locations = 1
(lldb) run
Process 6058 launched: '/Users/Jonathan/llvm/test/cmake_debug_build/Debug/bin/
llc' (x86_64)
Process 6058 stopped
* thread #1: tid = 0x1c03, 0x000000010077f231 llc`LLVMInitializeMipsTargetInfo
+ 33 at MipsTargetInfo.cpp:20, stop reason = breakpoint 1.1
frame #0: 0x000000010077f231 llc`LLVMInitializeMipsTargetInfo + 33 at
MipsTargetInfo.cpp:20
17
18 extern "C" void LLVMInitializeMipsTargetInfo() {
19 RegisterTarget<Triple::mips,
-> 20 /*HasJIT=*/true> X(TheMipsTarget, "mips", "Mips");
21
22 RegisterTarget<Triple::mipsel,
23 /*HasJIT=*/true> Y(TheMipselTarget, "mipsel", "Mipsel");
(lldb) n
Process 6058 stopped
* thread #1: tid = 0x1c03, 0x000000010077f24f llc`LLVMInitializeMipsTargetInfo
+ 63 at MipsTargetInfo.cpp:23, stop reason = step over
frame #0: 0x000000010077f24f llc`LLVMInitializeMipsTargetInfo + 63 at
MipsTargetInfo.cpp:23
20 /*HasJIT=*/true> X(TheMipsTarget, "mips", "Mips");
21
22 RegisterTarget<Triple::mipsel,
-> 23 /*HasJIT=*/true> Y(TheMipselTarget, "mipsel", "Mipsel");
24
25 RegisterTarget<Triple::mips64,
Download the snapshot version of Icarus Verilog tool from web site, ftp://icarus.com/pub/eda/verilog/snapshots or go
to http://iverilog.icarus.com/ and click snapshot version link. Follow the INSTALL file guide to install it.
Download Graphviz from 11 according your Linux distribution. Files compare tools Kdiff3 came from web site 7 .
After cmake, run command make , then you can get clang, llc, llvm-as, ..., in cmake_release_build/bin/ after a few
tens minutes of build. To speed up make process via SMP power, please check your core numbers by the following
command then do make the next.
4 http://lldb.llvm.org/lldb-gdb.html
5 http://lldb.llvm.org/
11 http://www.graphviz.org/Download.php
578 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
PATH=$PATH:/usr/local/sphinx/bin:~/llvm/release/cmake_release_build/bin:
/opt/mips_linux_toolchain_clang/mips_linux_toolchain/bin:$HOME/.local/bin:
$HOME/bin
export PATH
[Gamma@localhost ~]$ source .bash_profile
[Gamma@localhost ~]$ $PATH
bash: /usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:
/usr/sbin:/usr/local/sphinx/bin:/home/Gamma/.local/bin:/home/Gamma/bin:
/usr/local/sphinx/bin:/home/cschen/llvm/release/cmake_release_build/bin
This book is on the process of merging into llvm trunk but not finished yet. The merged llvm trunk version on lbd git
hub is LLVM 3.9 released version. The Cpu0 example code is also based on llvm 3.9. So, please install the llvm 3.9
debug version as the llvm release 3.9 installation, but without clang since the clang will waste time in build the Cpu0
backend tutorial code. Steps as follows,
The details of installing Cpu0 backend example code according the following list steps, and the corresponding com-
mands shown as below,
1. Enter ~/llvm/test/ and get Cpu0 example code as well as the llvm 3.9.
2. Make dir Cpu0 in src/lib/Target and download example code.
3. Update llvm modified source files to support cpu0 by command cp -rf lbdex/src/modify/src/*
src/. .
4. Check step 3 is effective by command , grep -R "Cpu0" . | more` . We add the Cpu0 backend
support, so check with grep.
5. Copy Cpu0 bakend code by command, cp -rf lbdex/Cpu0 src/lib/Target/. .
6. Remove clang from ~/llvm/test/src/tools/clang, and mkdir test/cmake_debug_build. Otherwise you will waste
extra time for command make in Cpu0 example code build with clang.
/home/cschen/llvm/test
[Gamma@localhost test]$ cp /home/Gamma/Downloads/llvm-3.9.0.src.tar.xz .
[Gamma@localhost test]$ tar -xvf llvm-3.9.0.src.tar.xz
[Gamma@localhost test]$ mv llvm-3.9.0.src src
[Gamma@localhost test]$ cp /Users/Jonathan/Downloads/
lbdex.tar.gz .
[Gamma@localhost test]$ tar -zxvf lbdex.tar.gz
...
[Gamma@localhost test]$ cp -rf lbdex/src/modify/src/* src/.
[Gamma@localhost test]$ grep -R "cpu0" src/include
src/include//llvm/ADT/Triple.h: cpu0, // For Tutorial Backend Cpu0
src/include//llvm/MC/MCExpr.h: VK_Cpu0_GPREL,
src/include//llvm/MC/MCExpr.h: VK_Cpu0_GOT_CALL,
...
[Gamma@localhost test]$ cp -rf lbdex/Cpu0 src/lib/Target/.
[Gamma@localhost test]$ ls src/lib/Target/Cpu0
AsmParser Cpu0RegisterInfoGPROutForAsm.td
CMakeLists.txt Cpu0RegisterInfoGPROutForOther.td
...
Now, create directory cmake_debug_build and do cmake just like build the llvm/release, except we do Debug build
with Cpu0 backend only, and use clang as our compiler instead, as follows,
[Gamma@localhost test]$ pwd
/home/cschen/llvm/test
[Gamma@localhost test]$ mkdir cmake_debug_build
[Gamma@localhost test]$ cd cmake_debug_build/
[Gamma@localhost cmake_debug_build]$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD=Cpu0
-G "Unix Makefiles" ../src/
-- The C compiler identification is Clang 3.9.0
-- The CXX compiler identification is Clang 3.9.0
-- Check for working C compiler: /home/cschen/llvm/release/cmake_release_build/bin/
clang
-- Check for working C compiler: /home/cschen/llvm/release/cmake_release_build/bin/
clang
-- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /home/cschen/llvm/release/cmake_release_build/
bin/clang++
-- Check for working CXX compiler: /home/cschen/llvm/release/cmake_release_build/
bin/clang++
-- works
...
-- Targeting Mips
-- Targeting Cpu0
...
-- Configuring done
-- Generating done
-- Build files have been written to: /home/cschen/llvm/test/cmake_debug
_build
[Gamma@localhost cmake_debug_build]$
580 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Breakpoint 1, LLVMInitializeMipsTargetInfo ()
at /home/cschen/llvm/test/src/lib/Target/Mips/TargetInfo/MipsTargetInfo.cpp:20
20 /*HasJIT=*/true> X(TheMipsTarget, "mips", "Mips");
(gdb) next
23 /*HasJIT=*/true> Y(TheMipselTarget, "mipsel", "Mipsel");
(gdb) print X
$1 = {<No data fields>}
(gdb) quit
A debugging session is active.
Quit anyway? (y or n) y
[Gamma@localhost InputFiles]$
582 Chapter 14. Appendix A: Getting Started: Installing LLVM and the Cpu0 example code
CHAPTER
FIFTEEN
• Cpu0 document
– Install sphinx
– Install pip and update Sphinx version
– Generate Cpu0 document
– About Cpu0 document
• Cpu0 Regression Test
LLVM and this book use sphinx to generate html document. This book uses Sphinx to generate pdf and epub format
of document further. Sphinx uses restructured text format here 3 4 5 . The installation of Sphinx reference 1 . About the
code-block in this document, please reference 6 7 .
On iMac or linux you can install as follows,
Above installaton can generate html document but not for pdf. To support pdf/latex document generated as follows,
On iMac, install MacTex.pkg from here 2 .
On Linux, install texlive as follows,
3 http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html
4 http://docutils.sourceforge.net/docs/ref/rst/directives.html
5 http://docutils.sourceforge.net/rst.html
1 http://docs.geoserver.org/latest/en/docguide/install.html
6 http://llvm.org/docs/SphinxQuickstartTemplate.html
7 http://pygments.org/docs/lexers/
2 http://www.tug.org/mactex/
583
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
or
sudo yum install texlive texlive-latex-extra
On Fedora 17, the texlive-latex-extra is missing. We install the package which include the pdflatex instead. For
instance, we install pdfjam on Fedora 17 as follows,
[root@localhost lbd]$ yum list pdfjam
Loaded plugins: langpacks, presto, refresh-packagekit
Installed Packages
pdfjam.noarch 2.08-3.fc17 @fedora
[root@localhost lbd]$
After upgrade to iMac OS X 10.11.1, pdflatex link is missing, fix it by set in .profile as follows,
114-37-153-62:lbd Jonathan$ ls /usr/local/texlive/2012/bin/universal-darwin/pdflatex
/usr/local/texlive/2012/bin/universal-darwin/pdflatex
114-37-153-62:lbd Jonathan$ cat ~/.profile
export PATH=$PATH:...:/usr/local/texlive/2012/bin/universal-darwin
Exception occurred:
File "/Library/Python/2.7/site-packages/sphinx/ext/intersphinx.py", line 148,
in _strip_basic_auth
url_parts = parse.urlsplit(url)
AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'urlsplit'
The full traceback has been saved in /var/folders/rf/
8bgdgt9d6vgf5sn8h8_zycd00000gn/T/sphinx-err-HgctP4.log, if you want to report
Cpu0 example code is added chapter by chapter. It can be configured to a specific chapter by change CH definition in
Cpu0SetChapter.h. For example, the following definition configue it to chapter 2.
lbdex/Cpu0/Cpu0SetChapter.h
#define CH CH2
To make readers easily understanding the backend structure step by step, Cpu0 example code can be generated with
chapter by chapter through commands as follws,
Beside chapters example code, above html and pdf of Cpu0 documents also include files *.ll and *.s in
lbd/lbdex/output.
Since llvm have a new release version about every 6 months and every name of file, function, class, variable, ..., etc,
may be changed, the Cpu0 document maintains is an effort because it adds the code chapter by chapter. In order to
make the document as correct and easily maintain. I use the ”:start-after:” and ”:end-before:” of restructured text
format to keep the document update to date. For every new release, when the Cpu0 backend code is changed, the
document will reflect the changes in most of the contents of document.
In lbdex/Cpu0, the text begin from “//@” and “#ifdef CH > CHxx” are refered by document files *.rst.
In lbdex/src/modify/src, the *.rst refer the code by copy them directly. Most of references exist in llvmstructure.rst
and elf.rst.
The example C/C++ code in lbdex/input come from my thinking and refer the directory clang/test/CodeGen of clang
source code release.
The last chapter can verify Cpu0 backend’s generated code by Verilog simulator for those code without global variable
access. The chapter lld in web https://github.com/Jonathan2251/lbt.git will include llvm ELF linker implementation
and can verify those test items which include global variable access. Beside these, LLVM has its test cases (regression
test) for each backend to verify the code generation 8 . Cpu0 regression test items existed in lbdex.tar.gz example code.
Untar it to lbdex/, and:
For both iMac and Linux, copy lbdex/regression-test/Cpu0 to ~/llvm/test/src/test/CodeGen/Cpu0.
Then run as follows for single test case and the whole test cases on iMac.
Run as follows for single test case and the whole test cases on Linux.
Listing the chapters of this book and the related regression test items as follows,
8 http://llvm.org/docs/TestingGuide.html
inlineasmmemop.ll v 11
•
selgt.ll v 8
•
selle.ll v 8
•
selltk.ll v 8
•
selne.ll v 8
•
selnek.ll v 8
•
seteq.ll v 8
•
seteqz.ll v 8
•
setge.ll v 8
•
setgek.ll v 8
•
setle.ll v 8
•
setlt.ll v 8
•
setltk.ll v 8
•
setne.ll v 8
•
setuge.ll v 8
•
setugt.ll v 8
•
setult.ll v 8
•
setultk.ll v 8
•
These supported test cases are in lbdex/regression-test/Cpu0 which can be gotten from tar -xf lbdex.tar.gz
.
SIXTEEN
TODO LIST
Todo
Fix centering for figure captions.
593
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
SEVENTEEN
595
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
EIGHTEEN
ALTERNATE FORMATS
597
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
NINETEEN
PRESENTATION FILES
599
Tutorial: Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
TWENTY
• search
601