UM1237
UM1237
User manual
STxP70 compiler
Overview
The purpose of the STxP70 compilation driver (stxp70cc) is to manage the stages of the
compilation process: preprocessing, compiling into assembly language, assembling and
linking. The assembler file is compiled using stxp70-as and linked using stxp70-ld to
provide an STxP70 binary image. All these phases are hidden using the driver tool
stxp70cc.
This user manual provides detailed information to enable users to write efficient code
optimized to run on the STxP70 processors and to compile and link it ready for execution by
sxrun. The manual covers:
• stxp70cc driver options
• pragmas supported by stxp70cc
• compiler optimization techniques
• GNU C language extensions
• GNU asm construct
• built-in functions
The load/run tool sxrun and the STxP70 debugger sxgdb are described in the STxP70
Professional Toolset user manual (7833754).
Contents
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Documentation suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Conventions used in this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 stxp70cc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Invoking the compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Command-line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Getting help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Overall options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 stxp70cc core selection option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 stxp70cc compiler generic options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 C preprocessor options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.6 C dialect options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.7 Warning options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.8 Debugging options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.9 Profiling options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.10 Code coverage options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.11 Call trace instrumentation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.12 Optimization options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.13 Code generation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.14 -OPT options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.15 Inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.16 Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.17 Position independent code generation (PIC) . . . . . . . . . . . . . . . . . . . . . 38
2.2.18 Sending options to a specific phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1 Pragmas short description and syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Loop optimization pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 #pragma unroll (n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 #pragma ivdep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 #pragma loopdep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.4 #pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.5 #pragma looptrip (n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.6 #pragma hwloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.7 #pragma loopmin<itercount> (minc) . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.8 #pragma loopmax<itercount> (maxc) . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.9 Code generation pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.10 Heuristic pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Miscellaneous pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 #pragma ident “string” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 #pragma weak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.3 #pragma disable_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.4 #pragma force_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.5 #pragma disable_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . 53
3.3.6 #pragma force_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . . 54
4 Optimization guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 Single file inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.2 stxp70cc inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.3 Extern inline functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1.4 Inlining pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Loop unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.1 Default unrolling policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2 Advanced control of the unroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Precedence rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Preface
This document is part of the documentation suite detailed below. Comments on this or other
manuals in the documentation suite should be made by contacting your local
STMicroelectronics sales office or distributor.
Documentation suite
STxP70 compiler user manual (8027948)
This manual describes the C compiler for STMicroelectronics STxP70 cores.
Software notation
Syntax definitions are presented in a modified Backus-Naur Form (BNF).
• Terminal strings of the language, that is those not built up by rules of the language, are
printed in teletype font. For example, void.
• Non-terminal strings of the language, that is those built up by rules of the language, are
printed in italic teletype font. For example, name.
• If a non-terminal string of the language starts with a non-italicized part, it is equivalent
to the same non-terminal string without that non-italicized part. For example,
vspace-name.
• Each phrase definition is built up using a double colon and an equals sign to separate
the two sides (‘::=’).
• Alternatives are separated by vertical bars (‘|’).
• Optional sequences are enclosed in square brackets (‘[’ and ‘]’).
• Items which may be repeated appear in braces (‘{’ and ‘}’).
The purpose of the stxp70cc compilation driver is to translate a program written in the C
language into the STxP70 assembly language so that is suitable for assembly, linking, and
execution. The assembler file is compiled using stxp70-as and linked using stxp70-ld(a) to
provide an STxP70 binary image. All these phases are hidden using the driver tool
stxp70cc.
Note: The stxp70cc compilation driver and core compiler are common to both STxP70 versions 3
and 4. A specific command line and GUI option can be used to generate code for either
target. See Section 1.2.2: Compiling code for STxP70-3 or STxP70-4 on page 14.
The stxp70cc compiler uses the GNU C language parser, and implements state-of-the art
compiler optimizations. Thanks to this GNU C language parser, the stxp70cc compiler is
closely compatible with the GNU C compiler, both at the driver level, and on C language
extensions (GNU Compiler Collection project; see
http://www.gnu.org/software/gcc/gcc.html). The processor-independent compiler
optimizations available in the stxp70cc compiler are mostly inherited from the Open64
project hosted on SourceForge; see http://open64.sourceforge.net. Other compiler
optimizations that are specific to the STxP70 family of processors have been developed by
STMicroelectronics.
These include:
• use of hardware loop mechanisms of the STxP70 core (hardware loops and
JRGTUDEC instructions)
• use of the special addressing modes of the STxP70 core
• use of the memory space defined in the STxP70 ABI in order to increase memory
accesses efficiency
• aggressive instruction selection including mapping of the user boolean variables to the
branch registers
• instruction scheduling
• aggressive transformation of loops
• compiler intrinsics and built-ins support
• compiler to support X3, FPX and MPx extensions
The binary image can be executed on a STxP70 hardware target or by using the sxrun
simulator or the sxgdb debugger. The binary format used for the image is ELF and the
debug format is DWARF2.
Where applicable, the available options are accessible through a command-line interface
similar to the UNIX style. This will be familiar to most gcc and cc users. The toolset is
installed in a directory structure which also follows the UNIX structure, that is bin and lib.
Wherever possible, compatibility with the options of the former sxcc compiler has been
preserved.
The compiler supports the ANSI C89 standard and partially supports the ANSI C99
standard, see Section 2.4: C99 support on page 41.
a. For usage information see the GNU linker document “Using ld” that is supplied with the toolset.
.c source files
STxP70 C Compiler
STxP70 assembler
files (.s)
STxP70 assembler
(stxp70-as)
target board
boot and sysconf
files
STxP70 linker
(stxp70-ld)
This assumes that the user has sourced the appropriate shell file in the <tools-
dir>/bin folder. In most cases, the one needed is STxP70.csh. This ensures that all
needed configuration environment variables are properly set.
Command [1] causes the following steps to be executed:
<tools-dir>/bin/stxp70cc # stxp70cc driver
<tools-dir>/lib/cpp <cpp_flags> file1.c file1.i # C preprocessor
<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file1.i file1.s
# C compiler
<tools-dir>/bin/stxp70-as <stxp70-as_flags> file1.s file1.o # STxP70 Assembler
Command [3] causes the link stage to be executed. Please refer to the STxP70 linker user
manual for further details.
Once steps [1] to [3] are completed, an STxP70 executable binary a.elf is generated. This
can be executed using the stand-alone driver for the load/run tool (available as sxrun) in the
following way:
$[4] sxrun a.elf
This causes the a.elf STxP70 binary to be “interpreted” by the sxrun command. The
simulator also provides some minimal tracing, cycle counting and statistics facilities.
2 stxp70cc
The stxp70cc compiler is similar to any command-line compiler. It is either invoked from a
command line interpreter or from a Makefile and implicitly recognizes files by their
extension.
where:
<argument> = <option> | <input_file>
Examples:
stxp70cc -S file.c # produces file.s
stxp70cc -c file.c # produces file.o
Conflicting options are resolved by using the last option on the command line.
The final executable file does not need to have a specific file extension. If no output file
name is specified through the -o option, the executable generated is named a.out.
Examples:
stxp70cc file.c # generates the executable a.out
stxp70cc file.c -o file.u # generates the executable file.u
Assembly, object and binary files are generated for single issue, fixed
stxp70v3
length encoding STxP70-3
Assembly, object and binary files are generated for single/dual issue,
stxp70v4
variable length encoding STxP70-4
Note: The set of options that must/can be set is strongly dependent on the core selected. This is
especially true for the configuration and code generation options presented in the tables of
the next section. Namely, the STxP70-4 can be configured for single or dual issue, as well
as single or dual ALU. Each of those choices corresponds to specific compiler options.
config[=context:<n>|
regbank:<n>|
mult:<n>|
bypass:<n>|
bhb:<n>|
efuif:<n>|
mfuif:<n>|
extmemif:<n>| Defines the processor configuration. Further information on these
itcnodes:<n>| controls can be found in Code generation and configuration controls on
noevc| page 19. The assembler performs some consistency checking based
evcglobal:<n>| on this configuration option.
evclocal:<n>|
The last option (vliw) is only available on STxP70-4.
hwloop:<n>|
dcache:<n>| It is possible to combine several suboptions in a single -Mconfig
dmsize:<n>| option bundle. In this case, suboptions must be separated by a “,”. For
pcache:<n>| instance: -Mconfig=vliw:no,noevc
pmsize:<n>|
pixel:<n>|
pixelsize:<n>|
rompatch:<n>|
maxszmis:<n>|
minadmis:<n>|
vliw:<n>]
da[={<n>|all}] Places certain data items in the data area (GP-based on 32 Kbytes)
Places certain data items in the small data area (GP-based on 4 K-
sda[={<n>|all}]
objects)
Places certain data items in the low memory or tiny data area (32-Kbyte
tda[={<n>|all}]
size)
Deprecated, and replaced by extoption. Allows the compiler to
enablefractgen generate fractional instructions of the MPx. Refer to Chapter 8: MPx
native support on page 122.
The control for the options listed in Table 6 can be found in Code generation and
configuration controls and Environment controls on page 25.
By default, -Mconfig enables four contexts, two register banks, multiplier, no memory
bypass, no branch history buffer (BHB), 32-bit EFU interface, 32-bit MFU interface,
32-bit external memory interface, eight ITC nodes, EVC with 16 global and 16 local
events, two hardware loops for all contexts, 4 Mbytes data memory, no data cache,
4 Mbytes program memory, no pcache, no pixel support, no ROM patch support,
no misaligned memory access and single issue architecture.
-Mda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called
the medium data area (DA). It is possible to generate optimized (that is, shorter)
addresses for data in the medium data area. (GP-based addugp is used instead of
make and more.)
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size
constraint. -Mda is equivalent to -Mda=all.
Notice that -Mda options are ignored if IPA memory placement is enabled. Refer to
Section 4.8: Interprocedural analysis optimization (IPA) on page 76 for further details.
-Msda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called
the small data area (SDA). It is possible to generate optimized (that is, shorter)
addresses for data in the small data area. (GP-based addressing mode can be used,
thus constructing the address and performing the access itself in the same instruction.)
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size
constraint. -Msda is equivalent to -Msda=all.
In the case of a structure that contains fields of different types, the decision of where to
place the variable depends on the alignment of the largest data types, whereas the
choice of the section to be used depends on the size of the smallest field. This means
that a structure with both int and char fields is placed if option is either -Msda=all
or -Msda=4. If placement is achieved, then the structure is placed in SDA1.
Notice that -Msda options are ignored if IPA memory placement is enabled. Please
refer to Section 4.8: Interprocedural analysis optimization (IPA) on page 76 for further
details.
-Mextrcdir=directory_path
Specifies where to find a particular extension package, which may be a location outside
the user workspace. The -Mextrcdir option enables the user to switch between
different extensions, stored in different locations. Full directory paths are recommended
but are not mandatory.
The directory path specified to -Mextrcdir must include the sub-directory _STxP70-
Extension_ where the stxp70extrc file is located. (This is the directory/file
structure used by sximport when the extension is imported. sximport
creates/updates an extension configuration file called stxp70extrc and puts it in the
subdirectory _STxP70-Extension_. stxp70extrc indicates where different files
relating to the extension are located, for example header files, libraries).
For example:
stxp70cc -Mextension=MP1x -Mextrcdir=My_Extrcdir/_STxP70-Extension_
This command sets the directory path to find the extension package in
My_Extrcdir/_STxP70-Extension_.
The compiler checks that the location specified by -Mextrcdir contains the file
stxp70extrc.
If the -Mextrdir option is not specified, the {SX}/sxext/_STxP70-Extension_
directory is used by default.
The STxP70 Utilities manual (8210925) documents several utilities that interact with
the extension package, for example sximport, stxp70-elfdump, stxp70-
objcopy.
The STxP70 User-defined extension methodology guide (8175272), “How to integrate
an Extension in an application” chapter, gives further information about extension
libraries.
-Mfarcall
Specify that all calls are far. The compiler generates a calling sequence composed of a
make/more/calla sequence instead of callr.
-Mhwloop[=option]
Controls hardware loop code generation. The default, (-Mhwloop specified with no
suboptions), is equivalent to:
• -Mhwloop=all if core configuration includes hardware loops
• -Mhwloop=jrgtudeconly if core configuration does not include hardware loops
option can be any of the values listed in Table 8.
-Mnoextgen[=ext1,ext2,...]
Disables the code generation for specified extensions. This option has only effect when
MPx are used. It has no effect with fpx.
Environment controls
The environment controls are listed below.
-Mlib16 Instructs the compiler to link with a version of the C library that uses
16 registers of the core. This is the default behavior when using
16 registers contexts.
-Mlib32 Instructs the compiler to link with a version of the C library that uses
32 register set of the core.
-Mnostartup Instructs the linker not to use standard boot.o file at link time. It is
then the user’s responsibility to provide a boot object file at link time.
All the options in Table 12 give the positive form of the option. The negative form of each
option can be constructed by replacing the -W prefix with a -Wno prefix, for example -
Wnoformat disables the printing of warning messages associated with calls to the printf
and scanf family of library functions.
Note: The online help and “man” page of the stxp70cc driver lists the full set of possible warning
options.
-O0 No optimization.
-O1 Minimal optimization.
-Os Optimize for code size.
-O2 Global optimization, speed orientated.
-O3 Aggressive optimization, speed orientated
Aggressive optimization, speed orientated. Enables aggressive loop
-O4 unrolling when compiling code for
STxP70-4 in dual issue/dual ALU core configuration.
-falign-jumps Align the target address of jumps to the next power of two
-falign-jumps=n greater than n (if n is specified), skipping up to n bytes.
-falign-labels Align the labels to the next power of two greater than n (if
-falign-labels=n n is specified), skipping up to n bytes.
-falign-instructions Align the instructions to the next power of two greater than
-falign-instructions=n n (if n is specified), skipping up to n bytes.
Defines the preprocessor macro __FAST_MATH__ and
-ffast-math
invokes -f[no-]math-errno.
-fmath-errno causes the compiler to generate code to
set the mathematical error flag in floating point code. The
compiler also makes use of slower libm from Newlib
libm with errno setting. This is the default behavior
when the FPx floating point extension is not used.
-f[no-]math-errno -f[no-]math-errno causes the compiler not to
generate code to set the mathematical error flag in
floating point code. The compiler also makes use of fast
libm overrides, for example sqrtf from the FLIP library
with no errno setting. This is the default behavior when
the FPx floating point extension is used.
No re-associations, folding or simplifications. This is the
-mreassoc=0
default.
Accurate simplifications that are correct for finite
arithmetic are allowed, for instance, a/a -> 1.0,
-mreassoc=1 recip(recip(a)) ->a.
For example, the transformation a/a -> 1.0 is not valid
when a is 0.0 because in this case 0.0/0.0 -> NaN.
Aggressive re-association of expressions is performed to
favor the selection of fused multiply-add routines. Such
-mreassoc=2
changes in the evaluation order can lead to slightly
different results, compared to the original evaluation order.
Generate position independent code (data accesses
-fpic
only). Chapter 9: Relocatable loader library on page 136.
Build a relocatable library that can be loaded by RL_LIB.
--rlib
See Chapter 9: Relocatable loader library on page 136.
Build a main program suitable for loading relocatable
--rmain libraries. See Chapter 9: Relocatable loader library on
page 136.
Modify the aggressiveness of the default unrolling policy.
n is a value in the range [0, 6]. The higher it is, the more
-maggressive_unroll=n aggressive the unrolling. Refer to Section 4.2: Loop
unrolling on page 63 for details about this option and the
values of n.
p Preprocessor cpp
f Compiler front-end
a Assembler stxp70-as
l Linker stxp70-ld
o Binary optimizer tool binopt - not yet used by stxp70cc
There must be a comma between the option -W<phase> and the argument and no spaces.
Anything occurring after a space is treated as the next option to stxp70cc. Also the
argument is only passed to <phase> if <phase> is normally run from the specified
command.
For example:
stxp70cc -O3 -Wl,-strict_warn a.out
This command causes the linker to emit strict warnings regarding link files.
-Idirectory Add directory to the beginning of the search list for include files.
-nostdinc No predefined include search path.
Search the library named lib<library>.a when linking. The linker looks
for the library in the directories specified by the -L options and then in a
standard list of directories.
The position of this option on the command line makes a difference. The
linker processes object files and libraries in the order that they are specified
on the command line. For example, if the following is specified:
-l<library> stxp70cc file1.o file2.o -lmylib
then the files are processed in the order file1.o, file2.o, libmylib.a.
However, if the following is specified:
stxp70cc file1.o -lmylib file2.o
then the files are processed in the order file1.o, libmylib.a, file2.o.
In this case, file2.o should not refer to any symbols defined in
libmylib.a.
-L<directory> Add directory to the beginning of the search list for library files.
-nostdlib No predefined libraries search path.
The search path for the various phases of the compiler can be overridden by using the
option: -Y<phase>,<path> where <phase> can take the values listed in Table 16 and
<path> is the path of the required tool. There must be a comma and no spaces separating
-Y<phase> and <path>.
Compiler
__open64__ Defined technology
identification
Front end major
__GNUC__ 3 release -no-gcc
identification
Front end minor
__GNUC_MINOR__ 3 release -no-gcc
identification
Defined, value
Compiler
__stxp70cc__ depends on major
identification
compiler version
Defined, value
Compiler
__STXP70CC_MINOR__ depends on minor
identification
compiler version
Defined, value
depends on Compiler
__STXP70CC_PATCHLEVEL__
compiler patch identification
level
Defined, value
depends on Compiler
__STXP70CC_DATE__
compiler release identification
date
Defined, value is an Compiler
__STXP70CC_VERSION__
identification string identification
Endianness
__LITTLE_ENDIAN__ Defined by default
identification
Language
Defined for C currently
_LANGUAGE_C
source processed is C
language.
Language
currently
Defined for ASM
_LANGUAGE_ASSEMBLY processed is
source
assembly
language.
Defined when -
Compiler is in
__STRICT_ANSI__ std=c89 or - -std
strict ansi mode.
std=c99 or -ansi
Defined when -
Compiler is in
__STDC_VERSION__ std=c99 with -std
C99 ansi mode
value 199901L
Defined as soon as Optimization
__OPTIMIZE__ -O
optimization is on. mode detection.
Optimization size
__OPTIMIZE_SIZE__ Defined under -Os -Os
detection
Intrinsics inlining -OPT:inline_
__INLINE_INTRINSICS Defined
mode detection. intrinsics
-f[no-]hosted -
__STDC_HOSTED__ Defined by default. Hosting mode.
f[no-]freestanding
Libraries or user
code can take
advantage of this
Defined when -
definition to
__FAST_MATH__ ffast-math -ffast-math
define alternative
option is used.
sequences of
floating point
code.
Note: The C standard guarantees that the __cplusplus symbol is never defined when compiling
C source code.
3 Pragmas
This chapter provides details of the #pragma directives that are recognized by stxp70cc.
If the loop that this pragma immediately precedes is an outer loop that contains only an
inner loop, then the compiler attempts to unroll the outer loop and perform loop fusion on the
resulting inner loops. This transformation, known as “unroll-and-jam”, is especially useful to
create parallel execution opportunities when the innermost loop alone does not present
such opportunities. See Figure 3.
The following tips provide information on how to control the desired inner loop unrolling with
the pragma unroll value.
• A counted loop with a compile-time constant trip count is always fully unrolled if a
pragma unroll with a value greater or equal to the loop trip count is specified.
• When a counted loop is not fully unrolled, the pragma unroll value is rounded to the
greatest power of two lower than the specified unrolling value.
• The maximum size of a loop after unrolling is controlled by the command line option -
OPT:unroll_size=<n>.
For example:
#pragma ivdep
for (i = 0; i < n; i++) {
a[b[i]] = a[b[i]]+3; // These dependencies cannot be computed by
// the compiler
}
In this example, the compiler cannot tell that either the load or store of a[b[i]] in the
current loop iteration does not depend on the load or store of a[b[i]] in a following loop
iteration. This is in fact the case if b[i] != b[j] for all i != j. The compiler could
rewrite the loop as:
for (i = 0; i < n; i+=2) {
t1 = a[b[i+1]] + 3;
t0 = a[b[i]] + 3;
a[b[i+1]] = t1;
a[b[i]] = t0;
}
statically determine the trip count. When it cannot determine the trip count, the compiler
must also create residual code in case the unrolling factor is not a divisor of the loop trip
count.
However, it is possible for application writers to know the modular properties of some of the
loops in their own code. Bringing this accurate information to the compiler, the residual code
can be largely removed or better optimized.
Note: Bringing inexact information on the trip count may lead to inexact code. Be careful that the
property asserted is valid in all cases.
The following example shows the use of the #pragma loopmod.
void copychar(unsigned char* __restrict p, unsigned char * q,
unsigned int sz)
{
int i ;
assert(sz % 4 == 0) ;
#pragma loopmod(4,0)
for(i=0; i<sz; i++)
p[i] = q[i];
}
The function copychar duplicates a byte stream, whose size must be a multiple of 4.
During unrolling, and without the pragma, the compiler would create a residual loop. This is
totally removed when the pragma information is asserted. In this example, the pragma does
not provide the compiler with any information about the memory alignment of p or q, which
the compiler would need to generate word accesses after unrolling.
A second scenario is for ‘while’ loops where the user knows that the approximate
effective trip count is high:
#pragma looptrip(100)
while (*p++=*s++)
This example gives a better approximation of the weight of the loop. Generally the compiler
trip count estimate for a while loop is very low.
Possible error messages are:
• Warning : pragma ‘LOOPTRIP’ : inconsistent with computed value,
ignored
• Warning : pragma ‘LOOPTRIP’ : not followed by a loop, ignored
• Warning : malformed ‘#pragma looptrip (n)’
The hardware loop pragmas must be placed before the loop statement:
#pragma hwloop forcejrgtudec
for(i=0; i<n; i++) {
a[i] = ...;
}
The following scenario can occur when the user wants to keep memory writes in order to
take advantage of a combining write buffer:
#pragma loopseq WRITE
for(i=0; i<n; i++) {
a[i] = ...;
a[i+1] = ... ;
a[i+2] = ... ;
a[i+4] = ... ;
}
The pragma hints that the compiler should keep writes to the array in order. If the loop is
unrolled, generating a large number of stores, this improves locality and may take
advantage of combining write buffers. By default the compiler does not put restrictions on
the ordering of non-overlapping store operations.
A second scenario is when the user has scheduled prefetch and load operations by hand,
and wants to ensure that the compiler does not reorder them.
#pragma loopseq READ
for(i=0; i<n; i+=S) {
... = a[i] ;
__builtin_prefetch(&a[i+S]) ;
}
The pragma hints that the compiler should keep the load and prefetch in order. In this
example, the prefetch is not placed before it is effectively used in the next iteration by the
load.
Example:
if (debug) {
#pragma frequency_hint NEVER
trace();
}
4 Optimization guide
This chapter describes specific compiler options and techniques that can be used to gain
maximum performance in your application.
4.1 Inlining
Inline function expansion is performed for function calls that the compiler estimates to be
frequently executed. These estimations are based on a set of heuristics. The compiler might
decide to replace the instructions of the call with code for the function itself (inline the call).
The current version of the compiler only supports the single file inlining mode as described
in Section 4.1.1. The compiler supports both the single file inlining mode as described in
Section 5.2.1: Placement and layout on page 101 and cross file inlining through the IPA
optimization described in Section 4.8: Interprocedural analysis optimization (IPA) on
page 76.
Inlining criteria
Each candidate function is checked against inlining-exclusion cases which include:
• requires no-inlining by the user (-INLINE:never=fn, -INLINE:off command line
options)
• recursive function
• vararg function
• exception handler
After this preliminary test, each candidate function is inlined regardless of cost if it is marked
must-inline, or if the -INLINE:all option has been specified by the user.
Otherwise, cost evaluation is used to decide whether to inline or not, and the candidate
function is rejected if its estimated cost is above a given threshold set by the compiler. The -
INLINE:list=on option can be used to list what is inlined. Changing the compiler limits is
not recommended, since this can lead to longer compilation times or increased memory
usage or both, with no noticeable performance benefit.
Finally:
• the function to be inlined must be defined and visible in the same source file as the
function using it
• a static function that is inlined can be in specific circumstances considered “dead”, and
removed from the final object file(b)
b. Note that this dead code removal was not performed in earlier versions of the stxp70cc compiler (that is, the
compiler provided in toolset 3.1.0 and earlier). With those versions, inlining usually causes an increase in size,
because both the original (not inlined) instance is preserved in the final executable code, even if it is never
called.
In addition to these options, the option given in Table 22 may be of interest when building a
large body of inline functions (which is not recommended and may adversely affect
performance).
Inlining any further calls is suspended when the size increase becomes greater than the
threshold.
Pragmas
To force inlining or non-inlining of a function in the scope of a call site, the following two
pragmas are introduced:
• #pragma inline_next (foo,...) forces inlining of function foo in the next
statement
• #pragma noinline_next (foo,...) prevents inlining of function foo in the next
statement
The ... denotes that it is possible to provide several function names with the same
pragma. It is equivalent to several pragma lines.
Two similar pragmas are provided that can be used within the scope of a function:
• #pragma inline_function (foo,...) forces inlining of function foo every time
it is called until the end of the current function
• #pragma noinline_function (foo,...) prevents inlining of function foo every
time it is called until the end of the current function
The two call site scope pragmas take precedence over these two function scope pragmas.
Two lower priority pragma are provided, with file scope:
• #pragma inline_file (foo,...) to force inlining of function foo every time it is
called until the end of the current source file
• #pragma noinline_file (foo,...) to prevent inlining of function foo every
time it is called until the end of the current file
Finally, to revert inlining policy to the default one (that is, rely on the inliner’s evaluation of
callee weight), the following pragma is introduced:
#pragma defaultinline (foo,...)
Function naming
As a special case, if the user does not provide any function name, the corresponding
pragma applies to all functions called in the scope of the pragma. In this case, parentheses
around the function names are optional.
User diagnostics
Several warning messages are provided to the user to help track errors.
If two conflicting pragmas are provided only the later is taken into account. For instance,
#pragma inline_next (foo)
#pragma noinline_next (foo)
foo();
This generates the following warning:
warning: #pragma noinline_next (foo) overrides previous #pragma
inline_next (foo)
If pragmas are provided at an invalid scope (that is outside of a function), the following
message is displayed:
warning: #pragma noinline_function (foo) ignored (incorrect scope)
To help track misspelling, a warning is also displayed if a pragma could not be applied to
any function call.
#pragma noinline_next (bar)
foo(i);
This generates the following warning:
warning: #pragma noinline_next (bar) matched no call
Precedence
Command-line options -INLINE:must=foo and -INLINE:never=foo take precedence
over both pragmas and attributes.
Attributes take precedence over pragmas. That is, a function declared with
__attribute__((noinline)) is never inlined, regardless of pragma inline_xxx
statements. However, the user can override this behavior with the -INLINE:must=foo
command-line option.
If several contradictory pragmas with the same scope apply to the same function, the last
one overrides the earlier ones.
Examples
Example one (Figure 5) illustrates the use of the #pragma noinline_next directive. All
calls to f1() are candidates for inlining, except the one directly following #pragma
noinline_next.
Example two (Figure 6) illustrates the use of the #pragma inline_function directive.
All calls to f1() following the #pragma inline_function (f1) directive are forced to
be inlined, except the one directly following #pragma noinline_next (f1). The call to
f2() following the #pragma inline_next (f2) is also forced to be inlined, while the
first call to f2() is only a candidate for inlining (inlining depends on the respective weights
of f2() and its caller).
Example three (Figure 7) illustrates the use of the #pragma defaultinline directive.
Example four (Figure 8) illustrates the use of several function names or an empty name list
with #pragma directives.
Example five (Figure 9) illustrates the use of the noinline attribute and shows how the
attribute has precedence over #pragma.
-O2 All 2 32
-O3 All 2 64
STxP70-3
-O4 2 64
STxP70-4-single issue
STxP70-4-dual issue,
-O4 4 64
16 GPRs
STxP70-4-dual issue,
-O4 4 128
32 GPRs
Note: 1 Depending on the internal analysis, the compiler is free to apply an actual unrolling factor
which is smaller than the maximum specified for the optimization level and core. This is
especially the case if a smaller unrolling factor enables the compiler to avoid the generation
of a remainder loop.
2 The #pragma unroll directive takes precedence over the default behavior of the loop
unroller.
0 No effect No effect
1 2 64
2 2 128
3 4 64
4 4 128
5 8 64
6 8 128
Although the compiler is able to compute precise memory dependences in many cases, this
is not possible when complex memory accesses are involved, such as in the following
example:
for (i = 1; i < n; i ++) {
a[i-1] = a[i] + b[i];
}
for (i = 1; i < n; i ++) {
c[d[i]] = c[i] + 1;
}
On the first loop, the compiler can fully determine the dependences between memory
accesses, provided that it knows that a and b point to distinct memory locations (see the C
language restrict qualifier). On the second loop, however, without information on values
in d, the compiler assumes that all memory accesses in the loop are dependent. In
particular, the sequence of load and store memory accesses in the iterations of the loop
must be strictly respected, resulting in a poor instruction schedule if the loop is unrolled or
software pipelined.
A useful property for loop optimizations is when a loop is vectorizable. This property can be
enforced on a loop by using the #pragma loopdep VECTOR. A vectorizable loop is such
that it can be decomposed into a sequence of loops, one per statement of the original loop,
without changing the program results. Moreover, for each loop resulting from that
decomposition (that contains only one statement), all load memory accesses can be
performed before all store memory accesses, which means that a vector version of the loop
can be written. In practice, unless the target processor is a real vector processor, the
compiler does not decompose vectorizable loops as described. Rather, it uses the
diverges when the default aliasing option is used. This is often caused by a violation of
aliasing rules, which are part of the ISO C/C++ standard. These rules say that a program is
invalid if you try to access a variable through a pointer of an incompatible type.
The example shown in Figure 10 demonstrates this violation, where a float is accessed
through a pointer to integer.
The aliasing rules were designed to allow compilers to perform more aggressive
optimization. Basically, a compiler can assume that all changes to variables happen through
pointers or references to variables of a type compatible with the accessed variable. De-
referencing a pointer that violates the aliasing rules results in undefined behavior.
In the case above, the compiler may assume that no access through an integer pointer can
change the float a. Therefore, the actual value of a may be unaffected by the writing through
pa. What really happens is up to the compiler and may change with architecture and
optimization level.
To disable optimizations based on alias-analysis for ‘faulty legacy code’, the option -fno-
strict-aliasing must be used as a work-around.
Note: Because the practice of reading from a different union member other than the one most
recently written to (called “type-punning”) is common, even with -fstrict-aliasing,
type-punning is allowed, provided the memory is accessed through the union type.
To fix the code in Figure 10 above, you can use a union instead of a cast, as shown in
Figure 11.
Note: This is a GCC extension which might not work with other compilers.
Finally, to fully respect the ANSI C/C++ aliasing rules, it is necessary to write the data
through a character type before reading it again. See Figure 12. The drawback of this
standard conforming solution is that it has to account for endianness, and that it is less
efficient than simply writing through an integer.
In this case, the program always prints “ANSI ALIASING BEHAVIOR” regardless of the
compiler and its optimization options.
4.5 Profiling
Before optimizing any application, we recommend that you analyze the critical areas of your
code to identify where optimization will have the most effect.
Profiling creates an instrumented program from your source code. Whenever this
instrumented code is executed, the program generates an information file that can be
displayed using the stxp70-gprof utility, supplied with the toolset.
This section is not a complete guide to profiling, but a brief refresher on how to proceed with
the compiler.
This small code sample below illustrates how to use this macro to avoid conflict between
profiling and user instrumentation involving cycle counters:
#ifndef __LIBGPROF_CYCLE_PROFILING
clrcc();
startcc();
#endif
To overcome this problem, edit the link script file associated to your application and increase
the padding of .heap section. By default, the .heap section contribution line is:
.heap ALIGN(16) PAD(64K) NOINIT : { } > EXTSM
This means that the.heap section base is aligned on a multiple of 16 boundary address, is
64 Kbytes in size and not zero-initialized at startup. Moreover, this section is located in
EXTSM memory region. To increase the padding of this contribution, you should change the
64K by something bigger depending on the XXX amount required, as shown in the error
message above.
Please note that if you do not specify a link script on your link command-line, the
sx_valid.ld file used by default is the one located in the folder:
<Toolset_Root>/arch_v3/stxp70cc/<stxp70cc_version>/lib/ldscript
Copy this file into your application project, modify its content according to statements above
and add it to your link command.
-fbranch-probabilities Re-compile a program that has already been compiled with the -
fprofile-arcs option. The -fbranch-probilities option
instructs the compiler to optimize using estimated branch
probabilities generated by -fprofile-arcs.
-fcoverage-counter64 Instruct the compiler to use a 64-bit edge counter instead of the
default 32-bit counter. Each counter is saved as 64 bits and so the
output can still be used with any gcov utility. Use this option if you
think a statement is executed more than 232 times.
-fprofile-arcs Instrument the "arcs" of the program flow during compilation. For
each function of your program, stxp70cc creates a program flow
graph, then finds a spanning tree for the graph. Only arcs that are not
on the spanning tree have to be instrumented; the compiler adds
code to count the number of times that these arcs are executed.
-fprofile-arcs also makes it possible to estimate branch
probabilities, and to calculate basic block execution counts. In
general, basic block execution counts alone do not give enough
information to estimate all branch probabilities.
When the program exits, -fprofile-arcs saves a list of arcs in
the program flow graph to a file called sourcename.gcda. gcov
can reconstruct the program flow graph and compute all basic block
and arc execution counts from the information in this file.
Use the compiler option -fbranch-probabilities when
recompiling to apply further optimizations.
-ftest-coverage Create a data file for the GNU gcov code coverage utility. The name
of the data file begins with the name of your source file:
sourcename.gcno. It contains a mapping from basic blocks to line
numbers, which gcov uses to associate basic block execution counts
with line numbers.
When recompiling, you must use the same code generation and optimization options for
both compilations. The only difference allowed is to replace -fprofile-arcs with -
fbranch-probabilities.
When running Interprocedural analysis, all the sources are merged into a unique file (or
several files for large programs). Therefore, the compiler is unable to know which procedure
belongs to which .c or .cxx file. The correspondence between a .c or .cxx and a .gcno
or .gcda file is no longer possible. The name of .gcda and .gcno files is the name of the
final executable, plus “_”, plus the number of the .s file that IPA has created. Since all the
original .c or.cxx filenames are saved in the .gcno file, gcov is able to associate each
procedure with a source file.
Note: You will need a copy of gcov with a version number higher than or equal to 3.4.4.
The following profiling function is called with the address of the caller function and the
address of the callee function:
void __profile_cal(void *caller_fn,
void *callee_fn,
const char *caller_name,
const char *callee_name,
int event);
The arguments to this function are as follows:
caller_fn This is the address of the start of the current function (the caller
function), which can be looked up specifically in the symbol table.
callee_fn This is the address of the start of the called function (the callee
function), which can be looked up specifically in the symbol table.
caller_name This is the name of the caller function.
callee_name This is the name of the callee function, or NULL if the call is an
indirect call.
The function names passed in the third and fourth arguments are
pointers to static strings that have the lifetime of the instrumented
executable or shared object.
The function names are the mangled names in C++.
event This is 0 when this function is invoked just before a call,
instrumenting a function entry. It is 1 when this function is invoked
just after a call, instrumenting a function exit.
Function calls that are inlined by the compiler are not instrumented.
To force instrumentation of all functions use the -fno-inline option to disable inlining.
A function may be given the attribute no_instrument_function, in which case this
instrumentation is not done if the caller or the callee function has the attribute
no_instrument_function.
The program must be linked with an object file that implements the function above to link
correctly.
The main differences with the -finstrument-functions option are listed below.
• This instrumentation tracks (caller, callee) address pairs instead of (call_site, callee)
address pairs. If the call site information is required use the -finstrument-
functions option.
• This instumentation provides the caller and callee name when available, which avoids
a specific post processing pass to retrieve the function names.
• This instrumentation is at the call site and not in the callee, therefore for instance calls
to top level library functions (which are not instrumented) are seen while the option -
finstrument-functions does not see them. To disable the instrumentation of the
call to a particular library routine you must declare it with the
no_instrument_function attribute.
• This instrumentation is not standard GCC functionality.
Because the default compiler behavior is 64-bit floating-point, the constant is considered 64-
bit, and the whole calculation is promoted to 64-bit. As a consequence, the multiplication is
performed due to the 64-bit runtime. The FPx cannot be used although this was specified in
the command line.
In the example in Figure 13, notice the definition of two possible configurations "c1" and
"c2".
• If configuration "c1" is applied, then all files are compiled with the -Os option, except
file "f1", which is compiled with the option -O3. Furthermore, function "foo" in file "f1"
is compiled with the option -O2, and if conversion is disabled.
• If configuration "c2" is applied, then all files are compiled with option -O3, without any
exception.
• By default, configuration "c1" is applied as the active configuration. The configuration
"c2" can be activated by a dedicated compiler option (see Section 4.10.4: Using the
ACF on page 85).
In this case, all files whose name is prefixed by an "f" are compiled with the option -Os.
Functions "foo1", "foo2", "foo3" are compiled with the option -O3.
Summary
There are three ways to handle an ACF, demonstrated by the following examples:
• stxp70cc -macf-decl acf_filename.acf
Reads acf_filename.acf as an ACF, using the default configuration declared in the
file as the active configuration.
• stxp70cc -macf-decl acf_filename.acf -macf-active c1
Reads acf_filename.acf as an ACF file, and uses the command line option to
define the active configuration as c1. Configuration "c1" must be defined in the ACF
acf_filename.acf.
• stxp70cc -macf-template source_file1.c source_file2.c
source_file3.c source_main.c
Generates the ACF template for the application implemented by the source files
specified. The source files must be linkable, and the compilation include a link stage to
ensure that template is complete. For example:
stxp70cc -macf-template source_file1.o source_file2.o
source_file3.o source_main.o
3. Finally the following command is used to close the template and link it:
stxp70cc -macf-template foo1.o foo2.o
– this last command only invokes the link stage. The file template.acf is closed,
with “c1” declared as the default configuration
Steps 1. to 2. above generate the same file template.acf as the equivalent unique
command:
stxp70cc -macf-template foo1.c foo2.c
Makefiles
Compilation through makefile performs independent calls to the compiler to generate object
files before linking. In this context, the generation of an ACF template requires an
incremental behavior. The mechanism of the template generation tests if the template file
template.acf exists in the compilation directory. If it exists, it opens it in append mode.
Otherwise, it creates it. At the linker or archive creation stage, the following actions are
performed:
• the template file is closed from a syntactical point of view (close of last '}', and the
active configuration lines are written)
• buffer and file are closed from a file system point of view
If the compilation does not end with a linker or archive creation stage (only use of the -S or
-c option), then the buffer is flushed, the file is closed, but the file is not closed from a
syntactical point of view. Since it does not end with the expected pattern, the corresponding
template is not usable.
global and file levels, but not at function level. This is linked to technical reasons in
relation with ABI handling (register saving at entry and exit of functions), which must be
consistent over the whole application.
Inliner
The inliner operates on a full compilation unit and then takes into consideration the
optimization level specified at global or file level, but not at function level. As a result, when
using ACFs, we can get different assembly code for a given function. Depending on the
scenario used, the function can apparently be compiled twice at the same optimization level.
For instance, consider the file f1.c:
int foo1() { return 1; }
int foo2() { return 2; }
int foo3() { return foo1() + foo2(); }
1. First scenario
With this first scenario, the file is compiled by the following command line, based on a
global -Os option:
stxp70cc -Os -c f1.c
Here foo3() is compiled using -Os.
Assembly code for foo3() contains calls to foo1() and foo2(), which are not
inlined because of -Os.
2. Second scenario
An ACF acf1.acf is defined with the following directives:
file "f1" {
function "foo3" { -Os }
}
Code is compiled using this ACF:
stxp70cc -O3 -c -macf-decl acf1.acf f1.c
Here foo3() is compiled using -Os.
Assembly code for foo3() does not contain calls to foo1() and foo2(), which are
inlined because of -O3, which is visible to the inliner.
3. Third scenario
Code is compiled with option -O3:
stxp70cc -O3 -c f1.c
Here foo3() is compiled using -O3.
Assembly code for foo3() does not contain calls to foo1() and foo2().
4. Fourth scenario
An ACF acf1.acf is defined with the following directives:
file "f1" {
function "foo3" { -O3 }
}
Code is compiled using this ACF:
stxp70cc -Os -c -macf-decl acf1.acf f1.c
Here foo3() is compiled using -O3.
Assembly code for foo3() contains calls to foo1() and foo2(), which are not
inlined because -Os is visible to the inliner.
Intuitively, the user might expect to have the same code for scenario 1 and 2, as well as for
scenario 3 and 4, but this will not be the case because of the implementation of inlining.
GNU cc provides a large set of extensions that are widely used in the GNU Linux
community. These extensions can be used to:
• describe embedded features, for example, data section placement
• provide guidance to the compiler for optimization, for example, the noreturn function
• provide language extensions, for example, conditional lvalue or C99 features
The GNU extensions are sometimes the only way to access ELF features that are not
directly available in the C language; for example, to declare a symbol as weak.
Note: GNU C supports two types of “variable number of arguments” syntax. The ISO C99 format,
which uses __VA_ARGS__ and the GNU format that uses ##args. The ISO C99 format
does not support the case where the number of parameters passed as part of the ellipsis is
zero. GNU C reuses the ## trick to absorb the comma in this case. See Figure 19.
GNU C allows initialization of objects with static storage duration by compound literals,
whereas ISO C99 does not.
With GNU C the = character can be omitted after the [index] indication.
Warning: The STxP70 ABI states that the stack is aligned to a 64 bit
boundary. However, for wider extension data types, it is
necessary to increase this value. A dedicated attribute
aligned_stack is defined for this purpose.
5.2 Attributes
Attributes are generally a much better design than a #pragma directive for several reasons.
Firstly, an attribute specification is a piece of C language that can be generated by use of a
cpp macro definition, whereas a #pragma directive generation is generally not supported
by non-GNU C preprocessors. Secondly, it avoids the scoping issues of the #pragma
directive.
Several attributes can be applied to the same object by using a comma to separate them.
For example, to declare a symbol that is both weak and aliased:
void useful (void) __attribute__ ((weak, alias("useful_func")));
memory
The STxP70 processor provides several special memory spaces that allow less costly
accesses.
• Tiny Data Area (TDA)
Data in the TDA is accessed using a single instruction of the form
baseaddress+offset, where offset is expressed in elements. The TDA is based
at address 0 (which is byte 4 as accessing address 0 is not possible in C). Due to the
way it is accessed, only 32 Kbytes can be placed in the TDA.
• Small Data Area (SDA)
Data in the SDA is accessed using a single instruction of the form
baseaddress+offset, where offset is expressed in elements. An element can be
a byte, 16-bit word, or 32-bit word depending on the type of the data object. An
aggregate of 4,096 elements can be placed in SDA. This can be a mixture of scalars,
arrays, and structures of various sizes and with element sizes of byte, 16-bit word, or
32-bit word, but the aggregate number of elements over all entries can not exceed
4,096.
• Data Area (DA)
The addresses of data in the DA are build using a single instruction of the form
addugp Ri, offset, where offset is expressed in bytes. An aggregate of 32,768
bytes can be placed in the DA. This can be a mixture of scalars, arrays, and structures
of various sizes and with element sizes of byte, 16-bit word or 32-bit word.
Three attributes are defined to instruct the compiler to place a variable in these spaces:
int __attribute__ ((memory ("tda"))) x; // x is placed in TDA
int __attribute__ ((memory ("sda"))) y; // y is placed in SDA
int __attribute__ ((memory ("da"))) z; // z is placed in DA
aligned
When applied to a variable or a structure field, specifies a minimum alignment for a variable
or structure field, measured in bytes. The aligned attribute can only increase the
alignment; it can be decreased by specifying packed as well.
int x __attribute__ ((aligned (16))) = 0;
struct _s { int x[2] __attribute ((aligned (8))); };
short array [3] __attribute ((aligned));
When applied to a type:
typedef int more_aligned_int __attribute__ ((aligned(8)));
aligned_stack
When applied to a function, this attribute specifies that the head of the stack must be aligned
to a given boundary. The value provided as an argument corresponds to the number of
bytes to which the stack must be aligned. The argument must be a power of 2, strictly
greater than 8 and lower than or equal to 256.
For instance the attribute below specifies that the stack of function fct() must be aligned
to a 128-bit boundary:
void fct() __attribute__ ((aligned_stack(16)));
void fct()
{
...
}
weak
When applied to a function, causes the function to be emitted as a weak symbol. Set to 0 if
the symbol is not defined at link time. This is primarily of use in defining library functions that
can be overridden in user code:
void d_stub (void) __attribute__ ((weak));
if (d_stub) {
d_stub();
}
When applied to data, causes the declaration to be emitted as a weak symbol rather than a
global symbol. This is primarily of use in defining variables that can be overridden in user
code:
int debug __attribute__ ((weak)) = 0;
alias
Applies only to functions: The required functionality is to provide an alias name for a given
function. It is often used in conjunction with the weak requirement to define an alternate
weak name for a given function.
void useful_func (void) {
/* ... Do something ... */
}
void useful (void) __attribute__ ((alias("useful_func")));
packed
Applies only to data: Specifies that a variable or structure should have the smallest possible
alignment - one byte for a variable, and one bit for a field, unless a larger value with the
aligned attribute is specified.
The specified data alignment is applied during data layout, and the code generator emits
safe sequence of instructions to avoid causing a misalign trap.
struct foo { char a; int x __attribute__ ((packed)); };
used
The GCC manual specifies that the used attribute may only apply to functions. For
stxp70cc it may also apply to variables.
• The used attribute, attached to a function, means that the code must be emitted for this
function, even if this function appears never to be referenced.
• This attribute, attached to a variable, means that the definition must be emitted for the
variable even if it appears that the variable is not referenced.
The used attribute follows the same syntax as any GCC attribute.
For a procedure:
static int Foo() __attribute__ ((used)) ;
For uninitialized data:
static foo __attribute__((used)) ;
For initialized data:
static foo __attribute__((used)) = 2 ;
Note: The assembly has been specifically extended to support this attribute:
.type Foo, @function, used
.type foo, @object, used
A motivation for using this attribute is to avoid the deletion of an unreferenced symbol by the
dead code, dead data or IPA optimization. This can be useful for debugging purposes (for
instance, a function dumping a specific data structure that is only called interactively from
debugging sessions is removed if not marked as ‘used’, since the compiler does not find
any reference to it).
5.2.2 Optimization
This section only applies to functions.
noreturn
Enables a function to be declared that cannot return, such as abort or exit. It is a useful
indication to optimizers.
void byebye () __attribute__ ((noreturn));
malloc
Used to tell the compiler that a function returns a pointer that cannot alias anything. It is a
useful indication to optimizers.
void * get_block (int) __attribute__ ((malloc));
Note: Hidden symbols cannot be referenced directly by other modules but they can be referenced
indirectly by function pointers. By indicating that a symbol cannot be called from outside the
module, the compiler may for instance omit the load of a PIC register since it is known that
the calling function has already defined the correct value.
format_arg
The format_arg attribute specifies that a function takes a format string for a printf,
scanf, strftime or strfmon style function and modifies it, so that the result can be
passed to a printf, scanf, strftime or strfmon style function.
extern char * my_dgettextprintf (void *my_domaint,
const char *my_format) __attribute__ ((format_arg(2)));
mode
This attribute specifies the data type for the declaration whichever type corresponds to the
mode. Refer to the GNU Compiler Collection Internals document for the definitions of
modes, http://gcc.gnu.org/onlinedocs/gccint.
Use the keywords __byte__, __word__ and __pointer__ to indicate the mode
corresponding to these quantities.
unsigned int qi __attribute__ ((mode (QI)));
unsigned int w __attribute__ ((mode (__word__)));
5.2.5 Built-ins
A built-in is used in the same way a function call, but is expanded by the compiler very early
in the intermediate representation, instead of doing a function call. On STxP70, most
machine and extension instructions can also be addressed using built-ins. Please refer to
Chapter 7: Built-in functions on page 115 for further information.
__builtin_constant_p
This built-in tests if a value is a constant at compile time.
int x;
#define C 1
int main () {
if (__builtin_constant_p (C) == 1)
printf ("c is proved to be a constant\n");
if (__builtin_constant_p (x) == 0)
printf ("x is a not proved to be a constant\n");
return 0;
}
__builtin_return_address
__builtin_return_address gets the return address of the currently executing function.
void bar () {
printf ("RA = 0x%08x\n", (int)__builtin_return_address (0));
}
__builtin_expect
long __builtin_expect (long exp, long c)
__builtin_expect provides the compiler with branch prediction information.
The return value is the value of exp, which should be an integral expression. The value of c
must be a compile-time constant. The semantics of the built-in are that it is expected that
exp == c.
For example:
if (__builtin_expect (exp, 0))
foo ();
indicates that a call to foo() is not expected as exp should be 0.
__builtin_classify_type
__builtin_classify_type(object) ignores the value of the object and considers
only its data type. It returns an enum describing what kind of type object is. See Figure 24.
6 GNU ASM
The stxp70cc compiler accepts “extended inline assembly” asm, as part of C programs.
This chapter only summarizes the main features of the asm implementation and describes
its limitations. It is not a substitute for the GNU documentation.
6.1 Syntax
General syntax
asm(template : output operands : input operands : clobber list);
or
__asm__(template : output operands : input operands : clobber list);
Where:
• template is the assembler instruction, defined as a string constant
• output operands is a list of comma separated output operands
• input operands is a list of comma separated input operands
• clobber list is a list of comma separated clobbered operands
The template section contains plain assembler, and uses ordinary STxP70 assembler
syntax, with the notable exception of the %i (i is a positive integer) notation that refers to
the ith output or input operand.
Note: Multiple consecutive strings are automatically concatenated to enable a readable and
correct template input. Multiple assembler instructions can be put together in a single asm
template, separated by explicit newline characters ‘\n’.
If there are no output operands but there are input operands, two consecutive colons must
be used in place of the output operands.
In the output and input list:
• each operand is described by an “operand constraint string” followed by a C expression
in parentheses
• the available constraints are the following:
– r general purpose register operand
– b boolean register operand
– i immediate integer operand, including symbolic constants only known at
assembly time
– n immediate integer operand, known at compile time
– g guard register
– fpx_FX FPx register (STxP70-4 only)
– the type attached to a scalar or SIMD audio extension (for instance, MP2x_VP or
MP2x_VX)
c. If the configuration only includes 1 bank (16 registers), then the range is only [0,15]
For example, when considering the register file V of the MP2x extension, with a two level
hierarchy, registers are referenced as "MP2x_V0_P0", "MP2x_V0_P1",
"MP2x_V1_P0", "MP2x_V1_P1" and so forth.
Note: Registers are always specified at the smallest hierarchy level. Therefore, to disable the full
V0 register, both subparts "V0_P0" and "V0_P1" must be specified in the clobber list.
6.2 Assumptions
The following assumptions apply.
• Output operand expressions must be lvalues.
• The compiler assumes that the input is consumed before the outputs are produced,
unless an output operand has the ‘&’ constraint modifier (also called “early clobber”).
The compiler does not assign the same register to an input operand and an early
clobber operand. However, the compiler may assign the same register to an input
operand and to a non-early clobber output operand.
6.3 Volatile
The volatile syntax is either:
asm volatile (template : output operands : input operands : clobber list);
or:
__asm__ volatile (template : output operands : input operands : clobber list);
The volatile keyword indicates that an instruction has side effects. A volatile
statement is not deleted if it is reachable. The order of volatile asm statements and, or
other volatile accesses is preserved. A consecutive sequence of volatile asm
statements may not stay perfectly consecutive, since some other instructions may be
scheduled in between. To achieve the effect of keeping instructions perfectly consecutive,
use a single asm instruction.
An asm statement without any operand or clobbers will be treated identically to a volatile
asm statement, the same as for an asm statement without an output operand.
6.4 Restrictions
The following restrictions apply.
• The compiler does not parse the assembler instruction template; this means that it
does not check if it is valid assembler input.
• Up to 10 operands, results and clobbered registers are allowed.
• Multiple alternative constraints are not supported.
• At -O3 and -O4 optimization levels, the loop nest optimizer is disabled for loops
containing asm statements.
6.7 Example
The code example in Figure 25 illustrates a typical use of asm statement on STxP70 core.
The example in Figure 25 delivers the assembly code given in Figure 26.
7 Built-in functions
The stxp70cc compiler recognizes a number of built-ins. These are used to generate
assembly language statements that cannot otherwise be expressed through standard ANSI
C/C++.
The built-ins are specified and called just like standard ANSI C/C++ functions and
procedures, using standard types. However, they are treated in a special way by the
compiler. The built-ins apply to the STxP70 core instructions, X3 instructions, floating point
FPx extension instructions, as well as scalar and SIMD audio extension (MPx) instructions.
On the core, FPx and MPx extension, built-ins may be needed to make use of instructions
that the compiler cannot capture automatically, or to work around a missing optimization.
For technical reasons the set of core/X3 built-ins does not currently cover the full set of
instructions. For instance, the load/store instructions are not available as built-ins. This also
includes specific load/store instructions such as the lsetub instruction. Instructions that do
not exist as built-ins can still be mapped by using the GNU assembly statements, see
Chapter 6: GNU ASM on page 109.
Example:
The core instruction addbp exists with a second operand that is either a register or a literal.
The corresponding built-ins are named as follows:
• int __builtin_sx_addbp_r(int, unsigned short) for register operand
• int __builtin_sx_addbp_i8(int, unsigned short) for u8 operand
The C-models have similar names:
• int __cmodel_sx_addbp_r(int, unsigned short) for register operand
• int __cmodel_sx_addbp_i8(int, unsigned short) for u8 operand
Finally, the unified macros for these built-ins and C-models are:
• sx_addbp_r when used for a register operand
• sx_addbp_i8 when used for an u8 operand
Note: The presence of the two leading underscores on each name denotes (according to the
ISO/IEC 9899 C Standard) that no such name should be defined by the user. More
specifically:
“All identifiers that begin with an underscore and either an upper case letter or another
underscore are always reserved for any use.”
Data types
Scalar and SIMD audio extensions include two register banks at most. Each bank may have
up to three consecutive “levels”, numbered from 0 to 2:
• level 0 corresponds to the full width of the register bank
• level 1 corresponds to the two halves of the register
• level 2 corresponds to the four quarters of the register
Furthermore, the register width is 2n bits, ranging from 8 bits to 512 bits inclusive.
The names of the data types that can be allocated to such banks take this structure into
account. They are built using the following template:
<extension>_<registerfile_name><register_level>
Where:
• <extension> is the alias of the SIMD extension
• <registerfile_name> is the name of the SIMD extension register file
• <register_level> is a letter denoting the type that can be allocated to this level:
– X stands for the full register width at level 0
– P stands for the sub-parts at level 1 (two halves)
– Q stands for the sub-parts at level 2 (four quarters). It is not instantiated on the
current MPx
Special macros
The MP1x and MP2x extensions are all provided with a set of dedicated memory access
and register move instructions. The latter can be invoked using dedicated macros that allow
easy accesses to the register bank of the extension.
Example:
In the lines below, __part__ denotes the subpart of the wider register that can be
represented by either a literal or a variable. _word_i_ denotes a 32-bit word to be assigned
to the subpart i of the corresponding register.
• Make macro builds a constant in extension register:
– MP2x_make_VX(_VX_, _word_3_, _word_2_, _word_1_, _word_0_);
– MP2x_make_VP(_VP_, _word_1_, _word_0_);
– MP2x_make_VQ -> not instantiated
• Compose macro composes register subparts into a wider one:
– MP2x_compose_2xVP(_VX_, _VP_1_, _VP_0_);
– MP2x_compose_4xVQ -> not instantiated
– MP2x_compose_2xVQ -> not instantiated
• Split macro decomposes a register subpart into narrower ones:
– MP2x_split_2xVP(_VX_, _VP_1_, _VP_0_);
– MP2x_split_4xVQ -> not instantiated
– MP2x_split_2xVQ -> not instantiated
Specialized macros
Specialized versions of the insertion and extraction macros are provided to handle cases
where the subpart of the wider register can be hard coded in the built-in name itself.
In the lines below, the macros do not accept an explicit __part__ parameter. The syntax of
the name implicitly corresponds to a given subpart (for instance
MP2x_insert_VP_into_VX0 takes the complete 64-bit register _VP_ and inserts it in the
lowest half of the 128-bit register _VX_).
• Insert macro inserts a register subpart into a wider one:
– MP2x_insert_VP_into_VX0(_VP_, _VX_);
– MP2x_insert_VP_into_VX1(_VP_, _VX_);
– MP2x_insert_VQ_into_VX0-> not instantiated
– MP2x_insert_VQ_into_VX1-> not instantiated
– MP2x_insert_VQ_into_VX2-> not instantiated
– MP2x_insert_VQ_into_VX3-> not instantiated
– MP2x_insert_VQ_into_VP0-> not instantiated
– MP2x_insert_VQ_into_VP1-> not instantiated
• Extract macro extracts a register subpart from a wider one:
– MP2x_extract_VP_from_VX0(_VP_, _VX_);
– MP2x_extract_VP_from_VX1(_VP_, _VX_);
– MP2x_extract_VQ_from_VX0-> not instantiated
– MP2x_extract_VQ_from_VX1-> not instantiated
– MP2x_extract_VQ_from_VX2-> not instantiated
– MP2x_extract_VQ_from_VX3-> not instantiated
– MP2x_extract_VQ_from_VP0-> not instantiated
– MP2x_extract_VQ_from_VP1-> not instantiated
This code places three 64-bit variables, a, b and c, in the MPx_Vx register set. It uses the
MPx addition instruction to add a and b, storing the result in c. Since it uses built-ins and
specific data types, this code is neither generic nor portable to another processor.
The compiler also maps the following instructions when dealing with either an assignment to
zero or a copy operation:
long long a = 0; // mapped to a MPx register clear instruction
long long b = c; // mapped to a MPx register copy instruction
Requires
mafw ll1+((long long)i1*i2)<<1 -Mextoption=MP1x:enable
fractgen
Requires
msfw ll1-((long long)i1*i2)<<1 -Mextoption=MP1x:enable
fractgen
Requires
mpfw ((long long)i1*i2)<<1 -Mextoption=MP1x:enable
fractgen
mahll (long long)((int)ll1+(int)ss1*ss2) 32b MAC with 16b multiplicands
mshll (long long)((int)ll1-(int)ss1*ss2) 32b MAC with 16b multiplicands
shlrr2x (long long)i1<<i2 -
shrr2x (int)(ll1<<i2) -
andcd (ll1 & (!ll2) -
mph (long long)ss1*ss2 -
mpw i1*i2 32b multiplier when no X3/FPx
maw ll1+(long long)i1*i2 -
msw ll1-(long long)i1*i2 -
Note: The three first rows correspond to fractional instructions, which are subject to specific
limitations (Section 8.6.6: Limitations regarding mapping of fractional instructions on
page 131). Their mapping is therefore only performed if the dedicated flag
-Mextoption=MP1x:enablefractgen is set.
// Unary/binary operator (not planned to be supported, use long long var instead)
gvx = gvx + gvx; // Not supported (error msg from front-end)
The result of instructions and built-ins in their functional form is always considered unsigned
by convention. Though, the actual type might be signed, and not explcitly visible to the
compiler. This must be taken into account expecially when writting comparisons.
For example, the following code is incorrect:
if (MP1x_SUBS_f(a, b) < 0) {
d. This mapping allows 32-bit multiplications to be mapped to the MPx multiplier in case the X3 or FPx 32-bit
multiplier is not present in the configuration. Note however that in this case the resulting code is less efficient
than with the 32-bit multiplier, since it requires one more instruction to extract the lower 32-bit part of the result.
e. “Spilled” means that the contents of the register are temporarily stored in memory and then restored when
needed.
MPx code. Note that the cost is neither assessed nor handled by the compiler, so it is the
developer’s responsibility to use the most efficient placement.
On the other hand, the maw instruction is always recognized in the code below:
long long mac;
int a, b;
mac+= (long long)a*b;
f. The name of this option has changed: it was formerly named -Menablefractgen or -Mfractsupport,
which was not accurate enough. The former name is still recognized, but its use is strongly discouraged.
8.7 Examples
tmp = a + b;
tmp = tmp << 2;
return tmp;
}
No MPx support
When MPx is not present and MPx support is not enabled (stxp70cc -O3 test.c), then
the code generated relies solely on core instructions and runtimes:
.global fct
fct:
L_BB1_fct:
make R4, 0 ;;
addcu R4, R4, R4 ;;
addcu R0, R0, R2 ;;
make R2, 2 ;;
addcu R1, R1, R3 ;;
.global __shll
.type __shll, @function
jr __shll ;;
MPx support
When MPx is present and MPx support is enabled (stxp70cc -O3 -Mextension=MP1x
test.c), then MPx instructions are mapped where needed:
.global fct
fct:
L_BB1_fct:
XRF0RR2X V0, R1, R0 ;;
XRF0RR2X V1, R3, R2 ;;
ADDD V0, V0, V1 ;;
SHLID V0, V0, 2 ;;
XRF0CSX2R R0, V0, V0 ;;
XRF0CSX2R R1, V0, V0 ;;
rts ;;
Note: 1 The moves between the core and the MPx registers are introduced to deal with ABI
constraints. Those instructions are necessary only because the addition is insulated in a
function. They are not present in successive long long arithmetic operations, and do not
represent any extra cost. (Consequently, they are shown here in italic.)
2 The MPx instructions are mapped automatically (ADDD, SHLID) to perform long long
operations.
No MPx support
When MPx is not present and MPx support is not enabled (stxp70cc -Os test.c), the
code generated relies only on core instructions and runtimes:
.global fct
fct:
L_BB1_fct:
cmpeq G0, R1, R3 ;;
cmpgtu G1, R0, R2 ;;
andg G0, G0, G1 ;;
cmpgt G1, R1, R3 ;;
org G0, G0, G1 ;;
G4? or R4, R2, 0 ;;
G0? or R4, R0, 0 ;;
G0? or R3, R1, 0 ;;
or R1, R3, 0 ;;
or R0, R4, 0 ;;
rts ;;
The core of the computation are those instructions that are not in italic. The sequence
contains three comparisons and two boolean operations (GMI).
MPx support
When MPx is present and MPx support is enabled (stxp70cc -Os -Mextension=MP1x
test.c), only two comparisons are needed. (The instructions in italic are not taken into
account, as they are mainly needed because of the encapsulation of the code in a function.)
.global fct
fct:
L_BB1_fct:
XRF0RR2X V3, R1, R0 ;;
XRF0RR2X V2, R3, R2 ;;
cmpgtx2r R0, V3, V2 ;;
cmpne G0, R0, 0 ;;
L__0_4:
G4? XRF0CSX2R R0, V0, V2 ;;
G0? XRF0CSX2R R2, V1, V3 ;;
G4? XRF0CSX2R R1, V0, V0 ;;
G0? or R0, R2, 0 ;;
G0? XRF0CSX2R R2, V1, V1 ;;
G0? or R1, R2, 0 ;;
rts ;;
This chapter describes how dynamic loading is implemented using the relocatable loader
library RL_LIB for the STxP70.
Table 29 and Table 30 list a number of acronyms and definitions used within this chapter.
Sometimes you may need to use some of the functions or data items from
a shareable object, but may wish to replace others with your own
definitions. For example, you may want to use the standard C runtime
library shareable object, libc.so, but to use your own definitions for the
Preemption heap management routines malloc() and free(). In this case it is
important that calls to malloc() and free() within libc.so call your
definition of the routines and not the definitions present in libc.so. Your
definition should override, or preempt, the definition within the shareable
object. This feature of shareable objects is called symbol preemption.
Relocation is the process of connecting symbolic references with symbolic
definitions. For example, when a program calls a function, the associated
call instruction must transfer control to the proper destination address at
Relocation execution. In other words, relocatable files must have information that
describes how to modify their section contents, thus allowing executable
and shared object files to hold the right information for a process's program
image.
• Statically allocated variables of local scope, or global variables whose definitions are
not subject to pre-emption, may be accessed directly with GP-relative addressing
mode.
9.1.4 Rationale
Code in main programs may be absolute or position independent. If an absolute program
imports data from a DLL, the linker is forced to allocate the data in the main program’s data
segment statically (this is commonly called the “.dynbss hack”). When data imported from
DLLs is allocated in the main program’s data segment, the program may be subject to future
compatibility problems when the DLL is replaced with a newer version. This issue may be
avoided by requiring main programs to be position independent, at the cost of some
efficiency in the main program. This compatibility/performance trade-off is not made in the
common run-time architecture; it is left to the specific ABI.
4. Procedure entry. The prologue code in the target procedure is responsible for
allocating a frame on the memory stack, if necessary.
If it is a non-leaf procedure, it must save the link register in the memory stack frame.
The prologue must also save any preserved registers that will be used in this
procedure.
If it is a position-independent procedure that makes calls or accesses global data, then
it must establish the GP value in the GP register. The GP register (R13) is a preserved
register, and therefore must be saved before being modified. A position-independent
internal function may assume that the GP register already contains the correct value.
A position-independent leaf procedure that accesses global data is not required to put
the GP value in R13, it may use a scratch register instead, thus avoiding the need for
saving and restoring register R13.
5. Procedure exit. The epilogue code is responsible for restoring the link register and any
preserved registers that were saved.
If a memory stack frame was allocated, the epilogue code must deallocate it. Finally,
the procedure exits by branching through the link register with the return instruction.
6. After the call. Any saved values should be restored.
Caller Callee
Call
- call Procedure body
callee’s load module
caller’s load module
Exit
- restore registers
After the call
- restore return link
- restore registers
- destroy memory frame
- return
2. Procedure call. All indirect calls are made with the call indirect instruction, which reads
and writes the link register. The call instruction saves the return link address in the link
register.
3. Procedure entry, exit, and return. The remainder of the calling sequence is the same
as for direct calls.
Call
- call Procedure body
printf
Module B
printf
Module A *exec_B
main malloc
printf *exec_A *exec_C
malloc
malloc
*exec_D Module C
printf
Module D
malloc
In Figure 30, curved arrows (from load modules to parent module) represent load time
symbol-binding performed while the load module loads. Straight arrows (from loader module
to loaded module) represent explicit symbol address resolution performed through the
loader library API.
The following describes a possible scenario.
1. At run-time, the main module loads the module A into memory through the
rl_load_file() function.
2. The loader, in the process of loading A into memory, binds the symbol printf
(undefined in A) to the printf function defined in main.
3. The main program uses the rl_sym() function to retrieve a pointer to the function
symbol exec_A in A.
4. For A, the main program loads the module D and references to printf are resolved to
the printf in main. In addition, references to malloc in D are also resolved to the
malloc in main.
5. The main program retrieves a pointer to exec_D in D using the rl_sym() function.
6. The main program (at some point) invokes the function exec_A.
7. The exec_A function loads the two modules B and C.
8. The undefined reference to printf in B is resolved to the printf in main (the loader
searches first in A, and then in main).
9. The undefined reference to malloc in C is resolved to the malloc in A (the loader
searches for and finds it in A). Note that the malloc function called from D (malloc of
main) is then different from the malloc function called from B (or C, or A) which is the
malloc of A.
10. After retrieving symbol addresses using the rl_sym() function, module A can
indirectly call functions or reference data in B and C.
Note: At any time, the main module or the module A can unload one of the loaded modules.
rl_parent Return the handle for the parent of the current handle
Definition: rl_handle_t *rl_parent(void);
Arguments: None.
Returns: The handle for the parent of the current handle.
Description: The rl_parent() function returns the handle for the parent of the current handle
(as returned by rl_this()).
It may be used, for example, to find a symbol in one of the parent modules:
void *symbol_in_parents = rl_sym_rec(rl_parent(), "symbol");
rl_file_name Return the filename associated with the loaded module handle
Definition: const char *rl_file_name(
rl_handle_t *handle);
Arguments:
handle The handle for the loaded module.
Returns: The filename associated with the loaded module handle, or NULL.
Description: The rl_file_name() function returns the filename associated with the loaded
module handle. It returns NULL if no filename is associated with the current loaded
module, if the handle does not hold a loaded module or if the handle passed is the
main program handle.
int rl_load_stream(
rl_handle_t *handle,
rl_stream_func_t *stream_func,
void *stream_cookie);
Arguments:
handle The handle for the module.
stream_func The user specified callback function.
stream_cookie The user specified state.
int rl_foreach_segment(
rl_handle_t *handle,
rl_segment_func_t *callback_fn,
void *callback_cookie);
Arguments:
handle The handle for the module.
callback_fn The user specified callback function.
callback_cookie The argument to pass to the function.
int rl_add_action_callback(
rl_action_t action_mask,
rl_action_func_t *callback_fn,
void *callback_cookie);
Arguments:
action_mask The set of actions for which the callback function must be
called.
callback_fn The user specified callback function.
callback_cookie The argument to pass to the function.
The type for the user action callback function is rl_action_func_t. The
parameters passed to the callback function when it is called are:
handle The handle that performed the action.
action The action performed.
cookie The callback_cookie parameter passed to
rl_add_action_callback().
The callback function returns 0 on success and -1 on failure. In the case of failure, the
loading (or unloading) of the module is undone and the error code returned by
rl_errno() is set to RL_ERR_ACTIONF.
Returns: Returns 0 for success, -1 if the callback was not present in the callback list.
Description: The rl_delete_action_callback() function removes the specified callback
function from the action callback list. This function returns 0 if the callback was
removed, or -1 if it was not present in the callback list. No error code is set.
rl_errno Return the error code for the last failed function
Definition: int rl_errno(
rl_handle_t *handle);
Arguments:
handle The handle for the module.
rl_load_buffer(),
rl_load_file(),
RL_ERR_DYN The load module is not a dynamic library.
rl_load_stream(),
rl_set_file_name()
rl_load_buffer(),
The load module has invalid segment rl_load_file(),
RL_ERR_SEG
information. rl_load_stream(),
rl_set_file_name()
rl_load_buffer(),
The load module contains invalid rl_load_file(),
RL_ERR_REL
relocations. rl_load_stream(),
rl_set_file_name()
rl_load_buffer(),
A symbol was not found a load time.
rl_load_file(),
RL_ERR_RELSYM rl_errarg() returns the symbol rl_load_stream(),
name. rl_set_file_name()
The symbol is not defined in the module.
rl_sym(),
RL_ERR_SYM rl_errarg() returns the symbol rl_sym_rec()
name.
The file cannot be opened by
RL_ERR_FOPEN rl_load_file()
rl_fopen().
Error while reading the file in
RL_ERR_FREAD rl_load_file()
rl_fread().
Error while loading the file from a
RL_ERR_STREAM rl_load_stream()
stream.
rl_load_file(),
rl_load_buffer(),
RL_ERR_LINKED Module handle is already linked.
rl_load_stream(),
rl_handle_delete()
rl_unload(), rl_sym(),
RL_ERR_NLINKED Module handle is not linked rl_sym_rec(),
rl_foreach_segment()
RL_ERR_SEGMENTF Error in segment function callback. rl_foreach_segment()
rl_load_file(),
RL_ERR_ACTIONF Error in action function callback. rl_load_buffer(),
rl_load_stream()
rl_errarg Return the name of the symbol that could not be resolved
Definition: const char *rl_errarg(
rl_handle_t *handle);
Arguments:
handle The handle for the module.
9.5 Customization
The relocatable loader library defines a number of functions that it uses internally for
providing services such as heap memory management and file access. To provide custom
implementation of these functions, the application in the main module can override these
functions.
The following command generates an import script that the main module can load from a list
of load modules, liba.rl and libb.rl:
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -o prog_import.ld liba.rl libb.rl
Use the import script to link the main module, for example:
stxp70cc -o prog.exe --rmain object_files.o prog_import.ld
To generate an export script for a load module with a well defined interface specified in the
file liba_export.lst (one symbol per line):
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -e -s -o liba_export.ld liba_export.lst
stxp70cc -o liba.rl --rlib *.o liba_export.ld
the functions that are external have an associated overhead. The other internal functions
have a very reduced overhead.
For a full inter-procedural optimization of the relocatable library, use the -ipa option. In this
case, when combined with the declaration of external functions, the library is generated with
a minimal overhead for the dynamic linking support.
For detailed information on the visibility specification, refer to the compiler options
documentation and to the ELF System V Dynamic Linking ABI.
int main()
{
...
if (rl_add_action_callback(RL_ACTION_ALL, module_profile,
NULL)==-1){
fprintf(stderr, "rl_add_Action_callback failed\n");
exit(1);
}
...
status = rl_load_file(handle, file_name);
...
return 0;
}
STxP70 v3 MUTEX
<RL_LIB_root>/librl/config/stxp70v3/sys_mutex.[c|h]
implementation
STxP70 v3 Cache
<RL_LIB_root>/librl/config/stxp70v3/targ_elf.[c|h]
management
STxP70 v4 MUTEX
<RL_LIB_root>/librl/config/stxp70v4/sys_mutex.[c|h]
implementation
STxP70 v4 Cache
<RL_LIB_root>/librl/config/stxp70v4/targ_elf.[c|h]
management
10 Compiler bugs
This chapter describes the different categories of compiler bugs and how they should be
reported to STMicroelectronics.
10.1.1 Category 1
The following cases are compiler or toolset bugs:
• the compilation phase ends with an assertion message
• the compilation phase ends with a system error message (core dump, bus error)
• the compilation phase produces an output that cannot be assembled
• the compilation phase never ends, or at least does not end in a reasonable amount of
time
• the compiler produces an error message for code that is valid input
• the compiler produces code that does not compute the expected results (but see
Section 10.1.2)
10.1.2 Category 2
The following case is possibly not a compiler or toolset bug.
• The code is functional under a specific optimization level, but not under another. This
may be due to an existing code bug that is only exposed by aggressive optimization.
10.3 Workaround
The following can be carried out to temporarily work-around a compiler bug.
1. Demote the optimization level to -O1 or -O0 when compiling the specific file creating
the problem, either in category 1 or 2. (See Section 10.1.1 and Section 10.1.2.)
2. Remove the optimization pragmas or restrict annotations.
3. Finally, check that you have an up-to-date compiler release.
11 Revision history
Information in this document is provided solely in connection with ST products. STMicroelectronics NV and its subsidiaries (“ST”) reserve the
right to make changes, corrections, modifications or improvements, to this document, and the products and services described herein at any
time, without notice.
All ST products are sold pursuant to ST’s terms and conditions of sale.
Purchasers are solely responsible for the choice, selection and use of the ST products and services described herein, and ST assumes no
liability whatsoever relating to the choice, selection or use of the ST products and services described herein.
No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted under this document. If any part of this
document refers to any third party products or services it shall not be deemed a license grant by ST for the use of such third party products
or services, or any intellectual property contained therein or considered as a warranty covering the use in any manner whatsoever of such
third party products or services or any intellectual property contained therein.
UNLESS OTHERWISE SET FORTH IN ST’S TERMS AND CONDITIONS OF SALE ST DISCLAIMS ANY EXPRESS OR IMPLIED
WARRANTY WITH RESPECT TO THE USE AND/OR SALE OF ST PRODUCTS INCLUDING WITHOUT LIMITATION IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE (AND THEIR EQUIVALENTS UNDER THE LAWS
OF ANY JURISDICTION), OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
ST PRODUCTS ARE NOT AUTHORIZED FOR USE IN WEAPONS. NOR ARE ST PRODUCTS DESIGNED OR AUTHORIZED FOR USE
IN: (A) SAFETY CRITICAL APPLICATIONS SUCH AS LIFE SUPPORTING, ACTIVE IMPLANTED DEVICES OR SYSTEMS WITH
PRODUCT FUNCTIONAL SAFETY REQUIREMENTS; (B) AERONAUTIC APPLICATIONS; (C) AUTOMOTIVE APPLICATIONS OR
ENVIRONMENTS, AND/OR (D) AEROSPACE APPLICATIONS OR ENVIRONMENTS. WHERE ST PRODUCTS ARE NOT DESIGNED
FOR SUCH USE, THE PURCHASER SHALL USE PRODUCTS AT PURCHASER’S SOLE RISK, EVEN IF ST HAS BEEN INFORMED IN
WRITING OF SUCH USAGE, UNLESS A PRODUCT IS EXPRESSLY DESIGNATED BY ST AS BEING INTENDED FOR “AUTOMOTIVE,
AUTOMOTIVE SAFETY OR MEDICAL” INDUSTRY DOMAINS ACCORDING TO ST PRODUCT DESIGN SPECIFICATIONS.
PRODUCTS FORMALLY ESCC, QML OR JAN QUALIFIED ARE DEEMED SUITABLE FOR USE IN AEROSPACE BY THE
CORRESPONDING GOVERNMENTAL AGENCY.
Resale of ST products with provisions different from the statements and/or technical features set forth in this document shall immediately void
any warranty granted by ST for the ST product or service described herein and shall not create or extend in any manner whatsoever, any
liability of ST.
ST and the ST logo are trademarks or registered trademarks of ST in various countries.
Information in this document supersedes and replaces all information previously supplied.
The ST logo is a registered trademark of STMicroelectronics. All other names are the property of their respective owners.