0% found this document useful (0 votes)
85 views166 pages

UM1237

Overview The purpose of the STxP70 compilation driver (stxp70cc) is to manage the stages of the compilation process: preprocessing, compiling into assembly language, assembling and linking. The assembler file is compiled using stxp70-as and linked using stxp70-ld to provide an STxP70 binary image. All these phases are hidden using the driver tool stxp70cc. This user manual provides detailed information to enable users to write efficient code optimized to run on the STxP70 processors and to compile and link it ready for execution by sxrun. The manual covers: • stxp70cc driver options • pragmas supported by stxp70cc • compiler optimization techniques • GNU C language extensions • GNU asm construct • built-in functions The load/run tool sxrun and the STxP70 debugger sxgdb are described in the STxP70

Uploaded by

Kif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
85 views166 pages

UM1237

Overview The purpose of the STxP70 compilation driver (stxp70cc) is to manage the stages of the compilation process: preprocessing, compiling into assembly language, assembling and linking. The assembler file is compiled using stxp70-as and linked using stxp70-ld to provide an STxP70 binary image. All these phases are hidden using the driver tool stxp70cc. This user manual provides detailed information to enable users to write efficient code optimized to run on the STxP70 processors and to compile and link it ready for execution by sxrun. The manual covers: • stxp70cc driver options • pragmas supported by stxp70cc • compiler optimization techniques • GNU C language extensions • GNU asm construct • built-in functions The load/run tool sxrun and the STxP70 debugger sxgdb are described in the STxP70

Uploaded by

Kif
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 166

UM1237

User manual
STxP70 compiler

Overview
The purpose of the STxP70 compilation driver (stxp70cc) is to manage the stages of the
compilation process: preprocessing, compiling into assembly language, assembling and
linking. The assembler file is compiled using stxp70-as and linked using stxp70-ld to
provide an STxP70 binary image. All these phases are hidden using the driver tool
stxp70cc.
This user manual provides detailed information to enable users to write efficient code
optimized to run on the STxP70 processors and to compile and link it ready for execution by
sxrun. The manual covers:
• stxp70cc driver options
• pragmas supported by stxp70cc
• compiler optimization techniques
• GNU C language extensions
• GNU asm construct
• built-in functions
The load/run tool sxrun and the STxP70 debugger sxgdb are described in the STxP70
Professional Toolset user manual (7833754).

May 2013 8027948 Rev 15 1/166


www.st.com
Contents UM1237

Contents

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Documentation suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Conventions used in this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1 STxP70 development system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


1.1 Toolset overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Toolset software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Example command-lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.2 Compiling code for STxP70-3 or STxP70-4 . . . . . . . . . . . . . . . . . . . . . . 14

2 stxp70cc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Invoking the compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Command-line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Getting help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Overall options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 stxp70cc core selection option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 stxp70cc compiler generic options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 C preprocessor options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.6 C dialect options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.7 Warning options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.8 Debugging options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.9 Profiling options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.10 Code coverage options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.11 Call trace instrumentation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.12 Optimization options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.13 Code generation options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.14 -OPT options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.15 Inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.16 Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.17 Position independent code generation (PIC) . . . . . . . . . . . . . . . . . . . . . 38
2.2.18 Sending options to a specific phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2/166 8027948 Rev 15


UM1237 Contents

2.2.19 Directory and library options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


2.2.20 Environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Predefined macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 C99 support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1 Pragmas short description and syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Loop optimization pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 #pragma unroll (n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 #pragma ivdep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 #pragma loopdep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.4 #pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.5 #pragma looptrip (n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.6 #pragma hwloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.7 #pragma loopmin<itercount> (minc) . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.8 #pragma loopmax<itercount> (maxc) . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.9 Code generation pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.10 Heuristic pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Miscellaneous pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 #pragma ident “string” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 #pragma weak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.3 #pragma disable_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.4 #pragma force_extgen ( fct1, fct2, ... ) . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.5 #pragma disable_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . 53
3.3.6 #pragma force_specific_extgen ( extname[, fct1, fct2, ... ] ) . . . . . . . . . 54

4 Optimization guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 Single file inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.2 stxp70cc inlining options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.3 Extern inline functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1.4 Inlining pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Loop unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.1 Default unrolling policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2 Advanced control of the unroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Precedence rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8027948 Rev 15 3/166


Contents UM1237

4.2.4 Built-in assume and pragma loopmod . . . . . . . . . . . . . . . . . . . . . . . . . . 64


4.3 Memory dependences in C programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Aliasing rules in C/C++ programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.1 Profiling data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.2 Using profiling data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.3 Special case of programs that never exit . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.4 Amount of heap required for profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 Code coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.7 Call trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.7.1 Instrumenting functions: -finstrument-functions . . . . . . . . . . . . . . . . . . . 74
4.7.2 Instrumenting calls to functions: -minstrument-calls . . . . . . . . . . . . . . . 74
4.8 Interprocedural analysis optimization (IPA) . . . . . . . . . . . . . . . . . . . . . . . 76
4.8.1 Using IPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.8.2 IPA command line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.8.3 Limitations and special cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.9 Floating-point code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.9.1 Precision of floating-point arithmetic in programs . . . . . . . . . . . . . . . . . 79
4.9.2 Controlling the precision of floating-point . . . . . . . . . . . . . . . . . . . . . . . . 79
4.9.3 Use of STxP70 with FPx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.9.4 Examples of floating-point arithmetic on the STxP70 . . . . . . . . . . . . . . 80
4.10 Application configuration files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.10.1 General description and purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.10.2 Description and syntax of an ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.10.3 ACF grammar description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.10.4 Using the ACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.10.5 Behavior of -macf-template option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.10.6 Scope and known limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 GNU C extensions supported by stxp70cc . . . . . . . . . . . . . . . . . . . . . . 90


5.1 Extensions to the C language family . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.1.1 Statements and declarations in expressions . . . . . . . . . . . . . . . . . . . . . 90
5.1.2 Locally declared labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.3 Labels as values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.4 Naming an expression's type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.5 Referring to a type with typeof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4/166 8027948 Rev 15


UM1237 Contents

5.1.6 Generalized Lvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92


5.1.7 Conditionals with omitted operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1.8 Double-word integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1.9 Hexadecimal floats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1.10 Specifying a register for a local variable . . . . . . . . . . . . . . . . . . . . . . . . 93
5.1.11 Array of length zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1.12 Array of variable length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1.13 Macro with variable number of arguments . . . . . . . . . . . . . . . . . . . . . . . 95
5.1.14 Strings literals with embedded newlines . . . . . . . . . . . . . . . . . . . . . . . . 97
5.1.15 Non-Lvalue arrays may have subscripts . . . . . . . . . . . . . . . . . . . . . . . . 97
5.1.16 Arithmetic on void and function pointers . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1.17 Non-constant initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1.18 Compound literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1.19 Designated initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.20 Case ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.21 Cast to a union type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.22 Dollar signs in identifier names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.23 Prototypes and old-style function definitions . . . . . . . . . . . . . . . . . . . . 100
5.1.24 C++ comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.25 Character ESC in constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.26 Inquiring on alignment of types or variables . . . . . . . . . . . . . . . . . . . . 100
5.1.27 Incomplete enum type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.28 Function names as strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 Placement and layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.3 Visibility attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.4 Miscellaneous attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.5 Built-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6 GNU ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


6.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.3 Volatile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112
6.5 Differences between the STxP70 core versions . . . . . . . . . . . . . . . . . . .112

8027948 Rev 15 5/166


Contents UM1237

6.6 GNU ASM optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112


6.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113
6.8 Parsing and optimization of GNU assembly statement . . . . . . . . . . . . . .114

7 Built-in functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115


7.1 Header files and C-models files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115
7.2 Naming built-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116
7.2.1 General naming scheme, relevant files . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.2 Types and special built-ins for audio scalar/SIMD extensions . . . . . . . 117
7.3 Using built-ins from C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.3.1 Using built-ins on an STxP70 platform . . . . . . . . . . . . . . . . . . . . . . . . 120
7.3.2 Standard use of built-in C-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.3.3 Use of built-in C-models on STxP70 target . . . . . . . . . . . . . . . . . . . . . 121

8 MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122


8.1 Goal of the MPx scalar support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.2 Control of the MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.2.1 Compiler options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.2.2 Function pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3 Scope of the MPx native support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3.1 Built-in based support with MPx_Vx type . . . . . . . . . . . . . . . . . . . . . . 123
8.3.2 Support of type equivalence between long long and MPx_Vx . . . . . . . 124
8.3.3 Automatic MPx code generation on long long arithmetic . . . . . . . . . . . 124
8.3.4 Pattern recognition for integer and fractional data types . . . . . . . . . . . 125
8.4 Type equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.5 Automatic code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5.1 Scope and principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5.2 Operations mapped to single MPx instructions . . . . . . . . . . . . . . . . . . 128
8.5.3 Operations mapped to meta-instructions . . . . . . . . . . . . . . . . . . . . . . . 128
8.6 Important remarks and known limitations . . . . . . . . . . . . . . . . . . . . . . . . 129
8.6.1 Avoid mixing MPx and long long . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.6.2 Long long passed as function parameters . . . . . . . . . . . . . . . . . . . . . . 129
8.6.3 Long long life span crossing function call . . . . . . . . . . . . . . . . . . . . . . 129
8.6.4 Efficiency of code in meta-instructions . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.6.5 Mapping exact conversions and single statement expressions . . . . . . 130
8.6.6 Limitations regarding mapping of fractional instructions . . . . . . . . . . . 131

6/166 8027948 Rev 15


UM1237 Contents

8.6.7 Unsupported mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


8.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.7.1 Direct mapping of long long arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 132
8.7.2 Meta-instruction, case of a long long max . . . . . . . . . . . . . . . . . . . . . . 133
8.7.3 Case of the 32-bit multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9 Relocatable loader library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136


9.1 Introduction to dynamic linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.1.1 Position-independent code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.1.2 Import stubs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.1.3 The dynamic loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.1.4 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.2 Calling sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.2.1 Direct calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.2.2 Indirect calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.3 Introduction to the relocatable loader library . . . . . . . . . . . . . . . . . . . . . 142
9.3.1 Run-time model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.3.2 Relocatable run-time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.4 Relocatable loader library API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.4.1 rl_handle_t type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.4.2 Function descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.5 Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.5.1 Memory allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.5.2 File management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.6 Building a relocatable library or main module . . . . . . . . . . . . . . . . . . . . 157
9.6.1 Importing and exporting symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.6.2 Optimization options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.7 Debugging support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.8 Profiling support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.9 Memory protection support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.10 STxP70 targeting of RL_LIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

10 Compiler bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163


10.1 Identifying a compiler bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10.1.1 Category 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10.1.2 Category 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

8027948 Rev 15 7/166


Contents UM1237

10.2 Checks performed by user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163


10.3 Workaround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.4 Reporting a compiler bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.5 Known bugs and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

11 Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

8/166 8027948 Rev 15


UM1237 Preface

Preface

This document is part of the documentation suite detailed below. Comments on this or other
manuals in the documentation suite should be made by contacting your local
STMicroelectronics sales office or distributor.

Documentation suite
STxP70 compiler user manual (8027948)
This manual describes the C compiler for STMicroelectronics STxP70 cores.

STxP70 Professional Toolset user manual (7833754)


This document explains the toolset architecture and provides information about how to
develop and debug applications running on STxP70 systems.

Advanced debugging with the STxP70-4 instruction-accurate simulator


(Doc ID 024404)
This document describes the commands implemented in the instruction-accurate simulator
for debugging applications.

STxP70 utilities reference manual (8210925)


This document provides in a single volume, command line reference for each of the generic,
STxP70-3 and STxP70-4 utilities provided with the STxP70 toolset that are not documented
elsewhere. For each utility, the manual provides a command line synopsis, a brief
description of the utility, the complete list of options that are available, and its return value.

Building STxP70 libraries application note (8226669)


This document explains how to produce a set of standard libraries for the STxP70
compilation tools optimized for the user’s specific purposes.

Conventions used in this guide


General notation
The notation in this document uses the following conventions:
• sample code, keyboard input and file names,
• variables, code variables and code comments,
• equations and math,
• screens, windows, dialog boxes and tool names,
• instructions.

8027948 Rev 15 9/166


Preface UM1237

Software notation
Syntax definitions are presented in a modified Backus-Naur Form (BNF).
• Terminal strings of the language, that is those not built up by rules of the language, are
printed in teletype font. For example, void.
• Non-terminal strings of the language, that is those built up by rules of the language, are
printed in italic teletype font. For example, name.
• If a non-terminal string of the language starts with a non-italicized part, it is equivalent
to the same non-terminal string without that non-italicized part. For example,
vspace-name.
• Each phrase definition is built up using a double colon and an equals sign to separate
the two sides (‘::=’).
• Alternatives are separated by vertical bars (‘|’).
• Optional sequences are enclosed in square brackets (‘[’ and ‘]’).
• Items which may be repeated appear in braces (‘{’ and ‘}’).

10/166 8027948 Rev 15


UM1237 STxP70 development system

1 STxP70 development system

The purpose of the stxp70cc compilation driver is to translate a program written in the C
language into the STxP70 assembly language so that is suitable for assembly, linking, and
execution. The assembler file is compiled using stxp70-as and linked using stxp70-ld(a) to
provide an STxP70 binary image. All these phases are hidden using the driver tool
stxp70cc.
Note: The stxp70cc compilation driver and core compiler are common to both STxP70 versions 3
and 4. A specific command line and GUI option can be used to generate code for either
target. See Section 1.2.2: Compiling code for STxP70-3 or STxP70-4 on page 14.
The stxp70cc compiler uses the GNU C language parser, and implements state-of-the art
compiler optimizations. Thanks to this GNU C language parser, the stxp70cc compiler is
closely compatible with the GNU C compiler, both at the driver level, and on C language
extensions (GNU Compiler Collection project; see
http://www.gnu.org/software/gcc/gcc.html). The processor-independent compiler
optimizations available in the stxp70cc compiler are mostly inherited from the Open64
project hosted on SourceForge; see http://open64.sourceforge.net. Other compiler
optimizations that are specific to the STxP70 family of processors have been developed by
STMicroelectronics.
These include:
• use of hardware loop mechanisms of the STxP70 core (hardware loops and
JRGTUDEC instructions)
• use of the special addressing modes of the STxP70 core
• use of the memory space defined in the STxP70 ABI in order to increase memory
accesses efficiency
• aggressive instruction selection including mapping of the user boolean variables to the
branch registers
• instruction scheduling
• aggressive transformation of loops
• compiler intrinsics and built-ins support
• compiler to support X3, FPX and MPx extensions
The binary image can be executed on a STxP70 hardware target or by using the sxrun
simulator or the sxgdb debugger. The binary format used for the image is ELF and the
debug format is DWARF2.
Where applicable, the available options are accessible through a command-line interface
similar to the UNIX style. This will be familiar to most gcc and cc users. The toolset is
installed in a directory structure which also follows the UNIX structure, that is bin and lib.
Wherever possible, compatibility with the options of the former sxcc compiler has been
preserved.
The compiler supports the ANSI C89 standard and partially supports the ANSI C99
standard, see Section 2.4: C99 support on page 41.

a. For usage information see the GNU linker document “Using ld” that is supplied with the toolset.

8027948 Rev 15 11/166


STxP70 development system UM1237

1.1 Toolset overview


The STxP70 Professional Toolset is a set of tools that allow C programs compiled for an
STxP70 target to be simulated on a host workstation or executed on an STxP70 target.
The STxP70 Professional Toolset is mainly intended for tool developers, for operating
system development and for applications that require modeling interrupts and real-time
behavior. It includes the whole set of tools that manipulate STxP70 object files, including the
STxP70 assembler, compiler, linker, load/run tool, debugger and archiver. Here, STxP70
assembler files are translated to STxP70 object files that the linker merges to produce an
STxP70 executable image. This image file does not run natively on the host workstation and
requires an interpreter to be executed. See Section 1.2.1: Example command-lines on
page 13 for details. Figure 1 shows the main components of the STxP70 Professional
Toolset (when IPA is not used).

Figure 1. Components of the STxP70 Professional Toolset interfaces

.c source files

STxP70 C Compiler

STxP70 assembler
files (.s)

STxP70 assembler
(stxp70-as)

STxP70 object file STxP70 object file STxP70 libraries

target board
boot and sysconf
files
STxP70 linker
(stxp70-ld)

STxP70 binary (.elf)

STxP70 load/run tool STxP70 debugger


(sxrun) (sxgdb)

12/166 8027948 Rev 15


UM1237 STxP70 development system

1.2 Toolset software requirements

1.2.1 Example command-lines


The stxp70cc compiler produces an STxP70 object file in STxP70 object file formats (ELF).
See the relevant chapter in the STxP70 ABI manual (7937486) for details.
Assuming that we want to compile two files file1.c and file2.c into an STxP70
executable a.elf, the set of commands to issue is:
$[1] stxp70cc –c file1.c
$[2] stxp70cc –c file2.c
$[3] stxp70cc –o a.elf file1.o file2.o

This assumes that the user has sourced the appropriate shell file in the <tools-
dir>/bin folder. In most cases, the one needed is STxP70.csh. This ensures that all
needed configuration environment variables are properly set.
Command [1] causes the following steps to be executed:
<tools-dir>/bin/stxp70cc # stxp70cc driver
<tools-dir>/lib/cpp <cpp_flags> file1.c file1.i # C preprocessor
<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file1.i file1.s
# C compiler
<tools-dir>/bin/stxp70-as <stxp70-as_flags> file1.s file1.o # STxP70 Assembler

Command [2] causes the following steps to be executed:


<tools-dir>/bin/stxp70cc # stxp70cc driver
<tools-dir>/lib/cpp <cpp_flags> file2.c file2.i # C preprocessor
<tools-dir>/lib/cmplrs/<C compiler> <C Compiler flags> file2.i file2.s
# C compiler
<tools-dir/bin/stxp70-as <stxp70-as_flags> file2.s file2.o # STxP70 Assembler

Command [3] causes the link stage to be executed. Please refer to the STxP70 linker user
manual for further details.
Once steps [1] to [3] are completed, an STxP70 executable binary a.elf is generated. This
can be executed using the stand-alone driver for the load/run tool (available as sxrun) in the
following way:
$[4] sxrun a.elf

This causes the a.elf STxP70 binary to be “interpreted” by the sxrun command. The
simulator also provides some minimal tracing, cycle counting and statistics facilities.

8027948 Rev 15 13/166


STxP70 development system UM1237

1.2.2 Compiling code for STxP70-3 or STxP70-4


By default, the code is compiled for STxP70-3. However a dedicated command line option
can be used to compile code for STxP70-4. In the example below, lines [1] and [2] generate
a version 3 executable and line [3] generates an executable for version 4:
$[1] stxp70cc file1.c
$[2] stxp70cc -mcore=stxp70v3 file2.c
$[3] stxp70cc -mcore=stxp70v4 file3.c
Except for a few instructions, the STxP70-3 and STxP70-4 are assembly compatible. They
are not binary compatible. More details are provided in following sections.

Warning: The assembly codes provided as an example in this


document make use of the STxP70-3 assembly syntax. On
STxP70-4, it is now possible to form bundles of one or two
instructions. Two successive bundles must be separated by
a “;;” pattern. Two successive lines not separated by a “;;”
are considered as a single bundle, meaning the two
instructions will be emitted in the same cycle.

14/166 8027948 Rev 15


UM1237 stxp70cc

2 stxp70cc

The stxp70cc compiler is similar to any command-line compiler. It is either invoked from a
command line interpreter or from a Makefile and implicitly recognizes files by their
extension.

2.1 Invoking the compiler


The C compiler is invoked using the stxp70cc command:
stxp70cc {<argument>}

where:
<argument> = <option> | <input_file>

Examples:
stxp70cc -S file.c # produces file.s
stxp70cc -c file.c # produces file.o

Conflicting options are resolved by using the last option on the command line.

2.1.1 Input and output


File extension naming conventions are summarized in Table 1 and Table 2.

Table 1. Input names conventions


Extension Convention

.c C language source file to be pre-processed and compiled


.h C language header file
.i C language source file already pre-processed
.s Assembly language source file to be assembled
.S Assembly language source file to be pre-processed and assembled

Table 2. Output names conventions


Extension Convention Produced by option(s)

.s Assembly language output file -S


.o Object file -c

The final executable file does not need to have a specific file extension. If no output file
name is specified through the -o option, the executable generated is named a.out.
Examples:
stxp70cc file.c # generates the executable a.out
stxp70cc file.c -o file.u # generates the executable file.u

8027948 Rev 15 15/166


stxp70cc UM1237

2.2 Command-line options


This section provides information on the command line options of stxp70cc.

2.2.1 Getting help


If the compiler driver is given the -help option, it displays the list of available options, and
then terminates.
Additionally, the -help option can be followed by an additional keyword separated from the
help option by a colon. All entries matching the keyword are displayed on the standard
output, for example:
stxp70cc -help:W
This command displays all options containing the -W string. In this example, all options
related to the emission of compiler warnings are listed.

2.2.2 Overall options


The options in Table 3 control the type of processing performed by stxp70cc and the output
it generates, for example: an executable, an object file, an assembler file, a pre-processed
file, an archive or a dependency list.
Output files produced by these options default to
<original_file_name>.<output_extension> and can be renamed using the -o
option.

Table 3. Overall options


Option Description

-c Compile or assemble the source file, but do not link.


-S Stop after the compilation phase.
-E Stop after the preprocessing phase. Output is send to stdout.
Print on stderr the commands executed to run the compilation phases. The
-v
message generated indicates the release identity.
--version Display the version numbers of the invoked compiler and stop.
-dumpversion Print the compiler front-end version (for example, 3.3.3) and stop.
Keep intermediary files produced by the compilation phases in the current
-keep
folder.
Used in combination with -keep or -Mkeepasm, this option specifies the
-keep_dir
location to be used to store intermediate files.

16/166 8027948 Rev 15


UM1237 stxp70cc

2.2.3 stxp70cc core selection option


The STxP70 tools delivered in the STxP70 toolset R4.0.0 and higher, support both STxP70-
3 and STxP70-4. The STxP70-4 is different from STxP70-3 in three ways:
• it implements a variable length encoding of the instruction set (VLIS)
• it implements dual issue
• it supports dual arithmetic and logic unit (ALU) configuration
The -mcore option must be used to select the version of the core. By default, the code is
compiled for STxP70-3. The STxP70-3 and STxP70-4 are assembly compatible, except for
a few instructions.
In the examples below, line [1] and [2] generate a version 3 executable and line [3]
generates an executable for version 4:
$[1] stxp70cc file1.c
$[2] stxp70cc -mcore=stxp70v3 file2.c
$[3] stxp70cc -mcore=stxp70v4 file3.c

Table 4. The core selection -mcore option


Option Description

Assembly, object and binary files are generated for single issue, fixed
stxp70v3
length encoding STxP70-3
Assembly, object and binary files are generated for single/dual issue,
stxp70v4
variable length encoding STxP70-4

Note: The set of options that must/can be set is strongly dependent on the core selected. This is
especially true for the configuration and code generation options presented in the tables of
the next section. Namely, the STxP70-4 can be configured for single or dual issue, as well
as single or dual ALU. Each of those choices corresponds to specific compiler options.

2.2.4 stxp70cc compiler generic options


Prefixes of generic options
The options in Table 5 provide generic means to pass fine grain options to either phase of
the compiler.

Table 5. Generic options


Option Description

stxp70cc interprets the -Msxflag option as an extra code generation or


environment option. The list of possible sxflags listed in Table 6. It should
-Msxflag
be noticed that, due to the GNU front-end, the -M prefix is also used for
dependency handling options.
This option is used to pass arguments to a specific phase. The phase
-W<phase>,<arg> names are p, f, b, a, l for pre-processor, front-end, back-end, assembler
and linker respectively.
This option is used to change the path to one of the phases. The phase
-Y<phase>,<path> names are p, f, b, a, l, I, S, L for pre-processor, front-end, back-end,
assembler, linker, include, startup, libraries respectively.

8027948 Rev 15 17/166


stxp70cc UM1237

Code generation/configuration and environment options with -M prefix


Table 6 lists the options that can be used with the -M flag. These options have a special
status, as they ensure backward compatibility with the sxcc compiler. Due to the differences
in compiler internals, some options have been adapted or removed.
The options that accept further controls are described in the following pages.
Several of the options are able to place certain data items into specific areas of memory
called special data areas. See Section 5.2.1: Placement and layout on page 101 for
information about the special data areas.
The options in Table 6 correspond to code generation and environment options that can be
set using the generic -M flag.

Table 6. Generic options with -M flag


Option Description

config[=context:<n>|
regbank:<n>|
mult:<n>|
bypass:<n>|
bhb:<n>|
efuif:<n>|
mfuif:<n>|
extmemif:<n>| Defines the processor configuration. Further information on these
itcnodes:<n>| controls can be found in Code generation and configuration controls on
noevc| page 19. The assembler performs some consistency checking based
evcglobal:<n>| on this configuration option.
evclocal:<n>|
The last option (vliw) is only available on STxP70-4.
hwloop:<n>|
dcache:<n>| It is possible to combine several suboptions in a single -Mconfig
dmsize:<n>| option bundle. In this case, suboptions must be separated by a “,”. For
pcache:<n>| instance: -Mconfig=vliw:no,noevc
pmsize:<n>|
pixel:<n>|
pixelsize:<n>|
rompatch:<n>|
maxszmis:<n>|
minadmis:<n>|
vliw:<n>]
da[={<n>|all}] Places certain data items in the data area (GP-based on 32 Kbytes)
Places certain data items in the small data area (GP-based on 4 K-
sda[={<n>|all}]
objects)
Places certain data items in the low memory or tiny data area (32-Kbyte
tda[={<n>|all}]
size)
Deprecated, and replaced by extoption. Allows the compiler to
enablefractgen generate fractional instructions of the MPx. Refer to Chapter 8: MPx
native support on page 122.

18/166 8027948 Rev 15


UM1237 stxp70cc

Table 6. Generic options with -M flag (continued)


Option Description

Connects extension (MP1x, fpx), using the specified VLIW


configuration. When STxP70-3 is used, only the novliw suboption can
extension[=fpx|MP1x]
be specified.
[:novliw|
single| MP1x has been supported through intrinsics and specific types since
dual] compiler version 3.2.0. Version 3.3.0 introduces so called “native
support”, which provides automatic code generation from pure C
language. Refer to Chapter 8: MPx native support on page 122.
extoption=extension: Pass a given option to the extension. Refer to Chapter 8: MPx native
option support on page 122.
extrcdir=directory_ Use the stxp70extrc user-defined extension definition file from the
path directory found on directory_path.
farcall Specifies that all calls and jumps are far (with absolute addresses).
Select the global offset table (GOT) model for position independent
got[=small|standard|
code and data (PIC and PID) generation. See Chapter 9: Relocatable
large]
loader library on page 136.
Controls use of hardware loops feature. Further information can be
hwloop[=option]
found in Code generation and configuration controls on page 19.
This option instructs the compiler to align the stack of interruption
itstackalign=<n> routines to the specified boundary, as a number of bytes. Default is 8
(that is, 64 bits).
Preserve intermediate files. Files are located in local folder by default.
keepasm The -keep_dir option can be used in combination to specify a
different folder where intermediate files must be stored.
Instructs the compiler to use the 16 register set (instead of the default
32 register set). Notice that contrary to the -Mconfig option, this
mode16
option is not a configuration option. No checking is made at assembler
level regarding the register indices.
Instructs the compiler to link in a version of the C library that uses the
lib16
16 register set.
Instructs the compiler to link in a version of the C library that uses the
lib32
32 register set.
Instructs the linker not to use standard boot sequence but one provided
nostartup
by the user.

The control for the options listed in Table 6 can be found in Code generation and
configuration controls and Environment controls on page 25.

Code generation and configuration controls


The code generation and configuration controls are listed below.
-Mconfig[=context:<n>|regbank:<n>|mult:<n>|efuif:<n>|mfuif:<n>|
extmemif:<n>|itcnodes:<n>|noevc|evcglobal:<n>|evclocal:<n>|
hwloop:<n>|dcache:<n>|dmsize:<n>|pcache:<n>|pmsize:<n>|pixel:<n>|
pixelsize:<n>|rompatch:<n>|maxszmis:<n>|minadmis:<n>|vliw:<n>]
Use -Mconfig to specify the configuration of the STxP70 core IP. The subflags to this
option are listed in Table 7.

8027948 Rev 15 19/166


stxp70cc UM1237

Table 7. Subflags allowed in the -Mconfig option


Subflag Description

context:<n> Defines context number. Where n can be: 1 | 2 | 4 | 8.


regbank:<n> Defines register bank number. Where n can be: 1 | 2.
Defines multiplier implementation. Where n can be: yes | no.
mult:<n>
Note that using the FPX enables the multiplier as well.
Defines memory bypass configuration. Where n can be:
no | mem2_exe. (mem2_exe indicates a bypass is implemented
bypass:<n> between the memory2 and execution stages of the pipeline. When a
bypass is present, the load-use penalty is one cycle instead of two
cycles when the bypass is not implemented.)
Defines branch history buffer configuration. Where n can be:
bhb:<n>
yes | no.
Defines extension functional unit interface width. Where n can be:
efuif:<n>
no | 32 | 64 | 128 | 256 | 512.
Defines MFU interface width. Where n can be:
mfuif:<n>
no | 32 | 64 | 128 | 256 | 512.
Defines external memory interface width. Where n can be:
extmemif:<n>
no | 32 | 64.
itcnodes:<n> Defines ITC number of nodes. Where n can be: no | 8 | 16 | 32.
noevc Defines EVC implementation.
Defines EVC number of global events. Where n can be:
evcglobal:<n>
4 | 8 | 16 | 32.
Defines EVC number of local events. Where n can be:
evclocal:<n>
4 | 8 | 16 | 32.
Defines hardware loop implementation. Where n can be:
hwloop:<n>
no | bycxt | forall.
Defines data memory size. Where n can be:
dmsize:<n> no | 512 | 1k | 2k | 4k | 8k | 16k | 32k | 64k |
128k | 256k | 512k | 1M | 2M | 4M.
dcache:<n> Defines data cache implementation. Where n can be: yes | no.
Defines program memory size. Where n can be:
pmsize:n<n> no | 512 | 1k | 2k | 4k | 8k | 16k | 32k | 64k |
128k | 256k | 512k | 1M | 2M | 4M.
pcache:<n> Defines program cache implementation. Where n can be: yes | no.
pixel:<n> Defines the pixel mode implementation. Where n can be: yes | no.
pixelsize:<n> Defines the pixel data size. Where n can be: 8 | 10 | 12 | 14.
Defines the ROM patch controller implementation. Where n can be:
rompatch:<n>
yes | no.
Defines the size of the largest memory access supporting
maxszmis:<n>
misalignment. Where n can be no | 2 | 4 | 8 | 16 | 32 | 64.

20/166 8027948 Rev 15


UM1237 stxp70cc

Table 7. Subflags allowed in the -Mconfig option (continued)


Subflag Description

Defines the minimal address alignment at which misaligned memory


minadmis:<n> accesses are supported. Where n can be:
no | 2 | 4 | 8 | 16 | 32.
This STxP70-4 specific option indicates the number of issues and
ALUs available on the core. The value of n can be:
no | singlecoreALU | dualcoreALU. The values of those
options must be interpreted as follows:
vliw:<n> – no: the core is single issue, single ALU,
– singlecoreALU: the core is dual issue, single ALU,
– dualcoreALU: the core is dual issue, dual ALU.
If the vliw option is not set and code is compiled for STxP70-4, then
the default behavior corresponds to -Mconfig=vliw:no.

By default, -Mconfig enables four contexts, two register banks, multiplier, no memory
bypass, no branch history buffer (BHB), 32-bit EFU interface, 32-bit MFU interface,
32-bit external memory interface, eight ITC nodes, EVC with 16 global and 16 local
events, two hardware loops for all contexts, 4 Mbytes data memory, no data cache,
4 Mbytes program memory, no pcache, no pixel support, no ROM patch support,
no misaligned memory access and single issue architecture.
-Mda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called
the medium data area (DA). It is possible to generate optimized (that is, shorter)
addresses for data in the medium data area. (GP-based addugp is used instead of
make and more.)
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size
constraint. -Mda is equivalent to -Mda=all.
Notice that -Mda options are ignored if IPA memory placement is enabled. Refer to
Section 4.8: Interprocedural analysis optimization (IPA) on page 76 for further details.
-Msda[={ <n> | all }]
Place data objects of aggregate alignment <= n bytes in the region of memory called
the small data area (SDA). It is possible to generate optimized (that is, shorter)
addresses for data in the small data area. (GP-based addressing mode can be used,
thus constructing the address and performing the access itself in the same instruction.)
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size
constraint. -Msda is equivalent to -Msda=all.
In the case of a structure that contains fields of different types, the decision of where to
place the variable depends on the alignment of the largest data types, whereas the
choice of the section to be used depends on the size of the smallest field. This means
that a structure with both int and char fields is placed if option is either -Msda=all
or -Msda=4. If placement is achieved, then the structure is placed in SDA1.
Notice that -Msda options are ignored if IPA memory placement is enabled. Please
refer to Section 4.8: Interprocedural analysis optimization (IPA) on page 76 for further
details.

8027948 Rev 15 21/166


stxp70cc UM1237

-Mtda[={ <n> | all }]


Place data objects of aggregate alignment <= n bytes in the region of memory called
the low memory data area (TDA). It is possible to generate optimized (that is, shorter)
addresses for data in the low memory area. Addresses in the TDA area are encoded
using a maximum of 15 bits and therefore may be constructed using a single make
instruction.
The parameter n can be one of (1, 2, 4, 8). Specifying all eliminates the size
constraint. -Mtda is equivalent to -Mtda=all.
-Mdarange=[minSize],maxSize
Use data area (DA) addressing mode on selected variables with a size between
minSize and maxSize bytes.
-Msdarange=[minSize],maxSize
Use small data area (SDA) addressing mode on selected variables with a size between
minSize and maxSize bytes.
-Mtdarange=[minSize],maxSize
Use tiny data area (TDA) addressing mode on selected variables with a size between
minSize and maxSize bytes.
-Menablefractgen
Enables generation of the fractional instructions when MP1x is present. This option was
formerly named -Mfractsupport. These two options are now deprecated, and
replaced by the suboption -Mextoption. Refer to Chapter 8: MPx native support on
page 122 for further details on this option.
-Mextension[=fpx|MP1x][:novliw|single|dual]
Only the X3 extension is connected by default. (This means that the corresponding
option x3 is no longer available.)
Connect extension fpx to the core to enable floating point arithmetic. Activating fpx
allows the compiler to generate floating point extension specific instructions, which
includes native floating point (32-bit) arithmetic instructions and some integer
instructions (such as multiply, divide) that completes core integer support.
MP1x has been supported in the compiler since version 3.2.0 using built-in functions
and specific data types. Version 3.3.0 introduces the so-called “native support” of the
MPx extension. This means that the compiler can generate code that makes use of
MPx registers and instruction from pure C code (that is, even if no MPx built-in
functions and types are present). More details can be found in Chapter 8: MPx native
support on page 122.
The vliw configuration can be specified for the extension. On extension for STxP70-3,
only the novliw configuration can be used.
-Mextoption
Used to pass different options to the extensions. Refer to Chapter 8: MPx native
support on page 122 for further details on this option.

22/166 8027948 Rev 15


UM1237 stxp70cc

-Mextrcdir=directory_path
Specifies where to find a particular extension package, which may be a location outside
the user workspace. The -Mextrcdir option enables the user to switch between
different extensions, stored in different locations. Full directory paths are recommended
but are not mandatory.
The directory path specified to -Mextrcdir must include the sub-directory _STxP70-
Extension_ where the stxp70extrc file is located. (This is the directory/file
structure used by sximport when the extension is imported. sximport
creates/updates an extension configuration file called stxp70extrc and puts it in the
subdirectory _STxP70-Extension_. stxp70extrc indicates where different files
relating to the extension are located, for example header files, libraries).
For example:
stxp70cc -Mextension=MP1x -Mextrcdir=My_Extrcdir/_STxP70-Extension_
This command sets the directory path to find the extension package in
My_Extrcdir/_STxP70-Extension_.
The compiler checks that the location specified by -Mextrcdir contains the file
stxp70extrc.
If the -Mextrdir option is not specified, the {SX}/sxext/_STxP70-Extension_
directory is used by default.
The STxP70 Utilities manual (8210925) documents several utilities that interact with
the extension package, for example sximport, stxp70-elfdump, stxp70-
objcopy.
The STxP70 User-defined extension methodology guide (8175272), “How to integrate
an Extension in an application” chapter, gives further information about extension
libraries.
-Mfarcall
Specify that all calls are far. The compiler generates a calling sequence composed of a
make/more/calla sequence instead of callr.
-Mhwloop[=option]
Controls hardware loop code generation. The default, (-Mhwloop specified with no
suboptions), is equivalent to:
• -Mhwloop=all if core configuration includes hardware loops
• -Mhwloop=jrgtudeconly if core configuration does not include hardware loops
option can be any of the values listed in Table 8.

8027948 Rev 15 23/166


stxp70cc UM1237

Table 8. List of options for -Mhwloop


Option Description

none Disables hardware loop and special jump code generation. By


hardware loop, we mean setle/ls/lc structures; by special
jump, we mean jrgtudec special jumps. However, hardware
loops forced by means of pragmas are still generated if supported
by core configuration.
jrgtudeconly Disables only setle/ls/lc hardware loop code generation.
However, hardware loops forced by means of pragmas are still
generated if supported by core configuration.
hwlooponly Disables only jrgtudec special jumps loop code generation. A
warning is generated if core configuration does not have hardware
loops.
all Enables hardware loops for all loops wherever possible. A warning
is generated if core configuration does not have hardware loops.

Hwloops are discarded in -O0 and -O1.


-Mitstackalign=<n>
By default, the stack of interruption routines (IT) is aligned to an 8-byte/64-bit boundary.
As a consequence, extra instructions are added to IT prolog and epilog to handle this
realignment. Since IT are often speed-critical parts of code, this may be a severe
drawback.
This option instructs the compiler to align the stack of IT to a smaller boundary
(typically: 4 bytes/32 bits) to avoid the overhead in prolog and epilog of those routines.
Several methods are provided for controlling the alignment of the stack. For interruption
routines, the precedence is as follows, in decreasing order:
– aligned_stack attribute, which specifies the alignment of the stack of a given
interruption routine
– interrupt_nostackalign attribute, which indicates that the stack of a given
interruption routines is to be aligned on a 4-byte/32-bit boundary
– -Mitstackalign option
– default (8 bytes/64 bits)
For any other function (not an interruption routine), the precedence is as follows, in
decreasing order:
– aligned_stack attribute
– default (8 bytes/64 bits)
You may want to refer to Section 5.2: Attributes on page 101 for further details on the
attributes which control the alignment of the stack of functions and interruption routines.
-Mmode16
The STxP70 compiler generates code for a context with 32 registers. Selecting the -
Mmode16 option switches to context with 16 registers. Note that the impact of this
option is slightly different than that of -Mconfig=regbank:1. Namely, no assumption
is made on the core configuration regarding register banks, and no checking is
performed at assembly level to ensure that only the lower bank is used.

24/166 8027948 Rev 15


UM1237 stxp70cc

-Mnoextgen[=ext1,ext2,...]
Disables the code generation for specified extensions. This option has only effect when
MPx are used. It has no effect with fpx.

Environment controls
The environment controls are listed below.

-Mlib16 Instructs the compiler to link with a version of the C library that uses
16 registers of the core. This is the default behavior when using
16 registers contexts.
-Mlib32 Instructs the compiler to link with a version of the C library that uses
32 register set of the core.
-Mnostartup Instructs the linker not to use standard boot.o file at link time. It is
then the user’s responsibility to provide a boot object file at link time.

2.2.5 C preprocessor options


The preprocessor is run on each C source file before actual compilation. The options in
Table 9 control how the sources are preprocessed.

Table 9. Preprocessor options


Option Description

-E Only the preprocessor is run.


-C The preprocessor does not discard comments.
The preprocessor copies comments inside macros to the output file when
-CC the macro is expanded. This is intended for use by applications which place
metadata or directives inside comments. Use with the -E option.
-P The preprocessor does discard #line information. Use with the -E option.
-Ddef Define the macro definition with the string 1 as the definition.
-Ddef=defn Define the macro definition as defn.
-M Generates a list of object file dependencies suitable for a makefile.
Similar to -M, but ignores system header files, that is, header files included
-MM
by <header.h>.
-MG Along with -M or -MM, treat missing files as generated in the local directory.
-H Display the name and path of the header in use.
Print a list of macro definitions in use after preprocessing. Use with the -E
-dM
option.
Print a list of macro definitions in use while preprocessing. Use with the -E
-dD
option.
Same as -dD, except that the macro arguments are not shown. Use with the
-dN
-E option.
Indicate to the preprocessor that the input file has already been
-fpreprocessed
preprocessed.

8027948 Rev 15 25/166


stxp70cc UM1237

2.2.6 C dialect options


The option -std=value instructs the compiler front-end to select the appropriate C
language dialect to use. For instance, the C99 restrict keyword is only recognized with
the -std=c99 option. However, this keyword also exists as a GNU extension keyword,
either __restrict or __restrict__ that are recognized by default. Possible values for -
std are listed in Table 10.

Table 10. C dialect options


Option Description

-std=iso9899:1990 Same as -ansi


-std=iso9899:199409 ISO C as modified in amendment 1
-std=iso9899:1999 ISO C 99
-std=c89 Same as -std=iso9899:1990
-std=c99 Same as -std=iso9899:1999
-std=gnu89 This is the default, iso9899:1990 + gnu extensions
-std=gnu99 iso9899:1999 + gnu extensions

2.2.7 Warning options


Diagnostic messages can be requested from the compiler to notify potentially erroneous or
dangerous C program constructions. Table 11 lists a subset of the GCC options.

Table 11. General warning options


Option Description

-Wall Enables all warnings.


-w Disables all warnings.
-Werror Turns warnings into errors.
-pedantic Issues all warnings needed for strict ANSI C compliance.
-pedantic-error Turn all pedantic warnings into errors.

All the options in Table 12 give the positive form of the option. The negative form of each
option can be constructed by replacing the -W prefix with a -Wno prefix, for example -
Wnoformat disables the printing of warning messages associated with calls to the printf
and scanf family of library functions.
Note: The online help and “man” page of the stxp70cc driver lists the full set of possible warning
options.

26/166 8027948 Rev 15


UM1237 stxp70cc

Table 12. Detailed warning options


Option Description

-mwarn-packstruct this option enables the emission of


warnings/errors when option -fpack-struct is set (see
Table 15 on page 31). The warnings emitted are the most
conservative ones, and based on the evaluation of a risk that
-m[no-]warn-packstruct a misalignment occurs.
-mno-warn-packstruct this option disables the emission
of warnings/errors when option -fpack-struct is set. This
is the default behavior.
-mwarn-smart-packstruct this option enables only the
emission of smarter warnings/errors when option
-fpack-struct is set (see Table 15 on page 31). The
warnings are more accurate ones: some of them are filtered if
-m[no-]warn-smart- the compiler can assess that a misalignment cannot occur,
packstruct due to the layout of the structure.
-mno-warn-smart-packstruct this option disables the
emission of smarter warnings/errors when option
-fpack-struct is set. This is the default behavior.
Warn if any functions that return structures or unions are
-Waggregate-return
defined or called.
-Wbad-function-cast Warn whenever a function call is cast to a non-matching type.
Warn whenever a pointer is cast such that the required
-Wcast-align
alignment of the target is increased.
Warn whenever a pointer is cast so as to remove a type
-Wcast-qual
qualifier from the target type.
-Wchar-subscripts Warn if an array subscript has type char.
-Wcomment Warn if nested comments are detected.
Warn if a prototype causes a type conversion that is different
-Wconversion
from what would happen in the absence of a prototype.
-Werror-implicit-function-
Output error when a function is used but not declared.
declaration
Check calls to the printf and scanf family of library
-Wformat
functions.
-Wimplicit-int and -Wimplicit-function-
-Wimplicit
declaration.
-Wimplicit-function-
Warn when a function is used but not declared.
declaration
Check that all declarations specify a type, which is int by
-Wimplicit-int
default in C89.
-Wlarger-than-number Warn if an object is larger than number bytes.
Warn if long long type is used. Only active along with -
-Wlong-long
pedantic.
-Wmissing-braces Warn if an aggregate or union initializer is not fully bracketed.
Warn if a global function is defined without a previous
-Wmissing-declarations
declaration.

8027948 Rev 15 27/166


stxp70cc UM1237

Table 12. Detailed warning options (continued)


Option Description

Warn about functions which might be candidates for attribute


-Wmissing-noreturn
noreturn.
Warn if a global function is defined without a previous
-Wmissing-prototypes
prototype declaration.
-Wmultichar Warn if a multi-character constant is used.
Warn if an extern declaration is encountered within a
-Wnested-externs
function.
Warn if a structure is given the packed attribute, but the
-Wpacked packed attribute has no effect on the layout or size of the
structure.
Warn if padding is included in a structure, either to align an
-Wpadded
element of the structure or to align the whole structure.
-Wparentheses Warn if parentheses are omitted in certain contexts.
Warn about anything that depends on the “size of” a function
-Wpointer-arith
type or of void.
Warn if anything is declared more than once in the same
-Wredundant-decls
scope.
Warn when a function is defined with a return-type that
-Wreturn-type
defaults to int.
-Wshadow Warn whenever a local variable shadows another variable.
Warn when a comparison between signed and unsigned
-Wsign-compare
values could produce an incorrect result.
Warn if a function is declared or defined without specifying the
-Wstrict-prototypes
argument types.
-Wswitch Warn whenever a switch statement may be incomplete.
Warn if any trigraphs are encountered that might change the
-Wtrigraph
meaning of the program.
Warn if an un-initialized automatic variable is detected.
Optimization must be enabled (see Section 2.2.12 on page
29) in order for -Wuninitialized or -Wall to report un-
-W[no-]uninitialized initialized variables. See also the entries for the -trapuv and
-zerouv options in Section 2.2.13 on page 31.
-W[no-]uninitialized instructs the compiler not to warn
about uninitialized variables.
Warn when a #pragma is encountered which is not
-Wunknown-pragmas
understood by stxp70cc.
Warn whenever a static function, a label, a parameter, a value
-Wunused
is not used.
-Wwrite-strings Warn when trying to write to a string constant.

28/166 8027948 Rev 15


UM1237 stxp70cc

2.2.8 Debugging options


The -g option instructs stxp70cc to generate symbolic information for debugging. DWARF2
format is used.
Note: The -g option may be used with optimization up to level -O2 and with -Os (see
Section 2.2.12: Optimization options).
Minimal debug information (that is, call frames) are generated whatever options are
selected.

2.2.9 Profiling options


The STxP70 compiler (version 3.4.0 and higher) supports profiling options. The dedicated -
pg option instructs the compiler to generate gprof profiling information. See Section 4.5:
Profiling on page 70 for more information on this topic.

2.2.10 Code coverage options


The stxp70cc compiler (version 3.4.0 and higher) supports code coverage options. Two
options are provided.
• The -ftest-coverage option instructs the compiler to generate code coverage file
for the GNU gcov code coverage utility.
• The -fprofile-arcs option instructs the compiler to generate information that
allows gcov to reconstruct the program flow graph.
See Section 4.6: Code coverage on page 72 for further details on this topic.

2.2.11 Call trace instrumentation options


The options -finstrument-functions and -minstrument-calls instruct stxp70cc
to generate instrumentation calls. See Section 4.7: Call trace on page 74 for further details
on call trace instrumentation.

2.2.12 Optimization options


The options in Table 13 control optimization levels.

Table 13. Optimize options


Option Description

-O0 No optimization.
-O1 Minimal optimization.
-Os Optimize for code size.
-O2 Global optimization, speed orientated.
-O3 Aggressive optimization, speed orientated
Aggressive optimization, speed orientated. Enables aggressive loop
-O4 unrolling when compiling code for
STxP70-4 in dual issue/dual ALU core configuration.

8027948 Rev 15 29/166


stxp70cc UM1237

Note: 1 -O optimization is equivalent to -O2.


2 -Os optimization applies the optimizations of -O2, except for those that increase the code
size (such as unrolling).
The options in Table 14 enable finer control of the optimization level.

Table 14. Advanced options


Option Description

This option forces the binary optimizations (binopt) performed after


link stage. For instance, enabling this option removes non-static
--deadcode functions that are never called in the executable binary file. This is
the default behavior when highest optimization level is set (-Os and
-O4).
This option disables the binary optimizations (binopt) performed after
--no-deadcode
the link stage.
-fstrict-aliasing enables the compiler to assume the strictest
aliasing rules applicable to the language being compiled. For C and
C++ this activates optimizations based on the type of expressions. In
particular an object of one type is assumed never to reside at the
same address as an object of a different type, unless types are
almost the same (the aliasing rules are stated in the ANSI C
-f[no-]strict-aliasing
standard, in clause 6.5 (7) Expressions. For example an unsigned
int can alias an int, but not a void * or a double. The types
char and types with the may_alias attribute can alias any other
type.
The default is -fstrict-aliasing. If this causes problems in
legacy code, use -fno-strict-aliasing to disable it.
-funroll-loops forces loop unrolling. This is the default at -O2,
-O3 and -O4.
-fno-unroll-loops disables loop unrolling. This is the default at
-Os.
-f[no-]unroll-loops
Loops with a #pragma unroll directive are not affected by these
two options.
See Section 4.2: Loop unrolling on page 63 for details of the unrolling
policy.

30/166 8027948 Rev 15


UM1237 stxp70cc

2.2.13 Code generation options


The options in Table 15 control various aspects of the code generation.

Table 15. Code generation options


Option Description

-gnu3 The GCC front-end version 3.3.3 is used.


The GCC front-end version 4.2.0 is used. This is the
-gnu4
default for toolset 2011.1 and higher.
Reads acf_filename.acf as an ACF, using the default
configuration declared in the file as the active
-macf-decl <act_filename.acf>
configuration. See Section 4.10: Application configuration
files on page 82 for details.
Use in conjunction with -macf-decl
<act_filename.acf>. Enables a configuration named
string to be specified as the active configuration.
-macf-active "string" string must be defined in <act_filename.acf>.
See Section 4.10: Application configuration files on
page 82 for details.
Generates the ACF template for the application
implemented by the source files specified. The source
-macf-template files must be linkable, and the compilation include a link
{source_filename1 ...} stage to ensure that template is complete. See
Section 4.10: Application configuration files on page 82
for details.
-fb <name> Not yet supported.
-fb_create <name> Not yet supported.
-fsigned-char implements type char as signed.
-funsigned-char implements char as unsigned.
-fsigned-char Note that when the -funsigned-char option is used,
-funsigned-char the __CHAR_UNSIGNED__ preprocessor symbol is
defined.
The compiler default is signed.

-fsigned-bitfields These options control whether a bitfield is signed or


-funsigned-bitfields unsigned, when the declaration does not use either
-fno-signed-bitfields ‘signed’ or ‘unsigned’.
-fno-unsigned-bitfields The compiler default is signed.

8027948 Rev 15 31/166


stxp70cc UM1237

Table 15. Code generation options (continued)


Option Description

<register-list> is a list of one or several comma-


separated register names or dash-separated register
ranges, either general purpose registers or boolean
registers. The syntax used for registers is:
– rn for core GPR, where n can be 0 to 31,
– gn for core guards registers, where n can be 0 to 7,
– fn for fpx extension registers, where n can be 0 to 15.
This option makes the given registers fixed registers; that
is, the code generated by the compiler never uses them.
There are however, some registers that are used by the
compiler for ABI register conventions. See the table of
general registers in the STxP70 ABI manual. The
-ffixed-reg=<register-list> registers with a specified use must not be reserved with
this option.
Note that specific care must be taken when using this
option since low-level library and run-time support code
are not specifically built to support non-ABI register
usage. For instance, reserving the r5 register does not
prevent already compiled library code from using it. Using
this option generally requires rebuilding a set of libraries
either with the same option (for C/C++ code) or to take
into account that this option has been used.
Examples:
stxp70cc -ffixed-reg=r6,g0
stxp70cc -ffixed-reg=f12-f15
This option is similar to -ffixed-reg described above,
except that the corresponding registers cannot be used by
the register GNU extension or asm statement clobber list.
-mdisabled-reg=<register-list> The syntax of the <register-list> is the same as for
option -ffixed-reg above.
Note that the -MMode16 configuration option is based on
this option.

32/166 8027948 Rev 15


UM1237 stxp70cc

Table 15. Code generation options (continued)


Option Description

By default, the compiler assumes double precision


floating point. This means that floating point constants
with implicit type declaration are promoted to double
precision. This promotion is propagated in the expression
where the constant is used. For example, the expression
used to compute C is performed as double precision
because of the implicit constant type:
float A;
float B;
float C=A*B*3.45;
If the constant is explicitly declared as a single precision,
-fshort-double the expression remains in single precision:
float A;
float B;
float C=A*B*3.45F;
The option -fshort-double instructs the compiler to
assume single precision instead.
When the FPx floating point extension is used, this
option is required to ensure an efficient code
generation. A warning is emitted if FPx is used without
this option.
More details can be found in Section 4.9: Floating-point
code generation on page 79.
This option instructs the compiler to use single precision
floating point libraries.
This option is forced as soon as the -fshort-double
-mlib-short-double
option is set. On the STxP70, this option is deprecated,
since it is forced to fit the default code generation setting.
It is preserved mainly for legacy reasons.
Instructs the compiler to use the C-library without floating-
-mlib-nofloat point support. Leads to a much smaller C-library (nearly
half the size of default library).

8027948 Rev 15 33/166


stxp70cc UM1237

Table 15. Code generation options (continued)


Option Description

Instructs the compiler to pack structures. The goal of this


option is to reduce the memory footprint of the data
sections of the objects and binary files. Note that this may
induce a need for misaligned accesses, which usually
increases the size of the code in text section. Gains in
size will be more significant if large arrays of structures
are used.
This option should be used by advanced users only. It
may conflict with the assumptions or semantics of the
source code. For instance:
– if the source code performs some verifications based on
-fpack-struct the size of a structure, then enabling this option may
cause the check to fail
– in some cases, some alignment constraints may no
longer hold when the option is set
Some warnings and errors are emitted to prevent the
compiler from silently perform ming non-conservative
code generation. See the options
-m[no-]warn-packstruct and
-m[no-]warn-smart-packstruct in Table 12 on
page 27 for controlling warnings.
If you encounter a problem with this option, it is advised to
disable it, and check if the issue is still present.
Instructs the compiler to use the shortest integer type
required to represent the values of an enumeration. The
goal of this option is to reduce the memory footprint of the
data sections of the objects and binary files. This option is
more likely to have a real impact if it is used in
combination with -fpack-struct.
This option should be used by advanced users only. It
may conflict with the assumptions or semantics of the
source code. For instance:
-fshort-enums – if the source code performs some verifications based on
the size of a structure, then enabling this option may
cause the check to fail
– in some cases, some alignment constraints may no
longer hold when the option is set
Some warnings and errors are emitted to prevent the
compiler from silently perform non-conservative code
generation. If you encounter a problem with this option, it
is advised to disable it, and check if the issue is still
present.
The -fno-verbose-asm removes extra commentary
-fverbose-asm information in the generated assembly code.
-fno-verbose-asm
The default is to have verbose asm output.
-falign-functions Align the start of functions to the next power of two greater
-falign-functions=n than n (if n is specified), skipping up to n bytes.
-falign-loops Align the first address of loops to the next power of two
-falign-loops=n greater than n (if n is specified), skipping up to n bytes.

34/166 8027948 Rev 15


UM1237 stxp70cc

Table 15. Code generation options (continued)


Option Description

-falign-jumps Align the target address of jumps to the next power of two
-falign-jumps=n greater than n (if n is specified), skipping up to n bytes.
-falign-labels Align the labels to the next power of two greater than n (if
-falign-labels=n n is specified), skipping up to n bytes.
-falign-instructions Align the instructions to the next power of two greater than
-falign-instructions=n n (if n is specified), skipping up to n bytes.
Defines the preprocessor macro __FAST_MATH__ and
-ffast-math
invokes -f[no-]math-errno.
-fmath-errno causes the compiler to generate code to
set the mathematical error flag in floating point code. The
compiler also makes use of slower libm from Newlib
libm with errno setting. This is the default behavior
when the FPx floating point extension is not used.
-f[no-]math-errno -f[no-]math-errno causes the compiler not to
generate code to set the mathematical error flag in
floating point code. The compiler also makes use of fast
libm overrides, for example sqrtf from the FLIP library
with no errno setting. This is the default behavior when
the FPx floating point extension is used.
No re-associations, folding or simplifications. This is the
-mreassoc=0
default.
Accurate simplifications that are correct for finite
arithmetic are allowed, for instance, a/a -> 1.0,
-mreassoc=1 recip(recip(a)) ->a.
For example, the transformation a/a -> 1.0 is not valid
when a is 0.0 because in this case 0.0/0.0 -> NaN.
Aggressive re-association of expressions is performed to
favor the selection of fused multiply-add routines. Such
-mreassoc=2
changes in the evaluation order can lead to slightly
different results, compared to the original evaluation order.
Generate position independent code (data accesses
-fpic
only). Chapter 9: Relocatable loader library on page 136.
Build a relocatable library that can be loaded by RL_LIB.
--rlib
See Chapter 9: Relocatable loader library on page 136.
Build a main program suitable for loading relocatable
--rmain libraries. See Chapter 9: Relocatable loader library on
page 136.
Modify the aggressiveness of the default unrolling policy.
n is a value in the range [0, 6]. The higher it is, the more
-maggressive_unroll=n aggressive the unrolling. Refer to Section 4.2: Loop
unrolling on page 63 for details about this option and the
values of n.

8027948 Rev 15 35/166


stxp70cc UM1237

Table 15. Code generation options (continued)


Option Description

Initialize uninitialized local variables to pre-defined values.


-trapuv helps to find issues that are due to uninitialized
variables. This option has a slight performance impact. It
affects local scalar, array variables and memory returned
by alloca. It does not affect the behavior of globals or
memory allocated with malloc.
Integer variables are initialized to 0xdeaddead.
Float variables are initialized to 0xfffa5a5a (NaN,
floating-point NaN).
Pointer variables are initialized to 0x0.
A sub-type is given a sub part of the pattern of its original
type:
-trapuv char is initialized to 0xad.
short is initialized to 0xdead.
long long is initialized to 0xdeaddeaddeaddeadLL.
double is initialized to 0xfffa5ffffa5a5a5a (NaN).
Default values of patterns can be controlled as follows:
-DEBUG:trapuv_int_value=0xffffffff to change
integer pattern to 0xffffffff.
-DEBUG:trapuv_float_value=0xeeeeeeee to
change float pattern to 0xeeeeeeee.
-DEBUG:trapuv_pointer_value=0xdddddddd to
change pointer pattern to 0xdddddddd.
Note: Using -trapuv removes the possibility of using -
Wuninitialized, see Section 2.2.7 on page 26.
Sets uninitialized variables to zero at runtime. This option
has a slight performance impact. It affects local scalar,
array variables and memory returned by alloca. It does
-zerouv not affect the behavior of globals or memory allocated
with malloc.
Note: Using -zerouv removes the possibility of using -
Wuninitialized, see Section 2.2.7 on page 26.

36/166 8027948 Rev 15


UM1237 stxp70cc

Table 15. Code generation options (continued)


Option Description

-mparse-asmstmts causes the compiler to parse and


optimize user defined GNU assembly statements. When
set, the compiler analyzes the content of GNU assembly
statement, and optimizes it if possible.
-mno-parse-asmstmts causes the compiler not to
-m[no-]parse-asmstmts parse and optimize user defined GNU assembly
statements. The compiler leaves the instructions of the
GNU assembly statement unchanged, except regarding
register allocation. This is the default behavior.
See Section 6.8: Parsing and optimization of GNU
assembly statement on page 114 for details.
-mparse-meta-asmstmts is similar to
-mparse-asmstmts, but applies only to the GNU
assembly statements used internally by the compiler to
automatically map the instructions of the extensions. This
is the default behavior.
-m[no-]parse-meta-asmstmts -mno-parse-meta-asmstmts is similar to
-mno-parse-asmstmts, but applies only to the GNU
assembly statements used internally by the compiler to
automatically map the instructions of the extensions.
See Section 6.8: Parsing and optimization of GNU
assembly statement on page 114 for details.

2.2.14 -OPT options


The options -OPT:unroll_size, -OPT:cray_ivdep and -OPT:liberal_ivdep
modify the behavior of pragmas and are documented in Section 3.2.1: #pragma unroll (n) on
page 45 and Section 3.2.2: #pragma ivdep on page 46.
The -OPT:alias option is documented in Section 4.3: Memory dependences in C
programs on page 65.

2.2.15 Inlining options


The -inline, -noinline and -INLINE options are provided to control inlining of
functions. They are listed in Table 21 on page 56 and Table 22 on page 57 and described in
Section 4.1.1: Single file inlining on page 55.
Only functions marked with the inline keyword are subject to inlining unless specified
otherwise.

2.2.16 Interprocedural analysis


The -ipa option enables interprocedural analysis, and is described in Section 4.8:
Interprocedural analysis optimization (IPA) on page 76. This section documents a range of
advanced -IPA options that provide control over the optimizations performed.

8027948 Rev 15 37/166


stxp70cc UM1237

2.2.17 Position independent code generation (PIC)


The STxP70 compiler now provides some support for position independent code (PIC)
generation and dynamic loading of shared components.
Note: This is a partial support since only data accesses are position independent.
This feature is described in Chapter 9: Relocatable loader library on page 136.

2.2.18 Sending options to a specific phase


The -W<phase>,<arg> option passes the specified argument <arg> to a specific
processing phase <phase> of stxp70cc.
Table 16 lists the different values of <phase>.

Table 16. Possible value for phase


Value of phase Description

p Preprocessor cpp
f Compiler front-end
a Assembler stxp70-as
l Linker stxp70-ld
o Binary optimizer tool binopt - not yet used by stxp70cc

There must be a comma between the option -W<phase> and the argument and no spaces.
Anything occurring after a space is treated as the next option to stxp70cc. Also the
argument is only passed to <phase> if <phase> is normally run from the specified
command.
For example:
stxp70cc -O3 -Wl,-strict_warn a.out
This command causes the linker to emit strict warnings regarding link files.

38/166 8027948 Rev 15


UM1237 stxp70cc

2.2.19 Directory and library options


Table 17 lists the options that select header files, libraries and compiler executables.

Table 17. Directory options


Option Description

-Idirectory Add directory to the beginning of the search list for include files.
-nostdinc No predefined include search path.
Search the library named lib<library>.a when linking. The linker looks
for the library in the directories specified by the -L options and then in a
standard list of directories.
The position of this option on the command line makes a difference. The
linker processes object files and libraries in the order that they are specified
on the command line. For example, if the following is specified:
-l<library> stxp70cc file1.o file2.o -lmylib
then the files are processed in the order file1.o, file2.o, libmylib.a.
However, if the following is specified:
stxp70cc file1.o -lmylib file2.o
then the files are processed in the order file1.o, libmylib.a, file2.o.
In this case, file2.o should not refer to any symbols defined in
libmylib.a.
-L<directory> Add directory to the beginning of the search list for library files.
-nostdlib No predefined libraries search path.

The search path for the various phases of the compiler can be overridden by using the
option: -Y<phase>,<path> where <phase> can take the values listed in Table 16 and
<path> is the path of the required tool. There must be a comma and no spaces separating
-Y<phase> and <path>.

2.2.20 Environment variables


Currently there are no special environment variables that affect stxp70cc.

8027948 Rev 15 39/166


stxp70cc UM1237

2.3 Predefined macros


Predefined macros are described in Table 18.
Note: 1 The list of macros currently defined can be obtained by typing:
stxp70cc -E -dM filename.c
where filename.c can be any .c file including an empty file.
2 Do not rely on a macro that is not documented, even if it is currently defined.
3 Some macro values are subject to change because of evolution of compiler design. This
may affect, for instance, front-end identification values.

Table 18. Predefined macros


Name Default definition Purpose See also

Compiler
__open64__ Defined technology
identification
Front end major
__GNUC__ 3 release -no-gcc
identification
Front end minor
__GNUC_MINOR__ 3 release -no-gcc
identification
Defined, value
Compiler
__stxp70cc__ depends on major
identification
compiler version
Defined, value
Compiler
__STXP70CC_MINOR__ depends on minor
identification
compiler version
Defined, value
depends on Compiler
__STXP70CC_PATCHLEVEL__
compiler patch identification
level
Defined, value
depends on Compiler
__STXP70CC_DATE__
compiler release identification
date
Defined, value is an Compiler
__STXP70CC_VERSION__
identification string identification
Endianness
__LITTLE_ENDIAN__ Defined by default
identification
Language
Defined for C currently
_LANGUAGE_C
source processed is C
language.

40/166 8027948 Rev 15


UM1237 stxp70cc

Table 18. Predefined macros (continued)


Name Default definition Purpose See also

Language
currently
Defined for ASM
_LANGUAGE_ASSEMBLY processed is
source
assembly
language.
Defined when -
Compiler is in
__STRICT_ANSI__ std=c89 or - -std
strict ansi mode.
std=c99 or -ansi
Defined when -
Compiler is in
__STDC_VERSION__ std=c99 with -std
C99 ansi mode
value 199901L
Defined as soon as Optimization
__OPTIMIZE__ -O
optimization is on. mode detection.
Optimization size
__OPTIMIZE_SIZE__ Defined under -Os -Os
detection
Intrinsics inlining -OPT:inline_
__INLINE_INTRINSICS Defined
mode detection. intrinsics
-f[no-]hosted -
__STDC_HOSTED__ Defined by default. Hosting mode.
f[no-]freestanding
Libraries or user
code can take
advantage of this
Defined when -
definition to
__FAST_MATH__ ffast-math -ffast-math
define alternative
option is used.
sequences of
floating point
code.

Note: The C standard guarantees that the __cplusplus symbol is never defined when compiling
C source code.

2.4 C99 support


The stxp70cc compiler supports a subset of the C99 standard. Most features are implicitly
available through default compiler command line options, with the notable exception of the
restrict keyword that requires the -std=c99 command line option to be specified.
It is recommended that any code fragment that depends upon C99 specific behavior be
guarded by the following preprocessing definitions, which are correctly triggered when the -
std=c99 command line option is used:
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
// Your C99 dependent code here
#else
#error "This source file depends upon C99 features not available with this compiler."
#endif

8027948 Rev 15 41/166


stxp70cc UM1237

Table 19 summarizes the status of the stxp70cc compiler C99 support.

Table 19. C99 support in stxp70cc


Feature as described in the C99 standard Status

Restricted character set support via digraphs YES


iso646.h included YES
NO: type not supported and library
Wide character library support
not provided.
YES: provided that the -fno-
More precise aliasing rules via effective type strict-aliasing option is not
used
YES: provided that the -fno-
Restricted pointer strict-aliasing option is not
used
PARTIAL: only local allocation, but
Variable length arrays
no other features
Flexible array members YES
Static and type qualifiers in parameter array declarators YES
Complex support (<complex.h>) NO
Type generic math macros (<tgmath.h>) NO: include file not provided
The long long int type and library functions YES
Increased minimum translation unit YES
Additional floating-point characteristics (<float.h>) NO
Remove implicit int YES
Reliable integer division YES
Universal character names NO
Extended identifiers NO
Hexadecimal floating-point constants YES
Compound literals YES
Designated initializers YES
// comments YES
Extended integer type and library functions in <inttypes.h>
YES
and <stdint.h>
Remove implicit function declaration NO: can get warning
Preprocessor arithmetic done in intmax_t/uintmax_t YES
Mixed declaration and code YES
New block scope for selection and iteration statements YES
Integer constant type rules YES
Integer promotion rules YES
vararg macro YES

42/166 8027948 Rev 15


UM1237 stxp70cc

Table 19. C99 support in stxp70cc (continued)


Feature as described in the C99 standard Status

The vscanf family of function in <stdio.h> YES


Additional math library functions in <math.h> NO
Floating point environment access in <fenv.h> NO
ISO 60559 Arithmetic support NO
Trailing comma allowed in enum declaration YES
%lf conversion allowed in printf NO
YES: but not fully ansi compliant in
Inline functions
the extern inline case
The snprintf family of functions in <stdio.h> YES
NO bool native type but
Boolean type in <stdbool.h>
<stdbool.h> header provided
Idempotent type qualifiers YES: but still emits warnings
Empty macro arguments YES
New struct type compatibility rules (tag compatibility) YES
Additional predefined macro names MOST
_Pragma preprocessing operator YES
Standard pragmas NO
__func__ predefined identifier YES
VA_COPY macro NO
Additional strftime conversion specifiers NO: library support not provided
LIA compatibility annex NO
Deprecate ungetc at the beginning of a binary file NO
Remove deprecation of aliased array parameters YES
Conversion of array to pointer not limited to lvalues YES
Relaxed constraints on aggregate and union initialization YES
Relaxed restrictions on portable header names YES
Return without expression not permitted in function that
YES
returns a value (and vice versa)

8027948 Rev 15 43/166


Pragmas UM1237

3 Pragmas

This chapter provides details of the #pragma directives that are recognized by stxp70cc.

3.1 Pragmas short description and syntax


Table 20. stxp70cc pragmas
Optimization
Syntax Scope Description
level(1)

#pragma unroll Unrolls the loop


Start of a loop body -O2
(unroll_amount) unroll_amount times
Liberalizes dependence
#pragma ivdep Start of a loop body -O3
analysis
#pragma loopdep PARALLEL | Liberalizes dependence
Start of a loop body -O3
VECTOR | LIBERAL analysis
Provides trip count
#pragma loopmod(q,r) Start of a loop body -O2
modularity information
Provides trip count
#pragma looptrip(n) Start of a loop body -O2
estimation information
#pragma loopseq READ | Ordering of the READ (or
Start of a loop body -O2
WRITE WRITE) accesses
#pragma hwloop none |
Controls mapping of HW
forcehwloop<loopid> | Start of a loop body -O2
loops and JRGTUDEC
forcejrgtudec
#pragma loopmin<itercount> Controls the guards to be
Start of a loop body -O2
(minc) placed around loops
#pragma loopmax<itercount> Controls specific cases of
Start of a loop body -O2
(maxc) HW loop mapping
Applies to the function or
#pragma frequency_hint
statement that follows the Execution frequency hint -O1
NEVER|FREQUENT
pragma
Adds a .comment section to
#pragma ident "string" - -O0
an assembly file.
#pragma weak symbol - Marks a symbol as weak -O0
Disables native code
#pragma disable_extgen
- generation for all extensions -O2
(fct1,fct2,...)
in the given functions.
Enables native code
#pragma force_extgen
- generation for all extensions -O2
(fct1,fct2,...)
in the given functions.
Disables native code
#pragma
generation for specified
disable_specific_extgen - -O2
extensions in the given
(extname[,fct1,fct2,...])
functions.

44/166 8027948 Rev 15


UM1237 Pragmas

Table 20. stxp70cc pragmas (continued)


Optimization
Syntax Scope Description
level(1)

Enables native code


#pragma
generation for specified
force_specific_extgen - -O2
extensions in the given
(extname[,fct1,fct2,...])
functions.
#pragma inline_next
Function call site
(function)
#pragma noinline_next
Function call site
(function)
#pragma inline_function
Function
(function)
#pragma noinline_function
Function Inlining (2) -O1
(function)
#pragma inline_file
File
(function)
#pragma noinline_file
File
(function)
#pragma defaultinline
-
(function)
1. This column denotes the lowest optimization level for which the pragma has an effect. For example -O0 means the pragma
is applicable even when optimization is switched off. A list of optimization levels is given in Section 2.2.12: Optimization
options on page 29.
2. All inlining pragmas are described in Section 4.1.4: Inlining pragmas on page 58.

3.2 Loop optimization pragmas

3.2.1 #pragma unroll (n)


This pragma suggests to the compiler the type of loop unrolling that should be done. The
pragma is a recommendation to the compiler to add n-1 copies of the loop body to the inner
loop. The value of n must be at least 1. If it is 1, then unrolling is not performed.
If the loop that this pragma immediately precedes is an inner loop, then it implies standard
inner loop unrolling. See Figure 2.

Figure 2. Inner loop unrolling example


for (i=0; i < 10; i++)
#pragma unroll (2)
for (j=0; j < 10; j++)
a[i][j] = a[i][j]+b[i][j];
becomes:
for (i=0; i < 10; i++)
for (j=0; j < 10; j+=2) {
a[i][j] = a[i][j] +b[i][j];
a[i][j+1] = a[i][j+1]+b[i][j+1];
}

8027948 Rev 15 45/166


Pragmas UM1237

If the loop that this pragma immediately precedes is an outer loop that contains only an
inner loop, then the compiler attempts to unroll the outer loop and perform loop fusion on the
resulting inner loops. This transformation, known as “unroll-and-jam”, is especially useful to
create parallel execution opportunities when the innermost loop alone does not present
such opportunities. See Figure 3.

Figure 3. Unroll-and-jam example


// Ensure ad[] and sd[] do not alias.
#pragma unroll(2)
for (i=0; i<16; i++) {
int sum = 0;
for (k=M; k<8+M; k++) {
sum += sd[k]*sd[k-i];
}
ad[i] = sum;
}
becomes:
for (i=0; i<16; i+=2) {
int sum0 = 0;
int sum1 = 0;
for (k=M; k<8+M; k++) {
sum0 += sd[k]*sd[k-i];
sum1 += sd[k]*sd[k-i-1];
}
ad[i] = sum0;
ad[i+1] = sum1;
}

The following tips provide information on how to control the desired inner loop unrolling with
the pragma unroll value.
• A counted loop with a compile-time constant trip count is always fully unrolled if a
pragma unroll with a value greater or equal to the loop trip count is specified.
• When a counted loop is not fully unrolled, the pragma unroll value is rounded to the
greatest power of two lower than the specified unrolling value.
• The maximum size of a loop after unrolling is controlled by the command line option -
OPT:unroll_size=<n>.

3.2.2 #pragma ivdep


This pragma instructs the compiler to liberalize dependence analysis between memory
accesses. The #pragma ivdep applies only to the innermost loops in a set of nested loops;
therefore, if it is used on a loop that has an inner loop, the compiler ignores it. By default,
this pragma allows the compiler to assume there are no memory dependences between
loop iterations.
The following command line options modify the ivdep semantic.
• -OPT:cray_ivdep=TRUE
Only ignore backward memory dependences (Cray semantics).
• -OPT:liberal_ivdep=TRUE
Also ignore all memory dependences in the same loop iteration.

46/166 8027948 Rev 15


UM1237 Pragmas

For example:
#pragma ivdep
for (i = 0; i < n; i++) {
a[b[i]] = a[b[i]]+3; // These dependencies cannot be computed by
// the compiler
}

3.2.3 #pragma loopdep


This pragma instructs the compiler to liberalize dependence analysis between memory
accesses, based on the specified type of loop dependences. Contrary to the pragma ivdep
described above, the semantics cannot be modified by command line options.
The loopdep pragma takes an argument to tell the compiler which kind of loop
dependencies it can ignore, VECTOR, PARALLEL or LIBERAL.

#pragma loopdep VECTOR


#pragma loopdep VECTOR allows the compiler to assume there are no backward
memory dependences between loop iterations. This pragma is equivalent to #pragma
ivdep, -OPT:cray_ivdep=TRUE.
Example:
#pragma loopdep VECTOR
for (i = 0; i < n; i++) {
a[i] = a[i+k]+3;
}
In this example, the compiler cannot tell when a[i+k] does not depend on a[i], but this is
in fact the case if k is always > 0 in the program. The pragma allows the compiler to
assume there are no dependences between the read of a[i+k] in the current loop
iteration, and the write of a[i] in the following loop iterations. The compiler could rewrite
the loop as:
for (i = 0; i < n; i+=2) {
t0 = a[i+k]+3;
t1 = a[i+1+k]+3;
a[i] = t0;
a[i+1] = t1;
}

#pragma loopdep PARALLEL


#pragma loopdep PARALLEL allows the compiler to assume there are no dependences
between any two memory accesses that are in different loop iterations. This pragma is
equivalent to:
#pragma ivdep, -OPT:cray_ivdep=FALSE -OPT:liberal_ivdep=FALSE
For example:
#pragma loopdep PARALLEL
for (i = 0; i < n; i++)
a[b[i]] = a[b[i]] + 3;

8027948 Rev 15 47/166


Pragmas UM1237

In this example, the compiler cannot tell that either the load or store of a[b[i]] in the
current loop iteration does not depend on the load or store of a[b[i]] in a following loop
iteration. This is in fact the case if b[i] != b[j] for all i != j. The compiler could
rewrite the loop as:
for (i = 0; i < n; i+=2) {
t1 = a[b[i+1]] + 3;
t0 = a[b[i]] + 3;
a[b[i+1]] = t1;
a[b[i]] = t0;
}

#pragma loopdep LIBERAL


#pragma loopdep LIBERAL allows the compiler to assume there are no dependences
between any two memory accesses that are either in the same, or different, loop iterations.
This pragma is equivalent to:
#pragma ivdep, -OPT:liberal_ivdep=TRUE
Example:
#pragma loopdep liberal
for (i = 0; i < n; i++) {
a[j] = b[i];
c[i] = a[i] + 3;
}
In this example, the compiler cannot tell that the load of a[i] does not depend on the store
of a[j]. This is in fact the case if i != j for all values of i and j in the loop iterations.

3.2.4 #pragma loopmod


This pragma tells the compiler the number of times a loop is taken in terms of a multiple q
and a residual r.
The syntax of this pragma is:
#pragma loopmod(q,r)
where q is strictly a positive integer, r is a positive integer, r < q.
For example:
#pragma loopmod (4,0)
This tells the compiler that the loop is taken 0, 4, 8, 12 .... times.
#pragma loopmod (4,1)
This tells the compiler that the loop is taken 1, 5, 9, 13 .... times.
When applied to an inner loop, this pragma indicates that the trip count tc, that is the
number of iterations that are executed by any execution of the loop can be written as:
tc = p q + r with q > 0, r >= 0
Where q is strictly a positive integer. This information helps the compiler in loop unrolling
optimization, and in software.
When unrolling loops, the compiler creates multiple loop bodies (the unrolling factor
specifies the number of loop bodies created). However, the compiler cannot always

48/166 8027948 Rev 15


UM1237 Pragmas

statically determine the trip count. When it cannot determine the trip count, the compiler
must also create residual code in case the unrolling factor is not a divisor of the loop trip
count.
However, it is possible for application writers to know the modular properties of some of the
loops in their own code. Bringing this accurate information to the compiler, the residual code
can be largely removed or better optimized.
Note: Bringing inexact information on the trip count may lead to inexact code. Be careful that the
property asserted is valid in all cases.
The following example shows the use of the #pragma loopmod.
void copychar(unsigned char* __restrict p, unsigned char * q,
unsigned int sz)
{
int i ;
assert(sz % 4 == 0) ;
#pragma loopmod(4,0)
for(i=0; i<sz; i++)
p[i] = q[i];
}
The function copychar duplicates a byte stream, whose size must be a multiple of 4.
During unrolling, and without the pragma, the compiler would create a residual loop. This is
totally removed when the pragma information is asserted. In this example, the pragma does
not provide the compiler with any information about the memory alignment of p or q, which
the compiler would need to generate word accesses after unrolling.

3.2.5 #pragma looptrip (n)


This pragma instructs the compiler that the estimate of the number of iterations of the loop
(the loop trip count estimate) is n. This is not an assertion that the loop effectively iterates n
times.
A number of optimizations are affected by the #pragma looptrip (n), when the
compiler has not already determined the exact trip count:
• basic block frequency estimation uses this information as an approximation of the loop
trip count
• unrolling and cross-iteration optimizations are reduced if the given loop trip count
estimate is low
• software pipelining is limited if the estimate is low
• automatic data prefetch generation is limited if the estimate is low
One scenario of usage is for ‘for’ loops with trip counts of unknown values where the user
knows that the approximate effective value is low:
#pragma looptrip(4)
for (i=0; i<n; i++)
a[i] = b[i] ;
This example avoids non-beneficial optimizations. On such loops the compiler trip count
estimate without the pragma is 100.

8027948 Rev 15 49/166


Pragmas UM1237

A second scenario is for ‘while’ loops where the user knows that the approximate
effective trip count is high:
#pragma looptrip(100)
while (*p++=*s++)
This example gives a better approximation of the weight of the loop. Generally the compiler
trip count estimate for a while loop is very low.
Possible error messages are:
• Warning : pragma ‘LOOPTRIP’ : inconsistent with computed value,
ignored
• Warning : pragma ‘LOOPTRIP’ : not followed by a loop, ignored
• Warning : malformed ‘#pragma looptrip (n)’

3.2.6 #pragma hwloop


#pragma hwloop none
#pragma hwloop forcehwloop <loopid>
#pragma hwloop forcejrgtudec
The hwloop pragmas allow fine control of special looping mechanisms available on
STxP70 processor. They are all to be placed before loop statement. They respectively
allow:
hwloop none Block the mapping of both hardware loops and JRGTUDEC special
instructions.
hwloop forcehwloop <loopid>
Force a given loop to make use of hardware loop. Notice that the
mapping is performed by the compiler only if it is legal to do so. The
loopid argument is optional. It allows the user to force the use of
either of the two hardware loop register. Thus possible values are 0
and 1. The main interest is to force the use of the saved loop register
L0 when a call is present in loop body, but the callee is known to
have no side effect on HW loop registers (that is, is HW loop free),
thus avoiding to save/restore loop register. It is the user
responsibility to ensure that using the specified register is legal.
hwloop forcejrgtudec
Force a given loop to make use of the JRGTUDEC special
instruction.

The hardware loop pragmas must be placed before the loop statement:
#pragma hwloop forcejrgtudec
for(i=0; i<n; i++) {
a[i] = ...;
}

50/166 8027948 Rev 15


UM1237 Pragmas

3.2.7 #pragma loopmin<itercount> (minc)


The content of the hardware loop register of the STxP70 core, used to indicate tripcount,
has 32-bit dynamics. This register is named LC. The zero value, however, is not legal from a
hardware standpoint. Furthermore, no special instruction is available to indicate that the
hardware loop must be skipped. Therefore, if the value used to set the LC register is less or
equal than zero, a guard is needed.
Use the loopmin pragma to instruct the compiler that the loop tripcount is at least minc. If
minc is 1 or more, then the compiler is allowed to remove the guard that is needed
otherwise. This saves both cycles and bytes because of the removal of comparison and
branching instructions.
The loopminiter and loopminitercount syntaxes are equivalent. The second one is
for legacy code that formerly used the sxcc compiler.
Use this pragma as follows:
#pragma loopmin (1) // loopminitercount can be used as well
for(i=0; i<n; i++) {
a[i] = ...;
}

3.2.8 #pragma loopmax<itercount> (maxc)


Use the loopmax pragma to instruct the compiler that a loop tripcount is at most maxc. This
pragma is not generally useful on an STxP70 core. In a few cases, it is useful as a
workaround for hardware problems that exposed problems when actual tripcount exceeded
a given range (for instance: 16-bit integer).
Use this pragma as follows:
#pragma loopmaxitercount (1)
for(i=0; i<n; i++) {
a[i] = ...;
}

3.2.9 Code generation pragmas


#pragma loopseq READ
#pragma loopseq WRITE
This pragma instructs the compiler that the memory READ accesses (or respectively the
memory WRITE accesses) as they appear in the loop should be sequenced. This is not an
assertion that the accesses must be kept in sequence, for instance, this is not a
replacement for volatile accesses where it is mandatory to keep them in order.
The effect of this pragma is that the scheduler serializes all load prefetch operations (or
respectively all stores) in the loop. Therefore the memory read (or write) accesses, as
written in the C code are kept in order, as long as no aggressive transformation occurs in the
loop.

8027948 Rev 15 51/166


Pragmas UM1237

The following scenario can occur when the user wants to keep memory writes in order to
take advantage of a combining write buffer:
#pragma loopseq WRITE
for(i=0; i<n; i++) {
a[i] = ...;
a[i+1] = ... ;
a[i+2] = ... ;
a[i+4] = ... ;
}
The pragma hints that the compiler should keep writes to the array in order. If the loop is
unrolled, generating a large number of stores, this improves locality and may take
advantage of combining write buffers. By default the compiler does not put restrictions on
the ordering of non-overlapping store operations.
A second scenario is when the user has scheduled prefetch and load operations by hand,
and wants to ensure that the compiler does not reorder them.
#pragma loopseq READ
for(i=0; i<n; i+=S) {
... = a[i] ;
__builtin_prefetch(&a[i+S]) ;
}
The pragma hints that the compiler should keep the load and prefetch in order. In this
example, the prefetch is not placed before it is effectively used in the next iteration by the
load.

3.2.10 Heuristic pragmas


#pragma frequency_hint
This pragma allows the user to specify information about the execution frequency for certain
regions of code with the following frequency specifications:
NEVER This region of code is never or rarely executed. The compiler might
move this region of the code away from the normal path. This
movement might either be at the end of the procedure or at some
point to an entirely separate section.
FREQUENT This region of code is frequently executed. The compiler might try to
put this region in the fall through path.

Example:
if (debug) {
#pragma frequency_hint NEVER
trace();
}

52/166 8027948 Rev 15


UM1237 Pragmas

3.3 Miscellaneous pragmas

3.3.1 #pragma ident “string”


Adds a .comment section in an assembly file.

3.3.2 #pragma weak


Marks a symbol as weak.
This pragma instructs the link editor to not issue a warning if it does not find a defining
declaration of the specified weak symbol. In which case the symbol is set to 0.
Allow the overriding of the current definition by a non-weak definition. See Figure 4.

Figure 4. #pragma weak example


#pragma weak opt_handler
extern void opt_handler (void);
int main(int argc, char *argv[])
{
/* If opt_handler has not been defined, the linker does not
complain and the condition is false.*/
/* If opt_handler has been defined, the opt_handler is
invoked.*/
if (opt_handler)
opt_handler();
}

3.3.3 #pragma disable_extgen ( fct1, fct2, ... )


This pragma can be used only when MPx extension is used. It disables the native code
generation for all extensions. Refer to Chapter 8: MPx native support on page 122 for
further details.

3.3.4 #pragma force_extgen ( fct1, fct2, ... )


This pragma can be used only when MPx extension is used. It forces the native code
generation for the all extensions. Refer to Chapter 8: MPx native support on page 122 for
further details.

3.3.5 #pragma disable_specific_extgen ( extname[, fct1, fct2, ... ] )


This pragma can be used only when MPx extension is used. It disables the native code
generation for the specified extension. Refer to Chapter 8: MPx native support on page 122
for further details. The typical use will be:
#pragma disable_specific_extgen ( MP1x, fct1, fct2).

8027948 Rev 15 53/166


Pragmas UM1237

3.3.6 #pragma force_specific_extgen ( extname[, fct1, fct2, ... ] )


This pragma can be used only when MPx extension is used. It forces the native code
generation for the specified extension. The typical use will be:
#pragma force_specific_extgen ( MP1x, fct1, fct2)
Refer to Chapter 8: MPx native support on page 122 for further details.

54/166 8027948 Rev 15


UM1237 Optimization guide

4 Optimization guide

This chapter describes specific compiler options and techniques that can be used to gain
maximum performance in your application.

4.1 Inlining
Inline function expansion is performed for function calls that the compiler estimates to be
frequently executed. These estimations are based on a set of heuristics. The compiler might
decide to replace the instructions of the call with code for the function itself (inline the call).
The current version of the compiler only supports the single file inlining mode as described
in Section 4.1.1. The compiler supports both the single file inlining mode as described in
Section 5.2.1: Placement and layout on page 101 and cross file inlining through the IPA
optimization described in Section 4.8: Interprocedural analysis optimization (IPA) on
page 76.

4.1.1 Single file inlining


The purpose of this section is to make users aware of the underlying algorithms used to
select functions to inline. First, it describes how possible candidates are selected for inlining,
and how the selection is finalized, taking size conditions into account. Then, user-level
compiler switches are listed, to show how the inlining process can be controlled.
The inlining decisions of the compiler can be observed with the -INLINE:list option. We
recommend that this option should be used when tuning inlining decisions. The exact scope
and syntax of the -INLINE option are described throughout this section.
There are two kinds of candidates for inlining: may-inline and must-inline functions.
May-inline functions are selected by the compiler according to the following conditions:
• function is declared with the inline C keyword
• the functions not declared inline are may-inline candidates only if the -
INLINE:only_inline=off option is specified. In this case, a function is a may-
inline candidate if:
– it is declared with the static C keyword
– its name is not weak
– its address is neither passed nor saved
Must-inline functions are specified by the user, through the command line option:
-INLINE:must=fn1,fn2,...
May-inline and must-inline functions are then checked against several criteria to decide
whether to inline them or not.

8027948 Rev 15 55/166


Optimization guide UM1237

Inlining criteria
Each candidate function is checked against inlining-exclusion cases which include:
• requires no-inlining by the user (-INLINE:never=fn, -INLINE:off command line
options)
• recursive function
• vararg function
• exception handler
After this preliminary test, each candidate function is inlined regardless of cost if it is marked
must-inline, or if the -INLINE:all option has been specified by the user.
Otherwise, cost evaluation is used to decide whether to inline or not, and the candidate
function is rejected if its estimated cost is above a given threshold set by the compiler. The -
INLINE:list=on option can be used to list what is inlined. Changing the compiler limits is
not recommended, since this can lead to longer compilation times or increased memory
usage or both, with no noticeable performance benefit.
Finally:
• the function to be inlined must be defined and visible in the same source file as the
function using it
• a static function that is inlined can be in specific circumstances considered “dead”, and
removed from the final object file(b)

4.1.2 stxp70cc inlining options


Table 21 specifies the options to control the stand-alone inlining.
More than one sub-option can be specified to the -INLINE:option either by using colons
to separate each sub-option or by specifying multiple options on the command line. Some -
INLINE:options are specified with a setting that either enables or disables the feature. To
disable a feature, specify the sub-option with either =OFF, =FALSE or =0 (all these strings
are case insensitive, for example -INLINE:list=OFF). To enable a feature, either use the
option name alone (for example -INLINE:list) or any other string can be used on the
right of the “=” sign (as in -INLINE:list=all). It is generally recommended to use =ON,
=TRUE, =1 for the sake of clarity (for example -INLINE:list=ON).

Table 21. Standalone inlining options


Option Description

Enable inlining on inline functions. This is activated by


-inline
default at optimization levels > 1.
-noinline Disable inlining.
Enable/disable inlining. Use of other -INLINE options
-INLINE:(on|off)
implicitly set this to on.
-INLINE:aggressive=(on|off) Inline even non-leaf, out-of-loop calls. Default is off.

b. Note that this dead code removal was not performed in earlier versions of the stxp70cc compiler (that is, the
compiler provided in toolset 3.1.0 and earlier). With those versions, inlining usually causes an increase in size,
because both the original (not inlined) instance is preserved in the final executable code, even if it is never
called.

56/166 8027948 Rev 15


UM1237 Optimization guide

Table 21. Standalone inlining options (continued)


Option Description

Forces may inline functions to be inlined, bypassing cost


-INLINE:all evaluation. This option conflicts with -INLINE:off, and
takes precedence if both are specified. Default is off.
Inline all functions marked by the C language inline
-INLINE:all_inline
keyword.
-INLINE:dfe Allow dead function elimination. Default is on.
-INLINE:list=(on|off) List compiler actions. Default is off.
Always attempt to inline the named subroutines in
-INLINE:must=name1[,name2...]
addition to the default heuristic.
-INLINE:never=name1[,name2...] Never attempt to inline the named subroutines.
Default is on. Inline only functions marked by the C
language inline keyword. The
-INLINE:only_inline=(on|off)
-INLINE:only_inline=off option is mandatory to
allow inlining of non inline functions.
Set to on, this option limits the inlining of static functions.
Set to off, this option allows more aggressive inlining of
static functions. See Inlining static functions. When code
-INLINE:size_static=(on|off)
is optimized for size (-Os) and for optimization levels:
-O0, -O1 and -O2 the default is on; when code is
optimized for speed (-O3, -O4) the default is off.
Specifies a filename containing inlining options. Default is
-INLINE:specfile=filename
none.
Default is off. Allow static functions to be candidates for
-INLINE:static=(on|off)
inlining.

In addition to these options, the option given in Table 22 may be of interest when building a
large body of inline functions (which is not recommended and may adversely affect
performance).

Table 22. Option changing inlining behavior


Option Description

Functions larger than size n are not optimized. Default is


-OPT:0limit=[0..n] 3000. Specifying 0 removes any limit but may lead to a
very long compile time.

Inlining static functions


When the option -INLINE:size_static=on, the compiler assesses the total size
increase that would result from the inlining of all the calls to the static callee function in the
current caller. If this increase is above a given threshold, none of the calls to this callee
function in the current caller are inlined.
When the option -INLINE:size_static=off, the compiler assesses the size increase
that would result from the inlining of the calls to the static callee function incrementally. The
first calls to the callee are inlined until the size increase becomes greater than the threshold.

8027948 Rev 15 57/166


Optimization guide UM1237

Inlining any further calls is suspended when the size increase becomes greater than the
threshold.

4.1.3 Extern inline functions


If both inline and extern are specified in a function definition, then the definition is used
only for inlining. The function is never compiled on its own, not even if its address is referred
to explicitly. The address becomes an external reference, as if the function had only been
declared but not defined.
This combination of inline and extern has almost the same effect as a macro. The way
to use it is to put a function definition in a header file with these keywords, and put another
copy of the definition (lacking inline and extern) in a library file. The definition in the
header file will cause most calls to the function to be inlined. If any instances of the function
remain, they will refer to the single copy in the library.

4.1.4 Inlining pragmas


The inlining process can be controlled within the C source code using #pragmas.
The stxp70cc compiler already supports several command-line options to configure its
behavior, but it is not flexible enough. For instance, with the option -INLINE:never=foo
the user can disable the inlining of foo everywhere it is called; conversely, with -
INLINE:must=foo the user can force inlining of foo everywhere.
The user has the ability to force inlining or non-inlining at call sites through the use of
pragmas. In addition, the noinline and always_inline attributes can be used at
function declaration.

Pragmas
To force inlining or non-inlining of a function in the scope of a call site, the following two
pragmas are introduced:
• #pragma inline_next (foo,...) forces inlining of function foo in the next
statement
• #pragma noinline_next (foo,...) prevents inlining of function foo in the next
statement
The ... denotes that it is possible to provide several function names with the same
pragma. It is equivalent to several pragma lines.
Two similar pragmas are provided that can be used within the scope of a function:
• #pragma inline_function (foo,...) forces inlining of function foo every time
it is called until the end of the current function
• #pragma noinline_function (foo,...) prevents inlining of function foo every
time it is called until the end of the current function
The two call site scope pragmas take precedence over these two function scope pragmas.
Two lower priority pragma are provided, with file scope:
• #pragma inline_file (foo,...) to force inlining of function foo every time it is
called until the end of the current source file
• #pragma noinline_file (foo,...) to prevent inlining of function foo every
time it is called until the end of the current file

58/166 8027948 Rev 15


UM1237 Optimization guide

Finally, to revert inlining policy to the default one (that is, rely on the inliner’s evaluation of
callee weight), the following pragma is introduced:
#pragma defaultinline (foo,...)

Function naming
As a special case, if the user does not provide any function name, the corresponding
pragma applies to all functions called in the scope of the pragma. In this case, parentheses
around the function names are optional.

User diagnostics
Several warning messages are provided to the user to help track errors.
If two conflicting pragmas are provided only the later is taken into account. For instance,
#pragma inline_next (foo)
#pragma noinline_next (foo)
foo();
This generates the following warning:
warning: #pragma noinline_next (foo) overrides previous #pragma
inline_next (foo)
If pragmas are provided at an invalid scope (that is outside of a function), the following
message is displayed:
warning: #pragma noinline_function (foo) ignored (incorrect scope)
To help track misspelling, a warning is also displayed if a pragma could not be applied to
any function call.
#pragma noinline_next (bar)
foo(i);
This generates the following warning:
warning: #pragma noinline_next (bar) matched no call

noinline and always_inline attributes


In order to enable the user to inhibit inlining of one function wherever it is called, the
noinline attribute is introduced, and is used at the function declaration level.
Conversely, to enable the user to force inlining of one function wherever it is called, the
always_inline attribute is introduced.

Precedence
Command-line options -INLINE:must=foo and -INLINE:never=foo take precedence
over both pragmas and attributes.
Attributes take precedence over pragmas. That is, a function declared with
__attribute__((noinline)) is never inlined, regardless of pragma inline_xxx
statements. However, the user can override this behavior with the -INLINE:must=foo
command-line option.
If several contradictory pragmas with the same scope apply to the same function, the last
one overrides the earlier ones.

8027948 Rev 15 59/166


Optimization guide UM1237

Examples
Example one (Figure 5) illustrates the use of the #pragma noinline_next directive. All
calls to f1() are candidates for inlining, except the one directly following #pragma
noinline_next.

Figure 5. #pragma noinline_next example


int ig = 0;
inline void f1(int i) {ig += i;}
void main()
{
f1(1); // f1 is candidate for inlining
#pragma noinline_next (f1)
f1(2); // f1 is not marked for inlining
f1(3); // f1 is candidate for inlining
printf("result is %d\n", ig);
}

Example two (Figure 6) illustrates the use of the #pragma inline_function directive.
All calls to f1() following the #pragma inline_function (f1) directive are forced to
be inlined, except the one directly following #pragma noinline_next (f1). The call to
f2() following the #pragma inline_next (f2) is also forced to be inlined, while the
first call to f2() is only a candidate for inlining (inlining depends on the respective weights
of f2() and its caller).

Figure 6. #pragma inline_function example


int ig = 0;
int jg = 0;
inline void f1(int i) {ig += i ;}
inline void f2(int i) {jg += i ;}
void main()
{
#pragma inline_function (f1)
f1(1); // f1 is forced to be inlined
f2(1); // f2 is candidate to inlining
#pragma noinline_next (f1)
f1(2); // f1 is not marked for inlining
#pragma inline_next (f2)
f2(3); // f2 is forced to be inlined
f1(3); // f1 is forced to be inlined
printf("result is %d %d\n", ig, jg);
}

60/166 8027948 Rev 15


UM1237 Optimization guide

Example three (Figure 7) illustrates the use of the #pragma defaultinline directive.

Figure 7. #pragma defaultinline example


int ig = 0;
int jg = 0;
inline void f1(int i) {ig += i ;}
inline void f2(int i) {jg += j ;}
void main()
{
#pragma noinline_function (f1)
f1(1); // f1 is not marked for inlining
f2(1); // f2 is candidate to inlining
#pragma inline_next (f1)
f1(2); // f1 is forced to be inlined
#pragma noinline_next (f2)
f2(3); // f2 is not marked for inlining
#pragma defaultinline (f1)
f1(4); // f1 is candidate to inlining
printf("result is %d %d\n", ig, jg );
}

Example four (Figure 8) illustrates the use of several function names or an empty name list
with #pragma directives.

Figure 8. Empty or multiple function name example


#pragma noinline_file ()
int f(int i) { return i+1; }
int g(int i) {
int j=i+f(i); // f is not marked for inlining
#pragma inline_next (f,g)
j += f(i); // f is forced to be inlined, g is
ignored
j += f(i) + f(i); // f is not marked for inlining
return j;
}
int h(int i) {
#pragma noinline_next ()
int j=i + f(i) + g(i); // f and g are not marked for
inlining
#pragma inline_next (f,g)
j+=i + f(i) + g(i); // f and g are forced to be inlined
return j;
}
void main()
{
// g and h are not marked for inlining
printf("result is %d %d\n", g(0), h(0));
}

8027948 Rev 15 61/166


Optimization guide UM1237

Example five (Figure 9) illustrates the use of the noinline attribute and shows how the
attribute has precedence over #pragma.

Figure 9. noinline attribute example


#pragma inline_file(f3)
int ig = 0;
void __attribute__ ((noinline)) f3(int i) { ig += i ; }
int main()
{
f3(1); // f3 is not marked for inlining
#pragma inline_next(f3)
f3(2); // f3 is not marked for inlining
#pragma defaultinline (f3)
f3(3); // f3 is not marked for inlining
printf("result is %d\n", ig);
}

62/166 8027948 Rev 15


UM1237 Optimization guide

4.2 Loop unrolling


This section describes how the stxp70cc compiler implements loop unrolling.

4.2.1 Default unrolling policy


The way loops are unrolled depends on the optimization level and on the version and
configuration of the core (single or dual ALU/dual issue, and the number of general purpose
registers (GPRs)).
Two main parameters are controlled:
• the maximum unrolling factor to be applied
• the maximum size of the loop after unrolling (this size corresponds to the number of
instructions in the internal representation rendered by the compiler when unrolling is
applied)
The exact parameters used to control unrolling are listed in Table 23.

Table 23. Loop unrolling parameters


Maximum unrolling
Optimization level Core Maximum unroll size
factor

-O0, -O1, -Os All No unrolling No unrolling

-O2 All 2 32

-O3 All 2 64
STxP70-3
-O4 2 64
STxP70-4-single issue
STxP70-4-dual issue,
-O4 4 64
16 GPRs
STxP70-4-dual issue,
-O4 4 128
32 GPRs

Note: 1 Depending on the internal analysis, the compiler is free to apply an actual unrolling factor
which is smaller than the maximum specified for the optimization level and core. This is
especially the case if a smaller unrolling factor enables the compiler to avoid the generation
of a remainder loop.
2 The #pragma unroll directive takes precedence over the default behavior of the loop
unroller.

8027948 Rev 15 63/166


Optimization guide UM1237

4.2.2 Advanced control of the unroller


The following facilities are provided to fine tune loop unrolling:
• the loop unroll pragma #pragma unroll
This pragma can be used to apply a precise unrolling factor to a given loop. This
pragma is described in Section 3.2.1: #pragma unroll (n) on page 45.
• the stxp70cc -maggressive_unroll=n option
This option enables the aggressiveness of the unroller to be set. This option takes an
integer in the range [0, 6] as an argument. It applies the unrolling parameters
described in Table 24.

Table 24. -maggressive_unroll option: values of n


Level Maximum unroll factor Maximum unroll size

0 No effect No effect
1 2 64
2 2 128
3 4 64
4 4 128
5 8 64
6 8 128

4.2.3 Precedence rules


The precedence order is as follows:
• #pragma unroll takes precedence over both the default unroller behavior and the
-maggressive_unroll option
• the -maggressive_unroll option takes precedence over the default unroller
behavior

4.2.4 Built-in assume and pragma loopmod


The built-in, __builtin_assume can be used to instruct the compiler that the loop count
is a multiple value of a given integer. This allows the compiler to apply an unrolling factor
which does not cause the generation of a remainder loop. This saves code size while often
ensuring a better efficiency of the final code.
The following code provides an example where the loop count is stated to be a multiple
value of 4:
__builtin_assume((lcount&3)==0);
for(i=0; i<lcount; i++) {
*dest=*src;
dest++; src++;
}
The built-in can be easier to integrate in the code than using the #pragma loopmod
described in Section 3.2.4: #pragma loopmod on page 48 and may be more precise.

64/166 8027948 Rev 15


UM1237 Optimization guide

4.3 Memory dependences in C programs


Precise analysis of memory dependences is key to compilation optimization, since it
enables the compiler to more freely schedule load instructions above store instructions. By
default, a C compiler assumes that any pair of memory accesses that reference distinct
types are not aliased (that is, memory dependent). However, real world cases almost
always involve pointers to the same types that are actually un-aliased: the compiler cannot
generally deduce this property and must rely on additional information. This effect can be
achieved either through the C language restrict keyword, or with the compiler option:
-OPT:alias=value
where possible values are listed in Table 25.

Table 25. Possible value to the -OPT:alias option


Value Description

any The default. Any pair of memory accesses may be aliased.


Any pair of memory accesses that reference distinct types are not
typed
aliased.
unnamed Assume pointers never point to global objects.
restrict Assume that different pointers never point to the same area
disjoint Assume multiple pointer indirection never overlap.

Although the compiler is able to compute precise memory dependences in many cases, this
is not possible when complex memory accesses are involved, such as in the following
example:
for (i = 1; i < n; i ++) {
a[i-1] = a[i] + b[i];
}
for (i = 1; i < n; i ++) {
c[d[i]] = c[i] + 1;
}
On the first loop, the compiler can fully determine the dependences between memory
accesses, provided that it knows that a and b point to distinct memory locations (see the C
language restrict qualifier). On the second loop, however, without information on values
in d, the compiler assumes that all memory accesses in the loop are dependent. In
particular, the sequence of load and store memory accesses in the iterations of the loop
must be strictly respected, resulting in a poor instruction schedule if the loop is unrolled or
software pipelined.
A useful property for loop optimizations is when a loop is vectorizable. This property can be
enforced on a loop by using the #pragma loopdep VECTOR. A vectorizable loop is such
that it can be decomposed into a sequence of loops, one per statement of the original loop,
without changing the program results. Moreover, for each loop resulting from that
decomposition (that contains only one statement), all load memory accesses can be
performed before all store memory accesses, which means that a vector version of the loop
can be written. In practice, unless the target processor is a real vector processor, the
compiler does not decompose vectorizable loops as described. Rather, it uses the

8027948 Rev 15 65/166


Optimization guide UM1237

vectorizable property of the original loop to remove dependences between memory


accesses.
In the example above, the first loop is vectorizable, provided that a and b do not overlap.
The second loop is also vectorizable if the assertion (d[i]<=i) holds for all i.
Another useful property for loop optimizations is when a loop can be parallelized. This
property can be enforced on a loop by using the #pragma loopdep PARALLEL. A
parallelized loop is one where memory accesses that reference a given memory location
may occur only in the same iteration of the loop. As a result, the sequence of memory
accesses of the original loop can be changed in any way that preserves the relative order of
memory accesses originating from the same loop iteration. Note that a parallelized loop is
always vectorizable, so the #pragma loopdep PARALLEL is stronger (but less generally
applicable) than the #pragma loopdep VECTOR.
In the example above, the first loop cannot be parallelized. The second loop can be
parallelized if the assertion (d[i]==i) holds for all values of i.
The last useful property for loop optimizations is when a loop is liberal. This property can be
enforced on a loop by using the #pragma loopdep LIBERAL. A liberal loop is one where
all its memory accesses reference unique memory locations. As a result, all the memory
accesses in the loop can be freely reordered. Note that a liberal loop can always be
parallelized, so the #pragma loopdep LIBERAL is stronger (but less generally applicable)
than the #pragma loopdep PARALLEL.
In the example above, the second loop is liberal if the assertion: (d[i]<1 || d[i]>=n)
holds for all i. (For clarity, we omitted this case for the VECTOR and PARALLEL pragmas.)
The restrict qualifier, which applies to pointers or arrays in a C program, is also highly
useful to remove dependences between memory accesses inside and outside loops. The
restrict property states that two memory accesses originating from different pointers or
arrays cannot reference the same memory location, when at least one of the pointers or
array has the restrict qualifier. Please note that all memory accesses based on a given
restrict pointer or array are still assumed dependent, unless it is obvious to the compiler that
they are not, or there is a #pragma loopdep on the loop that applies to these
dependences.

4.4 Aliasing rules in C/C++ programs


The -fstrict-aliasing option is enabled by default and allows the compiler to assume
the strictest aliasing rules applicable to the language being compiled (the aliasing rules are
stated in clause 6.5 (7) of the ISO/IEC Standard (Expressions)).
For C and C++, this activates optimizations based on the type of expressions. In particular,
an object of one type is assumed never to reside at the same address as an object of a
different type, unless the types are almost the same. For example, an unsigned int can
alias an int, but not a void* or a double. A character type may alias any other type.
Note: The type attribute may_alias is also available so that accesses to objects with types with
this attribute are not subject to type-based alias analysis. Instead they are assumed to be
able to alias any other type of object.
The -fno-strict-aliasing option can be used to disable the default option if required.
Particular attention is required before reporting a compiler issue related to aliasing,
specifically when code runs correctly with the -fno-strict-aliasing option, but

66/166 8027948 Rev 15


UM1237 Optimization guide

diverges when the default aliasing option is used. This is often caused by a violation of
aliasing rules, which are part of the ISO C/C++ standard. These rules say that a program is
invalid if you try to access a variable through a pointer of an incompatible type.
The example shown in Figure 10 demonstrates this violation, where a float is accessed
through a pointer to integer.

Figure 10. Aliasing example, using a cast


#include <stdio.h>
int main(int argc, char *argv[])
{
float a = 0.0f ;
int *pa = (int *)&a ;
*pa = 0x40000000; /* violation of aliasing rules */
if (a != 0.0f)
puts("LEGACY BEHAVIOR") ;
else
puts("STRICT ALIASING BEHAVIOR") ;
return 0;
}

The aliasing rules were designed to allow compilers to perform more aggressive
optimization. Basically, a compiler can assume that all changes to variables happen through
pointers or references to variables of a type compatible with the accessed variable. De-
referencing a pointer that violates the aliasing rules results in undefined behavior.
In the case above, the compiler may assume that no access through an integer pointer can
change the float a. Therefore, the actual value of a may be unaffected by the writing through
pa. What really happens is up to the compiler and may change with architecture and
optimization level.
To disable optimizations based on alias-analysis for ‘faulty legacy code’, the option -fno-
strict-aliasing must be used as a work-around.
Note: Because the practice of reading from a different union member other than the one most
recently written to (called “type-punning”) is common, even with -fstrict-aliasing,
type-punning is allowed, provided the memory is accessed through the union type.
To fix the code in Figure 10 above, you can use a union instead of a cast, as shown in
Figure 11.
Note: This is a GCC extension which might not work with other compilers.

8027948 Rev 15 67/166


Optimization guide UM1237

Figure 11. Aliasing example, using a union


#include <stdio.h>
/*
According to GNU documentation, this code should work in
both strict and non-strict aliasing rules
*/
int main(int argc, char *argv[])
{
union {
float f ;
int i;
} u;
u.f = 0.0f ;
u.i = 0x40000000 ; /* is 2.0f */
if (u.f != 2.0f)
puts("NON-GNU BEHAVIOR") ;
else
puts("GNU ALIASING BEHAVIOR") ;
return 0;
}

Now the result is always GNU ALIASING BEHAVIOR.

68/166 8027948 Rev 15


UM1237 Optimization guide

Finally, to fully respect the ANSI C/C++ aliasing rules, it is necessary to write the data
through a character type before reading it again. See Figure 12. The drawback of this
standard conforming solution is that it has to account for endianness, and that it is less
efficient than simply writing through an integer.

Figure 12. Aliasing example, writing through a character type


#include <stdio.h>
/*
According to ANSI standard, this code should work in
both strict and non-strict aliasing rules
*/
#include <stdio.h>
#define EXTRACTBYTE(val, pos) (((val) >> (pos*8)) & 0xff)
int main(int argc, char *argv[])
{
union
{
float f ;
char c[4] ;
} u;
const unsigned int twoasint = 0x40000000 ;
u.f = 0.0f ;
#if defined(__BIG_ENDIAN__)
u.c[0] = EXTRACTBYTE(twoasint, 3) ;
u.c[1] = EXTRACTBYTE(twoasint, 2) ;
u.c[2] = EXTRACTBYTE(twoasint, 1) ;
u.c[3] = EXTRACTBYTE(twoasint, 0) ;
#elif defined(__LITTLE_ENDIAN__)
u.c[0] = EXTRACTBYTE(twoasint, 0) ;
u.c[1] = EXTRACTBYTE(twoasint, 1) ;
u.c[2] = EXTRACTBYTE(twoasint, 2) ;
u.c[3] = EXTRACTBYTE(twoasint, 3) ;
#else
#error "Unknown endianness : please define either __BIG_ENDIAN__
or __LITTLE_ENDIAN__"
#endif
if (u.f != 2.0f)
puts("UNEXPECTED BEHAVIOR") ;
else
puts("ANSI ALIASING BEHAVIOR") ;
return 0;
}

In this case, the program always prints “ANSI ALIASING BEHAVIOR” regardless of the
compiler and its optimization options.

8027948 Rev 15 69/166


Optimization guide UM1237

4.5 Profiling
Before optimizing any application, we recommend that you analyze the critical areas of your
code to identify where optimization will have the most effect.
Profiling creates an instrumented program from your source code. Whenever this
instrumented code is executed, the program generates an information file that can be
displayed using the stxp70-gprof utility, supplied with the toolset.

Warning: Note that the functions in the toolset libraries (most


especially, the standard C library) are not instrumented for
intrusive profiling. Therefore, the time and cycles spent in the
library functions is assigned to the caller functions in the
application.

This section is not a complete guide to profiling, but a brief refresher on how to proceed with
the compiler.

4.5.1 Profiling data generation


Profiling is enabled by the -pg compiler option. For example:
stxp70cc -O2 -pg *.c -o myexe

4.5.2 Using profiling data


The first run of a program compiled using the -pg option generates a file called
gmon.out.000. This file can be viewed with the stxp70-gprof utility.
After each run in the same directory, the numerical suffix of gmon.out.000 is incremented.
The profile information for the next run is therefore gmon.out.001, and so on.
Note that a second file named stprof.out.xxx is also created. This file provides timing
measurements related to the call tree. The following data are available:
• basic_time: only the time spent within a function
• callcost_time: time spent in the function and its children
• count: number of function calls
The symbolic information available in the profile information can be augmented by using the
-g option when compiling the source code.
Users who are familiar with the standard gprof tool may use gprof to read the profiling
output file. In this case, it is necessary to pass the option --graph to the tool:
gprof --graph myexe gmon.out.000

70/166 8027948 Rev 15


UM1237 Optimization guide

4.5.3 Special case of programs that never exit


Usually, profiling data are generated at program exit. Many embedded applications,
however, are built as infinite loops and thus never exit. To enable profiling of such
applications, the toolset provides a dedicated function named UserProfilingWrite().
When this function is called, it updates the following profiling output files:
• call-graph file gmon.out.xxx
• time profiling file stprof.out.xxx
In those file names, xxx stands for a magic number that is incremented each time this
profiling function is called. It is only possible to use the function UserProfilingWrite()
if the correct toolset header file gprof.h is included in the source code:
#include <gprof.h>

Warning: We recommend that you use UserProfilingWrite()


outside critical or very often executed loops. It should be
called only a few times in a program. Be aware that a call to
this function may have side effect on compiler optimizations,
and may therefore bias results if placed in critical parts of the
code.

The profiling functions make use of the 64-bit cycle counters


of the STxP70 core, and the value of the counter is read each
time a function is entered and exited. Therefore, using those
counters must be avoided when profiling is enabled. The
predefined profiling macro __LIBGPROF_CYCLE_PROFILING
(which is automatically defined when -pg option is set) can
be used to protect the user-defined instrumentation code
based on cycle counters.

This small code sample below illustrates how to use this macro to avoid conflict between
profiling and user instrumentation involving cycle counters:
#ifndef __LIBGPROF_CYCLE_PROFILING
clrcc();
startcc();
#endif

8027948 Rev 15 71/166


Optimization guide UM1237

4.5.4 Amount of heap required for profiling


Usually, program instrumentation dedicated to profiling does not require any more heap
bytes than specified in the standard link script. However, in some specific applications – in
particular when involving a large number of routines – the standard heap size may be too
small. If this happens, the following message can appear at application run-time:
ERROR : profiling : cannot malloc profiling stack of XXX bytes: please increase heap!

To overcome this problem, edit the link script file associated to your application and increase
the padding of .heap section. By default, the .heap section contribution line is:
.heap ALIGN(16) PAD(64K) NOINIT : { } > EXTSM
This means that the.heap section base is aligned on a multiple of 16 boundary address, is
64 Kbytes in size and not zero-initialized at startup. Moreover, this section is located in
EXTSM memory region. To increase the padding of this contribution, you should change the
64K by something bigger depending on the XXX amount required, as shown in the error
message above.
Please note that if you do not specify a link script on your link command-line, the
sx_valid.ld file used by default is the one located in the folder:
<Toolset_Root>/arch_v3/stxp70cc/<stxp70cc_version>/lib/ldscript
Copy this file into your application project, modify its content according to statements above
and add it to your link command.

4.6 Code coverage


The toolset provides several options to generate test coverage data that can be used with
the GNU gcov test coverage program. Both the -ftest-coverage and -fprofile-
arcs options produce data files that can then be input to gcov. See the Using the GNU
Compiler Collection (GCC) manual provided with this product for a description of how to
apply code coverage techniques.

Table 26. Standalone inlining options


Option Description

-fbranch-probabilities Re-compile a program that has already been compiled with the -
fprofile-arcs option. The -fbranch-probilities option
instructs the compiler to optimize using estimated branch
probabilities generated by -fprofile-arcs.
-fcoverage-counter64 Instruct the compiler to use a 64-bit edge counter instead of the
default 32-bit counter. Each counter is saved as 64 bits and so the
output can still be used with any gcov utility. Use this option if you
think a statement is executed more than 232 times.

72/166 8027948 Rev 15


UM1237 Optimization guide

Table 26. Standalone inlining options (continued)


Option Description

-fprofile-arcs Instrument the "arcs" of the program flow during compilation. For
each function of your program, stxp70cc creates a program flow
graph, then finds a spanning tree for the graph. Only arcs that are not
on the spanning tree have to be instrumented; the compiler adds
code to count the number of times that these arcs are executed.
-fprofile-arcs also makes it possible to estimate branch
probabilities, and to calculate basic block execution counts. In
general, basic block execution counts alone do not give enough
information to estimate all branch probabilities.
When the program exits, -fprofile-arcs saves a list of arcs in
the program flow graph to a file called sourcename.gcda. gcov
can reconstruct the program flow graph and compute all basic block
and arc execution counts from the information in this file.
Use the compiler option -fbranch-probabilities when
recompiling to apply further optimizations.
-ftest-coverage Create a data file for the GNU gcov code coverage utility. The name
of the data file begins with the name of your source file:
sourcename.gcno. It contains a mapping from basic blocks to line
numbers, which gcov uses to associate basic block execution counts
with line numbers.

When recompiling, you must use the same code generation and optimization options for
both compilations. The only difference allowed is to replace -fprofile-arcs with -
fbranch-probabilities.
When running Interprocedural analysis, all the sources are merged into a unique file (or
several files for large programs). Therefore, the compiler is unable to know which procedure
belongs to which .c or .cxx file. The correspondence between a .c or .cxx and a .gcno
or .gcda file is no longer possible. The name of .gcda and .gcno files is the name of the
final executable, plus “_”, plus the number of the .s file that IPA has created. Since all the
original .c or.cxx filenames are saved in the .gcno file, gcov is able to associate each
procedure with a source file.
Note: You will need a copy of gcov with a version number higher than or equal to 3.4.4.

8027948 Rev 15 73/166


Optimization guide UM1237

4.7 Call trace


This section describes the use of the options -finstrument-functions and -
minstrument-calls.

4.7.1 Instrumenting functions: -finstrument-functions


The -finstrument-functions option provides standard GCC functionality. Using this
option generates instrumentation calls for entry and exit to functions.
Just after function entry and just before function exit, the following profiling functions are
called with the address of the current function and its call site:
void __cyg_profile_func_enter (void *this_fn, void *call_site);
void __cyg_profile_func_exit (void *this_fn, void *call_site);
The first argument is the address of the start of the current function. This may be looked up
specifically in the symbol table.
The second argument is the address of the call site from where the current function was
invoked. It corresponds to an address in the range of the caller function addresses that may
be found in the symbol table of the executable.
Functions that are inlined by the compiler are not instrumented. To force instrumentation of
all functions, use the -fno-inline option to disable inlining.
A function may be given the attribute no_instrument_function, in which case
instrumentation is not done for this function. This can be used, for example, for the profiling
functions listed above, high-priority interrupt routines, and any functions from which the
profiling functions cannot safely be called (perhaps signal handlers, if the profiling routines
generate output or allocate memory).
The program must be linked with an object file that implements the two functions above to
link correctly.

4.7.2 Instrumenting calls to functions: -minstrument-calls


Note: The option -minstrument-calls is not a standard GCC option.
Use this option to generate instrumentation calls just before, and just after each function
call.

74/166 8027948 Rev 15


UM1237 Optimization guide

The following profiling function is called with the address of the caller function and the
address of the callee function:
void __profile_cal(void *caller_fn,
void *callee_fn,
const char *caller_name,
const char *callee_name,
int event);
The arguments to this function are as follows:
caller_fn This is the address of the start of the current function (the caller
function), which can be looked up specifically in the symbol table.
callee_fn This is the address of the start of the called function (the callee
function), which can be looked up specifically in the symbol table.
caller_name This is the name of the caller function.
callee_name This is the name of the callee function, or NULL if the call is an
indirect call.
The function names passed in the third and fourth arguments are
pointers to static strings that have the lifetime of the instrumented
executable or shared object.
The function names are the mangled names in C++.
event This is 0 when this function is invoked just before a call,
instrumenting a function entry. It is 1 when this function is invoked
just after a call, instrumenting a function exit.

Function calls that are inlined by the compiler are not instrumented.
To force instrumentation of all functions use the -fno-inline option to disable inlining.
A function may be given the attribute no_instrument_function, in which case this
instrumentation is not done if the caller or the callee function has the attribute
no_instrument_function.
The program must be linked with an object file that implements the function above to link
correctly.
The main differences with the -finstrument-functions option are listed below.
• This instrumentation tracks (caller, callee) address pairs instead of (call_site, callee)
address pairs. If the call site information is required use the -finstrument-
functions option.
• This instumentation provides the caller and callee name when available, which avoids
a specific post processing pass to retrieve the function names.
• This instrumentation is at the call site and not in the callee, therefore for instance calls
to top level library functions (which are not instrumented) are seen while the option -
finstrument-functions does not see them. To disable the instrumentation of the
call to a particular library routine you must declare it with the
no_instrument_function attribute.
• This instrumentation is not standard GCC functionality.

8027948 Rev 15 75/166


Optimization guide UM1237

4.8 Interprocedural analysis optimization (IPA)


The -ipa option enables interprocedural analysis. With this option enabled, the compiler
identifies opportunities for optimization across module boundaries. It does this by extending
its scope for optimization and inlining from a single module to multiple modules.

Warning: The -ipa option in addition to the required optimization level


must be included in both the compiler and linker phases.

The major benefits of IPA are:


• interprocedural constant propagation
• interprocedural alias analysis
• inter-module inlining
• interprocedural placement of data in specific memory spaces. On the STxP70, the
possible spaces are DA and SDA. These can be controlled manually using options and
attributes already described (see also Table 6: Generic options with -M flag on page 18
and the section entitled memory on page 102). The command line options that control
manual memory placement (such as -Mda and -Msda) are ignored when automatic
placement is enabled.
A more advanced use of IPA is function specialization (also known as cloning).

4.8.1 Using IPA


The only mandatory option to trigger IPA compilation is -ipa.
The compilation and link time is longer because much of the optimization work is driven
from the linker. This can be observed by using the -v compiler option.
The following steps are performed when building an executable in IPA mode:
• the .c files are translated into special .o files
• the .o files are merged together (code, symbol table)
• the .o files are analyzed and optimized
• the final link is performed
Because IPA mode optimizations are carried out by the linker as well as the compiler, the
optimization is carried out only if the appropriate command line options are passed to both
the linker and the compiler. It may, therefore, be necessary to modify the Makefile
accordingly.

76/166 8027948 Rev 15


UM1237 Optimization guide

4.8.2 IPA command line options


Table 27 describes advanced IPA options.

Table 27. Advanced IPA options


Option Description

-dryipa The -dryipa option replaces the -dryrun option, which is no


longer relevant for IPA. The -dryipa option dumps details of
the different steps invoked by the driver.
-IPA:aggr_cprop=ON|OFF Enable or disable aggressive inter-procedural constant
propagation. This option attempts to avoid passing constant
parameters, replacing formal parameters by their
corresponding constant values. The default in ON.
-IPA:cgi=ON|OFF Enable or disable constant global variable identification. This
option marks non-scalar global variables that are never
modified as constants, and propagates their constant values to
all files. The default is ON.
-IPA:cprop=ON|OFF Enable or disable inter-procedural constant propagation. This
option identifies formal parameters which always have a
specific constant value. The default is ON. See also
-IPA:aggr_cprop.
-IPA:depth=n This option is identical to -IPA:maxdepth=n
-IPA:dfe=ON|OFF Enable or disable dead function elimination. This option
removes subprograms which are never called from the
program. The default is ON.
-IPA:dve=ON|OFF Enable or disable dead variable elimination. This option
removes variables which are never referenced from the
program. The default is ON.
-IPA:forcedepth=n Set inline depths. Instead of the default inlining heuristics, this
option directs IPA to attempt to inline all functions at a depth of
(at most) n in the call graph, where functions which make no
call are at depth 0, those which call only depth 0 function are at
depth 1, and so on. This ignores the default heuristic limits on
inlining.
-IPA:inline=ON|OFF Perform inter-file subprogram inlining during main IPA
processing. The default in ON.
-IPA:keeplight=ON|OFF Direct IPA not to send -keep to the compiler, in order to save
disk space. The default is ON. Setting it to OFF leaves
intermediate files in a directory which has the name of the final
executable but suffixed with .ipakeep.
-IPA:maxdepth=n Direct IPA not to attempt to inline functions at a depth of more
than n in the call graph, where functions which make no call are
at depth 0, those which call only depth 0 functions are at depth
1, and so on. Inlining remains subject to overriding limits on
code expansion. See also forcedepth, space and plimit.

8027948 Rev 15 77/166


Optimization guide UM1237

Table 27. Advanced IPA options (continued)


Option Description

-IPA:mem_placement=ON|OFF Enable or disable automatic placement of variables into the


special SDA and DA memory spaces. This STxP70 specific
optimization results in a more efficient address construction in
the use of GP-based instructions. Default is ON when
optimization level is O2 or higher (O2, O3, O4 and Os), OFF
otherwise. Command line options that control the manual
memory placement are ignored when automatic memory
placement is enabled.
-IPA:mem_array=ON|OFF Enable or disable automatic placement of array variables into
special memory spaces.
-IPA:mem_struct=ON|OFF Enable or disable automatic placement of structure variables
into special memory spaces.
-IPA::SDAspace=n Set the size of the SDA memory space to n bytes (the default is
4096).
-IPA::DAspace=n Set the size of the DA memory space to n bytes (the default is
32768).
-IPA:multi_clone=n Specify the maximum number of clones that can be created
from a single procedure. By default, this value is 0. Aggressive
procedure cloning may provide opportunities for inter-
procedural optimization, but it also may significantly increase
the code size.
-IPA:node_bloat=n When used in conjunction with -IPA:multi_clone, n this
option specifies the maximum percentage growth of the total
number of procedures relative to the original program.
-IPA:plimit=n Stop inlining in a particular subprogram when it reaches a size
of n bytes in the intermediate representation. The default is
2500.
-IPA:space=n Stop inlining when the program size has increased by n%. For
example, space=20 limits code expansion due to inlining to
approximately 20%. The default is 100%.
-IPA:specfile=filename Open filename to read more options. A spec file contains
zero or more of IPA options.

4.8.3 Limitations and special cautions


IPA and debug options
IPA optimization is not compatible with the -g compiler option. If both options are passed to
stxp70cc, then the -ipa option is automatically disabled by the driver, and debugging
information is generated.

IPA and compilations stages


The full benefit of IPA optimization is obtained only if both the compilation and the link
stages receive the -ipa option and the optimization level in command line. This is
particularly true when existing makefiles have separate stages and flags for compilation and
link stages.

78/166 8027948 Rev 15


UM1237 Optimization guide

IPA memory placement versus options and attributes


The manual placement of variables in the special memory spaces takes precedence over
the automatic placement. The automatic placement takes precedence over the command
line options that control manual memory placements. For instance:
• the automatic placement does not operate on a variable if an attribute instructs the
compiler to place it manually in a specific memory space
• if the memory spaces are already filled with variables placed manually as a
consequence of either attribute, the automatic placement has no effect
• if manual memory placement and automatic memory placement options are passed to
the compiler, then the options that control the manual memory placement are ignored

4.9 Floating-point code generation


This section describes the stxp70cc command line options for controlling floating-point.

4.9.1 Precision of floating-point arithmetic in programs


The IEEE754 standard defines two types of floating-point representation:
• The "single precision" is a 32-bit representation. It corresponds to the float data type in
C.
• The "double precision" is a 64-bit representation. It corresponds to the double data type
in C.
By default, a C compiler considers that floating-point calculations must be performed with
double precision, unless explicitly specified by the programmer. Furthermore, if any 32-bit
floating-point data is encountered in a floating-point calculation, it is promoted to 64-bit
precision. This aims at ensuring that the maximum precision is preserved.

4.9.2 Controlling the precision of floating-point


Syntax
In a program which must only use 32-bit floating-point arithmetic, a programmer should:
• declare all floating-point variables as 32-bit variables, that is "float"
• use only 32-bit floating-point constants, that is, use the "F" suffix (for example, "5.3F"
is interpreted as a 32-bit constant, whereas "5.3" is considered as a 64-bit constant).

Limitation and options


When the mechanism for controlling floating-point precision is only implemented by syntax
this can cause problems:
• many programmers are not aware that floating-point constants without the F suffix are
interpreted as 64-bit constants
• if the whole precision of a program needs to be modified, then all types and constants
may have to be changed, which may be tedious
The option -fshort-double is to be used to change the default behavior of the compiler,
and assume that floating-point arithmetic must be carried out in 32-bit arithmetic, even if
"double" types or constants without the F suffix are used.

8027948 Rev 15 79/166


Optimization guide UM1237

The option -mlib-short-double is to be used when specific libraries are provided to


support short double code generation. On the STxP70, this option is deprecated, since it is
forced to fit the default code generation setting. It is preserved mainly for legacy reasons.

4.9.3 Use of STxP70 with FPx


On any core without specific floating-point support, performing floating-point calculations in
32-bit or 64-bit arithmetic mainly results in calling different runtimes, or in different
expansion of floating-point operations. This has a limited impact on performance.
On cores with 32-bit floating-point support, the problem is different. A program with 64-bit
floating-point arithmetic cannot use the floating-point support of the core, which means that
it will call the runtime instead. This is the case for the STxP70 with the FPx floating-point
extension.
In other words, the FPx can be used efficiently only when floating-point arithmetic is 32-bit.
This is why it is highly recommended to use the option -fshort-double when the FPx is
used, because it ensures that all floating-point computations are performed using 32-bit
precision.
From the STxP70 toolset 4.1.0 onwards, a warning is emitted if the FPx is used without this
option.
On the STxP70, -mlib-short-double is deprecated and no longer has effect. It is still
recognized for legacy reasons.

4.9.4 Examples of floating-point arithmetic on the STxP70


Example 1: effect
Consider the following functions:
float fct (float A)
{
return A * 5.3;
}
If this code is compiled with the option -O3 -Mextension=fpx, then the compiler
generates the following code:
.global fct
fct:
pushrl LK ;;
subu R15, R15, 4 ;;
.LEH_post_adjust_sp_fct_1:
callr __stod ;;
L_BB2_fct:
make R2, 13107 ;;
more R2, 13107 ;;
make R3, 16405 ;;
more R3, 13107 ;;
callr __muld ;;
L_BB3_fct:
addu R15, R15, 4 ;;
poprl LK ;;
jr __dtos ;;

80/166 8027948 Rev 15


UM1237 Optimization guide

Because the default compiler behavior is 64-bit floating-point, the constant is considered 64-
bit, and the whole calculation is promoted to 64-bit. As a consequence, the multiplication is
performed due to the 64-bit runtime. The FPx cannot be used although this was specified in
the command line.

Example 2: adding -fshort-double


Adding the option -fshort-double to the command line modifies the default behavior of
the compiler and the floating-point calculations are all performed in 32-bit. The resulting
code now makes use of the FPx:
.global fct
fct:
L_BB1_fct:
.global fct
fct:
L_BB1_fct:
make R0, 16553 ;;
more R0, 39322 ;;
fmvr2f F1, R0 ;;
fmul F0, F0, F1 ;;
rts ;;

Example 3: specifying 32-bit floating-point using only syntax


Alternatively, the same result could be reached by modifying the source code as follows,
and compiling without the -fshort-double option:
float fct_float (float A)
{
return A * 5.3F;
}
Note: When -fhsort-double is used, "double" data types are interpreted as 32-bit floating-
point. This means that the following function, compiled with -O3 -Mextension=fpx -
fshort-double, will lead to the same result as in Example 2: adding -fshort-double, and
thus effectively makes use of the FPx:
double fct (double A)
{
return A * 5.3;

8027948 Rev 15 81/166


Optimization guide UM1237

4.10 Application configuration files


This section introduces and describes application configuration files (ACF), which facilitate
the fine tuning of compiler options in files and functions.

4.10.1 General description and purpose


Open64 based compilers do not allow fine grain option settings. This means that, except for
pragmas and attributes (such as inlining) that are already implemented, compiler options
apply to all functions in a file, and to all files on a command line.
When IPA is not enabled, this limitation can be partly worked around:
• by using different command lines to generate object files
• by splitting code into different files if particular functions must be compiled with different
options
Note: On STxP70-v4, some optimizations are performed at linker or post link level. Those
optimizations can depend on compilation options. Applying different options at compiler and
linker post/linker level must be made with caution.
In any case, when IPA is enabled, this workaround cannot be applied. This may be
problematic for the debugging and fine tuning of large applications. It is not easy to be
implemented either in the context of the STWorkbench.
The application configuration files (ACF) have thus been implemented to apply specific
compiler options to the different files and functions. The full set of options to be applied to
the files and functions of the same application is called a configuration. An application
configuration file can define several configurations, corresponding to different tuning
scenarios. Those configurations can then be selected by a dedicated compiler option.

Principles and overview of the implementation


The implementation of application configuration files takes place directly at compiler driver
level. It allows a fine grain, options control at a global level, file level and function level.
An application configuration file contains structured information to be attached to the
corresponding functions or files.
It is read by the compiler if specified by a dedicated option. Then it is parsed by the driver,
which applies the options at the requested level.

4.10.2 Description and syntax of an ACF


An ACF reproduces part of, or the whole of the application it is designed for, by listing files
and functions names in a configuration. It can contain several configurations, and only one
will be active during a compilation phase.
Figure 13 shows an example of an application configuration file.

82/166 8027948 Rev 15


UM1237 Optimization guide

Figure 13. Example application configuration file


configuration "c1" { // Starts the definition of a configuration called c1
-Os
// Option defined for all the application
file "f1" { // Configuration specification for file f1
-O3 // In file f1, use speed optimization level
function "foo" { // Configuration specification for function foo
-O2
-CG:if_conv=false // In function foo, disable if-conversion
}
}
}
configuration "c2" { // Other configuration
-O3
}
active configuration "c1"

In the example in Figure 13, notice the definition of two possible configurations "c1" and
"c2".
• If configuration "c1" is applied, then all files are compiled with the -Os option, except
file "f1", which is compiled with the option -O3. Furthermore, function "foo" in file "f1"
is compiled with the option -O2, and if conversion is disabled.
• If configuration "c2" is applied, then all files are compiled with option -O3, without any
exception.
• By default, configuration "c1" is applied as the active configuration. The configuration
"c2" can be activated by a dedicated compiler option (see Section 4.10.4: Using the
ACF on page 85).

Listing files or functions


It is possible to use a list of files or functions in a configuration, if several files (or functions)
have to be compiled with the same set of options. The wild character asterisk "*" can be
used in the names of files (or functions) to catch regular expressions. For example, an ACF
could contain a section, such as the one shown in Figure 14.

Figure 14. Listing files and functions


file "f*" {//Configuration specification for all files with a name starting with 'f'
-Os // In those files, use speed optimization level
function "foo1" "foo2" "foo3" {// Configuration specification for function
// foo1, foo2 and foo3
-O3
}
}

In this case, all files whose name is prefixed by an "f" are compiled with the option -Os.
Functions "foo1", "foo2", "foo3" are compiled with the option -O3.

8027948 Rev 15 83/166


Optimization guide UM1237

4.10.3 ACF grammar description

configuration_file ::= configuration_file configuration |


configuration_file active_configuration

active_configuration ::= active configuration string

configuration ::= configuration string { one_configuration } |


configuration string { }

one_configuration ::= one_configuration file_conf |


one_configuration global_option |
file_conf |
global_option

global_option ::= options

options ::= <list of compiler options>

file_conf ::= file files_name { one-file_conf } |


file files_name { }

files_name ::= files_name string


| string
| <nothing>

one_file_conf ::= one_file_conf func_conf |


one_file_conf file_option |
func_conf |
file_option

file_option ::= options

func_conf ::= function files_name { one_func_conf }

one_func_conf ::= one_func_conf option |


<nothing>

string ::= " <characters> "

84/166 8027948 Rev 15


UM1237 Optimization guide

4.10.4 Using the ACF


Compiling with an ACF
The option -macf-decl can be used to instruct the compiler to read and use an ACF:
stxp70cc -macf-decl my_acf.acf
The driver then parses the given file and applies defined options at the requested level,
provided that a default configuration is defined in the file.
Note: Options defined in a configuration file take precedence over options defined on the
command line (or in an STWorkbench session).

Specifying the active configuration


The active configuration can be specified by two different means:
• Using the dedicated keyword in the ACF:
active configuration "string"
For example: active configuration "c1"
• Using the compiler option:
-macf-active string
For example:
stxp70cc -macf-decl my_acf.acf -macf-active c1
Note: The -macf-active option takes precedence over the active configuration keyword
in the ACF.
Some warnings are emitted if no active configuration can be actually selected and applied.
In this case the ACF is ignored.

Creation of the ACF template


Even if the syntax is quite simple, writing the ACF for a large application can be a tedious
work. Thus, it is possible to automatically create the template of the ACF to be used on a
given application by using the dedicated option -macf-template.
For example, the following command creates the template of the ACF needed to compile an
application implemented in four source files; the template is created with the constant name
template.acf:
stxp70cc -macf-template file1.c file2.c file3.c main.c
This file lists all files and functions present in the application in a single configuration, with
no specific option. It also defines this configuration as the default one and names it “c1”.
The file template.acf is created locally, in the compilation directory. If a file with this
name already exists in this folder, the new content may be appended.
The template file remains incomplete until the link stage is run. This enables it to be
appended to, by subsequent compilation steps. It is, only when the template is linked, that it
is closed and cannot be further appended to. The mechanism for appending to and closing
the template file is described further in Section 4.10.5 on page 86.

8027948 Rev 15 85/166


Optimization guide UM1237

Summary
There are three ways to handle an ACF, demonstrated by the following examples:
• stxp70cc -macf-decl acf_filename.acf
Reads acf_filename.acf as an ACF, using the default configuration declared in the
file as the active configuration.
• stxp70cc -macf-decl acf_filename.acf -macf-active c1
Reads acf_filename.acf as an ACF file, and uses the command line option to
define the active configuration as c1. Configuration "c1" must be defined in the ACF
acf_filename.acf.
• stxp70cc -macf-template source_file1.c source_file2.c
source_file3.c source_main.c
Generates the ACF template for the application implemented by the source files
specified. The source files must be linkable, and the compilation include a link stage to
ensure that template is complete. For example:
stxp70cc -macf-template source_file1.o source_file2.o
source_file3.o source_main.o

4.10.5 Behavior of -macf-template option


The use of the -macf-template option is introduced in Creation of the ACF template on
page 85.
Note: The configuration defined and considered as the default in the template file is always named
"c1".
The behaviour of the -macf-template option depends on whether a template file already
exists and also on whether it is considered complete and closed.
If the template.acf file is generated by one or more compilations without a link stage, the
template file remains incomplete (and unusable) until the link stage is run.

Case 1: template.acf does not exist


1. The following command is issued to create a file template.acf:
stxp70cc -c -macf-template foo1.c
– this template contains the definition of a configuration "c1" for file foo1.c and all
functions herein
– the closing bracket for "c1" is missing, and default configuration is not declared
2. The following command is now used to create a template for the file foo2.c:
stxp70cc -c -macf-template foo2.c
– we have the pre-existing file template.acf created by the command in step 1.
– this new command appends the information related to file foo2 and all functions
herein to the configuration "c1" of pre-existing file template.acf
– the closing bracket for "c1" and the declaration of the default configuration are still
missing

86/166 8027948 Rev 15


UM1237 Optimization guide

3. Finally the following command is used to close the template and link it:
stxp70cc -macf-template foo1.o foo2.o
– this last command only invokes the link stage. The file template.acf is closed,
with “c1” declared as the default configuration
Steps 1. to 2. above generate the same file template.acf as the equivalent unique
command:
stxp70cc -macf-template foo1.c foo2.c

Case 2: template.acf exists and is closed


If the creation of a template is run with an existing, complete and closed template.acf file
in the current folder, then the syntax will be invalid, and the parser will reject the resulting
configuration file with an error message.

Makefiles
Compilation through makefile performs independent calls to the compiler to generate object
files before linking. In this context, the generation of an ACF template requires an
incremental behavior. The mechanism of the template generation tests if the template file
template.acf exists in the compilation directory. If it exists, it opens it in append mode.
Otherwise, it creates it. At the linker or archive creation stage, the following actions are
performed:
• the template file is closed from a syntactical point of view (close of last '}', and the
active configuration lines are written)
• buffer and file are closed from a file system point of view
If the compilation does not end with a linker or archive creation stage (only use of the -S or
-c option), then the buffer is flushed, the file is closed, but the file is not closed from a
syntactical point of view. Since it does not end with the expected pattern, the corresponding
template is not usable.

4.10.6 Scope and known limitations


Compiler options
Most stxp70cc compiler options, both external or internal can be used in the ACF.
Nevertheless, it would not make any sense to apply some of the options to only a subset of
the files or functions. This is especially true for the compiler options which describe the
hardware configuration.
The following options are not taken into account at file or function level:
• -Mconfig options: These options describe the hardware setup used to run the binary
file to be generated by the compiler. Since this hardware is the same for all the parts of
the code, those options should be the same in all files and functions. They are taken
into account if they are defined at the global level of an ACF. They are ignored if they
are defined only for some files or functions.
• -Mextension options: These options describe which extensions are available on the
hardware, and can be used to generate the code. Like the-Mconfig options, they are
accepted at global level, but discarded at file or function level.
• -Mmode16 or -Mmode32: This option does not describe the hardware configuration,
but rather the registers to be used during code generation. This option is accepted at

8027948 Rev 15 87/166


Optimization guide UM1237

global and file levels, but not at function level. This is linked to technical reasons in
relation with ABI handling (register saving at entry and exit of functions), which must be
consistent over the whole application.

Inliner
The inliner operates on a full compilation unit and then takes into consideration the
optimization level specified at global or file level, but not at function level. As a result, when
using ACFs, we can get different assembly code for a given function. Depending on the
scenario used, the function can apparently be compiled twice at the same optimization level.
For instance, consider the file f1.c:
int foo1() { return 1; }
int foo2() { return 2; }
int foo3() { return foo1() + foo2(); }

1. First scenario
With this first scenario, the file is compiled by the following command line, based on a
global -Os option:
stxp70cc -Os -c f1.c
Here foo3() is compiled using -Os.
Assembly code for foo3() contains calls to foo1() and foo2(), which are not
inlined because of -Os.

2. Second scenario
An ACF acf1.acf is defined with the following directives:
file "f1" {
function "foo3" { -Os }
}
Code is compiled using this ACF:
stxp70cc -O3 -c -macf-decl acf1.acf f1.c
Here foo3() is compiled using -Os.
Assembly code for foo3() does not contain calls to foo1() and foo2(), which are
inlined because of -O3, which is visible to the inliner.
3. Third scenario
Code is compiled with option -O3:
stxp70cc -O3 -c f1.c
Here foo3() is compiled using -O3.
Assembly code for foo3() does not contain calls to foo1() and foo2().

88/166 8027948 Rev 15


UM1237 Optimization guide

4. Fourth scenario
An ACF acf1.acf is defined with the following directives:
file "f1" {
function "foo3" { -O3 }
}
Code is compiled using this ACF:
stxp70cc -Os -c -macf-decl acf1.acf f1.c
Here foo3() is compiled using -O3.
Assembly code for foo3() contains calls to foo1() and foo2(), which are not
inlined because -Os is visible to the inliner.
Intuitively, the user might expect to have the same code for scenario 1 and 2, as well as for
scenario 3 and 4, but this will not be the case because of the implementation of inlining.

8027948 Rev 15 89/166


GNU C extensions supported by stxp70cc UM1237

5 GNU C extensions supported by stxp70cc

GNU cc provides a large set of extensions that are widely used in the GNU Linux
community. These extensions can be used to:
• describe embedded features, for example, data section placement
• provide guidance to the compiler for optimization, for example, the noreturn function
• provide language extensions, for example, conditional lvalue or C99 features
The GNU extensions are sometimes the only way to access ELF features that are not
directly available in the C language; for example, to declare a symbol as weak.

5.1 Extensions to the C language family


stxp70cc provides several language features not found in ANSI standard C. (The -
pedantic option directs stxp70cc to print a warning message if any of these features are
used.) To test for the availability of these features in conditional compilation, check for a
predefined macro __GNUC__, which is always defined under stxp70cc.
It is recommended to always put code containing stxp70cc extensions under the C pre-
processor macro __GNUC__.
#if __GNUC__
/* Original GNU code */
#else
/* Work-around code */
#endif

5.1.1 Statements and declarations in expressions


Statements and declarations in expressions allow complicated C statements to be written
and used as if they were a simple C expression, optionally returning a result value. Local
declarations and labels may be embedded.
This provides a way to construct a safe preprocessor macro that comprises several
statements, without using the do { } while(0) trick that swallows the semi-colon.
#define cfoo() \
( { int y = foo (); int z; \
if (y > 0) z = y; \
else z = - y; \
z; })

90/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.1.2 Locally declared labels


When GNU extensions are used in conjunction with expression statements and macros,
they enable service labels to be used, that is, labels whose scope is limited to the current
statement. See Figure 15.

Figure 15. Locally declared labels example


#define SEARCH(array, max, target) \
({ \
__label__ found; \
typeof (target) _SEARCH_target = (target); \
typeof (*(array)) *_SEARCH_array = (array); \
int i, j; \
int value; \
for (i = 0; i < max; i++) \
for (j = 0; j < max; j++) \
if (_SEARCH_array[i][j] == _SEARCH_target) \
{ value = i; goto found; } \
value = -1; \
found: \
value; \
})

5.1.3 Labels as values


The address of a label defined in the current function, or a containing function, can be
obtained with the extended && unary operator that has type void*. See Figure 16.

Figure 16. Labels as values example


const char * cgoto(int i)
{
void *ptr = &&foo;
static void *array[] = { &&foo, &&bar, &&hack };
goto *array[i];
foo:
return "foo" ;
bar:
return "bar" ;
hack:
return "hack" ;
}

5.1.4 Naming an expression's type


A name can be given to the type of an expression using a typedef declaration with an
initializer. To define name as a type name for the type of expression, do:
typedef name = expression;

8027948 Rev 15 91/166


GNU C extensions supported by stxp70cc UM1237

This can be used in conjunction with the statements-within-expressions feature described in


Section 5.1.1. For example, to define a safe “maximum” macro that operates on any
arithmetic type:
#define max(a,b) \
({typedef _ta = (a), _tb = (b); \
_ta _a = (a); _tb _b = (b); \
_a > _b ? _a : _b; })
The reason for using names that start with underscores for the local variables is to avoid
conflicts with variable names that occur within the expressions that are substituted for a and
b.
Note: In the future the GNU C language may include a new form of declaration syntax that allows
the declaration of variables whose scopes start only after their initializers; this will be a more
reliable way to prevent such conflicts.

5.1.5 Referring to a type with typeof


typeof allows you to refer to an object data type by referring to an object of that type. It is
particularly useful to write generic and safe macro-definitions, which can then be applied to
various primitive types or user-defined data types. Without this extension, it is necessary to
define as many specific macros as the number of different types used in calls to the generic
macro.
#define max(a,b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b? _a: _b; \
})

5.1.6 Generalized Lvalues


Compound expressions, conditional expressions and casts are allowed as lvalues
provided their operands are lvalues. For example:
(a, b) += 5;

5.1.7 Conditionals with omitted operands


The middle operand in a conditional expression may be omitted, for example:
z = x? : y;

5.1.8 Double-word integers


long long support (integer 64-bits) is supported by the stxp70cc compiler. It is now also
an ISO C99 feature.
long long x;

5.1.9 Hexadecimal floats


Floating-point numbers are written in hexadecimal format:
float f = 0x1.fp3;

92/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.1.10 Specifying a register for a local variable


A register in either the core or an extension may be specified for a local variable, for
example:
// R6 core register allocated to the myvar long variable
register long myvar asm ("r6") = name;
// The part number 1 of 128-bit width in the register 2
// of the register class D of the user defined extension MP2x
// is allocated to the variable myvarext
register MP2x_DP myvarext asm ("D2_P1");
The syntax for extension register specification is described in details in Syntax of
scalar/SIMD audio extension register lists on page 110.
Note: The extension multi-level register must always be specified using the smallest subpart
syntax. It is however possible to allocate a top level register. In this case, the specified sub
register must be the first one of the group composing the full register. For instance:
// declare a variable at level P allocated to D2_P1
register MP2x_DP var64 asm ("D2_P1");
// declare a variable at level X allocated to D1_P0 and D1_P1
register MP2x_DX var128 asm ("D1_P0");

8027948 Rev 15 93/166


GNU C extensions supported by stxp70cc UM1237

5.1.11 Array of length zero


Zero length arrays are allowed in GNU C. They are very useful as the last element of a
structure which is really a header for a variable length object. See Figure 17.

Figure 17. Zero length array example


#include <stdio.h>
#include <stdlib.h>
struct line {
int length;
char contents[0];
};
struct line *newline( unsigned int this_length)
{
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
return thisline ;
}
void delline(struct line *thisline)
{
free(thisline) ;
}
int main(int argc, char *argv[])
{
enum { __MAXL = 128 } ;
enum { __L = 16 } ;
struct line *lines[__MAXL] ;
int i ;
printf("sizeof(line) : %d\n", sizeof(struct line)) ;
for(i=0; i< __MAXL; i++) {
lines[i] = newline(__L) ;
}
for(i=0; i< __MAXL; i++) {
delline(lines[i]) ;
}
puts("Done.") ;
return 0 ;
}

94/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.1.12 Array of variable length


An array of variable length is an automatic array defined with a length that is not a constant
expression. This type of array is also known as a VLA. See Figure 18.

Figure 18. Variable length array example


#include <stdio.h>
#include <stdlib.h>
void sadcat(char *s1, char *s2)
{
char str[strlen (s1) + strlen (s2) + 1];
strcpy (str, s1);
strcat (str, s2);
printf("%s + %s == %s\n", s1, s2, str) ;
printf ("sizeof(str) = %d\n", sizeof(str));
}
void tester (int len, char buffer[len][len]) {
int i=0, j=0;
char tt[len][len];
for (i=0; i<len; i++)
for (j=0; j<len; j++)
buffer [i][j] = i*j;
printf ("sizeof(tt) = %d\n", sizeof(tt));
printf ("sizeof(buffer) = %d\n", sizeof(buffer));
}
char data[10][10];
int main(int argc, char *argv[])
{
sadcat("Foo", "Bar") ;
tester (4, data);
tester (10, data);
return 0 ;
}

5.1.13 Macro with variable number of arguments


This extension enables a macro to be defined that can safely be expanded into a function
with a variable number of arguments. These macros are also called CPP vararg macros.
For example, the following C program:
#define eprintf(format, args...) fprintf (stderr, format, ##args)
eprintf ("success!\n");
eprintf ("%s%d: ", input_file_name, line_number);
is expanded to:
fprintf ((&__iob[2]), "success\n!");
fprintf ((&__iob[2]), "%s%d: ", input_file_name, line_number);

8027948 Rev 15 95/166


GNU C extensions supported by stxp70cc UM1237

Note: GNU C supports two types of “variable number of arguments” syntax. The ISO C99 format,
which uses __VA_ARGS__ and the GNU format that uses ##args. The ISO C99 format
does not support the case where the number of parameters passed as part of the ellipsis is
zero. GNU C reuses the ## trick to absorb the comma in this case. See Figure 19.

Figure 19. Variable number of arguments example


#include <stdio.h>
#define gnu_eprintf(format, args...) \
fprintf (stdout, "gnu_eprintf " format, ## args)
#define isoc99_eprintf(format, ...) \
fprintf (stdout, "isoc99_eprintf " format, __VA_ARGS__)
#define extended_isoc99_eprintf(format, ...) \
fprintf (stdout, "extended_isoc99_eprintf " format, \
## __VA_ARGS__)
#define errprintf(args...) \
gnu_eprintf ("errprintf " "%s\n", ## args)

int main(int argc, char *argv[]) {


/* Try 1, 2, 3 arguments */
gnu_eprintf ("One argument: %s. Done.\n", __FILE__);
gnu_eprintf ("Two arguments: %s:%d. Done.\n", __FILE__, \
__LINE__);

isoc99_eprintf ("One argument: %s. Done.\n", __FILE__);


isoc99_eprintf ("Two arguments: %s:%d. Done.\n", __FILE__, \
__LINE__);

extended_isoc99_eprintf ("One argument: %s. Done.\n", __FILE__);


extended_isoc99_eprintf ("Two arguments: %s:%d. Done.\n", \
__FILE__, __LINE__);
extended_isoc99_eprintf ("Three arguments: %s:%s:%d. Done.\n", \
__FUNCTION__, __FILE__, __LINE__);

/* The case with no arguments ... */


gnu_eprintf ("No arguments. Done.\n");
/* The line below causes a syntax error */
isoc99_eprintf ("No arguments. Done.\n");
extended_isoc99_eprintf ("No arguments. Done.\n");

/* Cascade of macros with variable number of arguments */


errprintf (__FILE__);
return 0 ;
}

96/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.1.14 Strings literals with embedded newlines


GNU cpp permits string literals to cross multiple lines without escaping the embedded
newlines. Each embedded newline is replaced with a single newline character in the
resulting string literal, regardless of what form the newline took originally.
The macro definition:
#define MESSAGE \
"Hello,
good brave new World!
"
would be written under ISO:
#define MESSAGE \
"Hello,\n" \
"good brave new World!\n"

5.1.15 Non-Lvalue arrays may have subscripts


In ISO C99, arrays that are not lvalues still decay to pointers, and may be subscripted.
However, they may not be modified or used after the next sequence point and the unary
operator “&” may not be applied to them. See Figure 20.

Figure 20. Non-lvalue arrays example


struct foo {int a[4];};
struct foo f() {
static const struct foo f = { 2, 4, 8, 16 };
return f ;
}
void bar (void)
{
int i;
for (i=0; i<4; i++)
printf ("f().a[%d] == %d\n", i, f().a[i]) ;
}
int main(int argc, char *argv[])
{
bar ();
f().a[0] = 15;
bar ();
return 0 ;
}

8027948 Rev 15 97/166


GNU C extensions supported by stxp70cc UM1237

5.1.16 Arithmetic on void and function pointers


In GNU C, addition and subtraction are supported by pointers to void and by pointers to
functions. The size used for a void or for a function is 1. This means that although sizeof
is allowed for void and for a function, it always returns 1. See Figure 21.

Figure 21. Arithmetic on void and function pointers example


void f0(void) {}
void *p = 0;
void (*pf)(void) = 0;
bar (void) {
p++;
pf++;
printf ("sizeof(void) = %d\n", sizeof(void));
printf ("sizeof(func) = %d\n", sizeof(f0));
}

5.1.17 Non-constant initializers


As in standard C++ and ISO C99, the elements of an aggregate initializer for an automatic
variable are not required to be constant expressions. For example:
int foo (int f, int g)
{
int beat_freqs[2] = { f-g, f+g };
return beat_freqs[0] * beat_freqs[1] ;
}

5.1.18 Compound literals


Compound literals used to be called “Constructor Expressions” before ISO C99 normalized
them under the term “Compound Literals”. A compound literal looks like a cast containing an
initializer. See Figure 22.

Figure 22. Compound literal example


#include <stdio.h>
#include <malloc.h>
struct foo {int a; char b[2];} ;
struct foo * givefoo(int x, int y, char a, char b) {
struct foo * sfoo = (struct foo *) malloc(sizeof (struct foo));
/* Fill in the anonymous struct at once with a Compound Literal */
*sfoo = (struct foo) {x + y, a, b};
return sfoo;
}

GNU C allows initialization of objects with static storage duration by compound literals,
whereas ISO C99 does not.

98/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.1.19 Designated initializers


This extension was called “GNU Style Labeled Elements in Initializers”. It is now an ISO C99
feature. It allows the initialization of particular elements of an aggregate, a structure or an
array, by specifying the member name or the indices of the elements to initialize, in any
order. See Figure 23.

Figure 23. Designated initializers example


const int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };
int a[6] = { [4] 29, [2] = 15 } ;
enum { v1 = 1, v2 = 2 , v4 = 4 } ;
int b[6] = { [1] = v1, v2, [4] = v4 } ;
struct point { int x, y; };
struct point makep(int xvalue, int yvalue )
{
struct point p = { y: yvalue, x: xvalue };
return p ;
}
struct point makepp(int xvalue, int yvalue )
{
struct point p = { .y = yvalue, .x = xvalue };
return p ;
}

With GNU C the = character can be omitted after the [index] indication.

5.1.20 Case ranges


Case ranges may be specified with integer value intervals in switch statements.
const char * which (int v) {
switch (v) {
case 0 ... 31: return "Control";
case 'A' ... 'Z': return "Upper";
case 'a' ... 'z': return "Lower";
default: return "None";
}
}

5.1.21 Cast to a union type


A cast to union type is similar to other casts, except that the type specified is a union type.
The type is specified either with the union tag or with a typedef name.
union foo { int i; double d; } u, v;
makefoo (int i, double f) {
u = (union foo) i;
v = (union foo) f;
}

8027948 Rev 15 99/166


GNU C extensions supported by stxp70cc UM1237

5.1.22 Dollar signs in identifier names


Dollar signs are allowed in identifier names.
int $a;

5.1.23 Prototypes and old-style function definitions


GNU C extends ISO C to allow a function prototype to override a later old-style non-
prototype definition.
int isroot (uid_t);
int isroot (x) /* ??? lossage here ??? */
uid_t x;
{
return x == 0;
}

5.1.24 C++ comments


// C++ comment
C++ comments are not recognized by the stxp70cc option -ansi. This is to avoid problems
with constructs that contain the forward slash character “//”. For example:
x = a //**/b;

5.1.25 Character ESC in constants


The sequence “\e” is recognized in string or character constants as an ASCII <escape>
character.
char escape = '\e';
char s[] = "\e\e";

5.1.26 Inquiring on alignment of types or variables


__alignof__ allows enquiries about how an object is aligned, or the minimum alignment
required by a type or variable.
struct foo { int x; char y; } f;
int x = __alignof__ (double);
int b = __alignof__ (f.y);

Warning: The STxP70 ABI states that the stack is aligned to a 64 bit
boundary. However, for wider extension data types, it is
necessary to increase this value. A dedicated attribute
aligned_stack is defined for this purpose.

100/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.1.27 Incomplete enum type


An enum type can be defined without specifying its possible values.
typedef enum _e e;
struct _s {
e* p;
} s;
enum _e { red, green, blue, black };
e x;

5.1.28 Function names as strings


GNU cc predefines two magic identifiers to hold the name of the current function. The
identifier __FUNCTION__ holds the name of the function as it appears in the source. The
identifier __PRETTY_FUNCTION__ holds the name of the function printed in a language
specific fashion.
char here[] = "Function " __FUNCTION__ " in file " __FILE__;

5.2 Attributes
Attributes are generally a much better design than a #pragma directive for several reasons.
Firstly, an attribute specification is a piece of C language that can be generated by use of a
cpp macro definition, whereas a #pragma directive generation is generally not supported
by non-GNU C preprocessors. Secondly, it avoids the scoping issues of the #pragma
directive.
Several attributes can be applied to the same object by using a comma to separate them.
For example, to declare a symbol that is both weak and aliased:
void useful (void) __attribute__ ((weak, alias("useful_func")));

5.2.1 Placement and layout


section
When applied to a function, places the function in a user-defined section.
void myfunc (void) __attribute__ ((section(".mytext")));
void myfunc (void) {
printf ("From myfunc in .mytext section.\n");
}
When applied to a data object, places the data in a user-defined section.
struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };
Support must be explicitly added in the startup file or system loader to load the newly
created section.

8027948 Rev 15 101/166


GNU C extensions supported by stxp70cc UM1237

memory
The STxP70 processor provides several special memory spaces that allow less costly
accesses.
• Tiny Data Area (TDA)
Data in the TDA is accessed using a single instruction of the form
baseaddress+offset, where offset is expressed in elements. The TDA is based
at address 0 (which is byte 4 as accessing address 0 is not possible in C). Due to the
way it is accessed, only 32 Kbytes can be placed in the TDA.
• Small Data Area (SDA)
Data in the SDA is accessed using a single instruction of the form
baseaddress+offset, where offset is expressed in elements. An element can be
a byte, 16-bit word, or 32-bit word depending on the type of the data object. An
aggregate of 4,096 elements can be placed in SDA. This can be a mixture of scalars,
arrays, and structures of various sizes and with element sizes of byte, 16-bit word, or
32-bit word, but the aggregate number of elements over all entries can not exceed
4,096.
• Data Area (DA)
The addresses of data in the DA are build using a single instruction of the form
addugp Ri, offset, where offset is expressed in bytes. An aggregate of 32,768
bytes can be placed in the DA. This can be a mixture of scalars, arrays, and structures
of various sizes and with element sizes of byte, 16-bit word or 32-bit word.
Three attributes are defined to instruct the compiler to place a variable in these spaces:
int __attribute__ ((memory ("tda"))) x; // x is placed in TDA
int __attribute__ ((memory ("sda"))) y; // y is placed in SDA
int __attribute__ ((memory ("da"))) z; // z is placed in DA

aligned
When applied to a variable or a structure field, specifies a minimum alignment for a variable
or structure field, measured in bytes. The aligned attribute can only increase the
alignment; it can be decreased by specifying packed as well.
int x __attribute__ ((aligned (16))) = 0;
struct _s { int x[2] __attribute ((aligned (8))); };
short array [3] __attribute ((aligned));
When applied to a type:
typedef int more_aligned_int __attribute__ ((aligned(8)));

Warning: It is also possible to make use of a specific syntax for aligned


data types. This based on the addition of the _aligned suffix
to the type name. This syntax can be applied to any data type,
but is especially recommended on SIMD audio extension (see
also Aligned data types on page 118).

102/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

aligned_stack
When applied to a function, this attribute specifies that the head of the stack must be aligned
to a given boundary. The value provided as an argument corresponds to the number of
bytes to which the stack must be aligned. The argument must be a power of 2, strictly
greater than 8 and lower than or equal to 256.
For instance the attribute below specifies that the stack of function fct() must be aligned
to a 128-bit boundary:
void fct() __attribute__ ((aligned_stack(16)));
void fct()
{
...
}

Warning: Several means are provided to control the alignment of the


stack. It is recommended to refer to Table 6: Generic options
with -M flag on page 18 for the description of the related
option and precedence rules. Please note that the compiler is
also able to perform self-alignment of the stack on many
occasions, taking the size of local variables into account.

weak
When applied to a function, causes the function to be emitted as a weak symbol. Set to 0 if
the symbol is not defined at link time. This is primarily of use in defining library functions that
can be overridden in user code:
void d_stub (void) __attribute__ ((weak));
if (d_stub) {
d_stub();
}
When applied to data, causes the declaration to be emitted as a weak symbol rather than a
global symbol. This is primarily of use in defining variables that can be overridden in user
code:
int debug __attribute__ ((weak)) = 0;

alias
Applies only to functions: The required functionality is to provide an alias name for a given
function. It is often used in conjunction with the weak requirement to define an alternate
weak name for a given function.
void useful_func (void) {
/* ... Do something ... */
}
void useful (void) __attribute__ ((alias("useful_func")));

8027948 Rev 15 103/166


GNU C extensions supported by stxp70cc UM1237

packed
Applies only to data: Specifies that a variable or structure should have the smallest possible
alignment - one byte for a variable, and one bit for a field, unless a larger value with the
aligned attribute is specified.
The specified data alignment is applied during data layout, and the code generator emits
safe sequence of instructions to avoid causing a misalign trap.
struct foo { char a; int x __attribute__ ((packed)); };

used
The GCC manual specifies that the used attribute may only apply to functions. For
stxp70cc it may also apply to variables.
• The used attribute, attached to a function, means that the code must be emitted for this
function, even if this function appears never to be referenced.
• This attribute, attached to a variable, means that the definition must be emitted for the
variable even if it appears that the variable is not referenced.
The used attribute follows the same syntax as any GCC attribute.
For a procedure:
static int Foo() __attribute__ ((used)) ;
For uninitialized data:
static foo __attribute__((used)) ;
For initialized data:
static foo __attribute__((used)) = 2 ;
Note: The assembly has been specifically extended to support this attribute:
.type Foo, @function, used
.type foo, @object, used
A motivation for using this attribute is to avoid the deletion of an unreferenced symbol by the
dead code, dead data or IPA optimization. This can be useful for debugging purposes (for
instance, a function dumping a specific data structure that is only called interactively from
debugging sessions is removed if not marked as ‘used’, since the compiler does not find
any reference to it).

constructor and destructor


Applies only to functions: The constructor attribute causes the function to be called
automatically before execution enters main(). Similarly, the destructor attribute causes
the function to be called automatically after exit().
void initdata (void) __attribute__ ((constructor));
void terminatedata (void) __attribute__ ((destructor));

104/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.2.2 Optimization
This section only applies to functions.

noreturn
Enables a function to be declared that cannot return, such as abort or exit. It is a useful
indication to optimizers.
void byebye () __attribute__ ((noreturn));

malloc
Used to tell the compiler that a function returns a pointer that cannot alias anything. It is a
useful indication to optimizers.
void * get_block (int) __attribute__ ((malloc));

5.2.3 Visibility attributes


The visibility attributes are supported as follows:
__attribute__((__visibility__("visibility-type")))
__attribute__((visibility("visibility-type")))
where visibility-type can be default, hidden, protected, internal.
default Default visibility is the normal case for ELF. This value is available for
the visibility attribute to override other options that may change the
assumed visibility of symbols.
hidden Hidden visibility indicates that the symbol is not placed into the
dynamic symbol table. This means that no other module (executable
or shared library) can reference it directly.
protected Protected visibility indicates that the symbol is placed in the dynamic
symbol table, but that references within the defining module bind the
local symbol. This means that the symbol cannot be overridden by
another module.
internal Internal visibility is similar to hidden visibility, but has additional
processor-specific semantics. For the STxP70, this means that the
function is never called from another module.

Note: Hidden symbols cannot be referenced directly by other modules but they can be referenced
indirectly by function pointers. By indicating that a symbol cannot be called from outside the
module, the compiler may for instance omit the load of a PIC register since it is known that
the calling function has already defined the correct value.

8027948 Rev 15 105/166


GNU C extensions supported by stxp70cc UM1237

5.2.4 Miscellaneous attributes


interrupt and interrupt_nostkaln
The interrupt attribute specifies that a function is an interrupt routine. This imposes:
• a save/restore of all registers at entry/exit of the function
• an rte instruction is used to return from the routine (instead of an rts)
• a proper stack alignment at entry/exit of the routine
The interrupt_nostkaln attribute has the same effect, except that it does not perform
any stack realignment.
void __attribute__ ((interrupt)) it_routine_1(...)
{
...
}

format_arg
The format_arg attribute specifies that a function takes a format string for a printf,
scanf, strftime or strfmon style function and modifies it, so that the result can be
passed to a printf, scanf, strftime or strfmon style function.
extern char * my_dgettextprintf (void *my_domaint,
const char *my_format) __attribute__ ((format_arg(2)));

mode
This attribute specifies the data type for the declaration whichever type corresponds to the
mode. Refer to the GNU Compiler Collection Internals document for the definitions of
modes, http://gcc.gnu.org/onlinedocs/gccint.
Use the keywords __byte__, __word__ and __pointer__ to indicate the mode
corresponding to these quantities.
unsigned int qi __attribute__ ((mode (QI)));
unsigned int w __attribute__ ((mode (__word__)));

106/166 8027948 Rev 15


UM1237 GNU C extensions supported by stxp70cc

5.2.5 Built-ins
A built-in is used in the same way a function call, but is expanded by the compiler very early
in the intermediate representation, instead of doing a function call. On STxP70, most
machine and extension instructions can also be addressed using built-ins. Please refer to
Chapter 7: Built-in functions on page 115 for further information.

__builtin_constant_p
This built-in tests if a value is a constant at compile time.
int x;
#define C 1
int main () {
if (__builtin_constant_p (C) == 1)
printf ("c is proved to be a constant\n");
if (__builtin_constant_p (x) == 0)
printf ("x is a not proved to be a constant\n");
return 0;
}

__builtin_return_address
__builtin_return_address gets the return address of the currently executing function.
void bar () {
printf ("RA = 0x%08x\n", (int)__builtin_return_address (0));
}

__builtin_expect
long __builtin_expect (long exp, long c)
__builtin_expect provides the compiler with branch prediction information.
The return value is the value of exp, which should be an integral expression. The value of c
must be a compile-time constant. The semantics of the built-in are that it is expected that
exp == c.
For example:
if (__builtin_expect (exp, 0))
foo ();
indicates that a call to foo() is not expected as exp should be 0.

__builtin_classify_type
__builtin_classify_type(object) ignores the value of the object and considers
only its data type. It returns an enum describing what kind of type object is. See Figure 24.

8027948 Rev 15 107/166


GNU C extensions supported by stxp70cc UM1237

Figure 24. __builtin_classify_type example


enum type_class __builtin_classify_type(object)
enum type_class
{
no_type_class = -1,
void_type_class, integer_type_class, char_type_class,
enumeral_type_class, boolean_type_class,
pointer_type_class, reference_type_class, offset_type_class,
real_type_class, complex_type_class,
function_type_class, method_type_class,
record_type_class, union_type_class,
array_type_class, string_type_class, set_type_class,
file_type_class, lang_type_class
};

108/166 8027948 Rev 15


UM1237 GNU ASM

6 GNU ASM

The stxp70cc compiler accepts “extended inline assembly” asm, as part of C programs.
This chapter only summarizes the main features of the asm implementation and describes
its limitations. It is not a substitute for the GNU documentation.

6.1 Syntax
General syntax
asm(template : output operands : input operands : clobber list);
or
__asm__(template : output operands : input operands : clobber list);
Where:
• template is the assembler instruction, defined as a string constant
• output operands is a list of comma separated output operands
• input operands is a list of comma separated input operands
• clobber list is a list of comma separated clobbered operands
The template section contains plain assembler, and uses ordinary STxP70 assembler
syntax, with the notable exception of the %i (i is a positive integer) notation that refers to
the ith output or input operand.
Note: Multiple consecutive strings are automatically concatenated to enable a readable and
correct template input. Multiple assembler instructions can be put together in a single asm
template, separated by explicit newline characters ‘\n’.
If there are no output operands but there are input operands, two consecutive colons must
be used in place of the output operands.
In the output and input list:
• each operand is described by an “operand constraint string” followed by a C expression
in parentheses
• the available constraints are the following:
– r general purpose register operand
– b boolean register operand
– i immediate integer operand, including symbolic constants only known at
assembly time
– n immediate integer operand, known at compile time
– g guard register
– fpx_FX FPx register (STxP70-4 only)
– the type attached to a scalar or SIMD audio extension (for instance, MP2x_VP or
MP2x_VX)

8027948 Rev 15 109/166


GNU ASM UM1237

• an operand constraint can be prefixed by the following modifiers:


– = write-only operand, used for output operands
– & early clobber operand, does not prevent the use of =
– + operand is used for both input and output
In the clobber list:
• general registers are referred to by ri (where i has the range [0,31]), they map to the
corresponding Ri hardware registers [0,31](c)
• FPx extension registers are referred to by fi (where i has range [0,15]), they map to
the corresponding Fi hardware registers
• guard registers are referred to by gi (where i has the range [0,7]), they map to the
corresponding Gi hardware guard registers
• scalar or SIMD audio extension registers are referenced by a name determined by the
extension and level

Syntax of scalar/SIMD audio extension register lists


The STxP70 core accepts scalar and SIMD audio extensions with multi-level register files.
The syntax has been extended to support such extension registers.
For non-SIMD registers (that is, registers with level “X” only), a register name is constructed
using the following template:
<registerfile_name><register_id>
Where:
• <registerfile_name> is the name of the extension register file
• <register_id> is the number of the register
For example, when considering a register file T with a single level hierarchy, the registers
are referenced as "T0", "T1", "T2" and so forth.
For SIMD register files, register names are constructed according to the following template:
<regfile_name><reg_id_max_level>_<regfile_min_level><reg_subid_min_
level>
Where:
• <regfile_name> is the name of the scalar or SIMD audio extension register file
• <reg_id_max_level> is the number of the register at the highest hierarchy level
(level “X”)
• <regfile_min_level> is a letter specifying the smallest level accessible for the
register file:
– "X" for a single level register file
– "P" for a 2-level register file
– "Q" for a 4-level register file
• <reg_subid_min_level> is the offset of the register at the smallest hierarchy level.

c. If the configuration only includes 1 bank (16 registers), then the range is only [0,15]

110/166 8027948 Rev 15


UM1237 GNU ASM

For example, when considering the register file V of the MP2x extension, with a two level
hierarchy, registers are referenced as "MP2x_V0_P0", "MP2x_V0_P1",
"MP2x_V1_P0", "MP2x_V1_P1" and so forth.
Note: Registers are always specified at the smallest hierarchy level. Therefore, to disable the full
V0 register, both subparts "V0_P0" and "V0_P1" must be specified in the clobber list.

Register file disambiguation


Due to the limited length of register file names, different register files may have similar
names. To distinguish between the different register files, the register file name can be
prefixed by an optional string, if necessary. The prefix has the following syntax:
%<registerfile_name><registerfile_smallest_level>%
Where:
• <registerfile_name> is the name of the scalar or SIMD audio extension register
file
• <registerfile_smallest_level> is a letter specifying the smallest level
accessible for the register file:
– “X” for a single level register file
– “P” for a 2-level register file
– “Q” for a 4-level register file.

6.2 Assumptions
The following assumptions apply.
• Output operand expressions must be lvalues.
• The compiler assumes that the input is consumed before the outputs are produced,
unless an output operand has the ‘&’ constraint modifier (also called “early clobber”).
The compiler does not assign the same register to an input operand and an early
clobber operand. However, the compiler may assign the same register to an input
operand and to a non-early clobber output operand.

6.3 Volatile
The volatile syntax is either:
asm volatile (template : output operands : input operands : clobber list);

or:
__asm__ volatile (template : output operands : input operands : clobber list);

The volatile keyword indicates that an instruction has side effects. A volatile
statement is not deleted if it is reachable. The order of volatile asm statements and, or
other volatile accesses is preserved. A consecutive sequence of volatile asm
statements may not stay perfectly consecutive, since some other instructions may be
scheduled in between. To achieve the effect of keeping instructions perfectly consecutive,
use a single asm instruction.
An asm statement without any operand or clobbers will be treated identically to a volatile
asm statement, the same as for an asm statement without an output operand.

8027948 Rev 15 111/166


GNU ASM UM1237

6.4 Restrictions
The following restrictions apply.
• The compiler does not parse the assembler instruction template; this means that it
does not check if it is valid assembler input.
• Up to 10 operands, results and clobbered registers are allowed.
• Multiple alternative constraints are not supported.
• At -O3 and -O4 optimization levels, the loop nest optimizer is disabled for loops
containing asm statements.

6.5 Differences between the STxP70 core versions


The VLIW/VLIS STxP70-4 is designed to be assembly compatible with STxP70-3, except
for a few instructions. This means that assembly statements written for STxP70-3 should
work on STxP70-4. The main exceptions will be related to:
• the MAKE and MORE instructions, which should be replaced by a unique MAKE32 one on
STxP70-4
• the SIMD comparisons, which are no longer supported on STXP70-4
• the “;;” pattern to be used to separate bundles of instructions. For compatibility
reasons, this pattern becomes mandatory on both STxP70-3 and STxP70-4. Code
without “;;” is still accepted on STxP70-3, but this deprecated syntax is strongly
discouraged.

6.6 GNU ASM optimization


The compiler unrolls loops containing GNU asm statements. The compiler is not aware of
the resource requirements introduced by the opaque asm statement, therefore the unrolling
decision may be less precise compared with other situations.
It is possible to prevent the compiler from unrolling by using either an option or a #pragma.
If the asm statement contains any control-flow, it must be contained completely within the
asm statement.
See Section 3.2.1: #pragma unroll (n) on page 45 for information on #pragma unroll.

112/166 8027948 Rev 15


UM1237 GNU ASM

6.7 Example
The code example in Figure 25 illustrates a typical use of asm statement on STxP70 core.

Figure 25. Example of an asm statement


unsigned int foo(unsigned int * ptr)
{
unsigned int res;
unsigned int count;
unsigned int val;
asm (
" setls L1, L_2 ;;\n\t"
" setlc L1, 8 ;;\n\t"
" setle L1, L_3+-4 ;;\n\t"
" make %0, 0 ;;\n\t"
" make %1, 0 ;;\n\t"
"L_2:\n\t"
" lw %2, @(%3 !+ 4) ;;\n\t"
" cmpneu g0, %2, 0 ;;\n\t"
"g0? bset %0, %0, %1 ;;\n\t"
" add %1, %1, 1 ;;\n\t"
"L_3: \n\t"
: "=&r" (res), "=&r" (count), "=&r" (val), "+r" (ptr)
:
: "g0", "L1"
);
return res;
}

The example in Figure 25 delivers the assembly code given in Figure 26.

Figure 26. Example output of an asm statement


.entry
.global foo
foo:
L_BB1_foo:
or R4, R0, 0 ;;
setls L1, L_2 ;;
setlc L1, 8 ;;
setle L1, L_3+-4 ;;
make R1, 0 ;;
make R2, 0 ;;
L_2:
lw R3, @(R4 !+ 4) ;;
cmpneu g0, R3, 0 ;;
g0? bset R1, R1, R2 ;;
add R2, R2, 1 ;;
L_3:
L_BB2_foo:
or R0, R1, 0 ;;
rts

8027948 Rev 15 113/166


GNU ASM UM1237

6.8 Parsing and optimization of GNU assembly statement


The STxP70 compiler is capable of parsing, analyzing and optimizing the content of the
GNU assembly statements. The main optimizations it can achieve are those carried out at
the lowest level of the compiler, for example scheduling, removal of useless instructions,
constant propagation.
By default, the compiler does not perform any parsing and optimization of user defined
assembly statements. This parsing and optimization feature can be enabled with the option
-mparse-asmstmts.
Some GNU assembly statements are used internally by the compiler to map extension
instructions from C code. By default, those specific internal assembly statements are parsed
and optimized by the compiler. This parsing and optimization feature can be disabled with
the option -mparse-meta-asmstmts.

114/166 8027948 Rev 15


UM1237 Built-in functions

7 Built-in functions

The stxp70cc compiler recognizes a number of built-ins. These are used to generate
assembly language statements that cannot otherwise be expressed through standard ANSI
C/C++.
The built-ins are specified and called just like standard ANSI C/C++ functions and
procedures, using standard types. However, they are treated in a special way by the
compiler. The built-ins apply to the STxP70 core instructions, X3 instructions, floating point
FPx extension instructions, as well as scalar and SIMD audio extension (MPx) instructions.
On the core, FPx and MPx extension, built-ins may be needed to make use of instructions
that the compiler cannot capture automatically, or to work around a missing optimization.
For technical reasons the set of core/X3 built-ins does not currently cover the full set of
instructions. For instance, the load/store instructions are not available as built-ins. This also
includes specific load/store instructions such as the lsetub instruction. Instructions that do
not exist as built-ins can still be mapped by using the GNU assembly statements, see
Chapter 6: GNU ASM on page 109.

7.1 Header files and C-models files


Several header and source files are provided to use built-ins for the core and for the X3, FPx
and MPx extensions.
• A header file named builtins_<extension>.h contains the definitions of the built-
ins themselves, as described in Section 7.2: Naming built-ins on page 116.
• A header and a source file named builtins_model_<extension>.h and
builtins_model_<extension>.c respectively. These files contain the declaration
and the definition of the STxP70 built-ins, modelled as C functions, and acting as
executable specifications. This has the benefit that models can be used to develop
specialized algorithms (DSP, video, and so on) on a workstation, and these can be
immediately and safely ported to the STxP70 core and extensions.
• Finally, a generic header file named <extension>.h facilitates the use of built-ins or
C-models, as explained in Section 7.3: Using built-ins from C on page 120. It includes
the two headers mentioned above, plus the definition of some macros providing a
unified view of built-ins and C-models. Only the generic header file for a given
extension needs to be included in the application source code (see Section 7.3.1 on
page 120).
The <extension> suffix is one of:
• sx for STxP70 core
• x3 for STxP70 X3 extension
• fpx for FPx floating point and integer arithmetics extension
• the alias of the audio scalar or SIMD extension, for instance MP2x
The header and source files mentioned above are delivered with the current compiler
distribution (except for the audio scalar and SIMD extensions).

8027948 Rev 15 115/166


Built-in functions UM1237

7.2 Naming built-ins

7.2.1 General naming scheme, relevant files


The STxP70 built-ins make use of a flexible common naming scheme. The names of
intrinsic built-ins and the corresponding C-models are complementary, and either are
invoked (depending upon context) by using a dedicated simplified macro.
• The basic built-ins defined in file builtins_<extension>.h all have names in the
form:
__builtin_<extension>_<mnemonic>[_<operand_type>].
• Similarly, the names of the C-models found in the files
builtins_model_<extension>.[c|h] are:
__cmodel_<extension>_<mnemonic>[_<operand_type>].
• Finally, the generic macro defined in file <extension>.h gives a unified view of built-
ins and C-models. Its simplified name is built as:
<extension>_<mnemonic>[_<operand_type>].
<extension> is the alias of the core or the extension and is one of the following:
• sx for core
• x3 for X3 extension
• fpx for floating point extension
• the alias of a SIMD extension, for instance MP1x
<mnemonic> is the actual mnemonic of the instruction as it appears in the instruction set of
either the core or the extension.
<operand_type> is optional. It appears only in builtin-ins for the core, and for X3 and FPx
extensions. It is necessary when the given instruction may accept different types of
operands; for instance, either a register or a literal. In such cases, this part of the name
denotes the type of the operand, and may be one of the following:
• this element is absent if the instruction exists with only one type of operand
• r denotes an operand in a general purpose register
• iN denotes a literal operand of size N bits
• g denotes the instruction is guarded (used in X3 built-in names)
Note: The operand types may appear in the name of the built-in in an order that differs from the
order of the corresponding operands in the assembly instruction. For instance, writing the
following built-in:
x3_cancelg_i8_i2_g(0x1, 0x5);
leads to the emission of the following assembly code:
cancelg b1, 5 ;;
The header files are located in the directory
<toolsdir>/stxp70cc/4.1/include/models. This directory is pointed to by default
when the code is compiled using stxp70cc. The <toolsdir> denotes the root folder of the
toolset.
The C-models source files are located in the directory
<toolsdir>/stxp70cc/4.1/src/models.

116/166 8027948 Rev 15


UM1237 Built-in functions

Example:
The core instruction addbp exists with a second operand that is either a register or a literal.
The corresponding built-ins are named as follows:
• int __builtin_sx_addbp_r(int, unsigned short) for register operand
• int __builtin_sx_addbp_i8(int, unsigned short) for u8 operand
The C-models have similar names:
• int __cmodel_sx_addbp_r(int, unsigned short) for register operand
• int __cmodel_sx_addbp_i8(int, unsigned short) for u8 operand
Finally, the unified macros for these built-ins and C-models are:
• sx_addbp_r when used for a register operand
• sx_addbp_i8 when used for an u8 operand
Note: The presence of the two leading underscores on each name denotes (according to the
ISO/IEC 9899 C Standard) that no such name should be defined by the user. More
specifically:
“All identifiers that begin with an underscore and either an upper case letter or another
underscore are always reserved for any use.”

7.2.2 Types and special built-ins for audio scalar/SIMD extensions


The built-ins for audio scalar or SIMD extensions may require data types that cannot be
mapped to C native types. Vector operations may also be present on those extensions. This
means that the naming scheme is slightly different from the scheme used for the core or on
the other extensions.
The naming convention for data type names reflects this scheme. The naming convention
uses an alias for the MPx that is dedicated to audio applications, which is currently either a
scalar (MP1x) or an SIMD (MP2x) extension.
Note: The instructions for these extensions are not currently mapped automatically by the
compiler. They can only be invoked by using built-ins.

Data types
Scalar and SIMD audio extensions include two register banks at most. Each bank may have
up to three consecutive “levels”, numbered from 0 to 2:
• level 0 corresponds to the full width of the register bank
• level 1 corresponds to the two halves of the register
• level 2 corresponds to the four quarters of the register
Furthermore, the register width is 2n bits, ranging from 8 bits to 512 bits inclusive.
The names of the data types that can be allocated to such banks take this structure into
account. They are built using the following template:
<extension>_<registerfile_name><register_level>

8027948 Rev 15 117/166


Built-in functions UM1237

Where:
• <extension> is the alias of the SIMD extension
• <registerfile_name> is the name of the SIMD extension register file
• <register_level> is a letter denoting the type that can be allocated to this level:
– X stands for the full register width at level 0
– P stands for the sub-parts at level 1 (two halves)
– Q stands for the sub-parts at level 2 (four quarters). It is not instantiated on the
current MPx

Aligned data types


Since the data types of those extensions are likely to be larger than the default alignment of
the stack (64 bits), some variants are also provided which impose a consistent alignment.
Those aligned types have the special suffix _aligned tailed to their names.
Example:
The MP2x extension contains a register bank called V with data accesses of 128 bits or
64 bits that supports two vector data types:
• MP2x_VX is a 128-bit data type
• MP2x_VP is a 64-bit data type
• MP2x_VQ is not instantiated
• MP2x_VX_aligned is a 128-bit data type, aligned to a 128-bit boundary

Special macros
The MP1x and MP2x extensions are all provided with a set of dedicated memory access
and register move instructions. The latter can be invoked using dedicated macros that allow
easy accesses to the register bank of the extension.
Example:
In the lines below, __part__ denotes the subpart of the wider register that can be
represented by either a literal or a variable. _word_i_ denotes a 32-bit word to be assigned
to the subpart i of the corresponding register.
• Make macro builds a constant in extension register:
– MP2x_make_VX(_VX_, _word_3_, _word_2_, _word_1_, _word_0_);
– MP2x_make_VP(_VP_, _word_1_, _word_0_);
– MP2x_make_VQ -> not instantiated
• Compose macro composes register subparts into a wider one:
– MP2x_compose_2xVP(_VX_, _VP_1_, _VP_0_);
– MP2x_compose_4xVQ -> not instantiated
– MP2x_compose_2xVQ -> not instantiated
• Split macro decomposes a register subpart into narrower ones:
– MP2x_split_2xVP(_VX_, _VP_1_, _VP_0_);
– MP2x_split_4xVQ -> not instantiated
– MP2x_split_2xVQ -> not instantiated

118/166 8027948 Rev 15


UM1237 Built-in functions

• Insert macro inserts a register subpart into a wider one:


– MP2x_insert_VP_into_VX(_VP_, _VX_, _part_);
– MP2x_insert_VQ_into_VX -> not instantiated
– MP2x_insert_VQ_into_VP -> not instantiated
• Extract macro extracts a register subpart into a wider one:
– MP2x_extract_VP_from_VX(_VP_, _VX_, _part_);
– MP2x_extract_VQ_from_VX -> not instantiated
– MP2x_extract_VQ_from_VP -> not instantiated

Specialized macros
Specialized versions of the insertion and extraction macros are provided to handle cases
where the subpart of the wider register can be hard coded in the built-in name itself.
In the lines below, the macros do not accept an explicit __part__ parameter. The syntax of
the name implicitly corresponds to a given subpart (for instance
MP2x_insert_VP_into_VX0 takes the complete 64-bit register _VP_ and inserts it in the
lowest half of the 128-bit register _VX_).
• Insert macro inserts a register subpart into a wider one:
– MP2x_insert_VP_into_VX0(_VP_, _VX_);
– MP2x_insert_VP_into_VX1(_VP_, _VX_);
– MP2x_insert_VQ_into_VX0-> not instantiated
– MP2x_insert_VQ_into_VX1-> not instantiated
– MP2x_insert_VQ_into_VX2-> not instantiated
– MP2x_insert_VQ_into_VX3-> not instantiated
– MP2x_insert_VQ_into_VP0-> not instantiated
– MP2x_insert_VQ_into_VP1-> not instantiated
• Extract macro extracts a register subpart from a wider one:
– MP2x_extract_VP_from_VX0(_VP_, _VX_);
– MP2x_extract_VP_from_VX1(_VP_, _VX_);
– MP2x_extract_VQ_from_VX0-> not instantiated
– MP2x_extract_VQ_from_VX1-> not instantiated
– MP2x_extract_VQ_from_VX2-> not instantiated
– MP2x_extract_VQ_from_VX3-> not instantiated
– MP2x_extract_VQ_from_VP0-> not instantiated
– MP2x_extract_VQ_from_VP1-> not instantiated

8027948 Rev 15 119/166


Built-in functions UM1237

7.3 Using built-ins from C


This section explains the usage of the include files that are particular to built-ins and C-
models.

7.3.1 Using built-ins on an STxP70 platform


All STxP70 built-ins prototypes are available in the include files presented in Section 7.2:
Naming built-ins on page 116.
To make use of the built-ins of the core, X3 extension, FPx extension or SIMD extensions in
an application, the relevant header files (as listed below) must be included in the application
sources.
#include <sx.h> // for the core,
#include <x3.h> // for the X3 extension,
#include <fpx.h> // for the FPx arithmetic extension,
#include <MP2x.h> // for the MP2x SIMD audio extension.
By default, the stxp70cc compiler generates machine instructions corresponding to the
built-in functions found in the source code.
Example:
#include <sx.h>
...
int fct(int a, int b)
{
int c;
c=sx_lzc(a); // leading zero count
return c;
}
The above code produces the following assembly code, where the lzc instruction of the
core has been properly mapped.
.global fct
fct: // 0x0
L_BB1_fct: // 0x0
lzc R0, R0
rts
In this case, it is equivalent to write the source code as:
#include <builtins_sx.h>
...
int fct(int a, int b)
{
int c;
c=__builtin_sx_lzc(a);
return c;
}
This is because the macro sx_lzc is just mapped on the full built-in __builtin_sx_lzc
by default as soon as the code is compiled for an STxP70 target.

120/166 8027948 Rev 15


UM1237 Built-in functions

7.3.2 Standard use of built-in C-models


By default, the C-model files are designed to permit the use of the C-model on any host
machine except the STxP70. There is no need to modify the source code. However, it is
necessary to:
• add the path of the inc directory of the compiler in the toolset installation to the list of
include paths
• add the file containing the source of the C-models to the list of source files to be
compiled
Example:
Assuming that the toolset is installed in a directory named /home/myfolder, and a small
file containing calls to core built-ins is to be compiled with C-models, using a GCC compiler,
then the command line should contain the -I directive and the following source file.
gcc -I<tools_dir>stxp70cc/4.1/include/models \
<tools_dir>/stxp70cc/4.1/src/models/builtins_model_sx.c ...

7.3.3 Use of built-in C-models on STxP70 target


In a few cases, it may be necessary to compile application code using C-models, rather than
actual machine instructions, even on the STxP70 target. This may be useful, for example,
for testing or debugging purposes.
This can be done either by calling the C-model explicitly, or by using the macro instead
(thereby avoiding having to make any change to the source code). In the example given in
Section 7.3.3, the following lines should appear:
#ifdef __SX__ // code is compiled for a STxP70 target
#undef __SX__ // hide the target and use the non STxP70 settings
#include <sx.h>
#define __SX__ // return to the regular settings for STxP70
...
int fct(int a, int b)
{
int c;
c=sx_lzc(a); // leading zero count C model is used
return c;
}

8027948 Rev 15 121/166


MPx native support UM1237

8 MPx native support

8.1 Goal of the MPx scalar support


The goal of the MPx native support is to generate MPx code automatically from standard C
code. The compiler:
• detects variables that can beneficially be allocated to the MPx register file
• inserts required type conversions in the internal representation (also called “alien type
conversion”)
• detects some patterns of instructions that can beneficially be replaced by MPx integer
or fractional instructions.
Legacy source code that already contains variables explicitly allocated to MPx register file
and calls to MPx built-ins are not affected by these changes. It is compiled as before and the
generated assembly remains the same.
Note: The SIMD variants of the MPx benefit from the same level of (scalar) support as the scalar
variant. This means that the SIMD aspects of those variants are not dealt with by the
compiler.
These new features allow the porting of applications to the MPx with less effort than
previously, because:
• the extension type is no longer required, except in specific cases
• the use of intrinsics is more limited
In addition to pure audio applications, long long arithmetic also benefits from this support.
This chapter describes the scope of the MPx support, and explains how it can be used.
Examples are provided to help with comprehension.

8.2 Control of the MPx native support

8.2.1 Compiler options


By default, native support of the MPx is enabled in the compiler when:
• the code is compiled for the MPx when the option -Mextension=MP1x is set
• optimization level is equal to either -O2, -O3, -O4 or -Os
• the mapping of fractional instructions is enabled using the option
-Mextoption=MP1x:enablefractgen (formerly called -Menablefractgen or
-Mfractsupport, see Section 8.3.4: Pattern recognition for integer and fractional
data types on page 125).
It is possible to disable this native support by using the option: -Mnoextgen

122/166 8027948 Rev 15


UM1237 MPx native support

8.2.2 Function pragmas


Pragmas are provided to provide fine-grain control of MPx support. They allow the
developer to enable or disable MPx support in a given set of functions, declared as
arguments to the pragmas, overriding the option passed to the compiler.
The syntax is as follows:
• #pragma disable_extgen (foo1, foo2) disables MPx scalar support in
functions foo1 and foo2 in the file where it is placed, even if option
-Mextension=MP1x is set and optimization level is higher than -O1.
• #pragma force_extgen (foo1, foo2) forces MPx scalar support in functions
foo1 and foo2 in the file where it is placed, even if option -Mnoextgen is set.
Those file scope pragmas must be placed at the beginning of a file. They affect all variants
of the MPx (that is, both the scalar and SIMD variants).
A more focused version is also provided:
• #pragma disable_specific_extgen (extname, foo1, foo2) disables
scalar support on specified extension in functions foo1 and foo2 in the file where it is
placed, even if option -Mextension=MP1x is set and optimization level is higher than
-O1
• #pragma force_specific_extgen (extname, foo1, foo2) forces scalar
support on specified extensions in functions foo1 and foo2 in the file where it is
placed, even if option -Mnoextgen is set
Those file scope pragmas must be placed at the beginning of a file. They affect all variants
of the MPx (that is, both the scalar and SIMD variants).

8.3 Scope of the MPx native support


This section presents an overview of the features available in MPx native support. It
consists of three main levels:
• built-in based support (already present in toolset 3.2.0)
• support of type equivalence between long long integer and MPx data types (new in
toolset 3.3.0)
• automatic MPx code generation on MPx instructions and long long integer
arithmetic (new in toolset 3.3.0)
Besides the overview presented in this section, the latter two levels are documented in
detail in sections Chapter 8.4: Type equivalence and Chapter 8.5: Automatic code
generation.
Note: The native support now includes a limited pattern recognition facility, which can detect more
complex patterns like mac for both integer and fractional data types.

8.3.1 Built-in based support with MPx_Vx type


This feature has been available since toolset release 3.2.0. With this level of support, the
developer explicitly uses MPx built-ins and MPx types to write an application for the MPx, as
in the following example C code:
MPx_Vx a, b, c;
MPx_ADDD(c, a, b);

8027948 Rev 15 123/166


MPx native support UM1237

This code places three 64-bit variables, a, b and c, in the MPx_Vx register set. It uses the
MPx addition instruction to add a and b, storing the result in c. Since it uses built-ins and
specific data types, this code is neither generic nor portable to another processor.

8.3.2 Support of type equivalence between long long and MPx_Vx


The MPx_Vx type matches the MPx registers, and is therefore semantically equivalent to
the long long native type of the C language. In order to limit the work needed to port
applications to the MPx, the compiler handles the semantic equivalence between MPx_Vx
and long long. This means that the user can declare variables as long long type
instead of MPx_Vx. The compiler is responsible for placing them in the MPx registers, if
there is a benefit to be gained.
With this support, the C code in the example above can be simplified as follows:
long long a, b, c;
MPx_ADDD(c, a, b);
This C code is more portable, as it does not involve any specific type. Only the intrinsic
(MPx_ADDD) is still specific. The code generated by the compiler is the same as the code
generated with MPx types.

Warning: The heuristics currently used to place variables into MPx


registers are based on a quite systematic behavior: as soon
as a variable appears as a MPx_Vx parameter in a MPx
built-in, then it is placed in a MPx register. The explicit use of
MPx_Vx type in new code should be avoided and the long
long data type used instead. More details can be found in
Section 8.6: Important remarks and known limitations on
page 129.

8.3.3 Automatic MPx code generation on long long arithmetic


The MPx instruction set includes long long integer arithmetic instructions (add, sub, shift,
and so forth). In previous versions of the toolset, it was necessary to use built-in functions to
map those instructions. In order to limit the effort when porting applications to the MPx, the
current version of the compiler automatically maps these operations to MPx instructions.
The above example (Section 8.3.2) can now be written in standard C:
long long a, b, c;
c = a + b;
The compiler now ensures both:
• the placement of the variables a, b and c in the MPx registers
• the mapping of the MPx_ADDD instructions
In addition to pure arithmetic operations, the MPx also provides instructions that:
• clear the contents of a MPx register
• copy the contents of one MPx register into another

124/166 8027948 Rev 15


UM1237 MPx native support

The compiler also maps the following instructions when dealing with either an assignment to
zero or a copy operation:
long long a = 0; // mapped to a MPx register clear instruction
long long b = c; // mapped to a MPx register copy instruction

8.3.4 Pattern recognition for integer and fractional data types


The compiler provides pattern recognition capabilities to detect a set of complex patterns
and map them to their equivalent MPx instructions. These capabilities address both integer
and fractional instructions.
The list of recognized instructions is provided in Table 28.

Table 28. Pattern recognition


Mnemonic Equivalent source code Comment

Requires
mafw ll1+((long long)i1*i2)<<1 -Mextoption=MP1x:enable
fractgen
Requires
msfw ll1-((long long)i1*i2)<<1 -Mextoption=MP1x:enable
fractgen
Requires
mpfw ((long long)i1*i2)<<1 -Mextoption=MP1x:enable
fractgen
mahll (long long)((int)ll1+(int)ss1*ss2) 32b MAC with 16b multiplicands
mshll (long long)((int)ll1-(int)ss1*ss2) 32b MAC with 16b multiplicands
shlrr2x (long long)i1<<i2 -
shrr2x (int)(ll1<<i2) -
andcd (ll1 & (!ll2) -
mph (long long)ss1*ss2 -
mpw i1*i2 32b multiplier when no X3/FPx
maw ll1+(long long)i1*i2 -
msw ll1-(long long)i1*i2 -

Note: The three first rows correspond to fractional instructions, which are subject to specific
limitations (Section 8.6.6: Limitations regarding mapping of fractional instructions on
page 131). Their mapping is therefore only performed if the dedicated flag
-Mextoption=MP1x:enablefractgen is set.

8027948 Rev 15 125/166


MPx native support UM1237

8.4 Type equivalence


The example code listed here summarizes the equivalences that are accepted or rejected
by the compiler front-end when MPx support is enabled.

Figure 27. Summary of type equivalence with MPx support


// declaration of variables
MPx_VX gvx; // forced to MPx
long long gll; // candidate to placement in MPx registers
int gi; // to be placed in GPR

// Initialisation of global variables


MPx_VX gvx_2 = 1234LL; // Accepted
MPx_Vx gvx_3 = (long long) 11.3f: // Accepted
MPx_Vx gvx_array[4] = {1, 10, -1, -10}

foo(long long In) {


...
// Assignments of local variable using function parameters
MPx_Vx A = In;

// Assignment of local variable using a constant*


MPx_Vx B = 12LL;

// Constant assignment of global variables


gvx = 0LL; // Accepted
gvx = 1234LL; // Accepted
gvx = 0x12LL; // Accepted
gvx = 0; // Accepted
gvx = 1234; // Accepted
gvx = 0x12; // Accepted

// Variable assignment of global variables


gvx = gll; // Accepted
gll = gvx; // Accepted
gvx = (unsigned long long)gi;// Accepted
gvx = (long long)gi; // Accepted
gi = (int)gvx; // Accepted
gi = (unsigned int)gvx; // Accepted

// Unary/binary operator (not planned to be supported, use long long var instead)
gvx = gvx + gvx; // Not supported (error msg from front-end)

// Usage of long long variable in builtin calls


MPx_ADDD(gll, gll, gll); // Accepted

// Usage of long long variable in builtin calls (in/out param)


MPx_MAFW(gvx, 1, 2); // Accepted

// Usage of long long constant in builtin calls


MPx_ADDD(gll, 1234LL, 123LL); // Accepted

The result of instructions and built-ins in their functional form is always considered unsigned
by convention. Though, the actual type might be signed, and not explcitly visible to the
compiler. This must be taken into account expecially when writting comparisons.
For example, the following code is incorrect:
if (MP1x_SUBS_f(a, b) < 0) {

126/166 8027948 Rev 15


UM1237 MPx native support

Because the MP1x_SUBS_f() result is unsigned, the comparison is considered by the


compiler as always false and the corresponding block is therefore deadcoded.
The main recommendation for built-ins usage is to avoid the functional form and use only
the procedural version in which the type of the result is given explicitely by the developer, for
example:
int res = MP1x_SUBS_f(a, b)
if (res < 0) {
Alternatively, it is also possible to explicitely cast the builtin result to the proper type:
if ((int)MP1x_SUBS_f(a, b) < 0) {
However, the first method described using the procedural version is the preferred method.

8.5 Automatic code generation

8.5.1 Scope and principle


Some of the instructions available on the MPx map operations from C code. This limits the
need for intrinsics, and contributes to performance enhancements. Two cases are possible.
• The operation derived from the C code matches one of the instructions of the MPx. For
instance, this is the case with 64-bit addition, which can be mapped on the MPx ADDD
instruction.
• The operation derived from the C code fits a sequence of instructions which may
belong to either the core or the MPx instruction set. For instance, a 64-bit “min”
operation does not exist on the MPx, but it can be emulated using a sequence of
instructions involving both core and MPx instructions (MPx and core comparisons).
These sequences are called “meta-instructions”.
The second case is especially useful, because it makes more extensive use of the MPx
instructions with lower effort at developer level. In addition to the pure audio applications for
which it is designed, MPx support can also bring significant gains in applications that handle
long long arithmetic.

8027948 Rev 15 127/166


MPx native support UM1237

8.5.2 Operations mapped to single MPx instructions


In the current release of the compiler, the following C operations are directly mapped to
individual MPx instructions:
• 64-bit signed and unsigned addition mapped to ADDD
• 64-bit signed and unsigned subtraction mapped to SUBD
• 64-bit left shift signed and unsigned mapped to SHLRD
• 64-bit arithmetic right shift signed mapped to SHRRD
• 64-bit arithmetic right shift unsigned mapped to SHRURD
• 64-bit logical right shift signed and unsigned mapped to SHRURD
• 64-bit negate signed and unsigned mapped to NEGD
• 64-bit bitwise NOT signed and unsigned mapped to NOTD
• 64-bit bitwise OR signed and unsigned mapped to ORD
• 64-bit bitwise AND signed and unsigned mapped to ANDD
• 64-bit bitwise exclusive OR (XOR) signed and unsigned mapped to XORD
• 64-bit bitwise negate OR (NOR) signed and unsigned mapped to NORD

8.5.3 Operations mapped to meta-instructions


The following operations of the C language are mapped to or emulated by
meta-instructions:
• the ten 64-bit signed and unsigned comparisons (equal to, not equal to, greater than,
less than, greater than or equal, less than or equal)
• the 64-bit signed and unsigned min
• the 64-bit signed and unsigned max
• the 64-bit absolute value
• the 64-bit signed and unsigned multiplication (takes two 64-bit operands and returns a
64-bit result)
• the 32-bit signed and unsigned multiplication (takes two 32-bit operands and returns a
32-bit result)(d)
• the 32-bit to 64-bit conversions
The number of actual instructions present in each meta-instruction depends on the
complexity of the computation: for instance, comparisons are implemented in two
instructions at most, whereas the 64-bit multiplication takes about 25 instructions.

d. This mapping allows 32-bit multiplications to be mapped to the MPx multiplier in case the X3 or FPx 32-bit
multiplier is not present in the configuration. Note however that in this case the resulting code is less efficient
than with the 32-bit multiplier, since it requires one more instruction to extract the lower 32-bit part of the result.

128/166 8027948 Rev 15


UM1237 MPx native support

8.6 Important remarks and known limitations

8.6.1 Avoid mixing MPx and long long


As already mentioned in the warning in Section 8.3.2, the MPx_Vx type should be avoided
when writing new code. The following combinations are especially discouraged:
• simultaneous use of long long and MPx_Vx types in the same function
• C long long arithmetic applied to variables declared as MPx_Vx
For instance, the compiler considers the following code illegal:
MPx_Vx a;
long long b, c;
c = a + b;
Note: These restrictions do not affect legacy code, as this is only based on a combination of MPx
types and built-ins.

8.6.2 Long long passed as function parameters


The ABI of the STxP70 core specifies that function arguments are passed in the core
registers. This applies to long long variables as well, and this must be taken into account
when making the choice to declare a variable as either MPx or long long type.
Consider the following code:
extern int bar ( long long );
int foo ( long long a ) {
return ( bar ( a ) );
}
In this example, it makes no sense to store the long long variables in MPx_Vx registers,
as the core registers are used for the function call in any case.

8.6.3 Long long life span crossing function call


The STxP70 ABI states that MPx registers are all considered to be scratch registers. This
means that they do not retain their values across a function call.
Consider the following code:
int foo() {
long long a;
a = 0L;
bar();
a = a + b;
[...]
}
In this example, if a is promoted to MPx_Vx for its full life span, it may be spilled(e) by the
register allocator, which is extremely costly. A developer must bear this in mind when writing

e. “Spilled” means that the contents of the register are temporarily stored in memory and then restored when
needed.

8027948 Rev 15 129/166


MPx native support UM1237

MPx code. Note that the cost is neither assessed nor handled by the compiler, so it is the
developer’s responsibility to use the most efficient placement.

8.6.4 Efficiency of code in meta-instructions


Currently, the compiler does not optimize the code in the meta-instructions. In those parts of
code, the compiler performs register allocation, but it does not schedule the instructions, nor
does it perform any advanced optimizations. Even if the code has been designed for
efficiency, it is possible that sub-optimal patterns may exist in the final code if MPx native
support is enabled.
This limitation might be overcome in future versions of the tools.

8.6.5 Mapping exact conversions and single statement expressions


The current pattern recognition algorithm is limited, and only able to recognize the
expressions if:
• the conversions are made explicit by casts, and correspond to the exact model of the
instruction to be recognized
• it is located in a single C statement

Exact type conversions


For example, in the following code, the maw instruction is not recognized because of implicit
type conversions:
long long mac;
int a, b;
mac+= a*b; // multiplication result is 32bit

However, the maw instruction is recognized in the following code:


long long mac;
int a, b;
mac+= (long long)a*b; // multiplication result is 64bit

Single statement expressions


A pattern is more likely to be recognized if it occurs within a single statement. For example,
avoid code that resembles the following, as it may result in missed opportunities to map the
maw instruction:
long long mac;
int a, b;
long long tmp;
tmp= (long long)a*b;
mac+= tmp;

On the other hand, the maw instruction is always recognized in the code below:
long long mac;
int a, b;
mac+= (long long)a*b;

130/166 8027948 Rev 15


UM1237 MPx native support

8.6.6 Limitations regarding mapping of fractional instructions


The automatic mapping of fractional instructions is disabled by default. It is enabled only if
the flag -Mextoption=MP1x:enablefractgen(f) is set.
Take care when enabling the automatic mapping of fractional instructions. It may induce two
changes to the behavior:
1. The fractional instructions of the MPx are likely to modify the value of the saturation
flag. Consequently it is not safe to enable these instructions if the code contains
built-ins that use saturation. This change is clearly a non-conservative one.
2. The use of fractional instructions modifies the behavior of overflow. The wrap-around
performed in the scope of integer arithmetic is changed into clamping. Notice that this
change is still conservative, as it remains compliant with the C standard. Though, it
introduces discrepancies between the core and the MPx with regard to the result of
arithmetic overflow. For example, the multiplication of 0x7FFFFFFFFFFFFFFF with
0x7FFFFFFFFFFFFFFF provides the following results:
– without mapping fractional instructions: 0x0000000000000001,
– with mapping of fractional instructions: 0x7FFFFFFC00000001.

Warning: The automatic recognition and mapping of fractional


instructions should be enabled only if the following
conditions are met:
- source code does not already contain built-ins that may
read the saturation flag (otherwise, the semantics may not be
preserved)
- clamping is acceptable for handling arithmetic overflow

8.6.7 Unsupported mapping


The mapping of saturated arithmetic and the mapping of the cross register left shift
instructions are not supported by the compiler.

f. The name of this option has changed: it was formerly named -Menablefractgen or -Mfractsupport,
which was not accurate enough. The former name is still recognized, but its use is strongly discouraged.

8027948 Rev 15 131/166


MPx native support UM1237

8.7 Examples

8.7.1 Direct mapping of long long arithmetic


Consider a simple function that performs the addition and shift of two long long input
parameters, and returns the result as a long long integer:
long long fct(long long a, long long b)
{
long long tmp;

tmp = a + b;
tmp = tmp << 2;
return tmp;
}

No MPx support
When MPx is not present and MPx support is not enabled (stxp70cc -O3 test.c), then
the code generated relies solely on core instructions and runtimes:
.global fct
fct:
L_BB1_fct:
make R4, 0 ;;
addcu R4, R4, R4 ;;
addcu R0, R0, R2 ;;
make R2, 2 ;;
addcu R1, R1, R3 ;;
.global __shll
.type __shll, @function
jr __shll ;;

MPx support
When MPx is present and MPx support is enabled (stxp70cc -O3 -Mextension=MP1x
test.c), then MPx instructions are mapped where needed:
.global fct
fct:
L_BB1_fct:
XRF0RR2X V0, R1, R0 ;;
XRF0RR2X V1, R3, R2 ;;
ADDD V0, V0, V1 ;;
SHLID V0, V0, 2 ;;
XRF0CSX2R R0, V0, V0 ;;
XRF0CSX2R R1, V0, V0 ;;
rts ;;
Note: 1 The moves between the core and the MPx registers are introduced to deal with ABI
constraints. Those instructions are necessary only because the addition is insulated in a
function. They are not present in successive long long arithmetic operations, and do not
represent any extra cost. (Consequently, they are shown here in italic.)
2 The MPx instructions are mapped automatically (ADDD, SHLID) to perform long long
operations.

132/166 8027948 Rev 15


UM1237 MPx native support

8.7.2 Meta-instruction, case of a long long max


Consider a piece of code that involves long long operations that do not fit a single MPx
instruction. The following example is a function to find the maximum value between two
alternatives, a and b.
long long fct(long long a, long long b)
{
long long tmp;
if(a>b) tmp=a;
else tmp=b;
return(tmp);
}

No MPx support
When MPx is not present and MPx support is not enabled (stxp70cc -Os test.c), the
code generated relies only on core instructions and runtimes:
.global fct
fct:
L_BB1_fct:
cmpeq G0, R1, R3 ;;
cmpgtu G1, R0, R2 ;;
andg G0, G0, G1 ;;
cmpgt G1, R1, R3 ;;
org G0, G0, G1 ;;
G4? or R4, R2, 0 ;;
G0? or R4, R0, 0 ;;
G0? or R3, R1, 0 ;;
or R1, R3, 0 ;;
or R0, R4, 0 ;;
rts ;;
The core of the computation are those instructions that are not in italic. The sequence
contains three comparisons and two boolean operations (GMI).

8027948 Rev 15 133/166


MPx native support UM1237

MPx support
When MPx is present and MPx support is enabled (stxp70cc -Os -Mextension=MP1x
test.c), only two comparisons are needed. (The instructions in italic are not taken into
account, as they are mainly needed because of the encapsulation of the code in a function.)
.global fct
fct:
L_BB1_fct:
XRF0RR2X V3, R1, R0 ;;
XRF0RR2X V2, R3, R2 ;;
cmpgtx2r R0, V3, V2 ;;
cmpne G0, R0, 0 ;;
L__0_4:
G4? XRF0CSX2R R0, V0, V2 ;;
G0? XRF0CSX2R R2, V1, V3 ;;
G4? XRF0CSX2R R1, V0, V0 ;;
G0? or R0, R2, 0 ;;
G0? XRF0CSX2R R2, V1, V1 ;;
G0? or R1, R2, 0 ;;
rts ;;

8.7.3 Case of the 32-bit multiplication


Consider the function below, which performs the multiplication of two 32-bit integers and
returns the result as a 32-bit integer:
int fct(int a, int b)
{
return (a*b);
}
The resulting assembly code depends on compiler options and core configuration.

No X3 multiplier, no MPx support


If code is compiled without the X3 32-bit multiplier and without the MPx native support
(stxp70cc -O3 -Mconfig=mult:no test.c), then a runtime is called:
.global fct
fct:
L_BB1_fct:
.global __mulw
.type __mulw, @function
jr __mulw ;;

134/166 8027948 Rev 15


UM1237 MPx native support

X3 multiplier, no MPx support


If code is compiled with the X3 32-bit multiplier, and without the MP1x support (stxp70cc
-O3 -Mconfig=mult:yes test.c), then the 32-bit multiplication available in X3 is
mapped:
.global fct
fct:
L_BB1_fct:
mp R0, R0, R1 ;;
rts ;;

No X3 multiplier, MPx support


If code is compiled without the X3 32-bit multiplier, but with the MPx support enabled
(stxp70cc -O3 -Mextension=MP1x test.c), then the MPx 64-bit multiplier emulates
a 32-bit multiplication. This requires one more instruction to extract the proper 32-bit result:
.global fct
fct:
L_BB1_fct:
mpw V2, R0, R1 ;;
xrf0csx2r R5, V2, V2 ;;
L__0_2:
or R0, R5, 0 ;;
rts ;;
Note: If both the X3 32-bit multiplier and the 64-bit MPx multiplier can be used to map a 32-bit
multiplication, then the X3 multiplier is preferred.

8027948 Rev 15 135/166


Relocatable loader library UM1237

9 Relocatable loader library

This chapter describes how dynamic loading is implemented using the relocatable loader
library RL_LIB for the STxP70.
Table 29 and Table 30 list a number of acronyms and definitions used within this chapter.

Table 29. Acronyms


Acronym Term

DLL Dynamic link library


DSO Dynamic shared object
GOT Global offset table
GP Global pointer – alias of R13 register in STxP70 ABI
PC Program counter register
PIC Position independent code
PID Position independent data

Table 30. Definitions


Term Definition

Sometimes you may need to use some of the functions or data items from
a shareable object, but may wish to replace others with your own
definitions. For example, you may want to use the standard C runtime
library shareable object, libc.so, but to use your own definitions for the
Preemption heap management routines malloc() and free(). In this case it is
important that calls to malloc() and free() within libc.so call your
definition of the routines and not the definitions present in libc.so. Your
definition should override, or preempt, the definition within the shareable
object. This feature of shareable objects is called symbol preemption.
Relocation is the process of connecting symbolic references with symbolic
definitions. For example, when a program calls a function, the associated
call instruction must transfer control to the proper destination address at
Relocation execution. In other words, relocatable files must have information that
describes how to modify their section contents, thus allowing executable
and shared object files to hold the right information for a process's program
image.

136/166 8027948 Rev 15


UM1237 Relocatable loader library

9.1 Introduction to dynamic linking


This section provides an introduction to the concepts used for dynamic linking.

9.1.1 Position-independent code


All code within a dynamic link library (DLL) should be position independent (PIC). This
allows the text segment of the DLL to remain pure so that it can be shared among many
processes. Position-independence imposes two requirements on generated code:
• Code that forms an absolute address referring to any address in the DLL’s text or data
segments is not allowed, because the code would have to be relocated at load time,
making it non-sharable. All branches must be PC-relative, instruction and references to
the data segment and to constants and literals in the text segment must be relative to a
base pointer (typically GP).
• Code that references symbols that are or may be imported from other loaded modules
must use indirect addressing through a global offset table (GOT). The linker is
expected to resolve procedure calls by creating import stubs, and the compilers must
generate indirect loads and stores for data items that may be dynamically bound. In
both cases, the indirection is made through the global offset table, allocated by the
linker, and initialized by the dynamic loader. The global offset table is described in
Procedure calls and long branch stubs through to Materializing function pointers on
page 138

Procedure calls and long branch stubs


Normal procedure calls can be prepared with the call instructions, which use PC-relative
addressing. There are three possible cases at link time:
• If the target is not within the same module, or if it is subject to preemption by an earlier
definition from another loaded module, the linker must allocate an import stub and
resolve the relocation of the call instruction to the stub.
• If the target is known to be within the same module and the displacement is small
enough, the call instruction can be statically resolved to the call target.
• If the target is within the same load module, but the displacement is too far for the call
instruction, the linker must allocate a long branch stub. The long branch stub itself must
satisfy the PIC requirements. If the target is within range of the stub, the stub may use
a PC-relative goto instruction; otherwise, it must load the address of the target from the
global offset table.

Access to the data segment


The DLL’s data segment must be accessed through the GP value that must be set by a DLL
procedure before any use. The GP value is used to access both global offset tables and
statically allocated data.
There are several cases:
• Global variables that are imported from another load module, or that are subject to
preemption by an earlier definition in another load module, must be accessed indirectly
through the global offset table. The compiler must generate code to load a pointer from
the global offset table, using GP-relative addressing mode, and then access the data
item using that pointer. The compiler does not have to allocate the global offset table;
there are relocations defined in the object file format that instruct the linker to allocate a
global offset table slot and to supply the GP-relative address of that slot.

8027948 Rev 15 137/166


Relocatable loader library UM1237

• Statically allocated variables of local scope, or global variables whose definitions are
not subject to pre-emption, may be accessed directly with GP-relative addressing
mode.

Access to constants and literals in the text segment


Constants and literals allocated in the text segment may be accessed with GP-relative
addressing, or with indirect addressing through the global offset table.

Materializing function pointers


Function pointers may be materialized by indirect addressing through the global offset table.
Pointers to functions that are not subject to preemption may be materialized using
GP-relative addressing. Function pointers may not be materialized from immediate
operands.

9.1.2 Import stubs


When the linker determines that a procedure call refers to an entry point in a different load
module, it resolves the reference locally by building an import stub with the same name as
the intended target. The import stub contains code that points to an entry point inside the
global offset table, and transfers control, as described in Section Calling sequence.
Control is then transferred if the compiler gets enough information to know that a particular
entry point is in a different load module, it may generate a calling sequence that obviates the
need for the linker to build an import stub. However, this calling sequence is ABI specific,
and is not specified in this document.

9.1.3 The dynamic loader


The dynamic loader is a component of the operating system software that locates all load
modules belonging to an application, loads them into memory, and binds the symbolic
references among them. Most of the operations of the dynamic loader is specific to the
particular operating system environment, and is further described in the ABIs for those
environments. The common run-time architecture has been designed to minimize the
amount of work involved in the binding process, by concentrating most of the relocation
required in the global offset tables, and by prohibiting any items in the text segment that may
require dynamic relocation.

9.1.4 Rationale
Code in main programs may be absolute or position independent. If an absolute program
imports data from a DLL, the linker is forced to allocate the data in the main program’s data
segment statically (this is commonly called the “.dynbss hack”). When data imported from
DLLs is allocated in the main program’s data segment, the program may be subject to future
compatibility problems when the DLL is replaced with a newer version. This issue may be
avoided by requiring main programs to be position independent, at the cost of some
efficiency in the main program. This compatibility/performance trade-off is not made in the
common run-time architecture; it is left to the specific ABI.

138/166 8027948 Rev 15


UM1237 Relocatable loader library

9.2 Calling sequence


Direct and indirect procedure calls are described in the following sections.

9.2.1 Direct calls


Direct procedure calls follow the sequence of steps shown in Figure 28. The following
paragraphs describe these steps in detail.
1. Preparation for call. Values in scratch registers that must be kept alive across the call
must be saved. They can be saved by either copying them into preserved registers or
by saving them onto the memory stack.
The parameters must be set up in registers and memory as described in the Subroutine
linkage and parameter passing chapter of the STxP70 Application binary interface
manual (7937486).
2. Procedure call. All direct calls are made with a call relative instruction, which writes
the link register (also known as LK) for the return link.
For direct local calls the PC-relative displacement to the target is computed at link time.
Compilers may assume that the standard displacement field in the call instruction is
sufficiently wide to reach the target of the call. If the displacement is too large, the linker
must supply a branch stub at some convenient point in the code; compilers must
guarantee the existence of such a point by ensuring that code sections in the
relocatable object files are no larger than the maximum reach of the call instruction.
Direct calls to other load modules cannot be statically bound at link time, so the linker
must supply an import stub for the target procedure; this import stub obtains the
address of the target procedure from the global offset table. The call instruction can
then be statically bound using the PC-relative displacement to the import stub.
The call instruction saves the return link address in the link register, which is aliased to
general purpose register R14.
3. Import stub (direct external calls only). The import stub is allocated in the load
module of the caller, so that the call instruction may be statically bound to the address
of the import stub. The import stub obtains the address of the target procedure’s entry
point from the global offset table. In position-independent code (PIC), it must access
the global offset table using the current GP (which means that the GP must be valid at
the point of call). In absolute code, it can access the global offset table using an
absolute reference, so the GP does not need to be valid at the point of call. The import
stub then branches to the target entry point.
The detailed operation of an import stub is ABI specific.
When the target of a call is in the same load module, an import stub is not used.
However, for position-independent code, the GP value must still be valid for the caller
at the point of call, so that if the target is an internal function, it can assume that the GP
value is already correctly set.
The compiler may choose to generate calling code that performs the functions of the
import stub. This saves a branch compared to using the import stub, but is less efficient
than a direct call within the same load module. Therefore, the compiler should only do
this if it deduces that call target is in a separate load module, or that there is a high
probability of this.

8027948 Rev 15 139/166


Relocatable loader library UM1237

4. Procedure entry. The prologue code in the target procedure is responsible for
allocating a frame on the memory stack, if necessary.
If it is a non-leaf procedure, it must save the link register in the memory stack frame.
The prologue must also save any preserved registers that will be used in this
procedure.
If it is a position-independent procedure that makes calls or accesses global data, then
it must establish the GP value in the GP register. The GP register (R13) is a preserved
register, and therefore must be saved before being modified. A position-independent
internal function may assume that the GP register already contains the correct value.
A position-independent leaf procedure that accesses global data is not required to put
the GP value in R13, it may use a scratch register instead, thus avoiding the need for
saving and restoring register R13.
5. Procedure exit. The epilogue code is responsible for restoring the link register and any
preserved registers that were saved.
If a memory stack frame was allocated, the epilogue code must deallocate it. Finally,
the procedure exits by branching through the link register with the return instruction.
6. After the call. Any saved values should be restored.

Figure 28. Direct procedure calls

Caller Callee

Prepare the call Import stub Entry


- setup arguments -bad entry address - allocate memory frame
- save registers -goto - save return link
- save registers

Call
- call Procedure body
callee’s load module
caller’s load module

Exit
- restore registers
After the call
- restore return link
- restore registers
- destroy memory frame
- return

9.2.2 Indirect calls


Indirect procedure calls follow nearly the same sequence, except that the branch target is
set indirectly. This sequence is best shown in Figure 29.
1. Preparation for call. Indirect calls are built by loading the entry point address into the
link register. Values in scratch registers that must be kept alive across the call must be
saved, which can be done by either copying them into preserved registers or by saving
them on the memory stack. The parameters must be set up in registers and memory as
described in the Subroutine linkage and parameter passing chapter of the STxP70
Application binary interface manual (7937486).

140/166 8027948 Rev 15


UM1237 Relocatable loader library

2. Procedure call. All indirect calls are made with the call indirect instruction, which reads
and writes the link register. The call instruction saves the return link address in the link
register.
3. Procedure entry, exit, and return. The remainder of the calling sequence is the same
as for direct calls.

Figure 29. Indirect procedure calls


Caller Callee

Prepare the call Entry


- load entry address - allocate memory frame
- setup arguments - save return link
- save registers - save registers

Call
- call Procedure body

caller’s load module

callee’s load module


Exit
- restore registers
After the call
- restore return link
- restore registers
- destroy memory frame
- return

8027948 Rev 15 141/166


Relocatable loader library UM1237

9.3 Introduction to the relocatable loader library


The relocatable loader library (RL_LIB) supports the creation and loading of DSOs
(dynamic shared objects, also known as load modules) in an embedded environment.
RL_LIB implements DSOs as defined in the standard for supporting ELF System V Dynamic
Linking.
Note: For applications that do not rely on advanced OS features (such as file systems, virtual
memory management and multi process segment sharing), use RL_LIB as an alternative to
the standard ELF System V Dynamic Loader (libdl.so).

9.3.1 Run-time model overview


The ELF System V ABI supports several run-time models. Only some run-time models are
suitable for embedded systems without the support of traditional operating system services.
The run-time model for an application dictates the method used for linking and loading.
RL_LIB implements the R_Relocatable run-time model. The application has a main module
and several load modules. The main module is statically linked and loaded. The load
modules are loaded on demand (by explicit calls to the loader) at run-time. The load
modules are loaded at an arbitrary address and dynamic symbol binding is applied by the
loader for symbols undefined in the load modules. In the hierarchy of loaded modules, the
dynamic symbol binding traverses the modules from the bottom up.

9.3.2 Relocatable run-time model


The R_Relocatable run-time model, as implemented by RL_LIB, has the following features:
• one main module loaded at application startup by the system
• several load modules that can load at run-time and unload after use
• several modules can be resident at the same time
• a loaded module can load and unload other load modules (as for the main module)
• load modules can be loaded anywhere
• access to symbols in loaded modules from the loader through a call to the loader library
• the loader performs dynamic symbol binding when loading a module and symbols are
searched in the load modules hierarchy bottom-up (to the main module)
• sharing of code and data objects between modules is achieved by linking to the objects
in a common ancestor
• the loader library is statically linked with the main module
• the system support archive library should be linked with the main module
Figure 30 shows an example of an application that has four load modules A, B, C and D.

142/166 8027948 Rev 15


UM1237 Relocatable loader library

Figure 30. Example of an application with four load modules

printf
Module B
printf
Module A *exec_B

main malloc
printf *exec_A *exec_C
malloc
malloc
*exec_D Module C

printf
Module D
malloc

In Figure 30, curved arrows (from load modules to parent module) represent load time
symbol-binding performed while the load module loads. Straight arrows (from loader module
to loaded module) represent explicit symbol address resolution performed through the
loader library API.
The following describes a possible scenario.
1. At run-time, the main module loads the module A into memory through the
rl_load_file() function.
2. The loader, in the process of loading A into memory, binds the symbol printf
(undefined in A) to the printf function defined in main.
3. The main program uses the rl_sym() function to retrieve a pointer to the function
symbol exec_A in A.
4. For A, the main program loads the module D and references to printf are resolved to
the printf in main. In addition, references to malloc in D are also resolved to the
malloc in main.
5. The main program retrieves a pointer to exec_D in D using the rl_sym() function.
6. The main program (at some point) invokes the function exec_A.
7. The exec_A function loads the two modules B and C.
8. The undefined reference to printf in B is resolved to the printf in main (the loader
searches first in A, and then in main).
9. The undefined reference to malloc in C is resolved to the malloc in A (the loader
searches for and finds it in A). Note that the malloc function called from D (malloc of
main) is then different from the malloc function called from B (or C, or A) which is the
malloc of A.
10. After retrieving symbol addresses using the rl_sym() function, module A can
indirectly call functions or reference data in B and C.
Note: At any time, the main module or the module A can unload one of the loaded modules.

8027948 Rev 15 143/166


Relocatable loader library UM1237

The relocatable code generation model


The relocatable code generation model is the same as the code generation model for the
System V model with the following differences.
• No symbol can be preempted. Dynamic symbol binding always searches the current
module first. This has the effect that a module containing a symbol definition can be
sure that it will use this definition. For example, this enables inlining in load modules.
• Weak references are treated the same way as undefined references in load modules.
Therefore, when traversing the module tree bottom-up, the first definition found is
taken.

9.4 Relocatable loader library API


The relocatable loader library supports loading and unloading a module and for accessing a
symbol address in a module by name. The relocatable loader library is provided as a library
librl.a and its associated header file rl_lib.h.
The functions defined in this API are explained in the following sections.

9.4.1 rl_handle_t type


All the functions manipulating a load module use a pointer to the rl_handle_t type. This
is an abstract type for a load module handle.
A load module handle is allocated by the rl_handle_new() function and deallocated by
the rl_handle_delete() function.
The main module handle is statically allocated and initialized in the startup code of the main
module.
A module handle references one loaded module at a time. To load another module from the
same handle, the previous module must first be unloaded.

144/166 8027948 Rev 15


UM1237 Relocatable loader library

9.4.2 Function descriptions

rl_handle_new Allocate and initialize a new handle


Definition: rl_handle_t *rl_handle_new(
const rl_handle_t *parent,
int mode);
Arguments:
parent The handle of the parent module.
mode mode Determines the RL_LIB chunk mode. Valid values for mode
are:
RL_ONE_CHUNK_MODE (defined to be 0)
RL_MULTIPLE_CHUNK_MODE (defined to be 1)

Returns: The newly initialized handle.


Description: The rl_handle_new() function allocates and initializes a new handle that can be
used for loading and unloading a load module.
The handle of the parent module to which the loaded module will be connected is
specified by the parent argument.
In RL_MULTIPLE_CHUNK_MODE, the mode argument activates two separate memory
allocators: rl_text_memalign for text segments and rl_data_memalign for
data segments. In RL_ONE_CHUNK_MODE, the mode argument activates one global
memory allocator rl_memalign, for any segment type.
Generally, a load module will be attached to the module using this function, therefore
a handle will typically be allocated as follows:
rl_handle_t *new_handle = rl_handle_new(rl_this(),
RL_ONE_CHUNK_MODE);

rl_handle_delete Finalize and deallocate a module handle


Definition: int rl_handle_delete(
rl_handle_t *handle);
Arguments:
handle The handle to deallocate.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_handle_delete() function finalizes and deallocates a module handle.
The handle must not hold a loaded module. The loaded module must have been first
unloaded by rl_unload() before calling this function. If successful, the value
returned is 0. Otherwise the value returned is -1 and the error code returned by
rl_errno() is set accordingly.

8027948 Rev 15 145/166


Relocatable loader library UM1237

rl_this Return the handle for the current module


Definition: rl_handle_t *rl_this(void);
Arguments: None.
Returns: The handle for the current module.
Description: The rl_this() function returns the handle for the current module. If called from the
main module, it returns the handle of the main module. If called from a loaded
module, it returns the handle that holds the loaded module.
This function is used when allocating a handle with rl_handle_new(). It can also
be used, for example, to retrieve a symbol in the current module:
void *symbol_ptr = rl_sym(rl_this(), "symbol");

rl_parent Return the handle for the parent of the current handle
Definition: rl_handle_t *rl_parent(void);
Arguments: None.
Returns: The handle for the parent of the current handle.
Description: The rl_parent() function returns the handle for the parent of the current handle
(as returned by rl_this()).
It may be used, for example, to find a symbol in one of the parent modules:
void *symbol_in_parents = rl_sym_rec(rl_parent(), "symbol");

rl_load_addr Return the memory load address of a loaded module


Definition: const char *rl_load_addr(
rl_handle_t *handle);
Arguments:
handle The handle for the loaded module.

Returns: The memory load address of the loaded module, or NULL.


Description: The rl_load_addr() function returns the memory load address of a loaded
module. It returns NULL if the handle does not hold a loaded module or if the handle
passed is the main program handle.

rl_load_size Return the memory load size of a loaded module


Definition: unsigned int rl_load_size(
rl_handle_t *handle);
Arguments:
handle The handle for the loaded module.

Returns: The memory load size of the loaded module, or 0.


Description: The rl_load_size() function returns the memory load size of a loaded module. It
returns 0 if the handle does not hold a loaded module or if the handle passed is the
main program handle.

146/166 8027948 Rev 15


UM1237 Relocatable loader library

rl_file_name Return the filename associated with the loaded module handle
Definition: const char *rl_file_name(
rl_handle_t *handle);
Arguments:
handle The handle for the loaded module.

Returns: The filename associated with the loaded module handle, or NULL.
Description: The rl_file_name() function returns the filename associated with the loaded
module handle. It returns NULL if no filename is associated with the current loaded
module, if the handle does not hold a loaded module or if the handle passed is the
main program handle.

rl_set_file_name Specify a filename for the handle


Definition: int rl_set_file_name(
rl_handle_t *handle,
const char *f_name);
Arguments:
handle The handle for the module.
f_name The filename to specify for the handle.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_set_file_name() function is used to specify a filename for a handle. This
filename is attached to the next module that will be loaded. It can be used to specify a
filename for modules loaded from memory or to force a different filename for a
module loaded from a file.
This function returns 0 if the filename was successfully set, or -1 and the error code
returned by rl_errno() is set accordingly if a module is already loaded or if the
application runs out of memory.

8027948 Rev 15 147/166


Relocatable loader library UM1237

rl_load_buffer Load a relocatable module into memory


Definition: int rl_load_buffer(
rl_handle_t *handle,
const char *image);
Arguments:
handle The handle for the module.
image The image of the load module.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_load_buffer() function loads a relocatable module into memory from the
image referenced by image.
It allocates the space for the loaded module in the heap, loads the segments from the
memory image of the loadable module, links the module to the parent module of the
handle and relocates and initializes the loaded module.
This function calls the action callback functions for RL_ACTION_LOAD after loading
and before executing any code in the loaded module.
The value 0 is returned if the loading was successful. The value -1 is returned on
failure and the error code returned by rl_errno() is set accordingly.

rl_load_file Load a relocatable module into memory from a file


Definition: int rl_load_file(
rl_handle_t *handle,
const char *f_name);
Arguments:
handle The handle for the module.
f_name The file from which to load the relocatable module.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_load_file() function loads a relocatable module into memory from the file
specified by f_name.
It opens the specified file with an fopen() call, allocates the space for the loaded
module in the heap, loads the segments from the file, links the module to the parent
module of the handle, relocates and initializes the loaded module. The file is closed
with fclose() before returning. This function calls the action callback functions for
the RL_ACTION_LOAD after loading and before executing any code in the loaded
module.
0 is returned if the load was successful, -1 is returned on failure and the error code
returned by rl_errno() is set accordingly.

148/166 8027948 Rev 15


UM1237 Relocatable loader library

rl_load_stream Load a relocatable module into memory from a byte stream


Definition: typedef int rl_stream_func_t (
void *cookie,
char *buffer,
int length);

int rl_load_stream(
rl_handle_t *handle,
rl_stream_func_t *stream_func,
void *stream_cookie);
Arguments:
handle The handle for the module.
stream_func The user specified callback function.
stream_cookie The user specified state.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_load_stream() function loads a relocatable module into memory from a
byte stream provided through a user specified callback function stream_func and
the user specified state stream_cookie.
The callback function must be of type rl_stream_func_t. It is called multiple times
by the loader to retrieve the load module data in the buffer buffer of length length
until the module is loaded into memory. The loader always calls the callback function
with a buffer length strictly greater than 0. The stream_cookie argument passed to
rl_load_stream is passed to the callback function in its cookie parameter. The
cookie parameter is intended to be used by the callback function to update a private
state.
The callback function must return the number of bytes transferred. If the returned
value is less than the given buffer length or is -1, rl_load_stream() will in turn
return an error and the error code returned by rl_errno() is set accordingly.
The rl_load_stream() function allocates the space for the loaded module from
the heap, loads the segments by calling the callback function, links the module to the
parent module of the handle, relocates and initializes the loaded module. This
function calls the action callback functions for RL_ACTION_LOAD after loading and
before executing any code in the loaded module.
0 is returned if the load was successful, -1 is returned on failure and the error code
returned by rl_errno() is set accordingly.
This function can be used as an alternative to rl_load_buffer() or
rl_load_file() to allow any loading method to be implemented.

8027948 Rev 15 149/166


Relocatable loader library UM1237

The following example illustrates how the rl_load_file() function may be


implemented using the rl_load_stream() function:
/* User implementation of the callback function that read from
a file. */
static int rl_stream_read(FILE *file, char *buffer, int length)
{
int nbytes;
nbytes = fread(buffer, 1, length, file);
return nbytes;
}
...
{
/* Loads the module from a file.*/
FILE *file;
int status;
file = fopen(f_name, "rb");
if (file == NULL) { /*... error... */ }
status = rl_load_stream(handle, (rl_stream_func_t
*)rl_stream_read,
file);
if (status == -1) { /*... error... */ }
fclose(file);
}
...

rl_unload Unload a previously loaded relocatable module


Definition: int rl_unload(
rl_handle_t *handle);
Arguments:
handle The handle for the module.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_unload() function unloads a previously loaded relocatable module. It
finalizes, unlinks, and frees allocated memory for the loaded module. This function
calls the action callback functions for RL_ACTION_UNLOAD before unloading and
after having executed finalization code in the module.
The return value is 0 if the unloading is successful, otherwise the return value is -1
and the error code returned by rl_errno() is set accordingly.

150/166 8027948 Rev 15


UM1237 Relocatable loader library

rl_sym Return a pointer reference to the symbol in the loaded module


Definition: void *rl_sym(
rl_handle_t *handle,
const char *name);
Arguments:
handle The handle for the loaded module.
name The symbol in the loaded module.

Returns: The pointer reference to the symbol.


Description: The rl_sym() function returns a pointer reference to the symbol named name in the
loaded module specified by handle. It searches the dynamic symbol table of the
loaded module and returns a pointer to the symbol. The handle parameter can be
the handle of any currently loaded module, or the handle of the main module.
If the symbol is not defined in the loaded module, NULL is returned. It is not generally
an error for this function to return NULL. For example, the user may conditionally call
a specific function only if it is defined in the module.
In this function, as well as in the rl_sym_rec() function, the name parameter must
be the mangled symbol name. For instance, on some targets, C names are mangled
by prefixing the name with an underscore (_). For example, to return a reference to
the printf() function, the symbol name passed to rl_sym() will be “_printf”.

rl_sym_rec Return a pointer reference to the symbol in the loaded module or


one of its ancestors
Definition: void *rl_sym_rec(
rl_handle_t *handle,
const char *name);
Arguments:
handle The handle for the loaded module.
name The symbol in the loaded module.

Returns: The pointer reference to the symbol.


Description: The rl_sym_rec() function returns a pointer reference to the symbol named name
in the loaded module specified by handle or one of its ancestors.
This function searches the dynamic symbol table of the loaded module and returns a
pointer to the symbol if found. If the symbol is not found, the function iteratively
searches in the dynamic symbol table of the parent module until the symbol is found.
The handle parameter can be the handle of any currently loaded module, or the
handle of the main module.
If the symbol is not defined in the loaded module or one of its ancestors, NULL is the
returned. It is not generally an error for this function to return NULL.
The name parameter must be the mangled symbol name as for the rl_sym()
function.

8027948 Rev 15 151/166


Relocatable loader library UM1237

rl_foreach_segment Iterate over all the segments of loaded module and


call the supplied function
Definition: typedef rl_segment_info_t_ rl_segment_info_t;
typedef int rl_segment_func_t (
rl_handle_t *handle,
rl_segment_info_t *seg_info,
void *cookie);

int rl_foreach_segment(
rl_handle_t *handle,
rl_segment_func_t *callback_fn,
void *callback_cookie);
Arguments:
handle The handle for the module.
callback_fn The user specified callback function.
callback_cookie The argument to pass to the function.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_foreach_segment() function iterates over all the segments of the loaded
module handle and calls back the user supplied function. For each segment, the
function callback_fn is called with the following parameters.
handle The handle passed to the function.
seg_info The segment information pointer filled with the current
segment information.
cookie The callback_cookie argument passed to the function.

The segment information returned in seg_info is a pointer to the following structure:


typedef unsigned int rl_segment_flag_t;
#define RL_SEG_EXEC 1
#define RL_SEG_WRITE 2
#define RL_SEG_READ 4
struct rl_segment_info_t_ {
const char *seg_addr;
unsigned int seg_size;
rl_segment_flag_t seg_flags;
};
The user callback function must return 0 on success or -1 on error.
In the case where the callback function returns an error, the
rl_foreach_segment() function returns -1 and the error code returned by
rl_errno is set to RL_ERR_SEGMENTF. Otherwise the function returns 0.

152/166 8027948 Rev 15


UM1237 Relocatable loader library

rl_add_action_callback Add a user action callback function to the user


action callback list
Definition: typedef unsigned int rl_action_t;
#define RL_ACTION_LOAD 1
#define RL_ACTION_UNLOAD 2
#define RL_ACTION_ALL ((rl_action_t)-1)

typedef int rl_action_func_t (


rl_handle_t *handle,
rl_action_t action,
void *cookie);

int rl_add_action_callback(
rl_action_t action_mask,
rl_action_func_t *callback_fn,
void *callback_cookie);
Arguments:
action_mask The set of actions for which the callback function must be
called.
callback_fn The user specified callback function.
callback_cookie The argument to pass to the function.

Returns: Returns 0 for success, -1 for failure.


Description: The rl_add_action_callback() function adds a user action callback function to
the user action callback list. It can be called multiple times with different callback
functions. The same callback function cannot be added more than once.
For each defined action, each callback function is called in the order it was added into
the callback list. The callback functions are not attached to a particular module and
are called for any further loaded/unloaded modules.
This function returns 0 on success and -1 on failure. It does not set any error codes.
This function can fail if a callback function is already in the callback list or if the
program goes out of memory.
The rl_action_t type defines the action flags for module loading/unloading and is
passed to the action function callback. The action flags can be OR-ed to create an
action mask that can be passed to the function rl_add_action_callback(). The
action defined are:
RL_ACTION_LOAD The callback is called just after the module has been loaded in
memory and cache has been synchronized. No module code
has been executed.
RL_ACTION_UNLOAD The callback is called just before the module is unloaded from
memory. No module code will be executed after this point.
RL_ACTION_ALL The callback will be called for any action.

8027948 Rev 15 153/166


Relocatable loader library UM1237

The type for the user action callback function is rl_action_func_t. The
parameters passed to the callback function when it is called are:
handle The handle that performed the action.
action The action performed.
cookie The callback_cookie parameter passed to
rl_add_action_callback().

The callback function returns 0 on success and -1 on failure. In the case of failure, the
loading (or unloading) of the module is undone and the error code returned by
rl_errno() is set to RL_ERR_ACTIONF.

rl_delete_action_callback Remove the given function from the action


callback list
Definition: int rl_delete_action_callback(
rl_action_func_t *callback_fn);
Arguments:
callback_fn The user specified callback function.

Returns: Returns 0 for success, -1 if the callback was not present in the callback list.
Description: The rl_delete_action_callback() function removes the specified callback
function from the action callback list. This function returns 0 if the callback was
removed, or -1 if it was not present in the callback list. No error code is set.

rl_errno Return the error code for the last failed function
Definition: int rl_errno(
rl_handle_t *handle);
Arguments:
handle The handle for the module.

Returns: The error code for the last failed function.


Description: The rl_errno() function returns the error code for the last failed function. Table 31
lists the possible codes.

Table 31. Errors returned by rl_errno()


Possible error causing
Error code Diagnostic
function

RL_ERR_NONE No previous call has failed.


rl_load_buffer(),
Ran out of memory (rl_memalign(),
rl_load_file(),
RL_ERR_MEM rl_text_memalign() or
rl_load_stream(),
rl_data_memalign() failed).
rl_set_file_name()
rl_load_buffer(),
rl_load_file(),
RL_ERR_ELF The load module is not a valid ELF file.
rl_load_stream(),
rl_set_file_name()

154/166 8027948 Rev 15


UM1237 Relocatable loader library

Table 31. Errors returned by rl_errno() (continued)


Possible error causing
Error code Diagnostic
function

rl_load_buffer(),
rl_load_file(),
RL_ERR_DYN The load module is not a dynamic library.
rl_load_stream(),
rl_set_file_name()
rl_load_buffer(),
The load module has invalid segment rl_load_file(),
RL_ERR_SEG
information. rl_load_stream(),
rl_set_file_name()
rl_load_buffer(),
The load module contains invalid rl_load_file(),
RL_ERR_REL
relocations. rl_load_stream(),
rl_set_file_name()
rl_load_buffer(),
A symbol was not found a load time.
rl_load_file(),
RL_ERR_RELSYM rl_errarg() returns the symbol rl_load_stream(),
name. rl_set_file_name()
The symbol is not defined in the module.
rl_sym(),
RL_ERR_SYM rl_errarg() returns the symbol rl_sym_rec()
name.
The file cannot be opened by
RL_ERR_FOPEN rl_load_file()
rl_fopen().
Error while reading the file in
RL_ERR_FREAD rl_load_file()
rl_fread().
Error while loading the file from a
RL_ERR_STREAM rl_load_stream()
stream.
rl_load_file(),
rl_load_buffer(),
RL_ERR_LINKED Module handle is already linked.
rl_load_stream(),
rl_handle_delete()
rl_unload(), rl_sym(),
RL_ERR_NLINKED Module handle is not linked rl_sym_rec(),
rl_foreach_segment()
RL_ERR_SEGMENTF Error in segment function callback. rl_foreach_segment()
rl_load_file(),
RL_ERR_ACTIONF Error in action function callback. rl_load_buffer(),
rl_load_stream()

8027948 Rev 15 155/166


Relocatable loader library UM1237

rl_errarg Return the name of the symbol that could not be resolved
Definition: const char *rl_errarg(
rl_handle_t *handle);
Arguments:
handle The handle for the module.

Returns: The name of the symbol that could not be resolved.


Description: If rl_errno() returns either RL_ERR_RELSYM or RL_ERR_SYM, the rl_errarg()
function returns the name of the symbol that could not be resolved.

rl_errstr Return a string for an error code


Definition: const char *rl_errstr(
rl_handle_t *handle);
Arguments:
handle The handle for the module.

Returns: A string for the error code.


Description: The rl_errstr() function returns a readable string for the error code reported by
rl_errno(). For example:
...
void *sym = rl_sym(handle, "symbol");
if (sym == NULL) fprintf(stderr, "failed: %s\n",
rl_errstr(handle));
...
If symbol is not defined in the module referenced by handle then the following
message is displayed:
failed: symbol not found: symbol

156/166 8027948 Rev 15


UM1237 Relocatable loader library

9.5 Customization
The relocatable loader library defines a number of functions that it uses internally for
providing services such as heap memory management and file access. To provide custom
implementation of these functions, the application in the main module can override these
functions.

9.5.1 Memory allocation


These functions allocate free space for the load module image and for the handle objects:
void *rl_malloc(int size);
void *rl_memalign(int align, int size);
void *rl_text_memalign(int align, int size);
void *rl_data_memalign(int align, int size);
void rl_free(void *ptr);
Where:
• rl_memalign is valid only in RL_ONE_CHUNK_MODE
• rl_text_memalign is valid only in RL_MULTIPLE_CHUNK_MODE for text segments
• rl_data_memalign is valid only in RL_MULTIPLE_CHUNK_MODE for data segments
The default behavior for these functions is to call the standard C library functions
malloc(), memalign() and free() respectively.
Note: If providing a custom implementation, override all three functions.

9.5.2 File management


The rl_load_file() function uses these functions to open, read and close a file handle:
void *rl_fopen(const char *f_name, const char *mode);
int rl_fclose(void *file);
int rl_fread(char *buffer, int eltsize, int nelts, void *file);
The default behavior for these functions is to call the standard C library functions fopen(),
fread() and fclose() respectively.
Note: If providing a custom implementation, override all three functions and link them with the
main program.

9.6 Building a relocatable library or main module


To build a relocatable library that can be loaded by the RL_LIB loader, additional compile
time and link time options must be used.
The following is a simple example of building a hello world loadable module:
stxp70cc -o rl_hello.o -fpic -Mgot=small -c rl_hello.c
stxp70cc -o rl_hello.rl --rlib rl_hello.o
Alternatively, the compile and link phases can be carried out with a single command:
stxp70cc -o rl_hello.rl -fpic -Mgot=small --rlib rl_hello.c
To build a main module suitable for loading a relocatable library, specific link time options
are required. No specific compile time option are required for the main module.

8027948 Rev 15 157/166


Relocatable loader library UM1237

The following is an example of building a main module:


stxp70cc -o prog.o prog.c
stxp70cc -o prog.exe --rmain prog.o
The compile and link phases can be carried out with a single command:
stxp70cc -o prog.exe --rmain prog.c

9.6.1 Importing and exporting symbols


For the relocatable loader system to function, the main module (or a loaded module) must
provide services to the other load modules. To avoid a load error when loading a module, it
is usual for the referenced symbols to be linked into the main module.
When the services are present in a library, the main module imports the corresponding
symbols at link time. However, to import symbols, the linker requires an import script.
stxp70-rltool generates a list of symbols in the form of an import or export script from the
specified input files. Where, the input files are either load modules (relocatable libraries) or a
text file containing a list of symbols:
• An import script is generated from a list of symbols specified in the file symbol_list
(where, symbol_list must have only one symbol on each line), or from one or more
load module files. In the latter case, the stxp70-rltool utility generates an import script
from the set of symbols that the load modules require.
• An export script can be generated to reduce the size of the dynamic symbol table in the
main module or load modules. An export script is not mandatory as all global symbols
are exported by default.
The export script defines the set of symbols (and only these) that must be exported to
the other modules through the dynamic symbol table. These symbols are then
accessible by the load time symbol binding process and by the calls to rl_sym() and
rl_sym_rec().
This utility has both a generic driver stxp70-rltool as well as version specific commands to
invoke it: stxp70v3-rltool and stxp70v4-rltool. All versions of the utility are documented in
the STxP70 utilities reference manual (8210925).
Note: stxp70v3-rltool and stxp70v4-rltool are identical in terms of options and arguments.

Using the relocatable loader import/export utility


This section provides some examples of using the relocatable loader import/export utility.
Two common scenarios where an import script might be generated are:
• When the required services are well defined and a list of symbols can be passed to the
stxp70-rltool utility.
• When the list of services is not defined but the load modules are available and can be
passed to the stxp70-rltool utility. The stxp70-rltool utility generates an import script
from the set of symbols that the load modules require.
The following command generates an import script from a list of symbols specified in the file
prog_import.lst (one symbol per line):
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -s -o prog_import.ld prog_import.lst

158/166 8027948 Rev 15


UM1237 Relocatable loader library

The following command generates an import script that the main module can load from a list
of load modules, liba.rl and libb.rl:
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -o prog_import.ld liba.rl libb.rl

Use the import script to link the main module, for example:
stxp70cc -o prog.exe --rmain object_files.o prog_import.ld

Two common scenarios where an export script might be generated are:


• When an import script is required for the module, the export script can be generated at
the same time. This is because the symbols to export are generally those that are
imported.
• For a load module that has a well known external interface, the export script can be
generated from a list of symbols to export.
The following example shows how to generate an export script and import script for a list of
modules that is then used when linking the main module. Only the symbols from liba.rl
and libb.rl are imported into the main module and exported by it.
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -i -e -o prog_import_export.ld liba.rl libb.rl
stxp70cc -o prog.exe --rmain object_files.o prog_import_export.ld

To generate an export script for a load module with a well defined interface specified in the
file liba_export.lst (one symbol per line):
stxp70-rltool -mcore=[stxp70v3|stxp70v4] -e -s -o liba_export.ld liba_export.lst
stxp70cc -o liba.rl --rlib *.o liba_export.ld

9.6.2 Optimization options


When compiling a load module with the -fpic -Mgot=small option, some overhead
occurs in the generated code to access functions and data objects. Compiler options and
C language extensions can be used to reduce this overhead.
Relocatable libraries are not subject to symbol preemption, therefore, when generating
position independent code, the -fvisibility=protected option can be used in addition
to -fpic -Mgot. The -fvisibility=protected option enables the inlining of global
functions and can be used as a default option for compiling relocatable libraries. For
example:
stxp70cc -o a.o -fpic -Mgot=small -fvisibility=protected a.c
In addition to this option, fine grain visibility can be specified with the
__attribute__((visibility(...)) GNU C extension at the source code level.
For example, if the external interface of a load module is well defined in a header file, the
__attribute__((visibility("protected")) can be attached to each function of
the external interface. To specify that all other defined functions are internal to the load
module, on the command line, use the -fvisibility=hidden option. This combination
of options optimize references from the same file to global objects that are not part of the
interface.
To specify the visibility of each symbol externally with the given <file>, use the -
mvisibility-decl=<file> option. In the case where the external services required by
a module (default visibility) and the external services provided by the module (protected
visibility) are known, all other functions or data objects can be declared as internal (hidden
visibility). This option can be used to specify these visibility declarations. In this case, only

8027948 Rev 15 159/166


Relocatable loader library UM1237

the functions that are external have an associated overhead. The other internal functions
have a very reduced overhead.
For a full inter-procedural optimization of the relocatable library, use the -ipa option. In this
case, when combined with the declaration of external functions, the library is generated with
a minimal overhead for the dynamic linking support.
For detailed information on the visibility specification, refer to the compiler options
documentation and to the ELF System V Dynamic Linking ABI.

9.7 Debugging support


The debugging of dynamically loaded modules is possible in the same way as for System V
dynamic shared objects. The main module debugging information loads at load time of the
application. The load modules debugging information loads at load time of the load
modules.
To update debugging information, the loader maintains a list of loaded modules together
with their filenames (the file contains the debugging information) and the load address of the
module. Each time a new module loads, the loader calls a specific function. The debugger
has to set a breakpoint on this specific function and, when the breakpoint is hit, traverse the
list to find new loaded modules and load the debugging information.
For the STxP70 toolset, the debugger implements the required mechanism for the
automatic debugging of loaded modules.
To find the file that contains the debug information, the loader must know the path to the
load module. This is automatic in the case of rl_load_file() as the filename is specified
in the interface. For the rl_load_buffer() and rl_load_stream() functions, the user
must set the filename with a call to the rl_set_file_name() function.
For example, the following code enables automatic debugging of a load module loaded with
rl_load_buffer():
{
int status;
rl_handle_t *handle = rl_handle_new(rl_this(), 0);
if (handle == NULL) { /* error */ }
#ifdef DEBUG_ENABLED
rl_set_filename(handle, "path_to_the_file_for_the_module");
#endif
status = rl_load_buffer(handle, module_image);
if (status == -1) { /* error */ }
...
}

160/166 8027948 Rev 15


UM1237 Relocatable loader library

9.8 Profiling support


The action callbacks may be used with a profiling support library, or alternatively, a user
defined package can be informed that a segment has just been loaded or is on the point of
being unloaded by using the user action callback interface.
Below is an example that iterates over the segment list and declares the executable
segments to a profiling support library on the loading/unloading of a module.
static int segment_profile(rl_handle_t *handle, rl_segment_info_t
*info,
void *cookie)
{
rl_action_t action = *((rl_action_t *)cookie);
const char *file_name = handle_file_name(handle);
if (file_name != NULL && (info->seg_flags & RL_SEG_EXEC) {
if (action == RL_ACTION_LOAD) {
/* Call profiling interface for adding a code region. */
profiler_add_region(file_name, info->seg_addr, info-
>seg_size);
}
if (action == RL_ACTION_UNLOAD) {
/* Call profiling interface for removing a code region. */
profiler_remove_region(file_name, info->seg_addr,
info->seg_size);
}
}
return 0;
}

static int module_profile(rl_handle_t *handle, rl_action_t action,


void *cookie)
{
rl_foreach_segment(handle, segment_profile, (void *)&action);
return 0;
}

int main()
{
...
if (rl_add_action_callback(RL_ACTION_ALL, module_profile,
NULL)==-1){
fprintf(stderr, "rl_add_Action_callback failed\n");
exit(1);
}
...
status = rl_load_file(handle, file_name);
...
return 0;
}

8027948 Rev 15 161/166


Relocatable loader library UM1237

9.9 Memory protection support


When a new library segment has loaded into memory or is on the point of being unloaded
from memory, a system library (or the user) can use the user-action callback interface to
install a memory protection scheme.
To set user protection support, use the user-action callback, see Section 9.8: Profiling
support.

9.10 STxP70 targeting of RL_LIB


A basic MUTEX implementation is provided in the STxP70 targeting of the pre-compiled
RL_LIB, delivered with the toolset. In addition, because there is no cache activated on the
STxP70, specific functions such as bsp_cache_purge_data and
bsp_cache_invalidate_instruction (which respectively purge the data cache and
handle instruction cache invalidation) are not implemented.
It is the programmer’s responsibility to implement those functions depending on the platform
and STxP70 architecture used. Table 32 provides details of the files’ location in the toolset
distribution.

Table 32. RL_LIB source file location


Functionality Source file

STxP70 v3 MUTEX
<RL_LIB_root>/librl/config/stxp70v3/sys_mutex.[c|h]
implementation
STxP70 v3 Cache
<RL_LIB_root>/librl/config/stxp70v3/targ_elf.[c|h]
management
STxP70 v4 MUTEX
<RL_LIB_root>/librl/config/stxp70v4/sys_mutex.[c|h]
implementation
STxP70 v4 Cache
<RL_LIB_root>/librl/config/stxp70v4/targ_elf.[c|h]
management

162/166 8027948 Rev 15


UM1237 Compiler bugs

10 Compiler bugs

This chapter describes the different categories of compiler bugs and how they should be
reported to STMicroelectronics.

10.1 Identifying a compiler bug

10.1.1 Category 1
The following cases are compiler or toolset bugs:
• the compilation phase ends with an assertion message
• the compilation phase ends with a system error message (core dump, bus error)
• the compilation phase produces an output that cannot be assembled
• the compilation phase never ends, or at least does not end in a reasonable amount of
time
• the compiler produces an error message for code that is valid input
• the compiler produces code that does not compute the expected results (but see
Section 10.1.2)

10.1.2 Category 2
The following case is possibly not a compiler or toolset bug.
• The code is functional under a specific optimization level, but not under another. This
may be due to an existing code bug that is only exposed by aggressive optimization.

10.2 Checks performed by user


The following checks should be performed on your code before reporting a bug:
• check that the code works correctly on at least one other compiler, on another host
• check that the code does not access out-of-bound memory
• check that the source code does not raise any warning when compiled with the -Wall
option
• check that the source code does not make assumptions that may be false: specifically
check restrict annotations, and optimization pragmas
• check that the code does not exercise language edges or does not violate language
standards: an example of undefined behavior is to assume a specific behavior of shift
operators when the shift amount is negative or bigger than the size of the type shifted

8027948 Rev 15 163/166


Compiler bugs UM1237

10.3 Workaround
The following can be carried out to temporarily work-around a compiler bug.
1. Demote the optimization level to -O1 or -O0 when compiling the specific file creating
the problem, either in category 1 or 2. (See Section 10.1.1 and Section 10.1.2.)
2. Remove the optimization pragmas or restrict annotations.
3. Finally, check that you have an up-to-date compiler release.

10.4 Reporting a compiler bug


Carry out the following if a compiler bug is encountered.
1. Obtain your compiler version by running the command stxp70cc -version.
2. If the compiler bug is in category 1 (see Section 10.1.1), prepare a pre-processed input
file that can reproduce the problem.
3. If the compiler bug is in category 2 (see Section 10.1.2), prepare a source set and
Makefile that can reproduce the problem.
4. Supply the full command line that generates the problem.
5. Report the result of the following command in the shell that you use: uname -a.
6. Prepare a description of the expected result and the actual result.
7. Report all the above information through your local ST Field Applications Engineer
(FAE).
Finally, when in doubt, it is preferable that a possible bug is reported than ignored.

10.5 Known bugs and limitations


Please refer to the Release note supplied with the toolset for an up-to-date list of bugs and
limitations.

164/166 8027948 Rev 15


UM1237 Revision history

11 Revision history

Table 33. Document revision history


Date Revision Changes

Earlier revision history entries deleted as they are no longer pertinent.


Update for STxP70 toolset 2012.1.
Updated Documentation suite on page 9 to remove references to STxP70
assembler documents. The assembler as is documented in the GNU documents,
supplied with the toolset.
05-Mar-2012 11
Updated Inlining criteria on page 56 to change -INLINE:none to -INLINE:off.
Added the option -INLINE:size_static and updated the description of -
INLINE:all in Table 21 on page 56.
Added Inlining static functions on page 57.
Update for STxP70 toolset 2012.1 patch 001.
Table 15: Code generation options on page 31 updated -mlib-short-double
17-May-2012 12
and added -mlib-nofloat.
Table 19: C99 support in stxp70cc on page 42 updated throughout.
Update for STxP70 toolset 2012.2.
Updated Table 6 and Table 7 to add config options bypass and bhb.
Updated Table 13 to add -o4 optimization option.
Updated Table 14, --deadcode and -f[no]unroll-loops options.
Updated Table 15 to add -maggressive_unroll option.
19-Sep-2012 13 Updated Table 20 optimization levels.
Updated Table 21, -INLINE:size_static to add -o4 optimization level.
Added Section 4.2: Loop unrolling on page 63.
Updated Table 27, -IPA:mem_placement to include -o4 optimization level.
Updated Section 6.4: Restrictions on page 112.
Updated Section 8.2.1: Compiler options on page 122.
Update for STxP70 toolset 2012.2. Update 01.
Updated rl_handle_new on page 145 to add mode argument.
28-Jan-2013 14
Updated rl_errno on page 154 to expand description of RL_ERR_MEM error code.
Updated Section 9.5.1: Memory allocation on page 157.
Update for STxP70 toolset 2013.1.
Corrected syntax for FPx registers in Chapter 6: GNU ASM on page 109.
Added options to control warnings generated for -fpack-struct in Table 12 on
page 27 and updated description of -fpack-struct in Table 15 on page 31.
08-May-2013 15 Updated the description of -f[no-]math-errno in Table 15 on page 31 to
reflect its changed behavior in this toolset release.
Added GNU assembly parsing options at the end of Table 15 on page 31.
Added Section 6.8: Parsing and optimization of GNU assembly statement on
page 114.

8027948 Rev 14 165/166


UM1237

Please Read Carefully:

Information in this document is provided solely in connection with ST products. STMicroelectronics NV and its subsidiaries (“ST”) reserve the
right to make changes, corrections, modifications or improvements, to this document, and the products and services described herein at any
time, without notice.
All ST products are sold pursuant to ST’s terms and conditions of sale.
Purchasers are solely responsible for the choice, selection and use of the ST products and services described herein, and ST assumes no
liability whatsoever relating to the choice, selection or use of the ST products and services described herein.
No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted under this document. If any part of this
document refers to any third party products or services it shall not be deemed a license grant by ST for the use of such third party products
or services, or any intellectual property contained therein or considered as a warranty covering the use in any manner whatsoever of such
third party products or services or any intellectual property contained therein.

UNLESS OTHERWISE SET FORTH IN ST’S TERMS AND CONDITIONS OF SALE ST DISCLAIMS ANY EXPRESS OR IMPLIED
WARRANTY WITH RESPECT TO THE USE AND/OR SALE OF ST PRODUCTS INCLUDING WITHOUT LIMITATION IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE (AND THEIR EQUIVALENTS UNDER THE LAWS
OF ANY JURISDICTION), OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
ST PRODUCTS ARE NOT AUTHORIZED FOR USE IN WEAPONS. NOR ARE ST PRODUCTS DESIGNED OR AUTHORIZED FOR USE
IN: (A) SAFETY CRITICAL APPLICATIONS SUCH AS LIFE SUPPORTING, ACTIVE IMPLANTED DEVICES OR SYSTEMS WITH
PRODUCT FUNCTIONAL SAFETY REQUIREMENTS; (B) AERONAUTIC APPLICATIONS; (C) AUTOMOTIVE APPLICATIONS OR
ENVIRONMENTS, AND/OR (D) AEROSPACE APPLICATIONS OR ENVIRONMENTS. WHERE ST PRODUCTS ARE NOT DESIGNED
FOR SUCH USE, THE PURCHASER SHALL USE PRODUCTS AT PURCHASER’S SOLE RISK, EVEN IF ST HAS BEEN INFORMED IN
WRITING OF SUCH USAGE, UNLESS A PRODUCT IS EXPRESSLY DESIGNATED BY ST AS BEING INTENDED FOR “AUTOMOTIVE,
AUTOMOTIVE SAFETY OR MEDICAL” INDUSTRY DOMAINS ACCORDING TO ST PRODUCT DESIGN SPECIFICATIONS.
PRODUCTS FORMALLY ESCC, QML OR JAN QUALIFIED ARE DEEMED SUITABLE FOR USE IN AEROSPACE BY THE
CORRESPONDING GOVERNMENTAL AGENCY.
Resale of ST products with provisions different from the statements and/or technical features set forth in this document shall immediately void
any warranty granted by ST for the ST product or service described herein and shall not create or extend in any manner whatsoever, any
liability of ST.
ST and the ST logo are trademarks or registered trademarks of ST in various countries.
Information in this document supersedes and replaces all information previously supplied.
The ST logo is a registered trademark of STMicroelectronics. All other names are the property of their respective owners.

© 2013 STMicroelectronics - All rights reserved

STMicroelectronics group of companies


Australia - Belgium - Brazil - Canada - China - Czech Republic - Finland - France - Germany - Hong Kong - India - Israel - Italy - Japan -
Malaysia - Malta - Morocco - Philippines - Singapore - Spain - Sweden - Switzerland - United Kingdom - United States of America
www.st.com

166/166 8027948 Rev 15

You might also like