C-Programming-Optimization Techniques Class 01
C-Programming-Optimization Techniques Class 01
Optimization Techniques
Profiling Tools 1
Content
Program optimization – introduction
Optimization techniques for embedded systems
development
C for Embedded systems
2
Session Objectives
To learn the importance of optimization of the program
To know the different optimization techniques for
embedded systems design
To understand why to use c for embedded system
development
3
Introduction
Profiling Tools 4
The Problem
PC speed increased 500 times since 1981, but today’s
software is more complex and still hungry for more
resources
How to run faster on same hardware and OS
architecture?
Highly optimized applications run tens times faster
than poorly written ones
Using efficient algorithms and well-designed
implementations leads to high performance
applications
5
Writing Fast Programs
Use a fast algorithm
It does not make sense to optimize a bad algorithm
Implement it efficiently
Detect hotspots using profiler and fix them
Understanding of target system architecture is
often required – such as cache structure
Use platform-specific compiler extensions –
memory pre-fetching
cache control-instruction
branch prediction
SIMD instructions
6
Writing Fast Programs
Use good coding practices
Use good data structures
Apply appropriate optimization techniques
Optimizing code takes time and reduces source code
readability
7
Optimizing Embedded Software
Embedded software often runs on processors with
limited computation power, thus optimizing the code
becomes a necessity
Program can be made either faster or smaller, but not
both
An improvement in one of these areas can have a
negative impact on the other
It is up to the programmer to decide which of these
improvements is most important to her/him
Recommendation: reduce the size of your program
8
Optimizing For Program Size
Goal:
Reduce hardware cost of memory
Reduce power consumption of memory units
Two opportunities:
Data
Reuse constants, variables, data buffers in
different parts of code
Requires careful verification of correctness
9
Cost Of High Performance
10
Performance: Where To Look
“Maximize performance - who knows where to
optimize and where not to optimize”
Spend your time optimizing the portions of code where
the most time is taken
Run a compiled program to learn where that
program spends its time
May profile other computational resource usage -
Space, Power, I/O
Not easy to estimate this resource usage by static
analysis (requires dynamic)
11
Performance: Where To Look
Problem: You're given a program's source
code (which someone else wrote) and asked
to improve its performance by at least 20%
Where do you begin?
Look at source code and try to find
inefficient C code
Try rewriting some of it in assembly
12
Performance: Where To Look
How to figure out where a program is spending its time?
Count every static instruction - to know which routines
(functions) were the biggest
Big deal, large functions that aren't executed often
don't really matter
Count every dynamic instruction – to know which
routines executed the most instructions
Excellent! It tells the “relative importance” of each
function
But doesn't account for memory system
13
The Software Optimization
Process Hotspots are areas in
your code that take a
long time to execute
Create benchmark
Find hotspots
Retest using
benchmark Investigate causes
Modify application
14
Extreme Optimization Pitfalls
Large application’s performance cannot be improved
before it runs
Build the application then see what machine it runs on
Runs great on my computer…
Debug versus release builds
Performance requires assembly language
programming
Code features first then optimize if there is time
leftover
15
Key Point:
16
90/10 Rule
90% of execution time is spent in 10% of code
So the ‘hot’ 10% is the code that must be optimized
Optimization takes time, but gives efficient code – so
only use for 10%
Simple interpretation is quick, but gives slow code –
use for 90%
Tradeoff – need to get balance right!
17
How To Find Performance
Bottlenecks
Determine how the system resources are being utilized to
identify system-level bottlenecks
Measure the execution time for each module and function
in the application
Determine how the various modules running on the
system affect the performance of each other
Identify the most time-consuming function calls and call
sequences within the application
Determine how the application is executing at the
processor level to identify microarchitecture-level
performance problems
18
Improving Program Performance
Compiler writers try to apply several standard
optimizations - Do not always succeed
Compiler writers sometimes apply aggressive
optimizations
Often not “informed” enough to know that change
will help rather than hurt
Optimizations based on specific
architecture/implementation characteristics can be
very helpful
Much harder for compiler writers because it
requires multiple, generally very different, “back-
end” implementations
19
Improving Program Performance
How can one help?
Better code, algorithms and data structures (of
course)
Re-organize code to help compiler find
opportunities for improvement
Replace poorly optimized code with assembly code
(i.e., bypass compiler)
20
Writing Efficient C code
To write efficient C code, you must be aware of areas
The C compiler has to be conservative
21
Performance Tools Overview
Timing mechanisms
Stopwatch : UNIX time tool
Software profiler
Gprof, VTune, Visual C++ Profiler, IBM Quantify
Memory debugger/profiler
Valgrind , IBM Purify, Parasoft Insure++
22
Optimization Techniques
• Bad memory management has serious impacts
• Poor data locality causes high power dissipation
• Poor memory throughput leads to poor
performance
• Optimization techniques
• Platform independent
• Loop transformation
• Data reuse
• Processor partitioning
23
Optimization Techniques
Architecture specific
Memory modeling optimization
periodic
Embedded systems – address sequence might
not be periodic
24
Optimization Techniques
The "scope" of the optimization:
Local optimizations - Performed in a part of one procedure.
Common sub-expression elimination (e.g. those occurring when
translating array indices to memory addresses.
Using registers for temporary results, and if possible for
variables.
Replacing multiplication and division by shift and add operations.
Global optimizations - Performed with the help of data flow
analysis and split-lifetime analysis.
Code motion (hoisting) outside of loops
Value propagation
Strength reductions
Inter-procedural optimizations
25
Optimization Techniques
What is improved in the optimization:
Space optimizations - Reduces the size of the
executable/object.
Constant pooling
Dead-code elimination.
26
Optimization Techniques
There are important optimizations not covered above,
e.g. the various loop transformations:
Loop unrolling - Full or partial transformation of a
loop into straight code
Loop blocking (tiling) - Minimizes cache misses by
replacing each array processing loop into two
loops, dividing the "iteration space" into smaller
"blocks"
Loop interchange - Change the nesting order of
loops, may make it possible to perform other
transformations
Loop distribution - Replace a loop by two (or more)
equivalent loops
Loop fusion - Make one loop out of two (or more)
27
C Language In Embedded Systems
Profiling Tools 28
C Language In Embedded Systems
A number of causes to the increased popularity of C in
embedded system area:
The ever-increasing complexity of applications drives
programmers from assembly to the high-level
languages
The high-level programming language C offers good
support for high-speed, low-level I/O operations
Programmers of embedded applications particularly
appreciate this mixed high/low-level approach
In comparison to other high-level language compilers,
C language compilers tend to deliver more condensed
code size
29
C Language In Embedded Systems
Virtually all mathematical modeling tools generate C
source code
C offers significant productivity gains with opportunities
for
Code re-use
Improved code maintenance
Ongoing developments over the life of the application
C can be written in a structured manner that reduces
the chance of producing errors
C can also be written in a very condensed manner,
which is hard to comprehend and dramatically
increases the likelihood of introducing errors
30
C Language In Embedded Systems
The compiler does not necessarily detect small typing
errors
The operators &&, &, ||, |, +=, =, and ==, and think
of the ease with which a typo will still lead to
perfectly valid C code
Not every programmer is fully aware of the effects of
all the possible constructs in the C language
Casts (implicit or explicit) can cause both confusion
and errors
31
C Language In Embedded Systems
One of the main reasons that C compilers do a great
job of generating compact, efficient code is because of
the limited run-time checking in C
There are no provisions in C that would prevent
arithmetic exceptions such as divide by zero,
overflow, validity of addresses or pointers, or
surpassing array boundaries from causing a
runtime software failure
It is therefore easy to understand that programmers
with a special interest in writing robust, consistent
code have a concern with the programming
language C
32
C Language In Embedded Systems
Many of the companies developing safety-related
embedded applications have written guidelines to
restrict the use of error-prone C constructs with the
intention of reducing the probability of errors
The goal of these standards is to increase portability,
reduce maintenance, and above all improve clarity
Mixed coding style is harder to maintain than bad
coding style
33
C Language In Embedded Systems
These standards recognize that individual
programmers have the right to make judgments about
how best to achieve the goal of code clarity
All code should be ANSI standard and should compile
without warning under at least its principal compiler
Any warnings that cannot be eliminated should be
commented in the code
34
Optimizing C Code
Profiling Tools 35
Help From The Compiler
Always use compiler optimization settings to build an
application for use with performance tools
Understanding and using all the features of an
optimizing compiler is required for maximum
performance with the least effort
Use a compiler that supports your CPU
Avoid compiler optimization when debugging
Compiler optimization may:
Cause certain variables to vanish
Prevent stepping through each line of the code
Make it impossible to place breakpoints freely
Identify your machine to the compiler
gcc -march=athlon
36
Help From The Compiler
Ask the compiler to unroll loops
gcc -funroll-loops
gcc -funroll-all-loops
37
Gcc Optimization Levels
O0
don’t optimize
reduce cost of compilation
make debugging possible
O1
basic optimizations for execution time and space reduction
only functions declared as inline are expanded inline
only variables declared as register are placed in registers
O2
most optimization flags are turned on
compiler optimizes variable reister usage
does not do any space-speed trade-offs (ie no inlines)
O3
turns on all available optimization flags
compiler will attempt inlining for all compact functions
code generated is much larger than 02 but only slightly faster
38
Optimizing Compiler : Choosing
Optimization Flags Combination
39
Optimizing Compiler’s Effect
40
Helping The Compiler
Variables
Avoid complicated pointer arithmetic; use array
indexes
Use aliases
41
Helping The Compiler
Functions
Declare compact functions as inline
42
Helping The Compiler
Control flow
Simple design will often prevent extra branches
If..else…
Switch
Loop breaking
43
Helping The Compiler
Files
Keep closely related functions together
Libraries
Use functions best suited for the task
44
Session Summary
Software optimization doesn’t begin where coding
ends – It is ongoing process that starts at design
stage and continues all the way through development
• Optimization techniques
• Platform independent
• Loop transformation
• Data reuse
• Processor partitioning
45