Embedded C Programming
Embedded C Programming
Outline
Operations in C Variables and storages in C Multithreading
Vu.HoangAnh@vn.bosch.com
Embedded Programming
Basically, optimize the use of resources: Execution time Memory Energy/power Development/maintenance time Time-critical sections of program should run fast Processor and memory-sensitive instructions may be written in assembly Most of the codes are written in a high level language (HLL): C, C++, or Java
Use of an HLL
Short development cycle Can use modular building blocks for code reusability Can use standard library functions, e.g. delay( ), wait( ), sleep( ) Support for basic data types, control structures, conditions Support for type checking: Type checking during compilation makes the program less prone to errors e.g. type checking on a char does not permit subtraction, multiplication, division
Arithmetic
Integer arithmetic Fastest Floating-point arithmetic in hardware Slower Floating-point arithmetic in software Very slow +, sqrt, sin, log, etc.
slower
Arithmetic Lessons
Try to use integer addition/subtraction Avoid multiplication unless you have hardware Avoid division Avoid floating-point, unless you have hardware Really avoid math library functions
Bit Manipulation
C has many bit-manipulation operators: & Bit-wise AND | Bit-wise OR ^ Bit-wise XOR ~ Negate (ones complement) >> Right-shift << Left-shift Plus assignment versions of each Used often in embedded systems
Faking Multiplication
Addition, subtraction, and shifting are fast Can sometimes supplant multiplication Like floating-point, not all processors have a dedicated hardware multiplier Multiplication by addition and subtraction: 101011 1101 = 43 + 43 << 2 + 43 << 3 = 559 101011 10101100 + 101011000 1000101111
Faking Multiplication
Even more clever if you include subtraction: 101011 1110 = 43 << 1 + 43 << 2 + 43 << 3 1010110 = 43 << 4 - 43 << 2 10101100 = 602 + 101011000 1001011010 Only useful for multiplication by a constant for simple multiplicands when hardware multiplier not available
Faking Division
Division is a much more complicated algorithm that generally involves decisions But, division by a power of two is just a shift: a / 2 = a >> 1 a / 4 = a >> 2 No general shift-and-add replacement for division, but sometimes can use multiplication: a / 1.33333333 = a * 0.75 = a * 0.5 + a * 0.25 = a >> 1 + a >> 2
Multi-way branches
if (a == 1) foo(); else if (a == bar(); else if (a == baz(); else if (a == qux(); else if (a == quux(); else if (a == corge(); 2) 3) switch (a) { case 1: foo(); break; case 2: bar(); break; case 3: baz(); break; case 4: qux(); break; case 5: quux(); break; case 6: corge(); break; }
4)
5)
6)
= = = = =
4; 7; 2; 8; 9;
Strength Reduction
Why multiply when you can add? struct { struct { int a; int a; char b; char b; int c; int c; } *fp, *fe, foo[10]; } foo[10]; fe = foo + 10; for (fp=foo; fp != fe; ++fp){ int i; fp ->a = 77; for(i=0; i<10; ++i){ fp ->b = 88; foo[i].a = 77; fp ->c = 99; foo[i].b = 88; } foo[i].c = 99; }
Function Calls
Modern processors, especially RISC, strive to make this cheap Arguments passed through registers Still has noticeable overhead in calling, entering, and returning: int foo(int a, int b) { int c = bar(b, a); return c; }
Allocate stack space Store return address Swap arguments r4,r5 using r2 as temporary
Call (return in r2) Restore return addr Release stack space
Macro
A named collection of codes A function is compiled only once. On calling that function, the processor has to save the context, and on return restore the context Preprocessor puts macro code at every place where the macroname appears. The compiler compiles the codes at every place where they appear. Function versus macro: Time: use function when Toverheads << Texec, and macro when Toverheads ~= or > Texec, where Toverheads is function overheads (context saving and return) and Texec is execution time of codes within a function Space: similar argument
Outline
Operations in C Variables and storages in C Multithreading
Variables
The type of a variable determines what kinds of values it may take on The greatest savings in code size and execution time can be made by choosing the most appropriate data type for variables, e.g., Natural data size for an 8-bit MCU is an 8-bit variable While C preferred data type is int, in 16-bit and 32-bit architectures, there are needs to address either 8- or 16-bits data efficiently Double precision and floating point should be avoided wherever efficiency is important
Storage Classes in C
/* fixed address: visible to other files */ int global_static; /* fixed address: visible within file */ static int file_static; /* parameters always stacked */ int foo(int auto_param) { /* fixed address: only visible to func */ static int func_static; /* stacked: only visible to function */ int auto_i, auto_a[10]; /* array explicitly allocated on heap */ double *auto_d=malloc(sizeof(double)*5); /* return value in register or stacked */ return auto_i; }
Static Variables
When applied to variables, static means: A variable declared static within body of a function maintains its value between function invocations A variable declared static within a module, but outside the body of a function, is accessible by all functions within that module For embedded systems: Encapsulation of persistent data Modular coding (data hiding) Hiding of internal processing in each module Note that static variables are stored globally, and not on the stack
Volatile Variables
A volatile variable is one whose value may be change outside the normal program flow In embedded systems, there are two ways this can happen: Via an interrupt service routine As a consequence of hardware action It is considered to be very good practice to declare all peripheral registers in embedded devices as volatile Volatile variables are never optimized
C Unions
Like structs, but shares the same storage space and only stores the most-recently-written field union { int ival; float fval; char *sval; } u; Useful for arrays of dissimilar objects to save space Potentially very dangerous: not type-safe Good example of Cs philosophy: provide powerful mechanisms that can be abused
Layout of Storage
Modern processors have byte-addressable memory But, many data types (integers, addresses, floatingpoint) are wider than a byte Modern memory systems read data in 32-, 64-, or 128bit chunks Reading an aligned 32-bit value is fast: a single operation
Layout of Storage
Slower to read unaligned value: 2 reads plus shift Most languages pad layout of records for alignment restrictions struct padded { int x; /* 4 bytes */ char z; /* 1 byte */ short y; /* 2 bytes */ char w; /* 1 byte */ };
Memory Alignment
Memory alignment can be simplified by declaring first 32-bit variables, then 16-bit, then 8-bit. Porting this to a 32-bit architecture ensures that there is no misaligned access to variables, thereby saving processor time. Organizing structures like this means that we are less dependent upon tools that may do this automatically and may actually help these tools.
Single-size-object pool:
Fit, allocation, etc. much faster Good for object-oriented programs
38
Memory-Mapped I/O
Magical memory locations that, when written or read, send or receive data from hardware Hardware that looks like memory to the processor, i.e., addressable, bidirectional data transfer, read and write operations. Does not always behave like memory: Act of reading or writing can be a trigger (data irrelevant) Often read- or write-only Read data often different than last written Latency of operations
Outline
Operations in C Variables and storages in C Multithreading
Thread/Task Safety
Since every thread/task has access to virtually all the memory of every other thread/task, flow of control and sequence of accesses to data often do not match what would be expected by looking at the program Need to establish the correspondence between the actual flow of control and the program text To make the collective behavior of threads/tasks deterministic or at least more disciplined
43
44
If count was 3 before these run, does Thread 1 return TRUE or FALSE?
45
46
Interleaving 1
Thread 1 tmp1 = count (=1) Thread 2 tmp2 = count (=1) tmp2 = tmp2 + 2 (=3) count = tmp2 (=3) tmp1 = tmp1 + 1 (=2) count = tmp1 (=2)
47
Interleaving 2
Thread 1 tmp1 = count (=1) tmp1 = tmp1 + 1 (=2) count = tmp1 (=2) tmp2 = tmp2 + 2 (=3) count = tmp2 (=3)
48
Interleaving 3
Thread 1 Thread 2 tmp1 = count (=1) tmp1 = tmp1 + 1 (=2) count = tmp1 (=2) tmp2 = count (=2) tmp2 = tmp2 + 2 (=4) count = tmp2 (=4)
49
Thread Safety
A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads Must satisfy the need for multiple threads to access the same shared data Must satisfy the need for a shared piece of data to be accessed by only one thread at any given time Potential thread unsafe code: Accessing global variables or the heap Allocating/freeing resources that have global limits (files, sub-processes, etc.) Indirect accesses through handles or pointers
50
52
53
Critical Section
For ensuring only one task/thread accessing a particular resource at a time Make sections of code involving the resource as critical sections The first thread to reach the critical section enters and executes its section of code The thread prevents all other threads from their critical sections for the same resource, even after it is context-switched! Once the thread finished, another thread is allowed to enter a critical section for the resource This mechanism is called mutual exclusion
54
Mutex Strategy
Lock strategy may affect the performance Each mutex lock and unlock takes a small amount of time If the function is frequently called, the overhead may take more CPU time than the tasks in critical section
55
Lock Strategy A
Put mutex outside the loop If plenty of code involving the shared data If execution time in critical section is short
thread_function() { pthread_mutex_lock(); while(condition is true) { access shared_data; //code strongly associated with shared data //or the execution time in loop is short } pthread_mutex_unlock(); }
56
Lock Strategy B
Put mutex in loop If code loop too fat If code involving shared variable too few
thread_function() { while(condition is true) { tasks that does not involve the shared data pthread_mutex_lock(); access shared_data; pthread_mutex_unlock(); } }
57