Unit-1 - Embedded System - Dr. M. R. Arun
Unit-1 - Embedded System - Dr. M. R. Arun
Unit-1 - Embedded System - Dr. M. R. Arun
Dr. M.R.Arun
UNIT I INTRODUCTION TO EMBEDDED
COMPUTING AND ARM PROCESSORS
Processor
Memory
Cores
Designers in many fields must be able to identify where microprocessors can be used,
design a hardware platform with I/O devices that can support the required tasks, and
implement software that performs the required processing.
Embedding Computers
2. Low power and low cost also drive us away from PC architectures
and toward multiprocessors. Personal computers are designed to
satisfy a broad mix of computing requirements and to be very
flexible. Those features increase the complexity and price of the
components. They also cause the processor and other components
to use more energy to perform a given function.
COMPLEX SYSTEMS AND MICROPROCESSORS contd…
■ Platform: The platform includes the bus and I/O devices. The
platform components that surround the CPU are responsible for
feeding the CPU and can dramatically affect its performance.
■ Program: Programs are very large and the CPU sees only a small
window of the program at a time. We must consider the structure
of the entire program to determine its overall behavior.
Contd….
1. Requirements
EX:
GPS Moving Map
The architecture is a plan for the overall structure of the system that
will be used later to design the components that make up the
architecture.
Hardware
Software
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
4. Designing Hardware and Software Components
The components will in general include both hardware—FPGAs, boards, and so on and
software modules.
By building up the system in phases and running properly chosen tests, we can often find
bugs more easily.
If we debug only a few modules at a time, we are more likely to uncover the simple bugs
and able to easily recognize them.
Careful attention to inserting appropriate debugging facilities during design can help
ease system integration problems, but the nature of embedded computing means that
this phase will always be a challenge.
FORMALISMS FOR SYSTEM DESIGN
(STRUCTURAL DESCRIPTION & BEHAVIORAL DESCRIPTION)
An object describing a display (such as a CRT screen) is shown in UML notation in Figure
A class is a form of type definition—all objects derived from the same class have the
same characteristics, although their attributes may have different values.
A class defines the attributes that an object may have.
It also defines the operations that determine how the object interacts with the rest of
the world.
There are several types of relationships that can exist between objects and classes:
■ Association occurs between objects that communicate with each other but have no
ownership relationship between them.
■ Composition is a type of aggregation in which the owner does not allow access to the
component objects.
1. A signal is an asynchronous
occurrence. It is defined in UML by
an object that is labeled as a
<<signal>>.
2. A call event follows the model of a
procedure call in a programming
language.
rcvr motor
power
supply
console
The user sends messages to the train with a control box attached to the tracks.
The control box may have familiar controls such as a throttle, emergency stop button, and so
on.
Since the train receives its electrical power from the two rails of the track, the control box
can send signals to the train over the tracks by modulating the power supply voltage.
The control panel sends packets over the tracks to the receiver on the train.
The train includes analog electronics to sense the bits being transmitted and a control system
to set the train motor’s speed and direction based on those commands.
Each packet includes an address so that the console can control several trains on the same
track; the packet also includes an error correction code (ECC) to guard against transmission
errors.
This is a one-way communication system—the model train cannot send commands back to the
user.
DCC was created to provide a standard that could be built by any manufacturer so that
hobbyists could mix and match components from multiple vendors.
■ Standard S-9.1, the DCC Electrical Standard, defines how bits are encoded on
the rails for transmission.
■ Standard S-9.2, the DCC Communication Standard, defines the packets that
carry information.
The DCC standard does not specify many aspects of a DCC train system. It doesn’t
define the control panel, the type of microprocessor used, the programming language
to be used, or many other aspects of a real model train system. The standard
concentrates on those aspects of system design that are necessary for interoperability.
Basic system commands
set-speed speed
(positive/negative)
set-inertia inertia-value (non-
negative)
estop none
Typical control sequence
:console :train_rcvr
set-inertia
set-speed
set-speed
estop
set-speed
Conceptual Specification
Digital Command Control specifies some important aspects of the system,
particularly those that allow equipment to interoperate. But DCC deliberately does
not specify everything about a model train control system
Fig2: UML collaboration diagram for major subsystems of the train controller system
Fig: A UML class diagram for the train controller showing the composition of the subsystems
Console physical object classes
knobs* pulser*
sender* detector*
panel motor-interface
speed: integer
train-number() : integer
speed() : integer
inertia() : integer
estop() : boolean
new-settings()
Transmitter and receiver classes
transmitter receiver
current: command
new: boolean
send-speed(adrs: integer,
speed: integer)
send-inertia(adrs: integer, read-cmd()
val: integer) new-cmd() : boolean
set-estop(adrs: integer) rcv-type(msg-type:
command)
rcv-speed(val: integer)
rcv-inertia(val:integer)
Formatter class
Formatter class holds state for
each train, setting for current
formatter train.
The operate() operation
performs the basic formatting
current-train: integer task.
current-speed[ntrains]: integer
current-inertia[ntrains]:
unsigned-integer
current-estop[ntrains]: boolean
send-command()
panel-active() : boolean
operate()
Control input sequence diagram
:knobs :panel :formatter :transmitter
change in read panel
control panel-active
change in speed/
settings
inertia/estop
number
new-settings
set-knobs
Formatter operate behavior
update-panel()
idle
send-command()
other
Panel-active behavior
T
current-train = train-knob
panel*:read-train() update-screen
changed = true
F
T
panel*:read-speed() current-speed = throttle
changed = true
F
... ...
Instruction sets preliminaries
In this topic, we begin our study of microprocessors by studying instruction sets—”The
programmer’s interface to the hardware”
A Harvard architecture.
A von Neumann architecture computer.
Which Architecture is Best Suited for
µp and DSP?
Von Neumann Architecture Harvard Architecture
Stored program
concept (store
program code along
with data)
Computer Architecture Contd…
The CPU has several internal registers that store values used
internally. One of those registers is the program counter
(PC),which holds the address in memory of an instruction. The
CPU fetches the instruction from memory, decodes the
instruction, and executes it.
• Advantages Disadvantages
• A smaller die size Poor code density compared with
CISC’s
– A simpler processor requires
Doesn’t execute x86 code
fewer transistors and less
silicon area.
• A shorter development time
– Less design effort and
therefore a lower cost
• A higher performance
– Simpler instructions are
executed faster.
Instruction set characteristics
• Fixed vs variable length.
• Addressing modes.
• Number of operands.
• Types of operations supported.
Programming model
• Programming model: Registers visible to the
programmer.
• Some registers are not visible (IR).
ARM – What is it?
• ARM stands for Advanced RISC Machines
The Toshiba 46HM94 46-inch The Nano IPod Samsung S3FJ9SK Smartcard IC
Television
History of ARM
mechanism
are treated as data by the reg
device, such as the data read or CPU
written by a disk.
■ Status registers provide data
information about the device’s reg
operation, such as whether the
current transaction has
completed.
I/O Application: 8251 UART
• Universal asynchronous receiver transmitter
(UART) : provides serial communication.
• 8251 functions are integrated into standard PC
interface chip.
• Allows many communication parameters to be
programmed.
Serial communication
no
char
time
Serial communication parameters
• Baud (bit) rate.
• Number of bits per character (5 to 8).
• Parity/no parity.
• Even/odd parity.
• Length of stop bit (1, 1.5, 2 bits).
8251 CPU interface
The UART includes one 8-bit register that buffers characters
between the UART and the CPU bus.
The Transmitter Ready output indicates that the transmitter is
ready to accept a data character; the Transmitter Empty signal
goes high when the UART has no characters to send.
On the receiver side, the Receiver Ready pin goes high when the
UART has a character ready to be read by the CPU.
status
(8 bit)
CPU xmit/
8251 rcv serial
data port
(8 bit)
Programming I/O devices
• Two types of instructions can support I/O:
– special-purpose I/O instructions;
– memory-mapped load/store instructions.
• But ARM…………………….. ?
Programming I/O devices contd…
1.ARM memory-mapped I/O
(Programs using normal R/W instructions to
communicate with the devices)
• Example
• Define location for device:
DEV1 EQU 0x1000
• Read/write code:
LDR r1,#DEV1 ; set up device
address
LDR r0,[r1] ; read DEV1
LDR r0,#8 ; set up value to write
STR r0,[r1] ; write value to device
Programming I/O devices contd…
2.Poke and Peek (as like push and pop)
• To write I/O devices through High Level Language
– Done through pointers, since C compiler hides
variables address from us
ack
Y
Y N
bus error timeout? vector?
Y
call table[vector]
Supervisor mode
Complex systems are often implemented as several programs that communicate
with each other. These programs may run under the command of an operating
system. It may be desirable to provide hardware checks to ensure that the
programs do not interfere with each other.
For example,
By erroneously writing into a segment of memory used by another program.
In such cases it is often useful to have a supervisor mode provided by the CPU.
Normal programs run in user mode.
The supervisor mode has privileges that user modes do not.
For example, The Memory Management Unit (MMU) systems allow the addresses
of memory locations to be changed dynamically. Control of the memory
management unit (MMU) is typically reserved for supervisor mode to avoid the
obvious problems that could occur when program bugs cause inadvertent changes
in the memory management registers.
The ARM instruction that puts the CPU in supervisor mode is called SWI:
i.e, SWI CODE_1
Supervisor mode Contd….
SWI causes the CPU to go into supervisor mode and sets the PC to 0x08 or 08H.
The argument to SWI is a 24-bit immediate value that is passed on to the
supervisor mode code; it allows the program to request various services from the
supervisor mode.
In supervisor mode, the bottom 5 bits of the CPSR are all set to 1 to indicate that
the CPU is in supervisor mode.
The old value of the CPSR just before the SWI is stored in a register called the
saved program status register (SPSR).
There are in fact several SPSRs for different modes; the supervisor mode SPSR is
referred to as SPSR_svc.
To return from supervisor mode , the supervisor restores the PC from register r14
and restores the CPSR from the SPSR_svc.
Exceptions
An exception is an internally detected error.
The CPU can more efficiently check the divisor’s value during execution.
Since the time at which a zero divisor will be found is not known in
advance, this event is similar to an interrupt except that it is generated
inside the CPU.
Vectoring provides a way for the user to specify the handler for the
exception condition.
• ` 0x00000008
0x0000000C
Software Interrupt
Abort (prefetch)
Supervisor
Abort
0x00000010 Abort (data) Abort
0x00000014 Reserved Reserved
0x00000018 IRQ IRQ
0x0000001C FIQ FIQ
ARM’s Exceptions (2/6)
• When handling an exception, the ARM7TDMI:
– Preserves the address of the next instruction in the appropriate Link
Register
– Copies the CPSR into the appropriate SPSR
– Forces the CPSR mode bits to a value which depends on the exception
– Forces the PC to fetch the next instruction from the relevant exception
vector
– It may also set the interrupt disable flags to prevent otherwise
unmanageable nestings of exceptions.
– If the processor is in THUMB state when an exception occurs, it will
automatically switch into ARM state when the PC is loaded with the
exception vector address.
ARM’s Exceptions (3/6)
• On completion, the exception handler:
– Moves the Link Register, minus an offset where appropriate, to the PC.
(The offset will vary depending on the type of exception.)
– Copies the SPSR back to the CPSR
– Clears the interrupt disable flags, if they were set on entry
ARM’s Exceptions (4/6)
• Reset
– When the processor’s Reset input is asserted
• CPSR Supervisor + I + F
• PC 0x00000000
• Undefined Instruction
– If an attempt is made to execute an instruction that is undefined
• LR_undef Undefined Instruction Address + #4
• PC 0x00000004, CPSR Undefined + I
• Return with : MOVS pc, lr
• Prefetch Abort
– Instruction fetch memory abort, invalid fetched instruction
• LR_abt Aborted Instruction Address + #4, SPSR_abt CPSR
• PC 0x0000000C, CPSR Abort + I
• Return with : SUBS pc, lr, #4
ARM’s Exceptions (5/6)
• Data Abort
– Data access memory abort, invalid data
• LR_abt Aborted Instruction + #8, SPSR_abt
CPSR
• PC 0x00000010, CPSR Abort + I
• Return with : SUBS pc, lr, #4 or SUBS pc, lr, #8
• Software Interrupt
– Enters Supervisor mode
• LR_svc SWI Address + #4, SPSR_svc CPSR
• PC 0x00000008, CPSR Supervisor + I
• Return with : MOV pc, lr
ARM’s Exceptions (6/6)
• Interrupt Request
– Externally generated by asserting the processor’s IRQ input
• LR_irq PC - #4, SPSR_irq CPSR
• PC 0x00000018, CPSR Interrupt + I
• Return with : SUBS pc, lr, #4
The entry into supervisor mode must be controlled to maintain security—if the
interface between user and supervisor mode is improperly designed , a user
program may be able to sneak code into the supervisor mode that could be
executed to perform harmful operations.
The ARM provides the SWI interrupt for software interrupts. This instruction
causes the CPU to enter supervisor mode.
address data
cache
controller
cache main
CPU
memory
address
data data
Cache operation
• Many main memory locations are mapped
onto one cache entry.
• May have caches for:
– instructions;
– data;
– data + instructions (unified).
• Memory access time is no longer
deterministic.
Terms
• Cache hit: required location is in cache.
• Cache miss: required location is not in cache.
• Working set: set of locations used by program
in a time interval.
Types of misses
• Compulsory (cold): location has never been
accessed.
• Capacity: working set is too large.
• Conflict: multiple locations in working set map
to same cache entry.
Memory system performance
• h = cache hit rate.
• tcache = cache access time, tmain = main memory
access time.
• Average memory access time:
– tav = htcache + (1-h)tmain
Multiple levels of cache
cache block
hit value
byte
Write operations
• Write-through: immediately copy write to
main memory.
• Write-back: write to main memory only when
location is removed from cache.
Direct-mapped cache locations
• Many locations map onto the same cache
block.
• Conflict misses are easy to generate:
– Array a[] uses locations 0, 1, 2, …
– Array b[] uses locations 1024, 1025, 1026, …
– Operation a[i] + b[i] generates conflict misses.
Set-associative cache
hit data
Example: direct-mapped vs. set-
associative
address data
000 0101
001 1111
010 0000
011 0110
100 1000
101 0001
110 1010
111 0100
Direct-mapped cache behavior
• After 001 access: • After 010 access:
block tag data block tag data
00 - - 00 - -
01 0 1111 01 0 1111
10 - - 10 0 0000
11 - - 11 - -
Direct-mapped cache behavior, cont’d.
• After 011 access: • After 100 access:
block tag data block tag data
00 - - 00 1 1000
01 0 1111 01 0 1111
10 0 0000 10 0 0000
11 0 0110 11 0 0110
Direct-mapped cache behavior, cont’d.
• After 101 access: • After 111 access:
block tag data block tag data
00 1 1000 00 1 1000
01 1 0001 01 1 0001
10 0 0000 10 0 0000
11 0 0110 11 1 0100
2-way set-associtive cache behavior
• Final state of cache (twice as big as direct-
mapped):
set blk 0 tag blk 0 data blk 1 tag blk 1 data
00 1 1000 - -
01 0 1111 1 0001
10 0 0000 - -
11 0 0110 1 0100
2-way set-associative cache behavior
• Final state of cache (same size as direct-
mapped):
set blk 0 tag blk 0 data blk 1 tag blk 1 data
0 01 0000 10 1000
1 10 0111 11 0100
Example caches
• StrongARM:
– 16 Kbyte, 32-way, 32-byte block instruction cache.
– 16 Kbyte, 32-way, 32-byte block data cache (write-
back).
• C55x:
– Various models have 16KB, 24KB cache.
– Can be used as scratch pad memory.
Scratch pad memories
• Alternative to cache:
– Software determines what is stored in scratch
pad.
• Provides predictable behavior at the cost of
software control.
• C55x cache can be configured as scratch pad.
Memory management units
logical physical
address address
memory
main
CPU management
memory
unit
Memory management tasks
• Allows programs to move in physical memory
during execution.
• Allows virtual memory:
– memory images kept in secondary storage;
– images returned to main memory on demand
during execution.
• Page fault: request for location not resident in
memory.
Address translation
• Requires some sort of register/table to allow
arbitrary mappings of logical to physical
addresses.
• Two basic schemes:
– segmented;
– paged.
• Segmentation and paging can be combined
(x86).
Segments and pages
page 1
page 2
segment 1
memory
segment 2
Segment address translation
physical address
Page address translation
page offset
page i base
concatenate
page offset
Page table organizations
page
descriptor
page descriptor
flat tree
Caching address translations
• Large translation tables require main memory
access.
• TLB: cache for address translation.
– Typically small.
ARM memory management
• Memory region types:
– section: 1 Mbyte block;
– large page: 64 kbytes;
– small page: 4 kbytes.
• An address is marked as section-mapped or
page-mapped.
• Two-level translation scheme.
ARM address translation
Translation table 1st index 2nd index offset
base register
descriptor concatenate
1st level table
concatenate
descriptor
2nd level table
physical address
CPUs
• CPU performance
• CPU power consumption.
Elements of CPU performance
• Cycle time.
• CPU pipeline.
• Memory system.
Pipelining
• Several instructions are executed
simultaneously at different stages of
completion.
• Various conditions can cause pipeline bubbles
that reduce utilization:
– branches;
– memory system delays;
– etc.
Performance measures
• Latency: time it takes for an instruction to get
through the pipeline.
• Throughput: number of instructions executed
per time period.
• Pipelining increases throughput without
reducing latency.
ARM7 pipeline
• ARM 7 has 3-stage pipe:
– fetch instruction from memory;
– decode opcode and operands;
– execute.
ARM pipeline execution
time
1 2 3
Pipeline stalls
• If every step cannot be completed in the same
amount of time, pipeline stalls.
• Bubbles introduced by stall increase latency,
reduce throughput.
ARM multi-cycle LDMIA instruction
time
Control stalls
• Branches often introduce stalls (branch
penalty).
– Stall time may depend on whether branch is
taken.
• May have to squash instructions that already
started executing.
• Don’t know what to fetch until condition is
evaluated.
ARM pipelined branch
time
Delayed branch
• To increase pipeline efficiency, delayed branch
mechanism requires n instructions after
branch always executed whether branch is
executed or not.
Memory system performance
• Caches introduce indeterminacy in execution
time.
– Depends on order of execution.
• Cache miss penalty: added time due to a
cache miss.
Types of cache misses
• Compulsory miss: location has not been
referenced before.
• Conflict miss: two locations are fighting for
the same block.
• Capacity miss: working set is too large.
CPU power consumption
• Most modern CPUs are designed with power
consumption in mind to some degree.
• Power vs. energy:
– heat depends on power consumption;
– battery life depends on energy consumption.
CMOS power consumption
• Voltage drops: power consumption
proportional to V2.
• Toggling: more activity means more power.
• Leakage: basic circuit characteristics; can be
eliminated by disconnecting power.
CPU power-saving strategies
• Reduce power supply voltage.
• Run at lower clock frequency.
• Disable function units with control signals
when not in use.
• Disconnect parts from power supply when not
in use.
Power management styles
• Static power management: does not depend
on CPU activity.
– Example: user-activated power-down mode.
• Dynamic power management: based on CPU
activity.
– Example: disabling off function units.
Application: PowerPC 603 energy
features
• Provides doze, nap, sleep modes.
• Dynamic power management features:
– Uses static logic.
– Can shut down unused execution units.
– Cache organized into subarrays to minimize
amount of active circuitry.
PowerPC 603 activity
• Percentage of time units are idle for SPEC
integer/floating-point:
unit Specint92 Specfp92
D cache 29% 28%
I cache 29% 17%
load/store 35% 17%
fixed-point 38% 76%
floating-point 99% 30%
system register 89% 97%
SA-1100 power state machine
Prun = 400 mW
run
10 s
160 ms
90 s
10 s
90 s
idle sleep