Arm Cortex m3 365

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 48

The ARM Architecture

(with focus on Cortex-M3)

Joe Bungo
Applications Engineer
ARM University Program

1
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools

2
ARM Ltd
Founded in November 1990
Spun out of Acorn Computers
Initial funding from Apple, Acorn and VLSI

Designs the ARM range of RISC processor cores


Licenses ARM core designs to semiconductor
partners who fabricate and sell to their
customers
ARM does not fabricate silicon itself

Also develop technologies to assist with the design-


in of the ARM architecture
Software tools, boards, debug hardware
Application software
Bus architectures
Peripherals, etc

3
ARMs Activities

Connected Community
Development Tools
Software IP

Processors
memory
System Level IP:
Data Engines
SoC
Fabric
3D Graphics

Physical IP

4
ARM Connected Community 700+

5 5
Huge Range of Applications

IR Fire
Detector
Utility Exercise
Machines Intelligent
Intelligent toys Meters Energy Efficient Appliances
Vending
Tele-parking

Equipment Adopting 32-bit ARM


Microcontrollers

6
Worlds Smallest ARM Computer?
Battery Solar Cells
Wireless Sensor Network
Sensors, timers

Cortex-M0 +16KB RAM 65nm


UWB Radio antenna
10 kB Storage memory
~3fW/bit

12Ah Li-ion Battery

A B C
Processor, SRAM and PMU

Wirelessly networked into large scale


sensor arrays
Cortex-M0; 65 University of Michigan

7
Worlds Largest ARM Computer?

4200 ARM powered


Neutrino Detectors

70 bore holes 2.5km deep

60 detectors per string


starting 1.5km down

1km3 of active telescope

Work supported by the National Science Foundation and University of Wisconsin-Madison

8
From 1mm3 to 1km3

1mm3 1km3

10 $1000

Mobile Home Mobile Computing Server


Embedded Consumer Enterprise PC HPC

9
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools

10
ARM Cortex Processors (v7)

ARM Cortex-A family (v7-A):


Applications processors for full OS
and 3rd party applications x1-4
Cortex-A15
...2.5GHz
x1-4

ARM Cortex-R family (v7-R): Cortex-A9


Cortex-A8
x1-4
Embedded processors for real-time Cortex-A5
signal processing, control applications 1-2
R Heron
Cortex-R4

ARM Cortex-M family (v7-M): Cortex-M4

Microcontroller-oriented processors Cortex-M3


SC300

for MCU and SoC applications Cortex-M1

Cortex-M0
12k gates...

11
Cortex family
Cortex-A8 Cortex-R4 Cortex-M3
Architecture v7A Architecture v7R Architecture v7M
MMU MPU (optional) MPU (optional)
AXI AXI AHB Lite & APB
VFP & NEON support Dual Issue

12
Relative Performance*
2500

2000
Max Frequency (Mhz)

1500

1000

500

0
Cortex- Cortex- Cortex-A9
ARM7 ARM926 ARM1026 ARM1136 ARM1176 Cortex-A8
M0 M3 Dual-core
Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000
Min Power (mW/MHz) 0.012 0.06 0.35 0.235 0.36 0.335 0.568 0.43 0.5

*Represents attainable speeds in 130, 90, 65, or 45nm processes

13
Data Sizes and Instruction Sets
The ARM is a 32-bit architecture.

When used in relation to the ARM:


Byte means 8 bits
Halfword means 16 bits (two bytes)
Word means 32 bits (four bytes)

Most ARMs implement two instruction sets


32-bit ARM Instruction Set
16-bit Thumb Instruction Set

Jazelle cores can also execute Java bytecode

14
ARM and Thumb Performance

30000

25000

20000
Dhrystone 2.1/sec
@ 20MHz
15000 ARM
Thumb
10000

5000

0
32-bit 16-bit 16-bit with
32-bit stack

Memory width (zero wait state)

15
The Thumb-2 instruction set
Variable-length instructions
ARM instructions are a fixed length of 32 bits
Thumb instructions are a fixed length of 16
bits
Thumb-2 instructions can be either 16-bit or
32-bit

Thumb-2 gives approximately 26%


improvement in code density over ARM

Thumb-2 gives approximately 25%


improvement in performance over
Thumb

16
Cortex-M Programmers Model
Main

Fully programmable in C
r0
r1

Stack-based exception model r2


r3

Only two processor modes r4


r5
Thread Mode for User tasks r6
r7
Handler Mode for OS tasks and exceptions r8

Vector table contains addresses r9


r10
r11
r12 Process
sp
sp
lr
r15 (pc)

xPSR

17
Cortex-M3 Processor Privilege
ARM Cortex-M3

Privileged
Aborts
Supervisor Interrupts
Reset
Handler Mode
OS

System Call (SVCall)


Undefined Instruction

User Non-Privileged

Thread Mode Application code

Memory

Instructions & Data

18
Cortex-M3 Interrupt Handling
One Non-Maskable Interrupt (INTNMI) supported
1-240 prioritizable interrupts supported
Interrupts can be masked
Implementation option selects number of interrupts supported
Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core
Interrupt inputs are active HIGH

INTNMI

1-240 Interrupts NVIC


Cortex-M3

INTISR[239:0] Processor Core

Cortex-M3

19
Cortex-M3 Exception Handling
Reset : power-on or system reset
NMI : cannot be stopped or preempted by any exception other than reset
Faults
Hard Fault : default Fault or any fault unable to activate
Memory Manage : MPU violations
Bus Fault : prefetch and memory access violations
Usage Fault : undef instructions, divide by zero, etc.
SVCall : privileged OS requests
Debug Monitor : debug monitor program
PendSV : pending SVCalls
SysTick Interrupt : internal sys timer, i.e., used by RTOS to periodically
check resources or peripherals
External Interrupt : i.e., external peripherals
20
Cortex-M3 Program Status Register
31 28 27 26 25 24 23 16 15 10 7 0

N Z C V Q IT T IT/ICI ISR Number

One Status Register consisting of


APSR - Application Program Status Register ALU flags
IPSR - Interrupt Program Status Register Interrupt/Exception No.
EPSR - Execution Program Status Register
IT field If/Then block information
ICI field Interruptible-Continuable Instruction information
xPSR
Composite of the 3 PSRs
Stored on the stack on exception entry

21
Conditional Execution
If Then (IT) instruction added (16 bit)
Up to 3 additional then or else conditions maybe specified (T or E)
Makes up to 4 following instructions conditional

ITTET EQ MOVEQ
Inst 1 ADDEQ
Inst 2
SUBNE
Inst 3
Inst 4 ORREQ

Any normal ARM condition code can be used


16-bit instructions in block do not affect condition code flags
Apart from comparison instruction
32 bit instructions may affect flags (normal rules apply)
Current if-then status stored in CPSR
Conditional block maybe safely interrupted and returned to
Must NOT branch into or out of if-then block

22
Classes of Instructions (v4T)

Load/Store

Miscellaneous

Data Operations

Change of Flow

MOV PC, Rm
Bcc
BL
BLX

23
Data processing Instructions
Consist of :
Arithmetic: ADD ADC SUB SBC RSB RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN

These instructions only work on registers, NOT memory.


Syntax:

<Operation>{<cond>}{S} Rd, Rn, Operand2

Comparisons set flags only - they do not specify Rd


Data movement does not specify Rn
Second operand is sent to the ALU via barrel shifter.

24
Using a Barrel Shifter:The 2nd Operand

Operand Operand Register, optionally with shift operation


1 2 Shift value can be either be:
5 bit unsigned integer
Specified in bottom byte of
Barrel another register.
Shifter
Used for multiplication by constant

Immediate value
8 bit number, with a range of 0-255.

ALU Rotated right through even


number of positions
Allows increased range of 32-bit
constants to be loaded directly into
registers
Result

25
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

Memory system must support all access sizes

Syntax:
LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>

e.g. LDREQB

26
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools

27
Cortex-M3 Datapath
I_HRDATA Instruction
Decode

Write Data D_HWDATA


Address Register
Incrementer
D_HRDATA
D_HADDR Read Data
Address Register
Register

Address
Register Barrel
Incrementer Mul/Div
Bank Shifter
I_HADDR ALU
A ALU
Address
Register
Writeback

INTADDR

28
Cortex-M3 Pipeline
Cortex-M3 has 3-stage fetch-decode-execute pipeline
Similar to ARM7
Cortex-M3 does more in each stage to increase overall
performance

1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute

Address Data Phase


AGU Phase & Write Load/Store &
Back Branch

Instruction
Fetch
Decode & Multiply & Divide Write
(Prefetch)
Register Read

Branch Shift ALU & Branch


Branch forwarding & speculation

Execute stage branch (ALU branch & Load Store Branch)

29
ARM10 vs. ARM11 Pipelines
ARM10
Branch Memory
ARM or
Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multiply
Fetch Multiply
Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE

ARM11

Shift ALU Saturate

Fetch Fetch MAC MAC MAC Write


Decode Issue
1 2 1 2 3 back

Data Data
Address Cache Cache
1 2

30
Full Cortex-A8 Pipeline Diagram
13-Stage Integer Pipeline 10-Stage NEON Pipeline
F0 F1 F2 D0 D1 D2 D3 D4 E0 E1 E2 E3 E4 E5 M0 M1 M2 M3 N1 N2 N3 N4 N5 N6
Branch mispredict penalty
Replay penalty Instruction Execute and Load/Store NEON NEON register writeback
Integer register writeback

Integer ALU pipe


ALU pipe 0

Architectural register file


Integer MUL pipe

NEON register file


MUL pipe 0
NEON Integer shift pipe
Instruction ALU pipe 1 Instruction
Fetch Instruction Decode Decode
Non-IEEE FP ADD pipe
LS pipe 0 or 1
Non-IEEE FP MUL pipe

L1 instruction cache miss IEEE FP engine


L1 data cache miss L1 data
Load queue LS permute pipe
L2 data

NEON store data


BIU pipeline
Embedded Trace Macrocell
L1 L2 L3 L4 L5 L6 L7 L8
L2 Tag Array L2 Data Array T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13

External trace port


L3 memory system

31
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools

32
An Example AMBA System

High Performance
APB
ARM processor UART

High
Bandwidth AHB Timer
APB
External
Bridge
Memory Keypad
Interface

High-bandwidth DMA PIO


on-chip RAM Bus Master
Low Power
Non-pipelined
High Performance Simple Interface
Pipelined
Burst Support
Multiple Bus Masters

33
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools

34
ARM Debug Architecture
Ethernet

Debugger (+ optional
trace tools)

JTAG port Trace Port


EmbeddedICE Logic
Provides breakpoints and processor/system
access
TAP
JTAG interface (ICE) controller
Converts debugger commands to JTAG ETM
signals
Embedded trace Macrocell (ETM) EmbeddedICE
Logic
Compresses real-time instruction and data
access trace
Contains ICE features (trigger & filter logic)
Trace port analyzer (TPA) ARM
Captures trace in a deep buffer core

35
Keil Development Tools for ARM

Includes ARM macro assembler, compilers (ARM RealView C/C++


Compiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil uVision
Debugger and Keil uVision IDE
Keil uVision Debugger accurately simulates on-chip peripherals (I2C, CAN,
UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc.)
Evaluation Limitations
16K byte object code + 16K data limitation
Some linker restrictions such as base addresses for code/constants
GNU tools provided are not restricted in any way
http://www.keil.com/demo/

36
Keil Development Tools for ARM

37
University Resources

http://www.arm.com/support/university/

[email protected]

38
Your Future at ARM
Graduate and Internship/Co-op Opportunities
Engineering: Memory, Validation, Performance, DFT, R&D, GPU and more!
Sales and Marketing: Corporate and Technical
Corporate: IT, Patents, Services (Training and Support), and Human
Resources

Incredible Culture and Comprehensive Benefit Package


Competitive Reward
Work/Life Balance
Personal Development
Brilliant Minds and Innovative Solutions

Keep in Touch!
www.arm.com/about/careers

39
TI Panda Board
OMAP4430 Processor
1 GHz Dual-core ARM
Cortex-A9 (NEON+VFP)
C64x+ DSP
PowerVR SGX 3D GPU
1080p Video Support

POP Memory
1 GB LPDDR2 RAM

USB Powered
< 4W max consumption
(OMAP small % of that)
Many adapter options
(Car, wall, battery, solar, ..)

40
Project Ideas Using Panda
OS Projects
OS porting to ARM/Cortex (TI OMAP)
MythTV system
Super-Panda stack of Pandas as compute engine and task
distribution
Linux applications

NEON Optimization Projects


Codec optimization in ffmpeg (pick your favorite codec)
Voice and image recognition
Open-source Flash player optimizations (swfdec)

41
Fin

42
Nokia N95 Multimedia Computer
OMAP 2420
Applications Processor
ARM1136 processor-based
SoC, developed using Magma
Blast family and winner of
2005 INSIGHT Award for Most
Innovative SoC

Symbian OS v9.2
Operating System supporting ARM
processor-based mobile devices,
developed using ARM RealView
Compilation Tools

S60 3rd Edition


S60 Platform supporting ARM
processor-based mobile devices

Mobiclip Video Codec


Software video codec for ARM
processor-based mobile devices

ST WLAN Solution
Ultra-low power 802.11b/g WLAN
chip with ARM9 processor-based
MAC

Connect. Collaborate. Create.


43
Beagle Board

44
Targeting community development
Wikis, blogs,
$149 Personally promotion of
> 1000 participants affordable community
and growing activity

Active &
technical Freedom to
community innovate
Addressing
Open access to
open source Instant access to
hardware community >10 million lines
documentation of code
needs
Opportunity Free
to tinker and software
learn

45
Fast, low power, flexible expansion
OMAP3530 Processor
Peripheral I/O
600MHz Cortex-A8
DVI-D video out
NEON+VFPv3
3 SD/MMC+
16KB/16KB L1$
256KB L2$ S-Video out
430MHz C64x+ DSP USB 2.0 HS OTG
32K/32K L1$ I2C, I2S, SPI,
48K L1D MMC/SD
32K L2
JTAG
PowerVR SGX GPU
Stereo in/out
64K on-chip RAM
Alternate power
POP Memory RS-232 serial
128MB LPDDR RAM
256MB NAND flash USB Powered
2W maximum consumption
OMAP is small % of that
Many adapter options
Car, wall, battery, solar,
46
And more On-going collaboration at BeagleBoard.org
Live chat via IRC for 24/7 community support
Links to software projects to download

Other Features
4 LEDs
3
USR0 Peripheral I/O
USR1
DVI-D video out
PMU_STAT
SD/MMC+
PWR
2 buttons S-Video out
USER USB HS OTG
RESET I2C, I2S, SPI,
4 boot sources MMC/SD
SD/MMC JTAG
NAND flash Stereo in/out
USB
Alternate power
Serial
RS-232 serial

47
Project Ideas Using Beagle
OS Projects
OS porting to ARM/Cortex (TI OMAP)
MythTV system
Super-Beagle stack of Beagles as compute engine and task
distribution
Linux applications

NEON Optimization Projects


Codec optimization in ffmpeg (pick your favorite codec)
Voice and image recognition
Open-source Flash player optimizations (swfdec)

48

You might also like