Wei Tung

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Andes RISC-V V5 CPUs

Wei Tung
Strength of Andes RISC-V CPU IPs

 Rich experience on CPU design and market

 High quality of RISC-V CPU IPs – Risk Free


 Andes RISC-V good at PPA performance – Power, Performance, Area
 Professional DSP support – DSP ISA, C libraries and compiler
 ACE & COPILOT – Generate custom instructions automatically

 RISC-V + ACE
 Compact + High Performance + Flexible + Low Power

Taking RISC-V® Mainstream 2


Highlights of RISC-V V5 CPU IPs
 Andes Custom Extension™ (ACE)
 P-extension (DSP) and V-extension (Vector)
 Caches
 Complete cache management, controlled by CSRs
 Local memories
 Capable of copying boot code to LM when resetting CPU
 Access up to 32 MB local memories via V5’s AHB slave port
 Support vectored interrupt & unaligned mem access
 Comprehensive debug solutions

Taking RISC-V® Mainstream 3


Highlights of ACE Solution on TWS

Function insn cycles ACE cycles Speedup

FIR 128 (only CPU) 1600 128 12x

LMS 128 (only CPU) 1800 128 14x

BT DMA (only CPU) bus traffic background

ANC Hybrid Solution 10000 250 40x

Taking RISC-V® Mainstream 4


Andes V5 Processor Lineup
RV32 RV64 Vector Ext. Superscalar

Cache-Coherent
A25MP AX25MP
27-Series: 45-Series:
1-4 Cores MemBoost MemBoost
Vector Ext. Dual Issue.

Linux with NX27V N45/NX45


A25 AX25 A27/AX27 D45/DX45
FPU/DSP and more. A45/AX45
and more.
Fast/Compact N25F
NX25F
with FPU/DSP D25F
5-stage 5-stage 8-stage
N22
2-stage Note: Common features are RV*IMACN, Caches, LM, ECC, BrPred, CoDense™,
PowerBrake , StackSafe™ ACE (Andes Custom Extension™); Frequencies at 28nm
Taking RISC-V® Mainstream 5
V5 25-Series Performance
Features Base FP Linux Linux  Linux support
32KB I$/D$ + 256 BTB Yes Yes Yes Yes  RISC-V MMU and S-mode
SP/DP FPU -- Yes Yes Yes  SV{32,39,48}, all page sizes

MMU and S-Mode -- -- Yes Yes


 4-way 32~128-entry STLB
 4 or 8-entry ITLB and DTLB
RV-P ext. draft (DSP) -- -- -- Yes
 FPU (RV-F or RV-FD)
Worst-Case Freq. (GHz)1 1.2 1.2 1.2 1.1
2
 +, –, x, x+, x–:
Coremark/MHz 3.58 (rv32), 3.53 (rv64)  pipelined 5 cycles
DMIPS/MHz2 1.96 (rv32), 2.13 (rv64)  ÷, √ : run in background
1: 28nm SVT 9T library and high-speed memory. Frequency at 0.81v/-40oc.
2: AndeSight v320 toolchain; DMIPS/ground rule uses no-inline option.  SP: 15 cycles, DP: 29 cycles
1.56 1.58 Whetstone/MHz
1.38 1.30 1.09
DP
0.50 0.54
SP
NX25F N25F CM7 CA7
Taking RISC-V® Mainstream 6
Andes RISC-V V5 in SoC
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o single core o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
oooo o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
2-8 cores o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
oooo
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
oooooooo o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
oooooooo o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o>
o o30o cores
oooo o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
oooooooo o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o o o ooooooooooo o o o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o
o
o
o
o
o
ooooooooooo
ooooooooooo
o
o
o
o
.
o
. . . . . . . .
ooooooooo
.
o > 1000 cores
.. . .
oooo
.
o
.
o
.
o
.
o
. .
o…
.
o
..
oo
.
o
.
o
.
o
. . . . . . .. . . . .
ooooooooooooo
.
o
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o o o ooooooooooo o o
o o o
> 100 cores
ooooooooooo o o
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o o o ooooooooooo o o o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o o o ooooooooooo o o o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o o o ooooooooooo o o o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
o ooooooooo o oooo o o o o o… o oo o o o ooooooooooooo o
Taking RISC-V® Mainstream 7
Andes RISC-V 25-Series Core Overview
AHB/SRAM vPLIC PMU Debug Module
 AndeStar V5 architecture:
 RV32/RV64-IMACN + Andes Extensions
 Optional FPU: SP, DP Andes Custom Interrupt WFI HW
 Optional DSP/SIMD: P Extension™ Interface Mode Breakpoint
 Optional S-mode/MMU: SV32/39/48
 5-stage pipeline, single-issue 25-Series uCore, PMP, MMU (A(X)25)
 Configurable multiplier
 Optional branch prediction Branch Multiplier,
FPU DSP
Prediction Divider D25F, A(X)25
 I/D caches and Local Memory
 Optional parity or ECC protection ICache DCache
 Hit-under-miss caches (ECC option) (ECC option)
 HW unaligned load/store accesses
ILM DLM
 Bus interface (ECC option) BIU (ECC option)
 Master ports (AXI64*2/AHB{64,32})
 Optional AHB slave port accessing
ILM DLM
LM address space AXI64*2/AHB AHB Slave
SRAM/AHB-Lite Port SRAM/AHB-Lite
Taking RISC-V® Mainstream 8
25-Series Features Overview
Features N25F D25F A25 NX25F AX25
AndeStar ™ V5+RV32-FD-N V5+RV32-FD-P-N V5+RV64-FD-N V5+RV64-FD-P-N
Pipeline & GPR# 5-stage, 32 32-bit GPRs 5-stage, 32 64-bit GPRs
FPU Single/Double precision (IEEE754-compliant)
DSP -- RV32-P (draft), DSP/SIMD -- RV64-P (draft), DSP/SIMD
MMU -- -- Sv32 virtual-memory -- Sv39 & SV48 virtual-memory
Privilege mode M+U M+U+S M+U M+U+S
AXI64*2, AXI64, AHB64 or AHB32 AXI64*2, AXI64 or AHB64
Master Bus
32 (A25: 32-34) bit address 32-64 bit address
Slave Bus AHB64 or AHB32 AHB64
Branch Pred. Static/Dynamic (BTB,BHT,RAS)
Multiplier Radix2/Radix4/Radix16/Radix256/Fast
Memory system I&D Local Memory, up to 16MiB; I&D cache, up to 64KiB
CoDense™, StackSafe™, PowerBrake, QuickNap™, ECC/Parity, misaligned access,
Unique Features
Andes Custom Extension™
Debug Module 4-wire JTAG/2-wire Serial Debug Port; with Exception Redirection
PLIC Vectored dispatch, priority-based preemption, up to 1023 sources, 255 priorities, 16 targets

Taking RISC-V® Mainstream 9


Andes CPU GUI Configuration Tool

Taking RISC-V® Mainstream 10


Configurable CPU Subsystem
JTAG Interrupt Requests

JTAG 25-series
25-series
25-series Inst.
Debug Xport PLIC
Memory

Debug BBIIIU
B UU Data
master
master slave
slave
Module master slave Memory

AXI64*2/AHB64,32 AHB64,32

Taking RISC-V® Mainstream 11


Easy SoC Integration
JTAG Interrupt Requests
AE
JTAG 25-series
25-series
25-series Inst.
Debug Xport PLIC
Memory
Debug BIU
BIU
BIU Data CPU
master slave
master slave Subsystem
Module master slave Memory

AXI/AHB IP

GPIO APB IP
DMA
I2C AXI/AHB
AE350 PWM/PIT APB Bridge Bus Masters Customer’s or
RTC Bus Matrix Partner’s IP’s
Bus Slaves
Platform SPI
UART
WDT Sys. Mgmt Unit
Taking RISC-V® Mainstream 12
Speedup with DSP ISA on 25-Series
• Real world speed up, using DSP extension

CIFAR (Image Classification) 14x


RV64P

64 bit
PNET (90% of Face detection) 7.57x

Keyword Spotting (voice) 5.36x


RV32P

AMR voice codec 3.67x


32 bit
MP3 decode 1.95x

x 2x 4x 6x 8x 10x 12x
Taking RISC-V® Mainstream 13
DSP Support
 DSP ISA
 The basis of RISC-V P-extension draft that Andes contributed.
 300 instructions derived and evolved from real use cases (over decades)
 Support 32 bits and 64 bits
 Support saturation and rounding
 Cover SIMD, partial SIMD, bit manipulation and etc.
 DSP intrinsic functions
 Users can use as C-like functions without bothering to program in assembly
 DSP library
 >200 functions in 8 categories (basic, complex, controller, filtering, matrix, statistics,
transform, utils)
 Some DSP instructions are auto-generated by compiler to facilitate
development
 Compatible with CMSIS-DSP library API
 By including an API wrapper header file
 Microcontroller Software Interface Standard (CMSIS)
Taking RISC-V® Mainstream 14
DSP Library Comparison with CPU A
 RV32-P: Speedups over CPU A (with 3% larger code size)
Speedup Basic Cmplx Ctlr Filter Matrix Ststcs Xform Utils ALL
Q AVG 1.80 1.26 1.73 1.31 1.19 2.20 1.08 1.40 1.50
MAX 6.94 1.80 2.17 2.63 1.77 6.75 1.31 2.77 6.94
AVG 1.31 1.33 2.31 1.08 1.42 1.23 1.14 1.24 1.38
F32
MAX 1.42 1.64 2.55 2.09 1.78 1.35 1.39 2.05 2.55

 RV32-P: Speedups over CPU A (with 32% smaller code size)


Speedup Basic Cmplx Ctlr Filter Matrix Ststcs Xform Utils ALL
Q AVG 1.45 1.11 1.54 1.28 1.03 1.93 1.07 1.30 1.34
MAX 5.15 1.59 1.85 2.63 1.56 5.29 1.31 2.77 5.29
AVG 1.01 1.13 1.74 1.03 1.16 1.12 1.13 1.02 1.17
F32
MAX 1.35 1.48 2.11 2.09 1.55 1.22 1.39 2.05 2.11

Taking RISC-V® Mainstream 14


DSP Instruction Examples
Types Instruction Operations Cycles
Four 8x8 multiplications:
16= 8x8; 16= 8x8; 16= 8x8; 16= 8x8
SIMD 1
Two 16x16 multiplications:
32= 16x16; 32= 16x16
Four 8x8 multiplications with 32b accumulation:
32= 32 + 8x8 + 8x8 + 8x8 + 8x8;
Partial 32= 32 + 8x8 + 8x8 + 8x8 + 8x8 (2nd op: RV64 only)
2
SIMD Two 16x16 multiplications with 32b accumulation:
32= 32 + 16x16 + 16x16;
32= 32 + 16x16 + 16x16 (2nd op: RV64 only)
Two 32x32 multiplications with 64b accumulation:
RV64 Only 3
64= 64 + 32x32 + 32x32
Taking RISC-V® Mainstream 16
D25F vs. CPU A
Features D25F CPU A
Custom Instruction Andes Custom Extension™ No
Pipeline stages 5 stages 3 stages
Floating point SP, DP, HP conv. at LD/ST, Background ÷√ SP only
SIMD-instructions with 8/16/32-bit element size
DSP Extensions 8/16-bit SIMD arithmetic
Complex DSP instructions operating on 16/32/64-bit data
I/D Local Memory 4KB~16MB Yes
L1 I/D Cache 8KB~64KB No
SRAM Error Protection ECC or Parity No
Bus Interface AHB32, AHB64, or AXI64 AHB Lite, APB
I/D Local Memory DMA With AHB slave port No
Pre-integrated Platform AXI-based platform No
CoDense™ code size reduction, StackSafe™ stack protection,
Additional Features -
PowerBrake & QuickNap™ power management
DMIPS/MHz 1.96 1.25
CoreMark/MHz 3.58 3.42
Note: N25F use AndeSight v3.1.0; DMIPS/MHz follows the ground rule with no-inline option.

Taking RISC-V® Mainstream 16


AndeShape™ and Comprehensive Kits
Corvette-F1
• AndeShape™ Development Boards
• Debugging Hardware
• AICE-MINI+, AICE-MICRO
• Near-Cycle Accurate Simulator
• Qemu Virtual Board
• AndeSoft™ Software Stack
• Bare metal demo projects ADP-XC7K
• RTOS’es: FreeRTOS, ThreadX, Contiki, more
• Linux: RV32/RV64, UP and SMP
• Rich Support from 3rd Parties
AICE-MICRO
• IAR, Imperas, Lauterbach, Segger, UltraSoC, etc.

AICE-MINI+

1
A25/AX25 Multi-Core Processor (1/2)
 Configurable L2 cache size of 0KiB, 128KiB, 256KiB, 512KiB, 1MiB and 2MiB

PLIC AndesCore™ A*25 Multi-Core Processor


PLIC Debug I/F
I/F
Debug Module (e.g., to
Debug Xport)
Trace Core 0 Core 1 Core 2 Core 3
ILM DLM ILM DLM ILM DLM ILM DLM IO LM
Port I$ D$ I$ D$ I$ D$ I$ D$ Slave Port x4
(x4)
L1-to-L2 64b (AHB-32/64)
S S S S
L2 CSR Cache Coherence (MESI)/L2 Cache Controller IO Coherence

S
M Slave Port
(AXI-64)

Bus Master Interface: AXI-128/AXI-64

Taking RISC-V® Mainstream 21


A25/AX25 Multi-Core Processor (2/2)
A25MP Multi-Core Benchmark on Linux
Processors Single-core Dual-core Quad-core
CoreMark 3.50 6.92 13.84

AX25MP Multi-Core Benchmark on Linux

Processors Single-core Dual-core Quad-core

CoreMark 3.45 6.90 13.80

Taking RISC-V® Mainstream 22


Andes Custom Extension™ (ACE) Framework
- scalar/vector
- C code
- Verilog
- background COPILOT
- Attributes - wide operands Custom-OPtimized Instruction deveLOpment Tools

Extended Extended Extended


Tools ISS RTL
Automated Env. For
Cross Checking
Compiler CPU ISS
Test Case Generator
Asm/Disasm (near-cycle CPU RTL
Debugger accurate)
Extended Extended IDE
ISS RTL
Extensible Baseline Components

Taking RISC-V® Mainstream 23


Inner Product of Vectors with 64 8-bit Data
reg CfReg { //coef. Custom Register //ACE_BEGIN: ip64B
num= 4; assign IP= C[ 7:0] * V[ 7:0]
width= 512;
} + C[15:8] * V[15:8]
ram VMEM { //vector Custom Memory ...
interface= sram; + C[511:504] * V[511:504];
address_bits= 3; //8 elements //ACE_END
width= 512; ip64B.v
}
insn ip64B {
operand= {out gpr IP,
Speedup: 85x VMEM
in CfReg C, in VMEM V}; 512
CfReg
csim= %{ //multi-precision lib. used 512 512
IP= 0;
for(uint i= 0; i<64; ++i) ACE Logic
IP+= ((C >>(i*8)) & 0xff) *
((V >>(i*8)) & 0xff);
%}; GPR
latency= 3; //enable multi-cycle ctrl
}; ip64B.ace Intrinsic: long ace_ip64B(CfReg_t, VMEM_t);
Taking RISC-V® Mainstream 24
Custom Port (ACP) for Direct HW Engine Control
port command { //a 90-bit output port to CPU controls 4 HW engines.
// all 4 HW engines,
width= 90; //including a valid bit and
commands
// a HW engine ID field
io_type= out; HW
HW
} result ready? HW
HW
//4 HW engines
CPU ACE Engine
Engine
Engine
Engine
port ready { //4 ready signals results
num=4;
io_type= in;
}
port results { //4 256-bit input ports
num= 4; App. code sequence:
width= 256; prepare command (say, thru ACR);
io_type= in; send command;
} do other useful work;
wait for results to be ready;
get results;

Taking RISC-V® Mainstream 26


Design for Energy Efficiency
Performance Microarchitecture
Critical Tasks Performance
Fully Efficient Designs
Operational
Instantaneous Power

Periodic
Services
PowerBrake
Clock Throttling
w. Duty Ratio
Core Logic
Idle Logic and
Power Off
Loop SRAM off
Time QuickNap™
Clock Gating
RTL Designs Cache-Intact Fast Core
Power Down, Wakeup
Taking RISC-V® Mainstream 28
AndeSight™: Professional IDE
 Eclipse-based, enriched by 15-year effort

FreeRTOS
Task List

FreeRTOS
Event List

Taking RISC-V® Mainstream 29


AndeSoft™: Bare Metal Support
 Bare metal
 Rich startup demo projects for Andes-specific features
 PLIC, CLIC
 MMU, PMP, cache, ECC, bus matrix slave port
 PowerBrake, hibernate, WFI CPU standby/resume
 StackSafe™, performance monitor
 DSP, printf UART redirect, C++ programming
 AMSI (Andes MCU Software Interface) driver APIs
 UART, GPIO, RTC, PWM, QSPI, I2C and WDT
 Easy to use and catch up

Taking RISC-V® Mainstream 30


AndeSoft™: RTOS Support
 RTOS
 FreeRTOS (open source): 32/64 bits
 ThreadX (from Express Logic): 32/64 bits
 RISC-V ready: Zephyr, RT-Thread, SylixOS,
μC/OS-[II/III], MyNewt, LiteOS, AliOS Things
 FreeRTOS v10.0 Idle
 FreeRTOS test suite verified Tick ISR
 Support AE350 (AXI/AHB) platform
 Tickless idle Idle Task
 Reduce power consumption by stopping periodic tick
interrupt in the idle mode
Tickless Idle
 Based on RISC-V standard mtime/mtimecmp
Tick ISR
 RTOS-awareness debugging
 AndeSight™
 Lauterbach’s Trace32® Idle Task
Taking RISC-V® Mainstream 31
AndeSoft™: Linux Support
 Linux distribution:
 Fedora port ready
 OpenWRT port for networking
 Linux kernel tools
 strace/ftrace for developers to debug
 Perf to evaluate the bottleneck of the whole system
 Power management
 Suspend2ram: suspended by sysfs and wakeup by RTC and UART interrupt
 PowerBrake: power throttling mechanism controlled by sysfs
 Kernel module support all relocation types for RV32 and RV64
 Development tools:
 Linux awareness debugging
 Lauterbach Trace32®
Taking RISC-V® Mainstream 33
Open Source Contributions
 Major contributor, some as maintainer
 GCC/Binutils
 RV32IE
 Interrupt attribute
 ELF attribute support
 LLVM/LLD
 RV[32|64]IMAFDC code gen
 Hard-float calling convention.
 Debugging tools: GDB, OpenOCD
 Linux
 ftrace, Perf, kernel module, non-coherent support
 Fedora port
 u-Boot
 C library
 Glibc, RV32
 Newlib

Taking RISC-V® Mainstream 34


Why Andes for RISC-V
Leading Commercial RISC-V CPU company

CPU IPO company with 15-year history

Your Trusted RISC-V 350+ customers worldwide


CPU Vendor
+ 5 Bn shipping record

Profession FAE team and support system


Better code size and performance,
mature tool
Thank You

Confidential Taking RISC-V® Mainstream 36

You might also like