Altera

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

DDR4: Designing for Power and

Performance
Agenda

 Comparison between DDR3 and DDR4


 Designing for power
− DDR4 power savings

 Designing for performance


− Creating a data valid window
− Good layout practices for DDR4
− Board debug tools to minimize issues

 Looking ahead and conclusion

2
Comparison Between DDR3 and DDR4

3
DRAM Technology Comparison
DDR3 DDR4 GDDR5
Voltage 1.5 V / 1.35 V 1.2 V 1.5 V / 1.35 V
Free-running differential
Strobe Bi-directional differential Bi-directional differential
WRITE clock
Strobe Configuration Per byte Per byte Per word
READ Data Capture Strobe based Strobe based Clock data recovery
Data Termination VDDQ/2 VDDQ VDDQ
Address/Command
VDDQ/2 VDDQ/2 VDDQ
Termination
Burst Length BC4, 8 BC4, 8 8
Bank Grouping No 4 4
No Command / address parity
On-Chip Error Detection
CRC for data bus CRC for data bus
Configuration x4, x8, x16 x4, x8, x16 x16, x32
Package 78-ball / 96-ball FBGA 78-ball / 96-ball FBGA 170-ball FBGA
Data Rate (Mbps/Pin) 800 – 2,133 1,600 – 3,200+ 4,000 – 7,000
Component Density 1 GB – 8 GB 2 GB – 16 GB 512 MB – 2 GB
Up to 8H (128-GB stack);
Stacking Options DDP, QDP No
single load

4
DDR4 Power Savings

5
DDR4 Power Savings Features

 DDR4 voltage is 1.2 V (up to 40% savings)


− Lower voltage than DDR3 (1.5 V)
− On-die VREF
− Pseudo-open drain I/Os

 Manages refreshes (up to 20% savings)


− Based on temperature
 New DDR4 low-power auto self-refresh (LPASR) capability
− Changes refresh rate based on temperature
− Only refreshes parts of array that is in use
 Controller must allow fine-granularity refresh based on memory utilization

 Supports data bus inversion


− Limits number of signals transitioning, reducing simultaneous switching
output (SSO) and saving power

6
Creating a Data Valid Window

7
Timing Margins Are Shrinking

Shrinking Timing Margins in Picoseconds


DRAM Margin Package/board
Package / BoardMargin
Margin Chip Margin Data Valid Window

2,500
Package/
Data Valid DRAM Chip
Board
Window Margin Margin
Margin
DDR1 2,500 900 800 800
DDR2 938 425 256 256
DDR3 469 188 140 140
DDR4 313 125 93 93
938

469
313

DDR1 DDR2 DDR3 DDR4

400 Mbps 3,200 Mbps

8
Shrinking the Window Even More:
DDR4 VREF Training (1/2)

 DDR4 VREF training


− Training: sweep VREF setting, find maximum passing window
 Lump sum of DCD, RX offset, etc.
 Resolution error is the combination of (VREF, PI, or delay chain)
− Margin loss calculation
 VREF step size: from 0.5% VDDQ to 0.8% VDDQ
 VREF set tolerance: 1.625% or 0.15%
 Calibration error: 1 step size
− 0.8% * VDDQ = 0.8% * 1.2V = 9.6 mV
 Margin loss (due to VREF calibration error)
− 9.6 mv * 2 / slew_rate = 4.8 ps (assume slew rate = 4 V/ns)
 Calibration error = half step size

Vref Step Size Vref step 0.50% 0.65% 0.80% VDDQ 2


-1.625% 0.00% 1.625% VDDQ 3, 4, 6
Vref Set Tolerance Vref_set_tol
-0.15% 0.00% 0.15% VDDQ 3, 5, 7

10
Shrinking the Window Even More:
DDR4 VREF Training (2/2)

 Discussion with JEDEC members


− RDDR4 specification section 13.4: any DRAM component level variation
must be accounted for within the DRAM RX mask. This means that the
VREF calibration error is included in VdlVW_total.
− VREF_DQ internal aligns to VCENT_DQs with training. VCENT_DQs
has variation. VREF_DQ training error should increase with this variation
and internal voltage noise etc.

11
Shrinking the Window Even More:
Duty Cycle Error

 DDR4 specification is +/-2% tCK = +/- 0.04 UI


− IPD current budget +/-3% tCK

 Margin loss is 4% tCK +/-2%

 With proper link timing calibration


DQS
− 2% tCK margin loss
+/-2%
 Assume same for read
DQ

Timing Parameters by Speed Bin for DDR4-2400 to DDR4-3200


Speed DDR4-2400 DDR4-2666 DDR4-3200
Units NOTE
Parameter Symbol MIN MAX MIN MAX MIN MAX

Clock Timing

Minimum Clock Cycle Time (DLL Off Mode) tCK (DLL_OFF) 8 - 8 - 8 - nδ 22

Average Clock Period tCK (avg) TBD pδ

Average High Pulse Width tCH (avg) 0.48 0.52 0.48 0.52 0.48 0.52 tCK (avg)

Average Low Pulse Width tCL (avg) 0.48 0.52 0.48 0.52 0.48 0.52 tCK (avg)

12
Shrinking the Window Even More:
Calculating the PLL Jitter

Current Profile : I(f) PDN Impedance : Z(f) Jitter Sensitivity : S(f) PSRR of PLL: P(f)

f f f f

Jitter Spectrum J(f) TIE Jitter : j(t)

iFFT
p-p jitter

f t

I ( f ) × Z ( f ) × S ( f ) × P ( f ) = J ( f ) iFFT
 → jTIE (t )

13
DDR4 Bank Group Timing

 Different timing within a group and between groups (tCCD, tWTR, tRRD)
− “Long” timing: bank-to-bank within a group
− “Short” timing: access to different bank groups
 Maintain array timing requirements within bank group
 Maintain speed between different bank groups

Bank 2 Bank 3 Bank 2 Bank 3

Bank Group 0 Bank Group 1

Bank 0 Bank 1 Bank 0 Bank 1

Bank 2 Bank 3

Short Timings
Long Timings

Bank 2 Bank 3 Bank 2 Bank 3


Bank 0 Bank 1
Bank Group 2 Bank Group 3

Bank 0 Bank 1 Bank 0 Bank 1

Bank Group 1

14
Calibration Is Critical to Shrinking Margins

0.5
FPGA Effects
0.4 External Calibration Calibration
Effects Effects Uncertainty
0.3
Margin (ns)

0.2

0.1 No Margin Without


0
Calibration

-0.1

15
What is Calibration?

Capture Calibration (De-skew)


Before de-skew – small valid capture window After de-skew – maximize valid capture window
DQs
DQs
0 15 30 45 60 75 90 105 120 135 150 165 180
DQ0 0 15 30 45 60 75 90 105 120 135 150 165 180
DQ1 DQ0
DQ2
DQ3
DQ1
DQ4 DQ2
DQ5 DQ3
DQ6
DQ7 DQ4
DQ5

Benefit: Reduce skew between data group  More capture margin

Resync Calibration 0 15 30 45 60 … … … … 315 330 345 360


DQ0
DQ1
DQ2
DQ3
*
Benefit: Accurate strobe placement  *
DQ70
More resync margin DQ71

Valid data window

VT Compensation
Voltage and
Data shifts temperature
due to VT
variations
tracking

Benefit: Dynamic phase adjustment to match shifting


data valid window  Robust over VT

16
High-Level Output Topology

CLK

DQS OUT1 Delay DQS OUT2 Delay DQS

ptap control DQS out dtap1 DQS out dtap2


control control

X+90 phase
X phase
DQ OUT1 Delay DQ OUT2 Delay DQ

DQ out dtap1 DQ out dtap2


control control

 Calibration knobs
− DQ-out1 and DQ-out2 delay : Control the delay applied to outgoing DQ
pins
− DQS-out1 and DQS-out2 delay : Control the delay applied to outgoing DQS
pins
− Write leveling output : Changes the delay on both DQ and DQS relative to
the memory clock-in phase taps

17
High-Level Input Topology
dqs_en ptap
vfifo control control DQS en dtap
control

VFIFO X phase DQS En Delay


DQS

DQS
Enable DQS IN Delay DQS Delay Chain
DDIOin

DQS in dtap
LFIFO control
DQ

DQ IN Delay
Lfifo control

DQ in dtap

 Calibration knobs control

− DQ-in delay: Control the delay applied to incoming DQ pins


− DQS-in delay: Control the delay applied to incoming DQS pins
− LFIFO : Controls number of cycles after read command that data is read out of
the LFIFO
− DQS-En phase: Control the delay on DQS En in phase taps
− DQS-En delay: Control the delay on DQS En in dtaps
− VIFO : Adjusts the delay in cycles applied to controller-provided DQS burst signal
to generate DQS enable

18
Calibration Stages

Start
 DQS-enable calibration
− Calibrate DQS enable (delayed read data valid) relative to DQS
Wait for PLL/DLL locking

 Post-amble tracking
Initialize INST/AC ROM
− Track DQS-enable across temperature variation for all pins on this
Mem Interface
 Read data deskew
Initialize the memory
− Calibrate DQS relative to read command (read leveling) (Mode Registers etc.)
− Calibrate DQ versus DQS (per-bit deskew) for reads Calibration loop

LFIFO training
Calibrate
 the Mem Interface

− Calibrate LFIFO delay cycles (read latency)


N
 Write leveling All Mem Interfaces
calibrated?

− Calibrate DQS and DM to write command (write leveling)


Y
 Write data deskew Y
User command Process DPRIO
− Calibrate DQ versus DQS (per-bit deskew) for writes found in DPRIO? user command

User mode loop


 Address/command training (leveling and deskew) N
Y
− Calibrate CS, CAS, RAS, and ODT versus memory clock User command Process RAM
found in RAM? user command

 VREF training (FPGA and memory) N


− Calibrates receiver voltage threshold
(for DDR4 with pseudo open drain DQs)

19
Calibration Is Critical to Shrinking Margins

0.5
FPGA Effects
0.4 External Calibration Calibration
Effects Effects Uncertainty
0.3
Margin (ns)

0.2

0.1 No Margin Without


0
Calibration

-0.1

20
Good Layout Practices for DDR4

21
DDR4 Output Driver

DDR3 – Push-Pull DDR4 – Pseudo Open Drain

Content Courtesy of Micron


22
Unadjusted, Non-Terminated Data Eye

Overshoot
VDD

VSS
Undershoot

Jitter

Content Courtesy of Micron


23
Terminated Data Eye

Overshoot

VIHac VIHdc

Hi-Ringback
Lo-Ringback Vref

VILdc
VILac

Undershoot

Content Courtesy of Micron


24
OCT from the Controller Standpoint

 DQ and CA pins are terminated differently in DDR4


Specification DDR3 DDR4
512 Mb ~ 8 GB 2 GB ~ 16 GB
Density / Speed
1.6 ~ 2.1 Gbps 1.6 ~ 3.2 Gbps
Voltage 1.5 V / 1.5 V / NA
1.2 V / 1.2 V / 2.5 V
(VDD / VDDQ / VPP) (1.35 V / 1.35 V / NA)
VREF External VREF (VDD / 2) Internal VREF (need training)
Interface
Data I/Os CTT (34 ohm) POD (34 ohm)
CMD/ADDR I/Os CTT CTT
Strobe Bi-directional / differential Bi-directional / differential
Number of banks 8 16 (4 GB)
Page size (x4 / x8 /
Core 1 KB / 1 KB / 2 KB 512 B / 1 KB / 2 KB
x16)
Architect
Number of prefetch 8 bits 8 bits
Added function RESET / ZQ / Dynamic ODT + CRC / DBI / Multi preamble
Package type / balls
78 / 96 BGA 78 / 96 BGA
(x4, x8 / x16)
Physical
DIMM type R, LR, U, SoDIMM + ECC SoDIMM
DIMM pins 240 (R, LR, U) / 204 (So) 284 (R, LR, U) / 256 (So)

25
OCT Calibration Scheme to Support DDR4
 OCT can calibrate 2 times with 2 sets of pins (DQ/CA)
 DQ and CA pins will have 2 different sets of codes in DDR4

DDR4 DDR3

26
General Layout Concerns

 Avoid crossing splits in the power plane


 SSO on controller collapsed strobes/clocks
− Separate supplies and/or flip-chip packaging helps
 Low-pass VREF filtering on controller helps
 Minimize VREF noise
 Minimize intersymbol interference (ISI)
 Minimize crosstalk

Content Courtesy of Micron


27
Layout and Termination (1/12)

 Signal integrity review


− Importance of transmission line theory
 Today’s clock rates are too fast to ignore
− Matched impedance line is important for good signaling
 Mismatched impedance lines result in reflections
 Termination schemes are used to reduce / eliminate reflections
− Good power bussing is paramount to reducing SSO
 SSO reduce voltage and timing margins
− Decoupling capacitors needs and requirements

Content Courtesy of Micron


28
Layout and Termination (2/12)

 Signal integrity analysis is paramount to developing


cost-effective high-speed memory systems
− Develop timing budget for proof of concept
− Use models to simulate
− Board skews are important and should accounted for
− ISI, crosstalk, VREF noise, path length matching, Cin and RTT mismatch –
employ industry practices and assumptions
− Model vias too
− Eliminate return path discontinuities (RPDs)
− Minimize SSO affects
 Difficult to model

Content Courtesy of Micron


29
Layout and Termination (3/12)

 DRAM and controller package parasitics are fixed


− SSO effects already contained in their specified timings
 However, these are to test conditions with specific decoupling

 Power delivery network (PDN) for the controller and


DRAM need to be properly designed
 Lowering power supply inductance minimizes signaling
variations between devices
− Use power and ground planes wherever possible
− Make all power and ground traces as fat as possible
− Couple power and ground as much as possible
 Lowers inductance (mutual effects)

Content Courtesy of Micron


30
Layout and Termination (4/12)

 SSO
− Timing and noise issues generated due to rapid changes in voltage and
current caused by multiple circuits switching simultaneously in the same
direction
 Problems caused by SSO
− False triggers due to power/ground bounce
− Reduced timing margin due to SSO induced skew
− Reduced voltage margin due to power/ground noise
− Slew rate variation

Content Courtesy of Micron


31
Layout and Termination (5/12)

 Good power bussing is paramount to reducing SSO


 dI 
∆V =  L ⋅ 
 dt 

 Reduce L (power delivery effective inductance)


− Use planes for power and ground distribution
− Proper routing of power and ground traces to devices
− Proper use of decoupling capacitance
 Locate as close as possible to the component pins

 Reduce dI/dt (switching current slew rate)


− Use the slowest drive edge that will work
− Use reduced drive strength instead of full drive where possible

Content Courtesy of Micron


32
Layout and Termination (6/12)

 RPDs induce board noise and are difficult to model


− Splits/holes in reference planes Split Return Path
− Connector discontinuities
− Layer changes

 Avoid RPDs if at all possible


− Avoid crossing holes/splits in reference plane
− Route signals so they reference the proper domain
− Add power/ground vias to board
 Especially in dense layer-change areas
− Place decoupling capacitors near connectors Solid Return Path

Content Courtesy of Micron


33
Layout and Termination (7/12)

 VREF noise
− Induces strobe to data skews and reduces voltage margins
− Power/ground plane noise
− Crosstalk

 Minimize VREF noise


− Use widest trace practical to route
 From chip to decoupling capacitor
− Use large spacing between VREF and neighboring traces

Content Courtesy of Micron


34
Layout and Termination (8/12)

 ISI
− Occurs when data is random
 Clocks do not have ISI
− Multiple bits on the bus at the same time
 Bus cannot settle from bit #1 before bit #2, etc.
− Signal edges jitter due to previous bit’s energy still on the bus
− Ringing due to impedance mismatches
− Low pass structures can cause ISI
 Minimize ISI
− Optimize layout
− Keep board/DIMM impedances matched
 Drive impedance should be same as Zo of transmission line
− Terminate nets
 Termination values should be the same as Zo of transmission line
− Select high-quality connector
 Matched to board/DIMM impedance
 Low mutual coupling

Content Courtesy of Micron


35
Layout and Termination (9/12)

 Crosstalk
− Coupling on board, package, and connector from other signals, including
RPDs
 Inductive coupling is typically stronger than capacitive coupling
− When aggressors fire at the same time as victim (e.g. data-to-data coupling)
 Victim edge speeds up or slows down, causing jitter
− When aggressors do not fire at the same time as victim (e.g. data-to-
command/address coupling)
 Noise couples onto victim at time of aggressor switching

Content Courtesy of Micron


36
Layout and Termination (10/12)

 Minimize crosstalk
− Keep bits that switch on same “clock” edge routed together
 Route data bits next to other data bits; never next to CMD/ADDR bits
− Isolate sensitive bits (strobes)
 If need be, route next to signals that rarely switch
− Separate traces by at least two to three {preferred} conductor widths
(more accurately, one would define by trace pitch and height above
reference plane)
 Example: 5-mil trace located 5 mils from a reference plane should have a 15-mil gap
to its nearest neighbors to minimize crosstalk
− Choose a high-quality connector
− Run traces as stripline (as opposed to microstrip)
 Not at the cost of additional vias
− Maintain good references for signals and their return paths
− Avoid RPDs
− Keep driver, BD Zo, and ODT selections well matched

Content Courtesy of Micron


37
Layout and Termination (11/12)

 Cin mismatch
− Differing input capacitances on receiver pins
− Adds skew to input timings

 RTT mismatch
− Termination resistors not at nominal value
− Internal ODT on data pins have smaller variation than on DDR2
 They are calibrated (so is DRAM’s Ron)
− External termination resistor variation must be accounted for
 Consider one-percent resistors

Content Courtesy of Micron


38
Layout and Termination (12/12)

 High-speed signals must maintain a solid reference


plane
− Reference plane may be either VDD or ground
− For DDR3 UDIMM systems, the DQ busses are referenced to ground while
the ADDR/CMD and clock are referenced to VDD
− All signals may be referenced to ground if the layout allows

 Best signaling is obtained when a constant reference


plane is maintained
− If this is not possible try to make the transitions near decoupling capacitors
Cap
Signal
Power Plane

Ground Plane

Content Courtesy of Micron


39
Board Debug Tools to Minimize
Issues

40
TimeQuest DDR Timing: Read Capture

“Before calibration” Errors in the


is the out
Calibrating standard
some calibration algorithm
Effects of
timing analysis
of the process
Calibrating to the temperature and
variation
FPGA in the
variations voltage changes on
memory
(deskew + the calibration
pessimism removal)

Total margin after calibration

41
EMIF Debug Toolkit Features

 Reports results of the last calibration to the user


− Reports interface details, margins observed before calibration, settings
made during calibration, and post-calibration margins
− In the case of a calibration failure, toolkit reports the stage at which
calibration failed and the group
 Provides eye monitor support
 Provides loopback support
 Allows user interaction with memory interface
− Send commands to the memory interface to recalibrate, mask groups and
ranks
− Eye monitor support of data valid window
− Loopback support for bit error rate (BER) testing

42
TimeQuest-Like GUI interface

Reports section

Tasks section

Commands run
Shown in console

43
“On-Chip” EMIF Debug Toolkit

 Core access to calibration data


− Access same calibration data as the EMIF toolkit, now via FPGA logic
 Via Avalon® Memory-Mapped (Avalon-MM) interface

44
Looking Ahead and Conclusion

45
Will There Be a DDR5?

 Very unlikely
− SI for a parallel bus of 2 GHz and above would be very difficult
− Timing budget would be consumed in the package
 PDN noise
 Package skew

 Transition to stack memory


− Hybrid Memory Cube and serialized memory
− 3D memories integrated into ASICs

46
Conclusion

 DDR4 has many ways to reduce overall system power


− ~50% lower power than DDR3 at 1.5 V

 DDR4 is 33% faster than DDR3 2133


 But there are challenges…..
− Shrinking data valid window
− Increase signal integrity and power integrity concerns

 These can be overcome by good controller design


− Innovative calibration
− Good ODT
− Careful board design
− Good board debug tools

47
Thank You

You might also like