Design of An Open-Source SATA Core: July 2015

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/295010956

Design of an Open-Source SATA Core

Presentation · July 2015


DOI: 10.13140/RG.2.1.4163.8168

CITATIONS READS
0 2,156

1 author:

Nikola Zlatanov
Applied Materials
44 PUBLICATIONS   11 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

AC Power Distribution Systems and Standards View project

Lasers and laser applications View project

All content following this page was uploaded by Nikola Zlatanov on 18 February 2016.

The user has requested enhancement of the downloaded file.


Design of an Open-Source SATA Core
Nikola Zlatanov*

For some applications, it may be desirable to store the acquired data for later analysis.
However, the amount of data would quickly exceed the RAM storage capacity, so it is necessary
to store the data on a dedicated storage device such as a hard drive. To do so, one of the
industry-standard hard drive interfaces must be used. Thus, SATA seems to be a suitable
choice, since it features high throughput for storage, as shown in Table 1.

SATA Version Maximum Speed


SATA Generation 1 (SATA I) 1.5Gb/s, 150MB/s
SATA Generation 2 (SATA II) 3.0Gb/s, 300MB/s
SATA Generation 3 (SATA III) 6.0Gb/s, 600MB/s
Table 1: SATA Generations and Speeds
The SATA protocol uses a layered approach, wherein each layer uses services of the layer
below it and presents services to the layer above it. At the highest level, a fairly simple
Read/Write interface is presented to applications wishing to store data, while the lower layers do
many complex transformations, synchronization, and hand-shaking. The architecture of SATA
will be discussed in more detail shortly.

SATA Overview

Serial ATA is a peripheral interface created in 2003 to replace Parallel ATA, also
known as IDE. Hard drive speeds were getting faster, and would soon outpace the
capabilities of the older standard—the fastest PATA speed achieved was 133MB/s, while
SATA began at 150MB/s and was designed with future performance in mind [2]. Also,
newer silicon technologies used lower voltages than PATA's 5V minimum. The ribbon
cables used for PATA were also a problem; they were wide and blocked air flow, had a
short maximum length restriction, and required many pins and signal lines [2].
SATA has a number of features that make it superior to Parallel ATA. The
signaling voltages are low and the cables and connectors are very small. SATA has
outpaced hard drive performance, so the interface is not a bottleneck in a system. It also
has a number of new features, including hot-plug support.
SATA is a point-to-point architecture, where each SATA link contains only two
devices: a SATA host (typically a computer) and the storage device. If a system requires
multiple storage devices, each SATA link is maintained separately. This simplifies the
protocol and allows each storage device to utilize the full capabilities of the bus
simultaneously, unlike in the PATA architecture where the bus is shared.
To ease the transition to the new standard, SATA maintains backward
compatibility with PATA. To do this, the Host Bus Adapter (HBA) maintains a set of
shadow registers that mimic the registers used by PATA. The disk also maintains a set of
these registers. When a register value is changed, the register set is sent across the serial
line to keep both sets of registers synchronized. This allows for the software drivers to be
agnostic about the interface being used.
SATA uses a layered architecture, depicted in Figure 3.The highest layer is the Application
Layer, which represents the software using the SATA device. Below that is the Command
Layer, which triggers series of Transport Layer actions to implement a PATA command. Next
is the Transport Layer, which handles creating and formatting Frame Information Structures
(FISes), and the valid sequences of FISes. Beneath that is the Link Layer, which encodes the
FISes, handles control signals, and checks for FIS integrity. The lowest layer is the Physical
Layer, which handles the transmission and reception of the actual electrical signal and
maintains alignment. It also takes care of establishing the link, using what is known as Out-of-
band (OOB) signaling.

Figure 3: SATA Layer Architecture

Each layer provides services to the layer above it. This allows for each layer to “abstract away”
the details of the layers below it and simplify the design process. The layers will be discussed
in more depth shortly.

Notes on Terminology
When discussing SATA, there are multiple words that can refer to the same thing, and words
could have different meanings in other contexts. To avoid ambiguity, in this document, we will
try to be consistent in the use and meaning of the following terms.
Dword: Although this term is typically used in the context of a particular processor or processor
family, here it refers to 32 bits of data, or 4 bytes. This is consistent with other SATA literature.
However, note that a Dword is encoded as 40 bits while on the line. Despite the size change,
this is still referred to as a “Dword” because the encoded data is never manipulated directly, and
once decoded, will again be 32 bits.
Core, Host Bus Adapter (HBA): This refers to the SATA design being presented in this work.
That is, the hardware that interfaces with a disk and handles the SATA protocol.
Host: This refers to the system that is interfacing with the disk, and includes the HBA. An
example of a host would be a PC, or the SSD board. Since the SATA protocol is asymmetric,
“Host” can also refer to the host's side of the protocol.
Disk, device: This refers to the hard drive with which we are communicating. Although disk is
unambiguous, device could refer to any number of things. In this work, “device” refers to the
hard disk, unless context indicates otherwise.
Frame Information Structure (FIS): A Frame Information Structure, or FIS, is a single data
payload that is sent over the SATA link. These are analogous to “packets” in network
terminology. There are multiple types of FISes, and all of them are wrapped by Start of Frame
(SOF) and End of Frame (EOF) primitives. The protocol defines valid sequences of FISes for
data transfer. One or more of these FISes will be Data FISes, that actually contain the data to
be read or written. The maximum size of a single FIS is 8KB.
SATA Details
Physical Layer
The physical layer is the lowest layer of the SATA protocol stack. It handles the electrical signal
being sent across the cable. The physical layer also handles some other important aspects, such
as resets and speed negotiation.
SATA uses low-voltage differential signaling (LVDS). Instead of sending 1's and 0's relative to a
common ground, the data being sent is based on the difference in voltage between two conductors
sending data. In other words, there is a TX+ and a TX- signal. A logic 1 corresponds to a high
TX+ and a low TX-; and vice versa for a logic 0. SATA uses a ±125mV voltage swing.
This scheme was chosen for multiple reasons. For one, it improves resistance to noise. A
source of interference will likely affect both conductors in the same way, since they are parallel
to each other. However, a change in voltage on both conductors does not change the difference
between them, so the signal will still be easily recovered. Low- voltage differential signaling also
reduces electromagnetic interference (EMI), and the lower signaling voltages means that less
power is used.
Out-of-Band Signaling
As stated earlier, the physical layer is also responsible for link initialization and resets. But how
can a host and a device communicate to initialize the link if they don't have a link with which to
communicate? The scheme that SATA uses is called out-of-band (or OOB) signaling.
Under this scheme, it is assumed that the host and the device can detect the presence or
absence of a signal, even if they cannot yet decode that signal. OOB signals are essentially
that—whether or not an in-band signal is there. By driving TX+ and TX- to the same common
voltage (so not a logic 1 or a logic 0), one party can transmit an OOB “lack of signal.”
Link initialization is performed by sending a sequence of OOB primitives, which are defined
patterns of signal/no-signal. There are three defined primitives: COMRESET, COMINIT, and
COMWAKE. Each primitive consists of six “bursts” of a present signal, with idle time in between.
The times of each burst are defined in terms of “Generation 1 Unit Intervals” (U), which is the
time to send 1 bit at the SATA I rate of 1.5Gb/s.
Table 2 shows the definitions of the primitives. There are also fairly loose tolerances defined for
each signal. Note also that COMRESET and COMINIT have the same definition—the only
difference is that COMRESET is sent by the host, and COMINIT is sent by the device.
OOB Signal Burst Length Inter-burst Idle Time
COMRESET 106ns (160U) 320ns (480U)
COMINIT 106ns 320ns
COMWAKE 106ns 106ns
Table 2: OOB Primitive Definitions
The COMRESET signal, sent by the host, is used to reset the link. Following a COMRESET, the
OOB initialization sequence is performed again. COMRESET can also be sent repeatedly to
hold the link in a reset state.
The OOB Sequence
The initialization state machine for the host follows this sequence to establish communications
with the disk. This sequence is illustrated in Figure 4.
First, a COMRESET is sent. The host then waits for a COMINIT from the device.
If no COMINIT is received, the host can send more COMRESETs until it receives one, and
assume that no device is connected until it does. After receiving COMINIT, the host is given
time to optionally calibrate its receiver and transmitter. For example, it may be necessary to
adjust signal parameters or termination impedances. The host then sends a COMWAKE to the
device, and expects the same in return. After this, the host waits to receive an ALIGN primitive
(an in-band signal which will be explained shortly).
Meanwhile, it sends a “dial-tone” to the device: an alternating pattern of 1's and 0's. This was
intended as a cost-saving feature, so that disks with cheap oscillators could instead use the dial-
tone as a reference clock for locking.

Figure 4: OOB Initialization Sequence


It is also at this stage that speed negotiation is performed. The device will send ALIGN
primitives at the fastest speed it supports, and wait for the host to acknowledge them. If it does
not receive an acknowledgment, then it tries the next lowest speed, and so on until an
agreement is found. Alternatively, if the host supports faster speeds than the device, then
ALIGN primitives it receives will appear “stretched”; the host can then slow down to
accommodate. When the host receives valid ALIGN primitives, it sends ALIGNs back to
acknowledge. Both parties then send SYNC or other non-ALIGN primitives, and the link is
ready.

8b/10b Encoding
The Physical Layer also handles encoding the data before sending it. The scheme used in
SATA is 8b/10b encoding, which is also used in PCI Express, USB 3.0, and many other high
speed protocols. 8b/10b Encoding has a number of properties that make it useful for this
purpose.
One primary function of 8b/10b encoding is clock recovery. Under this scheme, there are never
more than five ones or zeros in a row. In other words, there are many bit transitions in the data
stream. This allows the receiver to recover the clock using a PLL or by oversampling the data.
This is important for serial data, as otherwise a stream of 12 ones in a row, for example, could
be interpreted as 11 or 13 ones instead.
The encoding of data maps each byte to a 10-bit character, instead of an 8-bit one.
Only 10-bit characters that have enough transitions are used. Also, the scheme tries to maintain
DC Balance, and uses the 10-bit patterns with an equal number of ones and zeros. However,
there are not enough of these to accommodate the 256 possible values of a byte, so also those
patterns with 6 zeros and 4 ones (or vice versa) are used.
The encoder keeps track of the running disparity to maintain DC Balance. The running disparity
changes each time an uneven pattern is sent. For example, if a 10-bit character with 6 zeros and
4 ones was just sent, the running disparity is now negative.
The next character therefore must have positive disparity (4 zeros and 6 ones) or neutral disparity
(5 and 5). Thus, many of the bytes actually have two encodings—one positive and the other
negative. The current running disparity determines which encoded value to use. Running disparity
also acts as a means to detect transmission errors.

Figure 5: 8b/10b Encoding Example


In this encoding example, note that the number of ones and zeros in the output is different
depending on the current disparity. Both of the encodings correspond to the same data byte
(0x06, or D6.0 in the encoding table).

The Comma and the ALIGN primitive


In addition to the 256 valid data encodings for each byte (referred to as Dx.x symbols), there are
also special control symbols that can be sent. These do not correspond to a data byte and are
referred to as Kx.x symbols, or K characters. SATA uses two K characters: K28.3 and K28.5.
The first is used to distinguish link layer primitives, and the other is the comma character.
The comma is a special character that is used to determine byte alignment in the data stream.
We've already discussed how 8b/10b encoding provides enough transitions in the stream to
recover a clock (essentially providing bit alignment), but it would not be possible to decode the
actual characters being sent if it is not known where they begin and end.
The comma is a special character because it is the only place in the data stream where there
are five zeros or five ones in a row (depending on disparity), followed by two bits of the opposite.
Thus, the receiver can detect this unique pattern and know that it is a comma, and therefore find
the 10-bit boundaries between the symbols.

Figure 6: Comma Alignment Example

In SATA, the comma is used as part of the ALIGN primitive. Link Layer primitives, which will be
discussed shortly, are 4 bytes long and always begin with a K character. The ALIGN is the only
one to contain the comma, K28.5. That is why it is used as part of the link initialization
procedure, so that byte boundaries can be determined before attempting to send data.
The SATA protocol also specifies that at least two ALIGNs must be sent every 256 Dwords, and
they must be sent in pairs. This happens even when data is being sent. This ensures that the
byte boundary is not lost, and both the host and the disk must send these ALIGNs. It also acts
as a way to manage small frequency differences between the sender and receiver. For
example, if the sender's clock is running a bit faster than the receiver's, the receiver's buffer may
eventually overflow. Since ALIGNs are sent periodically and they are not data-important, they
can be dropped to prevent this from occurring.
Spread-Spectrum Clocking
To further reduce EMI, the SATA specification requires that a receiver be able to lock to a
bitstream that uses spread-spectrum clocking (SSC). SSC is a scheme where-in the line rate
does not stay constant, but varies slightly over time. This spreads the emissions over a wider
frequency range. The transceivers on the Virtex-4 are able to receive SSC signals, but does not
use it when transmitting.
Link Layer
The link layer is the next layer and is directly above the physical layer. This layer is responsible
for encapsulating data payloads and manages the protocol for sending and receiving them. A
data payload that is sent is called a Frame Information Structure (FIS). The link layer also
provides some other services for ensuring data integrity, handling flow control, and reducing
EMI.
The host and the disk each have their own transmit pair in a SATA cable, and theoretically data
could be sent in both directions simultaneously. However, this does not occur. Instead, the
receiver sends “backchannel” information to the sender that indicates the status of the transfer
in progress. For instance, if an error were to be detected mid- transmission, such as a disparity
error, the receiver could notify the sender of this.
The link layer uses a set of defined Link Layer Primitives to perform these functions. Primitives
are each 4 Dwords long and start with the control character K28.3 (except for ALIGN, as
discussed above). The following table lists most of the defined primitives and their value in
hexadecimal before encoding. The usage of these will be discussed in more detail.

Primitive Hex Representation


ALIGN 0x7B4A4ABC
SYNC 0xB5B5957C
X_RDY 0x5757B57C
R_RDY 0x4A4A957C

SOF 0x3737B57C
R_IP 0x5555B57C
HOLD 0xD5D5AA7C
HOLD_ACK 0x9595AA7C
EOF 0xD5D5B57C
WTRM 0x5858B57C
R_OK 0x3535B57C
R_ERR 0x5656B57C
CONT 0x9999AA7C

Table 3: Link Layer Primitives


ALIGN: This primitive, as discussed in the previous section, allows the receiver to determine the
byte boundaries in the data stream. A pair of them is sent at least every 256 Dwords regardless
of what state the link layer is in.
SYNC: SYNC is used to indicate that the line is idle. When frames are not being sent, both the
host and the disk will send this primitive. This primitive also has a special function called the
“SYNC Escape.” If the host sends a SYNC, the line is forced to go idle, terminating all current
transfers. The disk must respond SYNC. This way, if the host needs to issue a soft reset, it can
do so.
X_RDY: This primitive indicates that there is data that is ready to be sent. It will be sent
repeatedly by the disk or host until it is acknowledged. If both parties are simultaneously
sending X_RDY, it is expected that the host will back down. R_RDY: Indicates that the party is
ready to receive a FIS. This primitive is used to acknowledge X_RDY, or can be sent
preemptively if a transfer is expected.
SOF: A primitive that signals the start of a FIS (Start of Frame). The next Dwords sent after this
are data.
R_IP: Receive In Progress. This is a backchannel primitive that is used by a receiver to indicate
that it is currently receiving the FIS.
HOLD: The HOLD primitive is used for flow control management, which will be discussed in
more detail shortly.
HOLD_ACK: Acknowledges a HOLD.
EOF: A primitive that signals the end of a FIS. No more data will be sent until a new FIS transfer
is started. It also indicates that the previous Dword was the CRC.
WTRM: Waiting for Termination. This is sent repeatedly after EOF by the sender of a FIS. It
indicates that the sender is waiting for acknowledgment of the frame.
R_OK: This primitive is sent by the receiver to indicate that the FIS was received correctly, and
that the CRC was correct.
R_ERR: This primitive indicates that there was an error with the reception of the FIS. Most
likely, the CRC was incorrect. However, it could also indicate a parity error.
CONT: The CONT primitive is used to reduce EMI created by primitives. There are many times
where the same primitive is sent repeatedly, and this would cause certain frequencies to have
more EMI noise. The CONT primitive eliminates that problem by using pseudo-randomly-
generated garbage data. If a sender would send many repeated primitives, instead it can send
CONT. The receiver should then treat the CONT, and all the following random data, as if the
original primitive was still being sent. This continues until a new valid primitive is received (none
of the junk data are K characters). For example, a sender may send SYNC, SYNC, CONT,
XXXX, XXXX, …., X_RDY. The CONT indicates that the receiver should “pretend” that SYNCs
are still being sent, up until the next valid primitive (X_RDY). By using garbage data instead of
repeated primitives, the EMI is distributed across a broader spectrum.

Figure 7: Link Layer Primitives in Action


In this screen capture from the Chipscope debugging tool, we see that the host is sending
WTRM while the disk sends R_IP. It then sends CONT followed by some garbage data, which
should be treated as a continued R_IP.
PMREQ_P/PMREQ_S/PMACK/PMNAK: These primitives facilitate power management.
However, they are not implemented in this work nor are they necessary for correct SATA
operation. Thus, they will not be discussed further. For more information regarding these
primitives..
A typical FIS transfer happens as follows. The sender indicates that they have data to send
using X_RDY. The sender then waits for R_RDY from the receiver. The sender then sends a
(single) SOF, followed by the data to be sent. When the receiver sees the SOF, it will switch
from sending R_RDY to R_IP. Once all of the data in the FIS has been sent, the CRC is sent,
followed by EOF. The sender then starts sending WTRM until it gets R_OK, R_ERR, or SYNC.
The latter two indicate an error, with SYNC meaning a protocol or unknown error. The receiver,
upon getting EOF, checks the CRC, which it knows is the previous Dword. It then replies either
R_OK or R_ERR. The sender acknowledges the R_OK or R_ERR by sending SYNC. The
receiver then also sends SYNC, the line has returned to idle, and the transfer is complete. This
process is illustrated in Figure 8.
Flow Control
As stated before, HOLD and HOLD_ACK are the primitives used for flow control. They are used
in two situations to temporary pause the transmission of data in the middle of a FIS. The first
situation is if the receiver's buffer is getting full and can't accept any more data. For example,
this could happen if a hard drive cannot write data as fast as the protocol allows. The receiver
would then change from sending R_IP to HOLD. The sender would then pause the sending of
data and respond with HOLD_ACK. When there is once again enough room in the buffer, the
receiver sends R_IP again, and the sender can resume sending the data.
The second situation occurs when the transmitter is waiting for more data to send.
In this case, the sender sends HOLD until it is ready to continue to send data. Once again,
HOLD_ACK is sent in reply.
Of course, these primitives do not travel down the cable instantaneously. There is a delay
between the time that a HOLD is sent and the time that the HOLD is received. But this could
lead to data loss if the sending party was not yet aware of the requested HOLD and continued to
send data. Thus, the protocol specifies a maximum delay, referred to as the maximum signal
latency or the HOLD latency. This latency includes not only the time on the wire, but also the
time to decode, interpret, and react to the HOLD.
The HOLD latency is specified as the time to send 20 Dwords. Thus, a receiver can send a
HOLD when there are 20 Dwords of space left in its buffer and no data will be lost. Before 20
more Dwords of data arrive, the sender will have switched to HOLD_ACK.
CRC
SATA uses a Cyclic Redundancy Check (CRC) on each and every FIS to ensure data integrity.
The CRC used is CRC-32, the same that is used for Ethernet and some other protocols. This
CRC can reliably detect up to two bit errors on data blocks as large as 2064 Dwords. Thus, the
CRC places a limit on the maximum size of a FIS. The limit is defined to be 2049 Dwords for
SATA.
Scrambling
As stated in section 3.2, one of the functions that the link layer performs is EMI reduction. The
CONT primitive does this for primitives, but it is also done for FISes. The contents of a frame,
including the CRC, are scrambled before being sent. To do this, the data is XORed (a bitwise
exclusive OR operation) with a pseudo-random number generator. Specifically, the PRNG used
is a Galois Linear Feedback Shift Register (LFSR).
At the start of each frame, the scrambler is reset. The receiver, using the same Galois LFSR,
can then descramble the data by again XORing the data with the output of the scrambler.
Primitives are never scrambled, even those sent in the middle of a frame (such as HOLD and
ALIGN).
Transport Layer
The transport layer is responsible for constructing, delivering, and receiving Frame Information
Structures. It defines the format of each FIS and the valid sequence of FISes that can
exchanged.
The first byte of each FIS defines the type. The second byte contains type- dependent control
fields. The following table lists some of the types of FISes that are defined, and the value of their
type field.

FIS Type Type Value


Register – Host to Device 0x27
Register – Device to Host 0x34
Data 0x46
DMA Activate 0x39
Table 4: FIS Types
A number of other FIS types are defined, but they are not implemented in this work. For more
details on other FIS types.
The Register FIS types are used to transfer the contents of the shadow registers to the device,
and the device registers back to the host. These registers mirror those used for PATA, and are
the means by which commands are triggered. Some of the relevant fields are the Command
field, which holds the PATA command to be executed; the addressing fields; and the sector
count fields. A sector is 512 bytes.
The Data FIS is a very simple FIS. After the type field, the remainder of the first Dword is
reserved. Following that is the actual data to be delivered. The maximum length of the data for a
single FIS is 8KB. This is to ensure that the CRC is capable of checking the data.
The DMA Activate FIS is sent by the device to indicate that it is ready to receive data. After a
write request has been made, the disk may need to prepare itself before it can receive data. For
example, it may need to flush its buffer or move the head to the correct location. It is a very
short FIS, consisting only of a single Dword. It contains the FIS type and the rest of the bits are
reserved.
For read and write operations, the sequence of valid FISes is fairly simple. To perform a read,
the host sends a Register – Host to Device (H2D Register) FIS to the disk with the PATA read
command in the Command field. It then waits to receive one or more Data FISes (depending on
the length of the operation) from the disk. After that, the device will send a D2H Register FIS to
indicate its status.
Write operations are fairly similar. The host again sends an H2D Register FIS to the disk, but
now with the PATA write command. It then awaits a DMA Activate FIS, indicating that the disk is
ready. It then sends a Data FIS. If the operation is larger than 8KB, the host must wait for a new
DMA Activate before sending each Data FIS. After the operation is complete, the device will
again send a D2H Register FIS with status information.

SATA Conclusion
Overall, SATA is a very suitable protocol for the mass storage design. It allows for high speed
storage compatible with almost any hard drive or SSD available on the market. Also, it is a very
robust protocol, making it suitable for use pretty much everywhere. Each and every frame has a
CRC to protect against bit errors. Low-voltage differential signaling adds noise immunity and
decreases power consumed. There are also numerous methods employed to reduce EMI.

Resources
[1] “Serial ATA: Meeting Storage Needs Today and Tomorrow”. SATA-IO.
http://www.serialata.org/documents/SATA-Rev-30-Presentation.pdf, Jun 2009.
[2] K. Grimsrud and H. Smith, Serial ATA Storage Architecture and Applications.
Hillsboro, OR: Intel Press, 2003.
[3] D. Anderson, SATA Storage Technology. Colorado Springs, CO: Mindshare Press, 2007.
[4] “Serial ATA I/II Host Controller (SATA_HI).” ASICS World Services.
http://www.xilinx.com/publications/3rd_party/products/ASICSWS_SATA_H1.p df, May 2008.

* Mr. Nikola Zlatanov spent over 20 years working in the Capital Semiconductor Equipment Industry. His work at Gasonics, Novellus,
Lam and KLA-Tencor involved progressing electrical engineering and management roles in disruptive technologies. Nikola received his
Undergraduate degree in Electrical Engineering and Computer Systems from Technical University, Sofia, Bulgaria and completed a
Graduate Program in Engineering Management at Santa Clara University. He is currently consulting for Fortune 500 companies as well
as Startup ventures in Silicon Valley, California.

View publication stats

You might also like