Interrupt Moderation Using Intel® Gbe Controllers: April 2007
Interrupt Moderation Using Intel® Gbe Controllers: April 2007
Interrupt Moderation Using Intel® Gbe Controllers: April 2007
April 2007
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Legal Lines and Disclaimers
Intel may make changes to specifications and product descriptions at any time, without notice. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. IMPORTANT - PLEASE READ BEFORE INSTALLING OR USING INTEL PRE-RELEASE PRODUCTS. Please review the terms at http://www.intel.com/netcomms/prerelease_terms.htm carefully before using any Intel pre-release product, including any evaluation, development or reference hardware and/or software product (collectively, Pre-Release Product). By using the Pre-Release Product, you indicate your acceptance of these terms, which constitute the agreement (the Agreement) between you and Intel Corporation (Intel). In the event that you do not agree with any of these terms and conditions, do not use or install the Pre-Release Product and promptly return it unused to Intel. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details. The GbE controllers described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Hyper-Threading Technology requires a computer system with an Intel Pentium 4 processor supporting HT Technology and a HT Technology enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. See http://www.intel.com/ products/ht/Hyperthreading_more.htm for additional information. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com. Intel and Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright 2007, Intel Corporation. All Rights Reserved.
Contents
1.0 Introduction .............................................................................................................. 5 1.1 Reference Documents .......................................................................................... 5 1.2 Glossary of Terms ............................................................................................... 5 1.3 Supported GbE Devices ........................................................................................ 6 1.4 Supported Operating Systems and Drivers.............................................................. 6 Background ............................................................................................................... 7 2.1 Basic Interrupt Processing .................................................................................... 7 2.2 Interrupt Handling with Interrupt Moderation .......................................................... 8 2.3 Trade-Offs Inherent with Interrupt Moderation ........................................................ 9 GbE Controller Interrupt Moderation Features ........................................................... 9 3.1 Absolute Timers .................................................................................................. 9 3.2 Packet Timers ................................................................................................... 10 3.3 Combining the Timers ........................................................................................ 11 3.4 Interrupt Throttling ........................................................................................... 12 Sample Configuration .............................................................................................. 14 4.1 Absolute Timers ................................................................................................ 14 4.2 Packet Timers ................................................................................................... 14 4.3 Interrupt Throttle Timer ..................................................................................... 15 4.4 Additional Tuning Considerations ......................................................................... 15
2.0
3.0
4.0
5.0
Interrupt Moderation Algorithm .............................................................................. 15 5.1 I/O Patterns ..................................................................................................... 16 5.2 Default Settings/Interrupt Rates.......................................................................... 16 5.3 Manual Settings (Windows) ................................................................................ 16 5.4 Manual Settings (Linux) ..................................................................................... 17 5.5 Linux ITR Throughput Comparisons ..................................................................... 18 5.6 Linux ITR Latency Comparisons ................................................................................. 19
Revision History
Revision Date April 2007 Sept 2003 May 2003 Major edit all sections.
Description
The document was modified to be applicable to Intel Gigabit Ethernet Controllers except the 82542, 82543, and 82544. In other words, this document applies to the 82540, 82545, 82546, 82541, and 82547. Initial release.
1.0
Introduction
This application note describes how to use the interrupt moderation features of the Intel GbE Controllers.
Note:
1.1
Reference Documents
PCIe* Family of Gigabit Ethernet Controllers Software Developers Manual PCI/PCI-X Family of Gigabit Ethernet Controllers Software Developers Manual
1.2
Glossary of Terms
Windows* Adaptive Mode - dynamically adjusts interrupt rates (3,000 to 20,000) to the I/O patterns received over the wire. Linux* Mode 3 - dynamically adjusts interrupt rates (4,000 to 20,000) to the I/O patterns received over the wire. Linux* Mode 1 dynamically adjusts interrupt rates (4,000 to 70,000) to the I/O patterns received over the wire. Latency The amount of time it takes for a packet of data to get from one designated point to another. Throughput - The amount of digital data per time unit that is delivered over a network. Usually measured in bits per second (bps). CPU Utilization Amount of CPU time required to transfer data over a network.
1.3
Windows* PCIe* Devices 82571/82572 82573 631xESB/632xESB 82575 PCI-X Devices 82540 82541 82545 82546 82547
Use these links to help identify an adapter or GbE device: http://www.intel.com/design/network/products/ethernet/linecard_ec.htm http://www.intel.com/network/connectivity/products/server_adapters.htm http://www.intel.com/network/connectivity/products/desktop_adapters.htm http://support.intel.com/support/network/adapter/pro100/21397.htm
1.4
Linux* Red Hat Novell SUSE Kernel versions 2.4.x and 2.6.x
Linux* driver versions: 7.3.15 and higher Note: Linux* drivers must be downloaded because they are not currently included in distributions (releases). Windows* driver versions: 9.6.31.0 and higher For the latest Intel network drivers for Linux*, go to: http://downloadfinder.intel.com/scripts-df/support_intel.asp For the latest Intel network drivers for Windows*, go to: http://support.intel.com/support/network/sb/CS-006120.htm
2.0
Background
Interrupt moderation reduces host processor interrupts, thereby enabling technologies such as Gigabit EtherChannel* to deliver more of their 16-Gb/s bandwidth potential (8 Gb/s x full duplex). As a result, performance gains from load-balancing used in adapter teaming becomes a smaller part of overall performance as compared to the performance gains achieved through the increased server CPU headroom provided by interrupt moderation. Host processor interrupts are generated by the GbE controller in order to request cycles for packet processing. These interrupts need to be controlled to achieve optimum throughput. Too few interrupts can lead to latencies and too many can unduly burden the servers processor. By bundling an appropriate number of packets before issuing an interrupt to the host (Figure 1), the GbE controller tunes interrupt frequency to match traffic conditions while maintaining packet flow.
Low-volume traffic
PS stack
High-volume traffic
PS stack
Hardware
Interrupt
Hardware
Packet
Interrupt
Interrupt
Packet
Packet
Packet
Packet
Packet
Packet
As volume increases, multiple packets are bundled for processing on a single interrupt
2.1
At low traffic rates, this behavior is acceptable since this process occurs relatively infrequently. However, as traffic rates increase, the system spends more and more time servicing these interrupts. The overhead of processing these interrupts begins to degrade overall system performance as the CPU spends the majority of its time scheduling and executing the interrupt handler. If the traffic rate continues to increase, the traffic might overrun the GbE controller causing it to drop packets, or the system itself might become temporarily unusable. In addition to the packet events described, the GbE controller might also interrupt after certain external events, such as a change in link status. Since these other events occur relatively infrequently, they do not usually degrade system performance in the same manner as the packet events.
2.2
2.3
3.0
3.1
Absolute Timers
Absolute timers delay the assertion of an interrupt to enable the GbE controller to collect additional interrupt events before delivering them to software. The absolute timers are particularly useful in high traffic environments. The receive absolute timer starts to count down upon receipt of the first packet (after software has enabled interrupts). Subsequent packets, if any, do not alter the countdown. Once the timer reaches zero, the controller generates a new interrupt. In the 82540EM/EP, 82545EM/GM and 82546EB/GB controllers, software controls the receive absolute timer using the Receive Interrupt Absolute Delay Timer (RADV) register (offset 282Ch).
Time
Figure 2. Note:
Receive Absolute Timer The arrival of new packets after the timer has started does not affect the countdown. The transmit absolute timer starts to count down upon transmission of the first packet (after software has enabled interrupts). Subsequent packets, if any, do not alter the countdown. Once the timer reaches zero, the GbE controller generates a new interrupt. Software controls the transmit absolute timer using the Transmit Absolute Interrupt Delay Value (TADV) register (offset 382Ch). The delay values are intended to be relatively large (several packet times in duration). This enables the GbE controller to transmit or receive multiple packets in succession prior to generating an interrupt. The drawback of absolute timers is that a single packet incurs the full countdown latency even when traffic rates are low. Therefore, the absolute timers do not perform well in low traffic situations.
3.2
Packet Timers
Packet timers are inactivity timers, triggering interrupts when the link has been idle for a long interval. Software can use these timers to minimize packet latency in low traffic environments. The receive packet timer starts to count down upon receipt of a new packet. If the GbE controller receives another packet before the timer expires, it resets the timer to its original value and restarts the countdown. If the timer reaches zero, the GbE controller generates a new interrupt. Software controls the receive packet timer using the Receive Interrupt Packet Delay Timer (RDTR) register (offset 108h).
10
Time
Figure 3. Note:
Receive Packet Timer The arrival of new packets after the timer has started causes the timer to restart. The transmit timer begins to count down upon transmission of a new packet. If the GbE controller transmits another packet before the timer expires, it resets the timer to its original value and restarts the countdown. If the timer reaches zero, the GbE controller generates an interrupt. Software controls the receive packet timer using the Transmit Interrupt Delay Value (TIDV) register (offset 440h). Unlike absolute timer delays, packet timer delays are intended to be short (possibly two or three packet times in duration). This minimizes the latency suffered by each packet. The drawback of the packet timers is that they might be chained indefinitely. Under a sustained load, the packet timer never expires until the GbE controller has completely exhausted all of its resources. Therefore, the packet timers do not perform well in high traffic situations.
3.3
11
Time
Packet Timer
Figure 4. Note:
Absolute and Packet Timers at High Traffic Rate In these situations, the absolute timer is the source of most device interrupts. Under light loads or brief bursts of traffic, the packet timers are the primary source of interrupts. Figure 5 shows this process. In these situations, the packet timers determine the latency suffered by most packets. The packet timers also determine the minimum traffic rate required to trigger the absolute timer interrupts. For example, if the traffic rate is high enough to prevent the packet timer from ever expiring, then the GbE controller does not interrupt until the absolute timer has expired.
Time
Packet Timer
Figure 5. Note:
Absolute and Packet Timers at Low Traffic Rate In these situations, the packet timer is the source of most device interrupts.
3.4
Interrupt Throttling
A few additional factors make the process of determining optimal timer settings more challenging. First, while software can configure each pair of timers for the expected workload, the transmit and receive timers operate independently. Therefore, transmit interrupts can disrupt the idealized behavior of the receive interrupt processing and vice-versa. Second, Ethernet traffic is inherently unpredictable, exhibiting both large surges in traffic and other periods of relative inactivity. These fluctuations trigger corresponding storms and lulls in the interrupt rate. Lastly, the use of advanced GbE controller features such as TCP segmentation can introduce additional considerations by altering the usual timing of transmit events. To limit these effects, the GbE controller also provides an interrupt throttling mechanism for placing an upper bound on the GbE controllers interrupt rate. The throttling mechanism operates independently of any interrupt source and no network events affect the throttle mechanism. Software can use the throttle timer to limit the GbE controller to a maximum interrupt rate, regardless of the traffic rate or other external factors. Software controls the throttle timer using the Interrupt Throttling Rate (ITR) register (offset C4h).
12
I nt errupt Rat e
Time
Figure 6. Note: Effect of the Interrupt Throttle on a GbE Controllers Interrupt Rate The interrupt throttle places a ceiling on the number of interrupts per second that the GbE controller might generate. Similar to the absolute and packet timers, the throttle timer is a simple countdown timer. The GbE controller blocks all interrupt sources until the throttle timer expires. If interrupt events are pending when the throttle timer expires, the GbE controller then generates an interrupt. When the countdown reaches zero, the throttle timer resets and restarts its countdown.
Time
Interrupt Throttle
Figure 7. Note:
Interrupt Throttling Used with Packet Timer The throttle timer gates all other device interrupt sources. The throttle timer runs continuously, regardless of packet events.
13
On average and under light loads, the throttle timer adds half of its countdown delay to the latency of each packet. Under heavy loads, it is functionally equivalent to the absolute timers. Therefore, software typically uses the interrupt throttle as a means of restricting the GbE controllers overall interrupt rate rather than as a third interruptgeneration mechanism. For optimal performance, software might configure the throttle mechanism for a slightly higher-than-desired interrupt rate to reduce the per-packet latency previously described.
4.0
Sample Configuration
This section provides GbE controller example settings for illustration purposes only. For optimal performance, the exact GbE controller configuration is best determined through actual experimentation.
Note:
It is assumed that software is optimizing for full-size frames of 1538 bytes. The following calculations make use of the following facts: GbE Ethernet operates at 1.0 Gb/s or 1,000,000,000 bits per second. At this speed, the time required to transmit or receive a single bit (bit-time) is 1.0 ns. A full-size Ethernet frame requires 1538 bytes (12,304 bits) of bandwidth: 8-byte preamble and start-of-frame delimiter 14-byte Ethernet header 1500-byte payload 4-byte FCS 12-byte inter-packet gap The GbE controller can transmit or receive a full-size frame every 12.3 s or approximately 81,000 packets per second.
4.1
Absolute Timers
Configuring the absolute timers is typically a matter of determining the desired interrupt rate or the desired number of packets per interrupt. To receive approximately 3000 interrupts per second, software would configure the absolute timers to interrupt every 333 s. Alternately, to receive approximately 50 packets per interrupt, the GbE controller must interrupt approximately 1620 times per second (81,000 packets-per-second at 50 packets-per-interrupt). Software would then configure the absolute timers to interrupt every 617 s.
4.2
Packet Timers
Testing has shown that values between 20 and 40 s work well for the packet timers. Software might set the packet timers to expire after two full-length packet-times, or approximately 25 s. The packet timers would then expire when throughput falls below about 333 Mb/s (two unused packet-times follow every packet arrival; approximately one-third of the total bandwidth is in use). At greater levels of use, the packet timers would likely chain repeatedly until the one of the absolute timers expired.
14
4.3
4.4
5.0
15
5.1
I/O Patterns
Bulk - This category is for large I/O and high throughput patterns. Throughput essentially reaches wire speed with the average packet size being close to full-size frames. This category allows for lower interrupt rates and reduced CPU utilization. Bulk Intermediate (Windows* Only) This category is for 2 KB and 4 KB I/O sizes along with moderate throughput. This category balances CPU utilization, latency, and throughput for medium size packets. Low Latency This category is for small packet performance and is targeted for small I/O sizes of 64 bytes to 1024 bytes. For this category, acceptable throughput can be achieved (about the same throughput with ITR turned off). Lowest Latency This category is for small ping-pong type messages. Other I/O patterns can transition to this mode when the number of packets are small and the number of total bytes processed are small.
5.2
Linux* (Mode 3)
While in either of these two modes, the current device drivers provide dynamic transitions from bulk/intermediate to low latency (when throughput starts to drop) and to lowest latency when the number of packets are small and the number of total bytes processed are small.
5.3
Note:
To achieve the optimum interrupt rate for a specific I/O pattern, experimentation is required.
16
Note:
Adaptive is the default setting and should be used when variable I/O patterns are likely to occur.
5.4
1. Use Mode 1 when low latency is a must and when variable I/O patterns are likely to occur.
For dual-port GbE controllers, each port can be set independently. For example, one port can be set to adaptive (Mode 3 or Mode 1) and the other to a fixed interrupt rate. If using multiple NICs with multiple ports, each NIC or NIC port can also be set independently. For example, a single-port NIC can be set to adaptive (Mode 3 or Mode 1) and other NICs/NIC ports can be set to fixed interrupt rates. Note: For multiple network ports, ITR must be set for each port (with comma separated values): insmod e1000.ko InterruptThrottleRate=1,1,1,1
17
5.5
1000
100%
900
90%
800
80%
700
70%
Throughput Mbps
600
60%
500
50%
400
40%
300
30%
200
20%
100
10%
0 2E+05 3E+05 4E+05 8E+05 1E+06 2E+06 3E+06 4E+06 6E+06 1 6 16 27 45 64 99 189 256 387 765 1024 1539 3069 4096 6147 12285 16384 24579 49149 65536 98307
0%
Data Size
ITR=3 ITR=1 ITR=0 ITR=8000 Poly. (ITR 0 CPU) Poly. (ITR 3 CPU)
Note:
18
CPU % Utilization
5.6
1000
Latency (us)
100
10
99 13 1 19 5 25 9 38 7 51 5 77 1 10 27 15 39 20 51 30 75 40 99 61 47 81 95 12 29 16 1 38 24 7 57 32 9 77 49 1 15 65 5 53 9 12 19 27 35 51 67 1 4
Data Size
ITR=3 ITR=1 ITR=0 ITR=8000
19