Intel (R) VT For Direct IO
Intel (R) VT For Direct IO
Intel (R) VT For Direct IO
Directed I/O
Architecture Specification
September 2008
Revision: 1.2
Order Number: D51397-004
Copyright © 2008, Intel Corporation. All Rights Reserved.
Legal Lines and Disclaimers
Intel and Itanium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS
OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING
TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for
use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may
contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are
available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature may be obtained by calling
1-800-548-4725 or by visiting Intel's website at http://www.intel.com.
This document contains information on products in the design phase of development.
64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications
enabled for Intel® 64 architecture. Performance will vary depending on your hardware and software configurations. Consult with your system vendor for
more information.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some
uses, certain platform software enabled for it. Functionality, performance or other benefits will vary depending on hardware and software configurations
and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the
presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel
or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.
Contents
1 Introduction
1.1 Audience ........................................................................................................ 2-9
1.2 Glossary ...................................................................................................... 2-10
1.3 References ................................................................................................... 2-11
2 Overview
2.1 Intel® Virtualization Technology Overview ........................................................ 3-12
2.2 VMM and Virtual Machines .............................................................................. 3-12
2.3 Hardware Support for Processor Virtualization ................................................... 3-12
2.4 I/O Virtualization........................................................................................... 3-13
2.5 Intel® Virtualization Technology For Directed I/O Overview................................. 3-13
2.5.1 Hardware Support for DMA Remapping................................................... 3-14
2.5.1.1 OS Usages of DMA Remapping................................................. 3-14
2.5.1.2 VMM Usages of DMA Remapping .............................................. 3-15
2.5.1.3 DMA Remapping Usages by Guests........................................... 3-15
2.5.1.4 Interaction with Processor Virtualization.................................... 3-16
2.5.2 Hardware Support for Interrupt Remapping ............................................ 3-17
2.5.2.1 Interrupt Isolation ................................................................. 3-17
2.5.2.2 Interrupt Migration................................................................. 3-17
2.5.2.3 x2APIC Support ..................................................................... 3-17
3 DMA Remapping
3.1 Domains and Address Translation .................................................................... 4-18
3.2 Remapping Hardware - Software View.............................................................. 4-19
3.3 Mapping Devices to Domains .......................................................................... 4-19
3.3.1 Source Identifier ................................................................................. 4-20
3.3.2 Root-Entry ......................................................................................... 4-20
3.3.3 Context-Entry ..................................................................................... 4-21
3.3.3.1 Context Caching .................................................................... 4-22
3.4 Address Translation ....................................................................................... 4-22
3.4.1 Multi-Level Page Table ......................................................................... 4-22
3.4.2 Adjusted Guest Address Width (AGAW) .................................................. 4-24
3.4.3 Multi-level Page Table Translation.......................................................... 4-24
3.4.4 I/O Translation Lookaside Buffer (IOTLB)................................................ 4-25
3.5 DMA Remapping Fault Conditions .................................................................... 4-25
3.5.1 Hardware Handling of Faulting DMA Requests.......................................... 4-25
3.6 DMA Remapping - Usage Considerations........................................................... 4-27
3.6.1 Identifying Origination of DMA Requests ................................................. 4-27
3.6.1.1 Devices Behind PCI Express to PCI/PCI-X Bridges....................... 4-27
3.6.1.2 Devices Behind Conventional PCI Bridges .................................. 4-27
3.6.1.3 Root-Complex Integrated Devices ............................................ 4-27
3.6.1.4 PCI Express Devices Using Phantom Functions ........................... 4-27
3.6.2 Handling DMA Requests Crossing Page Boundaries................................... 4-28
3.6.3 Handling of Zero-Length Reads ............................................................. 4-28
3.6.4 Handling DMA to Reserved System Memory ............................................ 4-28
3.6.5 Root-Complex Peer to Peer Considerations.............................................. 4-29
3.6.6 Handling of Isochronous DMA ............................................................... 4-29
4 Support For Device-IOTLBs
4.1 Hardware Handling of ATS .............................................................................. 5-31
4.1.1 Handling of ATS Protocol Errors ............................................................. 5-31
4.1.2 Root Port Handling of ATS .................................................................... 5-32
4.1.3 Handling of ATS When Remapping Hardware Disabled .............................. 5-32
Figures
Figure 1-1. General Platform Topology ......................................................................... 2-9
Figure 2-2. Example OS Usage of DMA Remapping .......................................................3-14
Figure 2-3. Example Virtualization Usage of DMA Remapping .........................................3-15
Figure 2-4. Interaction Between I/O and Processor Virtualization ....................................3-16
Figure 3-5. DMA Address Translation ..........................................................................4-19
Figure 3-6. Requester Identifier Format.......................................................................4-20
Figure 3-7. Device to Domain Mapping Structures.........................................................4-21
Figure 3-8. Example Multi-level Page Table ..................................................................4-23
Figure 3-9. Example Multi-level Page Table (with 2MB Super Pages)................................4-24
Figure 5-10. Compatibility Format Interrupt Request.......................................................6-39
Figure 5-11. Remappable Format Interrupt Request........................................................6-40
Figure 5-12. Interrupt Requests on ItaniumTM Platforms .................................................6-44
Figure 5-13. I/OxAPIC RTE Programming ......................................................................6-45
Figure 5-14. MSI-X Programming .................................................................................6-46
Figure 5-15. Remapping Hardware Interrupt Programming in Intel®64 xAPIC Mode............6-47
Figure 5-16. Remapping Hardware Interrupt Programming in Intel®64 x2APIC Mode ..........6-48
Figure 5-17. Remapping Hardware Interrupt Programming on ItaniumTM................................... 6-49
Figure 6-18. Context Cache Invalidate Descriptor ...........................................................7-56
Figure 6-19. IOTLB Invalidate Descriptor.......................................................................7-56
Figure 6-20. Device-IOTLB Invalidate Descriptor ............................................................7-57
Figure 6-21. Interrupt Entry Cache Invalidate Descriptor .................................................7-58
Figure 6-22. Invalidation Wait Descriptor ......................................................................7-59
Figure 8-23. Hypothetical Platform Configuration............................................................9-73
Figure 9-24. Root-Entry Format ................................................................................. 10-80
Figure 9-25. Context-Entry Format ............................................................................. 10-82
Figure 9-26. Page-Table-Entry Format ........................................................................ 10-85
Figure 9-27. Fault-Record Format............................................................................... 10-88
Figure 9-28. Interrupt Remapping Table Entry Format................................................... 10-90
Figure 10-29. Version Register .................................................................................... 11-99
Figure 10-30. Capability Register ............................................................................... 11-100
Figure 10-31. Extended Capability Register ................................................................. 11-104
Figure 10-32. Global Command Register ..................................................................... 11-106
Figure 10-33. Global Status Register .......................................................................... 11-111
Figure 10-34. Root-Entry Table Address Register ......................................................... 11-113
Figure 10-35. Context Command Register ................................................................... 11-114
Figure 10-36. IOTLB Invalidate Register...................................................................... 11-118
Figure 10-37. Invalidate Address Register ................................................................... 11-121
Tables
Table 1. Glossary................................................................................................. 2-10
Table 2. References ............................................................................................. 2-11
Table 3. Unsuccessful DMA requests....................................................................... 4-26
Table 4. Unsuccessful Translation Requests ............................................................. 5-33
Table 5. Successful Translation Requests ................................................................ 5-34
Table 6. Unsuccessful Translated Requests .............................................................. 5-35
Table 7. Address Fields in Remappable Interrupt Request Format ............................... 6-40
Table 8. Data Fields in Remappable Interrupt Request Format ................................... 6-41
Table 9. Interrupt Remapping Fault Conditions ........................................................ 6-43
Table 10. Index Mask Programming ......................................................................... 7-58
Revision History
1 Introduction
This document describes the Intel® Virtualization Technology for Directed I/O (“Intel® VT for Directed
I/O”); specifically, it describes the components supporting I/O virtualization as it applies to platforms
that use Intel® processors and core logic chipsets complying with Intel® platform specifications.
P ro c e s s o r P ro c e s s o r
S y s te m B u s
N o rth B rid g e
D M A R e m a p p in g
DRAM
In te g ra te d
D e v ic e s
P C I E x p re s s S o u th P C I, L P C ,
D e v ic e s B rid g e L e g a c y d e v ic e s
The following topics are not covered (or are covered in a limited context):
• Intel® Virtualization Technology for Intel®64 Architecture. For more information, refer to the
“Intel® 64 Architecture Software Developer's Manual, Volume 3B: System Programming Guide”.
• Intel® Virtualization Technology for Intel® Itanium® Architecture. For more information, refer to
the “Intel® Itanium® Architecture software developer's manuals”.
1.1 Audience
This document is aimed at hardware designers developing Intel platforms or core-logic providing
hardware support for virtualization. The document is also expected to be used by operating system
and virtual machine monitor (VMM) developers utilizing the I/O virtualization hardware functions.
1.2 Glossary
The document uses the terms listed in the following table.
Table 1. Glossary
Term Definition
Chipset /
Refers to one or more hardware components that connect processor complexes to the
Root-
I/O and memory subsystems. The chipset may include a variety of integrated devices.
Complex
A hardware representation of state that identifies a device and the domain to which the
Context
device is assigned.
Context
Remapping hardware cache that stores device to domain mappings
Cache
DMA
The act of translating the address in a DMA request to a host physical address (HPA).
Remapping
A collection of physical, logical, or virtual resources that are allocated to work together.
Domain
Used as a generic term for virtual machines, partitions, etc.
DMA Virtual Address: a virtual address in a DMA request. For certain virtualization
DVA
usages of remapping, DVA can be the Guest Physical Address (GPA).
GAW Guest Address Width: the DMA virtual addressability limit for a Guest partition.
Guest Physical Address: the view of physical memory from software running in a
GPA
partition. GPA is used in this document as an example of DVA.
HAW Host Address Width: the DMA physical addressability limit for a platform.
Interrupt Entry Cache: A translation cache in remapping hardware unit that caches
IEC
frequently used interrupt-remapping table entries.
Interrupt
The act of translating an interrupt request before it is delivered to the CPU complex.
Remapping
Maximum Guest Address Width: the maximum DMA virtual addressability supported by a
MGAW
remapping hardware implementation.
Page Directory Entry cache: address translation caches in a remapping hardware unit
PDE Cache that caches page directory entries at the various page-directory levels. Also referred to
as non-leaf caches in this document.
A 16-bit identification number to identify the source of a DMA or interrupt request. For
Source ID PCI family devices this is the ‘Requester ID’ which consists of PCI Bus number, Device
number, and Function number.
Virtual Machine Monitor: a software layer that controls virtualization. Also referred to as
VMM
hypervisor in this document.
1.3 References
For related information, see the documents listed in the following table.
Table 2. References
Description
®
Intel 64 Architecture Software Developer's Manuals
http://developer.intel.com/products/processor/manuals/index.htm
®
Intel 64 x2APIC Architecture Specification
http://developer.intel.com/products/processor/manuals/index.htm
PCI Express Single-Root I/O Virtualization and Sharing (SR-IOV) Specification, Revision 1.0
http://www.pcisig.com/specifications/iov
ACPI Specification
http://www.acpi.info/
2 Overview
This chapter provides a brief overview of Intel® VT, the virtualization software ecosystem it enables,
and hardware support offered for processor and I/O virtualization.
The VMM is a key component of the platform infrastructure in virtualization usages. Intel® VT can
improve the reliability and supportability of virtualization infrastructure software with programming
interfaces to virtualize processor hardware. It also provides a foundation for additional virtualization
support for other hardware components in the platform.
Intel® VT provides hardware support for processor virtualization. For Intel®64 processors, this
support consists of a set of virtual-machine extensions (VMX) that support virtualization of processor
hardware for multiple software environments by using virtual machines. An equivalent architecture is
defined for processor virtualization of the Intel® Itanium® architecture.
Depending on the usage requirements, a VMM may support any of the above models for I/O
virtualization. For example, I/O emulation may be best suited for virtualizing legacy devices. I/O
assignment may provide the best performance when hosting I/O-intensive workloads in a guest.
Using new software interfaces makes a trade-off between compatibility and performance, and device
I/O sharing provides more virtual devices than the number of physical devices in the platform.
DMA remapping provides hardware support for isolation of device accesses to memory, and enables
each device in the system to be assigned to a specific domain through a distinct set of I/O page
tables. When the device attempts to access system memory, the DMA-remapping hardware intercepts
the access and utilizes the I/O page tables to determine whether the access can be permitted; it also
determines the actual location to access. Frequently used I/O page table structures can be cached in
hardware. The DMA remapping can be configured independently for each device, or collectively across
multiple devices.
Domain 1 Domain 2
DMA-Remapping Hardware
Virtual Machine
VM (0) Virtual Machine
VM (n) Virtual Machine
VM (0) Virtual Machine
VM (n)
App App
App App App
App App App
App App App
App
DMA-Remapping Hardware
In this model, the VMM restricts itself to enabling direct assignment of devices to their partitions.
Rather than invoking the VMM for all I/O requests from a partition, the VMM is invoked only when
guest software accesses protected resources (such as configuration accesses, interrupt management,
etc.) that impact system functionality and isolation.
To support direct assignment of I/O devices, a VMM must enforce isolation of DMA requests. I/O
devices can be assigned to domains, and the DMA-remapping hardware can be used to restrict DMA
from an I/O device to the physical memory presently owned by its domain. For domains that may be
relocated in physical memory, the DMA-remapping hardware can be programmed to perform the
necessary translation.
I/O device assignment allows other I/O sharing usages — for example, assigning an I/O device to an
I/O partition that provides I/O services to other user partitions. DMA-remapping hardware enables
virtualization software to choose the right combination of device assignment and software-based
methods for I/O virtualization.
and perform invalidations of remapping hardware. Due to the non-restartability of faulting DMA
transactions (unlike CPU memory management virtualization), a VMM cannot perform lazy updates to
its shadow DMA-remapping structures. To keep the shadow structures consistent with the guest
structures, the VMM may expose virtual remapping hardware with eager pre-fetching behavior
(including caching of not-present entries) or use processor memory management mechanisms to
write-protect the guest DMA-remapping structures.
Virtual Machines
Physical Memory
I/O Logical
Devices Processors
DMA CPU Memory
Remapping Virtualization
The VMM manages processor requests to access physical memory via the processor’s memory
management hardware. DMA requests to access physical memory use DMA-remapping hardware.
Both processor memory management and DMA memory management are under the control of the
VMM.
The interrupt-remapping hardware may be utilized by a Virtual Machine Monitor (VMM) to improve the
isolation of external interrupt requests across domains. For example, the VMM may utilize the
interrupt-remapping hardware to distinguish interrupt requests from specific devices and route them
to the appropriate VMs to which the respective devices are assigned. The VMM may also utilize the
interrupt-remapping hardware to control the attributes of these interrupt requests (such as
destination CPU, interrupt vector, delivery mode etc.).
Another example usage is for the VMM to use the interrupt-remapping hardware to disambiguate
external interrupts from the VMM owned inter-processor interrupts (IPIs). Software may enforce this
by ensuring none of the remapped external interrupts have attributes (such as vector number) that
matches the attributes of the VMM IPIs.
Interrupt remapping enables x2APICs to support the expanded APIC addressability for external
interrupts without requiring hardware changes to interrupt sources (such as I/OxAPICs and MSI/MSI-
X devices).
3 DMA Remapping
This chapter describes the hardware architecture for DMA remapping. The architecture envisions
DMA-remapping hardware to be implemented in Root-Complex components, such as the memory
controller hub (MCH) or I/O hub (IOH).
The isolation property of a domain is achieved by blocking access to its physical memory from
resources not assigned to it. Multiple isolated domains are supported in a system by ensuring that all
I/O devices are assigned to some domain (possibly a null domain), and that they can only access the
physical resources allocated to their domain. The DMA remapping architecture facilitates flexible
assignment of I/O devices to an arbitrary number of domains. Each domain has a view of physical
address space that may be different than the host physical address space. DMA remapping treats the
address specified in a DMA request as a DMA-Virtual Address (DVA). Depending on the software
usage model, the DMA virtual address space may be the same as the Guest-Physical Address (GPA)
space of the domain to which the I/O device is assigned, or a purely virtual address space defined by
software. In either case, DMA remapping transforms the address in a DMA request issued by an I/O
device to its corresponding Host-Physical Address (HPA).
For simplicity, this document refers to an address in a DMA request as a GPA and the translated
address as an HPA.
Figure 3-5 illustrates DMA address translation. I/O devices 1 and 2 are assigned to domains 1 and 2,
respectively. The software responsible for creating and managing the domains allocates system
physical memory for both domains and sets up the DMA address translation function. GPAs in DMA
requests initiated by devices 1 & 2 are translated to appropriate HPAs by the DMA-remapping
hardware.
10000h
CPU Assigned to
DMA Domain 1
Memory
Memory
Management
Management
DVA/GPA
Domain 2 1000h HPA = HPA = = 4000h
Device 2
2000h 3000h
Assigned to
0h Domain 2
Physical
Memory
The host platform may support one or more remapping hardware units. Each hardware unit supports
remapping DMA requests originating within its hardware scope. For example, a desktop platform may
expose a single remapping hardware unit that translates all DMA transactions at the memory
controller hub (MCH) component. A server platform with one or more core chipset components may
support independent translation hardware units in each component, each translating DMA requests
originating within its I/O hierarchy (such as a PCI Express root port). The architecture supports
configurations in which these hardware units may either share the same translation data structures
(in system memory) or use independent structures, depending on software programming.
The remapping hardware translates the address in a DMA request to host physical address (HPA)
before further hardware processing (such as address decoding, snooping of processor caches, and/or
forwarding to the memory controllers).
For hardware implementations supporting multiple PCI segment groups, the remapping architecture
requires hardware to expose independent remapping hardware units (at least one per PCI segment
group) for processing requests originating within the I/O hierarchy of each segment group.
For PCI Express devices, the source-id is the requester identifier in the PCI Express transaction layer
header. The requester identifier of a device, which is composed of its PCI Bus/Device/Function
number, is assigned by configuration software and uniquely identifies the hardware function that
initiated the request. Figure 3-6 illustrates the requester-id1 as defined by the PCI Express
Specification.
1
5 87 3 2 0
The following sections describe the data structures for mapping I/O devices to domains.
3.3.2 Root-Entry
The root-entry functions as the top level structure to map devices on a specific PCI bus to their
respective domains. Each root-entry structure contains the following fields:
• Present flag: The present field is used by software to indicate to hardware whether the root-
entry is present and initialized. Software may Clear the present field for root entries
corresponding to bus numbers that are either not present in the platform, or don’t have any
downstream devices attached. If the present field of a root-entry used to process a DMA request
is Clear, the DMA request is blocked, resulting in a translation fault.
• Context-entry table pointer: The context-entry table pointer references the context-entry
table for devices on the bus identified by the root-entry. Section 3.3.3 describes context entries in
further detail.
Section 9.1 illustrates the root-entry format. The root entries are programmed through the root-entry
table. The location of the root-entry table in system memory is programmed through the Root-entry
Table Address register. The root-entry table is 4KB in size and accommodates 256 root entries to
cover the PCI bus number space (0-255). In the case of a PCI device, the bus number (upper 8-bits)
encoded in a DMA transaction’s source-id field is used to index into the root-entry structure.
Figure 3-7 illustrates how these tables are used to map devices to domains.
1. For PCI Express devices supporting Alternative Routing-ID Interpretation (ARI), bits traditionally
used for the Device Number field in the Requester-id are used instead to expand the Function
Number field.
(Dev 0, Func 1)
(Bus 255) Root entry 255 (Dev 0, Func 0) Context entry 0
Context-entry Table Address Translation
for Bus N Structures for Domain A
(Bus N) Root entry N
Root-entry Table
Context entry 255
Address Translation
Structures for Domain B
Context entry 0
Context-entry Table
for Bus 0
3.3.3 Context-Entry
A context-entry maps a specific I/O device on a bus to the domain to which it is assigned, and, in
turn, to the address translation structures for the domain. The context entries are programmed
through the memory-resident context-entry tables. Each root-entry in the root-entry table contains
the pointer to the context-entry table for the corresponding bus number. Each context-entry table
contains 256 entries, with each entry representing a unique PCI device function on the bus. For a PCI
device, the device and function numbers (lower 8-bits) of a source-id are used to index into the
context-entry table.
• Address Space Root: The address-space-root field provides the host physical address of the
address translation structure in memory to be used for address-translating DMA requests
processed through the context-entry.
• Fault Processing Disable Flag: The fault-processing-disable field enables software to
selectively disable recording and reporting of remapping faults detected for DMA requests
processed through the context-entry.
Section 9.2 illustrates the exact context-entry format. Multiple devices may be assigned to the same
domain by programming the context entries for the devices to reference the same translation
structures, and programming them with the same domain identifier.
The architecture defines the following features for the multi-level page table structure:
• Super Pages
— The super-page field in page-table entries enables larger page allocations. When a page-table
entry with the super-page field set is encountered by hardware on a page-table walk, the
translated address is formed immediately by combining the page base address in the page-
table entry with the unused guest-physical address bits.
— The architecture defines super-pages of size 221, 230, 239 and 248. Implementations indicate
support for specific super-page sizes through the Capability register. Hardware
implementations may optionally support these super-page sizes.
• DMA Access Controls
— DMA access controls make it possible to control DMA accesses to specific regions within a
domain’s address space. These controls are defined through the Read and Write permission
fields.
— If hardware encounters a page-table entry with the Read field Clear as part of address-
translating a DMA read request, the request is blocked1.
— If hardware encounters a page-table entry with the Write field Clear as part of address
translating a DMA write request, the request is blocked.
1. Refer to Section 3.6.3 for handling Zero-Length DMA read requests to present pages without
read-permissions.
— If hardware encounters a page-table entry with either Read or Write field Clear as part of
address translating a Atomic Operation (AtomicOp) request, the request is blocked.
Figure 3-8 shows a multi-level (3-level) page-table structure with 4KB page mappings and 4KB page
tables. Figure 3-9 shows a 2-level page-table structure with 2MB super pages.
6 3 3 3 2 22 11
3 9 8 0 9 10 21 0
DMA with address bits
0s 63:39 validated to be 0s
12-bits
9-bits
9-bits
9-bits
+
<< 3
+ SP = 0
<< 3
4KB page
<< 3 + SP = 0
1
2 6 1 4KB page table
7 3 2 0
ASR + SP = 0
Context
Entry 4KB page table
,
,
4KB page table
6 3 3 3 2 22 11
3 9 8 0 9 10 21 0
DMA with address bits
0s 63:39 validated to be 0s
21-bits
9-bits
9-bits
+
<< 3
<< 3 + SP = 1
1
2 6 1 2MB page
7 3 2 0
ASR + SP = 0
Context
Entry 4KB page table
,
,
4KB page table
Figure 3-9. Example Multi-level Page Table (with 2MB Super Pages)
The AGAW indicates the number of levels of page-walk. Hardware implementations report the
supported AGAWs through the Capability register. Software must ensure that it uses an AGAW
supported by the underlying hardware implementation when setting up page tables for a domain.
2. If the address-width (AW) field programmed in the context-entry is not one of the AGAWs
supported by hardware (as reported through the SGAW field in the Capability register), the DMA
request is blocked.
3. The address of the DMA request is validated to be within the adjusted address width of the
domain to which the device is assigned. DMA requests attempting to access memory locations
above address (2X - 1) are blocked, where X is the AGAW corresponding to the address-width
programmed in the context-entry used to process this DMA request.
4. The adjusted address of the DMA request is translated through the multi-level page table
referenced by the context-entry. Based on the programming of the page-table entries1 (Super-
page, Read, Write attributes), either the adjusted address is successfully translated to a Host
Physical Address (HPA), or the DMA request is blocked.
5. For successful address translations, hardware performs the normal processing (address decoding,
etc.) of the DMA request as if it was targeting the translated HPA.
1. Software must ensure the DMA-remapping page tables are programmed not to remap regular
DMA requests to the interrupt address range (0xFEEx_xxxx). Hardware behavior is undefined for
DMA requests remapped to the interrupt address range.
2. When inserting a leaf page-table entry into the IOTLB, hardware caches the Read (R) and Write
(W) attributes as the logical AND of all the respective R and W fields encountered in the page walk
reaching up to this leaf entry.
3. For PCI Express, DMA requests with untranslated (default) address are identified by the Address
Type (AT) field value of 00b in the Transaction Layer Packet (TLP) header.
4. For PCI Express, a DMA read completion with an error status may cause hardware to generate a
PCI Express un-correctable, non-fatal (ERR_NONFATAL) error message.
Fault
Fault Conditions1 Qualified Behavior
Reason
The Present (P) field in the root-entry used to process the DMA
1h No
request is Clear.
The Present (P) field in the context-entry used to process the DMA
2h
request is Clear.
Hardware attempt to access the next level page table through the
7h
Address (ADDR) field of the page-table entry resulted in error.
1. Non-recoverable error conditions encountered by the remapping hardware are not treated as DMA-remapping
fault conditions. These are treated as platform hardware errors and reported through existing error reporting
methods such as NMI, MCA etc.
2. Requests to addresses beyond the maximum guest address width (MGAW) supported by hardware may be
reported through other means such as through PCI Express Advanced Error Reporting (AER) at a PCI Express
root port.
For remapping requests from devices behind PCI Express-to-PCI/PCI-X bridges, software must
consider the possibility of requests arriving with the source-id in the original PCI-X transaction or the
source-id provided by the bridge. Devices behind these bridges can only be collectively assigned to a
single domain. When setting up DMA-remapping structures for these devices, software must program
multiple context entries, each corresponding to the possible set of source-ids. Each of these context-
entries must be programmed identically to ensure the DMA requests with any of these source-ids are
processed identically.
Since the function number is part of the requester-id used to locate the context-entry for processing a
DMA request, when assigning PCI Express devices with phantom functions enabled, software must
program multiple context entries, each corresponding to the PhFN enabled for use by the device
function. Each of these context-entries must be programmed identically to ensure the DMA requests
with any of these requester-ids are processed identically.
Platforms supporting DMA remapping are expected to check for violations of the rule in one of the
following ways:
• The platform hardware checks for violations and explicitly blocks them. For PCI Express memory
requests, this may be implemented by hardware that checks for the condition at the PCI Express
receivers and handles violations as PCI Express errors. DMA requests from other devices (such as
Root-Complex integrated devices) that violate the rule (and hence are blocked by hardware) may
be handled in platform-specific ways. In this model, the remapping hardware units never receive
DMA requests that cross page boundaries.
• If the platform hardware cannot check for violations, the remapping hardware units must perform
these checks and re-map the requests as if they were multiple independent DMA requests.
DMA remapping hardware implementations are recommended to report ZLR field as Set and support
the associated hardware behavior.
Platform implementations supporting reserved memory must carefully consider the system software
and security implications of its usages. These usages are beyond the scope of this specification.
Platform hardware may use implementation-specific methods to distinguish accesses to system
reserved memory. These methods must not depend on simple address-based decoding since DMA
virtual addresses can indeed overlap with the host physical addresses of reserved system memory.
For platforms that cannot distinguish between DMA to OS-visible system memory and DMA to
reserved system memory, the architecture defines a standard reporting method to inform system
software about the reserved system memory address ranges and the specific devices that require
DMA access to these ranges for proper operation. Refer to Section 8.4 for details on the reporting of
reserved memory regions.
For legacy compatibility, system software is expected to setup identity mapping (with read and write
privileges) for these reserved address ranges, for the specified devices1. For these devices, the
system software is also responsible for ensuring that the DMA virtual addresses generated by the
system software do not overlap with the reserved system memory address ranges.
There may be other classes of differentiated DMA streams in the platform to support future usages.
Different classes of isochrony are typically implemented through traffic classification, Virtual Channels
(VC), and priority mechanisms. DMA remapping of critical isochronous traffic may require special
handling since the remapping process potentially adds additional latency to the isochronous paths.
Hardware recommendations and software requirements for efficient remapping of isochronous traffic
are described below. The classes of isochrony that need to be subject to these requirements are
platform-specific.
1. USB controllers and UMA integrated graphics devices are the only legacy device usages identified
that depend on DMA to reserved system memory.
• Root-Complex hardware may utilize dedicated resources to support remapping of DMA accesses
from isochronous devices. This may occur through dedicated remapping hardware units for
isochronous devices, or through reservation of hardware resources (such as entries in various
remapping caching structures) for remapping isochronous DMA.
• DMA-remapping units supporting isochronous traffic may pre-fetch and cache valid address
translations. Under normal operation, to maintain isochronous guarantees, software should avoid
invalidating mappings that are in-use by isochronous DMA requests active at the time of
invalidation. The Capability register of each DMA-remapping unit indicates to software if the
remapping unit manages critical isochronous DMA.
The DMA remapping architecture described in previous chapter supports address translation of DMA
requests received by the Root-Complex. Section 3.4.4 described the use of IOTLBs in these core logic
components to cache frequently used I/O page-table entries to improve the DMA address translation
performance. IOTLBs improve DMA remapping performance by avoiding the multiple memory
accesses required to access the I/O page-tables for DMA address translation. However, the efficiency
of IOTLBs in the core logic may depend on the address locality in DMA streams and the number of
simultaneously active DMA streams in the platform.
One approach to scaling IOTLBs is to enable I/O devices to participate in the DMA remapping with
IOTLBs implemented at the devices. The Device-IOTLBs alleviate pressure for IOTLB resources in the
core logic, and provide opportunities for devices to improve performance by pre-fetching address
translations before issuing DMA requests. This may be useful for devices with strict DMA latency
requirements (such as isochronous devices), and for devices that have large DMA working set or
multiple active DMA streams. Additionally, Device-IOTLBs may be utilized by devices to support
device-specific I/O page faults.
Refer to the Address Translation Services (ATS) specification from the PCI-SIG for extensions to PCI
Express to support Device-IOTLBs. DMA-remapping implementations report Device-IOTLB support
through the Extended Capability register.
— First and last DWORD byte enable (BE) fields not equal to 1111b.
• ATS ‘Invalidation Request’ message.
1. ATS specification requires Completer Abort (CA) to be returned for unsuccessful translation requests due
to remapping hardware error. Errors due to improper programming of remapping structures are treated
similarly and return CA.
Success
Hardware detected address in the translation request is to the interrupt address
range (0xFEEx_xxxx). The special handling to interrupt address range is to (R=0,W=1,U=1,
comprehend potential endpoint device ATS behavior of issuing translation requests S=0 in
to all of its memory transactions including its message signalled interrupt (MSI) translation
posted writes. completion data
entry)
Conditions where hardware could not find a translation for address specified in
translation request, or the requested translation lacked both read and write
permissions.
• The translation request had an address beyond (2X - 1), where X is the Success
minimum of the maximum guest address width (MGAW) reported through the
Capability register and the value in the address-width (AW) field of the context- (R=W=U=S=0,
entry used to process the DMA request. in translation
• Hardware found a not-present (R=W=0b) page-directory (non-leaf) entry along completion data
the page-walk for the address specified in the translation request, and hence entry)
could not complete page-walk.
• Hardware found the cumulative R and cumulative W bits to be both Clear along
the page-walk on behalf of the translation request.
Hardware successfully fetched the effective translations requested and it has at least
one of Read and Write permissions.
• The R and W bits in the translation completion data entry are the effective
(cumulative) permissions for this translation.
• The N and U bits are derived from the SNP and TM bits from the leaf page-table
entry for the translation. Success
• Translations corresponding to 4KB pages indicate S bit in the translation (with translation
completion data entry as 0. Translations corresponding to super-pages indicate completion data
S bit as Set with low-order bits of the translated address field encoded to entries per ATS
indicate the appropriate super-page size. See ATS specification for details on specification)
encoding.
• If U field is Clear, the address in the translation completion data entry is the
translated address. If the U field is Set, the address in the translation
completion data entry is not specified. The R and W fields returned in the
translation completion data entry are not affected by the value of U field.
• Translated writes to the interrupt address range (0xFEEx_xxxx) are treated as UR. Translated
(and untranslated) reads to the interrupt address range always return UR.
• Table 6 illustrates the hardware handling of various error conditions leading to a failed Translated
request. In these cases the request is blocked (handled as UR) and treated as remapping fault. A
fault condition is considered ‘qualified’ if its reported to software only when the Fault Processing
Disable (FPD) field in the context-entry used to process the faulting request is Clear.
• When none of the error conditions in Table 6 is detected, the Translated request is processed as
pass-through (i.e. bypasses address translation).
Hardware attempt to access the root-entry table through the Root Unsupported
Table Address (RTA) field in the Root-entry Table Address register 8h Request
resulted in error.
• Hardware starts an invalidation completion timer for this ITag, and issues the invalidation request
message to the specified endpoint. The invalidation completion time-out value is recommended to
be sufficiently larger than the PCI Express read completion time-outs.
5 Interrupt Remapping
This chapter discuss architecture and hardware details for interrupt remapping. The interrupt-
remapping architecture defined in this chapter is common for both Intel®64 and ItaniumTM
architectures.
5.1 Overview
The interrupt-remapping architecture enables system software to control and censor external
interrupt requests generated by all sources including those from interrupt controllers (I/OxAPICs),
MSI/MSI-X capable devices including endpoints, root-ports and Root-Complex integrated end-points.
Interrupts generated by the remapping hardware itself (Fault Event and Invalidation Completion
Events) are not subject to interrupt remapping.
Interrupt requests appear to the Root-Complex as upstream memory write requests to the interrupt-
address-range 0xFEEX_XXXXh. Since interrupt requests arrive at the Root-Complex as write requests,
interrupt-remapping is co-located with the DMA-remapping hardware units. The interrupt-remapping
capability is reported through the Extended Capability register.
hardware does not isolate message-signaled interrupt requests from individual devices behind
such bridges.
• Legacy pin interrupts
— For devices that use legacy methods for interrupt routing (such as either through direct wiring
to the I/OxAPIC input pins, or through INTx messages), the I/OxAPIC hardware generates the
interrupt-request transaction. To identify the source of interrupt requests generated by
I/OxAPICs, the interrupt-remapping hardware requires each I/OxAPIC in the platform
(enumerated through the ACPI Multiple APIC Descriptor Tables (MADT)) to include a unique
16-bit source-id in its requests. BIOS reports the source-id for these I/OxAPICs via ACPI
structures to system software. Refer to Section 8.3.1.1 for more details on I/OxAPIC identity
reporting.
• Other Message Signaled Interrupts
— For any other platform devices that are not PCI discoverable and yet capable of generating
message-signaled interrupt requests (such as the integrated High Precision Event Timer -
HPET devices), the platform must assign unique source-ids that do not conflict with any other
source-ids on the platform. BIOS must report the 16-bit source-id for these via ACPI
structures described in Section 8.3.1.2.
Address
3 2 1 1 1
1 0 9 2 1 5 4 3 2 1 0
0 XX
Don’t Care
Destination Mode
Redirection Hint
Interrupt Format (0b)
Reserved
Destination ID
Data FEEh
3 1 1 1 1 1 1
1 6 5 4 3 1 0 8 7 0
Vector
Delivery Mode
Reserved
Trigger Mode Level
Trigger Mode
Reserved
Address
2 1
31 0 9 5 4 3 2 1 0
1 XX
Don’t Care
HANDLE[15]
SHV
Interrupt Format
HANDLE [14:0]
FEEh
Data
1 1
31 6 5 0
0h
Table 7 describes the various address fields in the Remappable interrupt request format.
Interrupt DWORD DMA write request with value of FEEh in these bits are
31: 20
Identifier decoded as interrupt requests by the Root-Complex.
This field along with bit 2 provides a 16-bit Handle. The Handle is used
by interrupt-remapping hardware to identify the interrupt request. 16-
19: 5 Handle[14:0]
bit Handle provides 64K unique interrupt requests per interrupt-
remapping hardware unit.
4 Interrupt Format This field must have a value of 1b for Remappable format interrupts.
2 Handle[15] This field carries the most significant bit of the 16-bit Handle.
Table 8 describes the various data fields in the Remappable interrupt request format.
When SHV field in the interrupt request address is Set, this field treated
as reserved (0) by hardware.
31:16 Reserved
When SHV field in the interrupt request address is Clear, this field is
ignored by hardware.
When SHV field in the interrupt request address is Set, this field contains
the 16-bit Subhandle.
15:0 Subhandle
When SHV field in the interrupt request address is Clear, this field is
ignored by hardware.
For interrupt requests in Remappable format, the interrupt-remapping hardware computes the
‘interrupt_index’ as below. The Handle, SHV and Subhandle are respective fields from the interrupt
address and data per the Remappable interrupt format.
if (address.SHV == 0) {
interrupt_index = address.handle;
} else {
interrupt_index = (address.handle + data.subhandle);
}
The Interrupt Remap Table Address register is programmed by software to specify the number of
IRTEs in the Interrupt Remapping Table (maximum number of IRTEs in an Interrupt Remapping Table
is 64K). Remapping hardware units in the platform may be configured to share interrupt-remapping
table or use independent tables. The interrupt_index is used to index the appropriate IRTE in the
interrupt-remapping table. If the interrupt_index value computed is equal to or larger than the
number of IRTEs in the remapping table, hardware treats the interrupt request as error.
Unlike the Compatibility interrupt format where all the interrupt attributes are encoded in the
interrupt request address/data, the Remappable interrupt format specifies only the fields needed to
compute the interrupt_index. The attributes of the remapped interrupt request is specified through
the IRTE referenced by the interrupt_index.The interrupt-remapping architecture defines support for
hardware to cache frequently used IRTEs for improved performance. For usages where software may
need to dynamically update the IRTE, architecture defines commands to invalidate the IEC. Chapter 6
describes the caching constructs and associated invalidation commands.
• If Extended Interrupt Mode is enabled (EIME field in Interrupt Remapping Table Address
register is Set), or if the Compatibility format interrupts are disabled (CFIS field in the
Global Status register is Clear), the Compatibility format interrupts are blocked.
• Else, Compatibility format interrupts are processed as pass-through (bypasses interrupt-
remapping).
— Interrupt requests in the Remappable format (i.e. request with Interrupt Format field Set) are
subject to interrupt-remapping as follows:
• The reserved fields in the Remappable interrupt requests are checked to be zero. If the
reserved field checking fails, the interrupt request is blocked. Else, the Source-id, Handle,
SHV, and Subhandle fields are retrieved from the interrupt request.
• Hardware computes the interrupt_index per the algorithm described in Section 5.3.2.1.
The computed interrupt_index is validated to be less than the interrupt-remapping table
size configured in the Interrupt Remap Table Address register. If the bounds check fails,
the interrupt request is blocked.
• If the above bounds check succeeds, the IRTE corresponding to the interrupt_index value
is either retrieved from the Interrupt Entry Cache, or fetched from the interrupt-
remapping table. If the Coherent (C) field is reported as Clear in the Extended Capability
register, the IRTE fetch from memory will not snoop the processor caches. If the Present
(P) field in the IRTE is Clear, the interrupt request is blocked and treated as fault. If the
IRTE is present, hardware checks if the IRTE is programmed correctly. If an invalid
programming of IRTE is detected, the interrupt request is blocked.
• If the above checks are successful, hardware performs verification of the interrupt
requester per the programming of the SVT, SID and SQ fields in the IRTE as described in
Section 9.5. If the source-id checking fails, the interrupt request is blocked.
• If all of the above checks succeed, a remapped interrupt request is generated per the
programming of the IRTE fields1.
• Any of the above checks that result in interrupt request to be blocked is treated as a interrupt-
remapping fault condition. The interrupt-remapping fault conditions are enumerated in the
following section.
1. When forwarding the remapped interrupt request to the system bus, the ‘Trigger Mode Level’ field
in the interrupt request on the system bus is always set to “asserted” (1b).
The Present (P) field in the IRTE entry corresponding to the interrupt_index
22h Yes
of the interrupt request is Clear.
Address
3 2 1 1 1
1 0 9 2 1 4 3 2 1 0
XX
Don’t Care
Destination M ode
Redirection Hint
Extended
Destination ID
Destination ID
Data FEEh
3 1 1 1 1 1 1
1 6 5 4 3 1 0 8 7 0
Vector
Delivery M ode
Reserved
Trigger M ode Level
Trigger M ode
Reserved
• When interrupt-remapping hardware is enabled (IRES field Set in Global Status register), all
interrupt requests are remapped per the Remappable interrupt request format described in
Section 5.3.2.
The following sub-sections describes example programming for I/OxAPIC, MSI and MSI-X interrupt
sources to generate interrupts per the Remappable interrupt request format.
6 44 4 1 1 1 1 1 1 1 1
3 98 7 7 6 5 4 3 2 1 0 87 0
1 000
Vector
Interrupt_Index [15]
Delivery Status
Interrupt Polarity
Remote IRR
Trigger Mode
Mask
Reserved (0)
Interrupt Format (1b)
Interrupt_Index [14:0]
• The Interrupt_Index[14:0] is programmed in bits 63:49 of the I/OxAPIC RTE. The most
significant bit of the Interrupt_Index (Interrupt_Index[15]) is programmed in bit 11 of the
I/OxAPIC RTE.
• Bit 48 in the I/OxAPIC RTE is Set to indicate the Interrupt is in Remappable format.
• RTE bits 10:8 is programmed to 000b (Fixed) to force the SHV (SubHandle Valid) field as Clear in
the interrupt address generated.
• The Trigger Mode field (bit 15) in the I/OxAPIC RTE must match the Trigger Mode in the IRTE
referenced by the I/OxAPIC RTE. This is required for proper functioning of level-triggered
interrupts.
• For platforms using End-of-Interrupt (EOI) broadcasts, Vector field in the I/OxAPIC RTEs for level-
triggered interrupts (i.e. Trigger Mode field in I/OxAPIC RTE is Set, and Trigger Mode field in the
IRTE referenced by the I/OxAPIC RTE is Set), must match the Vector field programmed in the
referenced IRTE. This is required for proper processing of End-Of-Interrupt (EOI) broadcast by the
I/OxAPIC.
• Programing of all other fields in the I/OxAPIC RTE are not impacted by interrupt remapping.
Address
63 / 2 1
31 0 9 5 4 3 2 1 0
0FEEh 1 1 XX
Don’t Care
Interrupt_Index [15]
Data SHV (1)
Interrupt Form at (1)
31/ Interrupt_index[14:0]
15 0
0h
D a ta R e g is te r
3
1 8 7 0
0h
V e c to r
D e liv e ry M o d e
0 : F ix e d
1 : L o w e s t P rio rity
R e s e rv e d (0 )
A d d re s s R e g is te r
3 2 1 1 1
1 0 9 2 1 4 3 2 1 0
XX
D o n ’t C a re
D e s tin a tio n M o d e
R e d ire c tio n H in t
R e s e rv e d (0 )
D e s tin a tio n ID
(A P IC ID 7 :0 )
FEEh
U p p e r A d d re s s R e g is te r
3
1 0
R e s e rv e d (0 )
D a ta R e g is te r
3
1 8 7 0
0h
V e c to r
D e liv e ry M o d e
0 : F ix e d
1 : L o w e s t P rio rity
R e s e rv e d (0 )
A d d re s s R e g is te r
3 2 1 1 1
1 0 9 2 1 4 3 2 1 0
XX
D o n ’t C a re
D e s tin a tio n M o d e
R e d ire c tio n H in t
R e s e rv e d (0 )
D e s tin a tio n ID
(A P IC ID 7 :0 )
FEEh
U p p e r A d d re s s R e g is te r
3
1 8 7 0
R e s e rv e d (0 )
D e s tin a tio n ID
(A P IC ID 3 1 :8 )
1. Hardware support for x2APIC mode is reported through the EIM field in the Extended Capability
Register. x2APIC mode is enabled through the Interrupt Remapping Table Address Register.
D a ta R e g is te r
3
1 8 7 0
0h
V e c to r
D e liv e ry M o d e
R e s e rv e d (0 )
A d d re s s R e g is te r
3 2 1 1 1
1 0 9 2 1 4 3 2 1 0
XX
D o n ’t C a re
D e s tin a tio n M o d e
R e d ire c tio n H in t
E x t. D e s t. ID
D e s t. ID
FEEh
U p p e r A d d re s s R e g is te r
3
1 0
R e s e rv e d (0 )
Some existing platforms are known to use I/OxAPIC RTEs (Redirection Table Entries) to deliver SMI,
PMI and NMI events. There are at least two existing initialization approaches for such platform events
delivered through I/OxAPIC RTEs.
• Some existing platforms report to system software the I/OxAPIC RTEs connected to platform
event sources through ACPI, enabling system software to explicitly program/enable these RTEs.
Examples for this include, the 'NMI Source Reporting' structure in ACPI MADT (for reporting NMI
source), and 'Platform Interrupt Source' structure in ACPI MADT (for reporting PMI source in
ItaniumTM platforms).
• Alternatively, some existing platforms program the I/OxAPIC RTEs connected to specific platform
event sources during BIOS initialization, and depend on system software to explicitly preserve
these RTEs in the BIOS initialized state. (For example, some platforms are known to program
specific I/OxAPIC RTE for SMI generation through BIOS before handing control to system
software, and depend on system software preserving the RTEs pre-programmed with SMI delivery
mode).
On platforms supporting interrupt-remapping, delivery of SMI, PMI and NMI events through I/OxAPIC
RTEs require system software programming the respective RTEs to be properly remapped through the
Interrupt Remapping Table. To avoid this management burden on system software, platforms
supporting interrupt remapping are highly recommended to avoid delivering platform events through
I/OxAPIC RTEs, and instead deliver them through dedicated pins (such as the processor’s xAPIC
LINTn input) or through alternative platform-specific messages.
This chapter describes the architectural behavior associated with the following hardware caches and
the associated invalidation operations:
• Context-cache
• IOTLB and PDE (Page Directory Entry) Cache
• Interrupt Entry Cache (IEC)
• Device-IOTLB
For implementations reporting Caching Mode (CM) as Set in the Capability register, above conditions
may cause caching of the entry that resulted in the fault1.
Since information from the present context-entries (such as domain-id) may be utilized to tag the
IOTLB and the page-directory caches, to ensure the updates are visible to hardware, software must
invalidate the IOTLB (domain-selectively or globally) after the context-cache invalidation is
completed.
For implementations reporting Caching Mode (CM) as Clear in the Capability register, IOTLB caches
only valid mappings (i.e. results of successful page-walks with effective translations that has at least
one of the cumulative Read and Write permissions from the page-walk being Set). Specifically, if any
of the following conditions are encountered, the results are not cached in the IOTLB:
• Conditions listed in Section 6.1.1.
• Attempt to access the page directory/table through the ASR field in the context-entry or the
ADDR field of the previous page-directory entry in the page walk resulted in error.
• Read (R) and Write (W) fields of a page directory/table entry encountered in a page-walk is Clear
(not-present entry).
• Invalid programming of one or more fields in the present page directory/table entry.
• One or more non-zero reserved fields in the present page directory/table entry.
• The cumulative read and write permissions from the page-walk was both Clear (effectively not-
present entry).
For implementations reporting Caching Mode (CM) as Set in the Capability register, above conditions
may cause caching of erroneous or not-present mappings in the IOTLB.
For implementations reporting Caching Mode (CM) as Clear in the Capability register, if any of the
following fault conditions are encountered as part of accessing a page-directory entry, the resulting
entry is not cached in the non-leaf caches.
• Conditions listed in Section 6.1.1.
• Attempt to access the page-directory through either the ASR field of context-entry (in case of root
page directory), or the ADDR field of the previous page directory entry in the page-walk resulted
in error.
• Read (R) and Write (W) fields of the page-directory entry is Clear (not present entry).
• Invalid programming of one ore more fields in the present page directory entry.
• One or more non-zero reserved fields in the present page directory entry.
• For implementations caching partial-cumulative permissions in the PDE-caches, the partial
cumulative read and write permissions from the page-walk till the relevant page-directory entry
are both Clear (effectively not-present).
For implementations reporting Caching Mode (CM) as Set in the Capability register, above conditions
may cause caching of the corresponding page-directory entries.
1. If a fault was detected without a present context-entry, the reserved domain-id value of 0 is used
to tag the cached faulting entry.
For implementations reporting Caching Mode (CM) as Clear in the Capability register, if any of the
interrupt-remapping fault conditions described in Section 5.3.3.1 is encountered, the resulting entry
is not cached in the IEC.
For implementations reporting Caching Mode (CM) as Set in the Capability register, interrupt-
remapping fault conditions may cause caching of the corresponding interrupt remapping entries.
All hardware implementations are required to support the register based invalidation interface.
Implementations report the queued invalidation support through the Extended Capability Register.
The following sub-sections describe the invalidation command registers. Hardware implementations
must handle commands through these registers irrespective of remapping hardware enable status (i.e
irrespective of TES and IES status in the Global Status register).
When modifying root or context entries referenced by more than one remapping hardware units in a
platform, software is responsible to explicitly invalidate the context-cache at each of these hardware
units.
• Global Invalidation: All entries in the IOTLB are invalidated through a global invalidate.
• Domain-Selective Invalidation: Entries in the IOTLB that corresponds to a specific domain
(specified by the domain-id) are invalidated through a domain-selective invalidate.
• Page-Selective Invalidation: Entries in the IOTLB that corresponds to the specified DMA
address(es) of a domain are invalidated through a page-selective invalidate.
When modifying page-table entries referenced by more than one remapping hardware units in a
platform, software is responsible to explicitly invalidate the IOTLB at each of these hardware units.
The queued invalidations is expected to be most beneficial for the following software usages:
• Usages that frequently map and un-map I/O buffers in the remapping page-tables (causing
frequent invalidations).
• Usages where page-selective operation requests frequently span pages that are not virtually
contiguous.
• Usages where software can overlap processing with invalidations. (i.e., Usages where software
does not always block while an invalidation operation is pending in hardware).
• Invalidation operations that are latency prone (such as invalidating Device-IOTLBs on a device
across the PCI Express interconnect).
• Usages of VMM virtualizing remapping hardware, where VMM may improve the virtualization
performance by allowing guests to queue invalidation requests (instead of intercepting guest
MMIO accesses for each invalidation request as required by the register based interface).
The queued invalidation interface uses a Invalidation Queue (IQ) which is a circular buffer in system
memory. Software submits commands by writing Invalidation Descriptors to the IQ. The following
registers are defined to configure and manage the IQ:
• Invalidation Queue Address Register: Software programs this register to configure the base
address and size of the contiguous memory region in system memory hosting the Invalidation
Queue.
• Invalidation Queue Head Register: This register points to the invalidation descriptor in the IQ that
hardware will process next. The Invalidation Queue Head register is incremented by hardware
after fetching a valid descriptor from the IQ. Hardware interprets the IQ as empty when the head
and tail registers are equal.
• Invalidation Queue Tail Register: This register points to the invalidation descriptor in the IQ to be
written next by software. Software increments this register after writing one or more invalidation
descriptors to the IQ.
When the queued invalidation is enabled, software must submit invalidation commands only through
the IQ (and not through any invalidation command registers).
Hardware fetches descriptors from the IQ in FIFO order starting from the Head register whenever all
of the following conditions are true. This is independent of the remap hardware enable status (state of
TES and IES fields in Global Status Register).
• QIES field in the Global Status register is Set (indicating queued invalidation is enabled)
• IQ is not empty (i.e. Head and Tail pointer registers are not equal)
• There is no pending Invalidation Queue Error or Invalidation Time-out Error (IQE and ITE fields in
the Fault Status Register are both Clear)
Hardware implementations may fetch one or more descriptors together. However, hardware must
increment the Invalidation Queue Head register only after verifying the fetched descriptor to be valid.
Hardware handling of invalidation queue errors are described in Section 6.2.2.7.
The following subsections describe the various Invalidation Descriptors. All descriptors are 128-bit
sized. Type field (bits 3:0) of each descriptor identifies the descriptor type. Software must program
the reserved fields in the descriptors as zero.
1
2 6
7 4
Rsvd
6 544 4 33 1 1
3 098 7 21 6 5 6 5 4 3 0
Rsvd F
Source-ID Domain-ID Rsvd G 01h
M
The Context Cache Invalidate Descriptor (cc_inv_dsc) allows software to invalidate the context cache.
The descriptor includes the following parameters:
• Granularity (G): The G field indicates the requested invalidation granularity. The encoding of the
G field is same as the CIRG field in the Context Command register (described in Section 10.4.7).
Hardware implementations may perform coarser invalidation than the granularity requested.
• Domain-ID (DID): For domain-selective and device-selective invalidations, the DID field indicates
the target domain-id.
• Source-ID (SID): For device-selective invalidations, the SID field indicates the target device-id.
• Function Mask (FM): The Function Mask field indicates the bits of the SID field to be masked for
device-selective invalidations. The usage and encoding of the FM field is same as the FM field
encoding in the Context Command register.
Since information from the context-cache may be used to tag the IOTLB, software must always follow
a context-cache invalidation with an IOTLB invalidation.
1
2 7 7 7 7 6 6
7 6 5 1 0 9 4
6 3 3 1 1
3 2 1 6 5 8 7 6 5 4 3 0
Rsvd D D
DID Rsvd G 02h
R W
The IOTLB Invalidate Descriptor (iotlb_inv_dsc) allows software to invalidate the IOTLB and PDE-
cache. The descriptor includes the following parameters:
• Granularity (G): The G field indicates the requested invalidation granularity (global, domain-
selective or page-selective). The encoding of the G field is same as the IIRG field in the IOTLB
Invalidate register (described in Section 10.4.8). Hardware implementations may perform coarser
invalidation than the granularity requested.
• Drain Reads (DR): Software sets this flag to indicate hardware must drain DMA read requests that
are already processed by the remapping hardware, but queued within the Root-Complex to be
completed. When this flag is Set, hardware must perform the DMA read drain before the next
Invalidation Wait Descriptor (described in Section 6.2.2.5) is completed. Section 6.3 describes
hardware support for DMA draining.
• Drain Writes (DW): Software sets this flag to indicate hardware must drain relevant DMA write
requests that are already processed by the remapping hardware, but queued within the Root-
Complex to be completed. When this flag is Set, hardware must drain the relevant DMA writes
before the next Invalidation Wait Descriptor is completed. Section 6.3 describes hardware support
for DMA draining.
• Domain-ID (DID): For domain-selective and page-selective invalidations, the DID field indicates
the target domain-id. Hardware ignores bits 31:(16+N), where N is the domain-id width reported
in the Capability register. This field is ignored by hardware for global invalidations.
• Invalidation Hint (IH): For page-selective invalidations, the Invalidation Hint specifies if the
corresponding entries in the PDE-cache needs to be invalidated or not. For software usages that
updates only the leaf PTEs, the PDE-cache can be preserved by specifying the Invalidation Hint.
The encoding for the IH field is same as the IH field encoding in the IOTLB Address register(s).
This field is ignored by hardware for global and domain-selective invalidations.
• Address (ADDR): For page-selective invalidations, the Address field indicates the starting page
address of the mapping(s) that needs to be invalidated. Hardware ignores bits 127:(64+N),
where N is the maximum guest address width (MGAW) supported. This field is ignored by
hardware for global and domain-selective invalidations.
• Address Mask (AM): For page-selective invalidations, the Address Mask specifies the number of
contiguous pages that needs to be invalidated. The encoding for the AM field is same as the AM
field encoding in the Invalidate Address register (refer Section 10.4.8.2).
1
2 7 7 6 6
7 6 5 5 4
6 44 33 22 1 1
3 87 21 10 6 5 4 3 0
MaxInvs
Rsvd SID Rsvd Rsvd 03h
Pend
The Device-IOTLB Invalidate Descriptor (dev_iotlb_inv_dsc) allows software to invalidate the Device-
IOTLB on a PCI Express endpoint device. The descriptor includes the following parameters:
• Source-ID (SID): The SID field indicates the requestor-id of the PCI Express endpoint device
whose Device-IOTLB needs to be invalidated.
• Address (ADDR): The address field indicates the starting untranslated address for the mapping(s)
that needs to be invalidated. The Address field is qualified by the S field.
• Size (S): The size field indicates the number of consecutive pages targeted by this invalidation
request. If S field is zero, a single page at page address specified by Address [63:12] is requested
to be invalidated. If S field is Set, the least significant bit in the Address field with value 0b
indicates the invalidation address range. For example, if S field is Set and Address[12] is Clear, it
indicates an 8KB invalidation address range with base address in Address [63:13]. If S field and
Address[12] is Set and bit 13 is Clear, it indicates a 16KB invalidation address range with base
address in Address [63:14], etc.
• Max Invalidations Pending (MaxInvsPend): This field is a hint to hardware to indicate the
maximum number of pending invalidation requests the specified PCI Express endpoint device can
handle optimally. All devices are required to support up to 32 pending invalidation requests, but
the device may put back pressure on the PCI Express link for multiple pending invalidations
beyond MaxInvsPend. A value of 0h in MaxInvsPend field indicates the device is capable of
handling maximum (32) pending invalidation requests without throttling the link. Hardware
implementations may utilize this field to throttle the number of pending invalidation requests
issued to the specified device.
Since translation requests from a device may be serviced by hardware from the IOTLB, software must
always request IOTLB invalidation before requesting corresponding Device-IOTLB invalidation.
1
2 6
7 4
Rsvd
6 4 4 3 3 22
3 8 7 2 1 76 5 4 3 0
The Interrupt Entry Cache Invalidate Descriptor (iec_inv_dsc) allows software to invalidate the
Interrupt Entry Cache. The descriptor includes the following parameters:
• Granularity (G): This field indicates the granularity of the invalidation request. If Clear, a global
invalidation of the interrupt-remapping cache is requested. If Set, a index-selective invalidation is
requested.
• Interrupt Index (IIDX): This field specifies the index of the interrupt remapping entry that needs
to be invalidated through a index-selective invalidation.
• Index Mask (IM): For index-selective invalidations, the index-mask specifies the number of
contiguous interrupt indexes that needs to be invalidated. The encoding for the IM field is
described below in Table 10).
0 None 1
1 0 2
2 1:0 4
3 2:0 8
4 3:0 16
As part of IEC invalidation, hardware must drain interrupt requests that are already processed by the
remapping hardware, but queued within the Root-Complex to be delivered to the processor.
Section 6.4 describes hardware support for interrupt draining.
1
2 6 6 6
7 6 5 4
6 3 3
3 2 1 7 6 5 4 3 0
F S I
Status D ata R svd 05h
N W F
The Invalidation Wait Descriptor (inv_wait_dsc) descriptor allows software to synchronize with
hardware for the invalidation request descriptors submitted before the wait descriptor. The descriptor
includes the following parameters:
• Status Write (SW): Indicate the invalidation wait descriptor completion by performing a coherent
DWORD write of the value in the Status Data field to the address specified in the Status Address
field.
• Status Address and Data: Status address and data is used by hardware to perform wait descriptor
completion status write when the SW field is Set. Hardware behavior is undefined if the Status
Address specified is not an address route-able to memory (such as peer address, interrupt
address range of 0xFEEX_XXXX etc.). The Status Address and Data fields are ignored by
hardware when the Status Write (SW) field is Clear.
• Interrupt Flag (IF): Indicate the invalidation wait descriptor completion by generating an
invalidation completion event per the programming of the Invalidation Completion Event
registers. Section 6.2.2.6 describes details on invalidation event generation.
• Fence Flag (FN): Indicate descriptors following the invalidation wait descriptor must be processed
by hardware only after the invalidation wait descriptor completes.
The following logic applies for interrupts held pending by hardware in the IP field:
• If IP field was Set when software clears the IM field, the fault event interrupt is generated along
with clearing the IP field.
• If IP field was Set when software services the pending interrupt condition (indicated by IWC field
in the Invalidation Completion Status register being Clear), the IP field is cleared.
The invalidation completion event interrupt must push any in-flight invalidation completion status
writes, including status writes that may have originated from the same inv_wait_dsc for which the
interrupt was generated. Similarly, read completions due to software reading any of the remapping
hardware registers must push (commit) any in-flight invalidation completion event interrupts and
status writes generated by the respective hardware unit.
The invalidation completion event interrupts are never subject to interrupt remapping.
A DMA write request to system memory is considered drained when the effects of the write are visible
to processor accesses to addresses targeted by the DMA write request. A DMA read request to system
memory is considered drained when the Root-Complex has finished fetching all of its read response
data from memory.
completed). For IOTLB invalidations submitted through the queued invalidation interface, DMA
draining must be completed before the next Invalidation Wait Descriptor (inv_wait_dsc) is
completed by hardware.
— For global IOTLB invalidation requests specifying DMA read/write draining, all non-committed
DMA read/write requests queued within the Root-Complex are drained.
— For domain-selective IOTLB invalidation requests specifying DMA read/write draining,
hardware only guarantees draining of non-committed DMA read/write requests to the domain
specified in the invalidation request.
— For page-selective IOTLB invalidation requests specifying DMA read/write draining, hardware
only guarantees draining of non-committed DMA read/write requests with untranslated
address overlapping the address range specified in the invalidation request and to the
specified domain.
Hardware maintains an internal index to reference the Fault Recording register in which the next fault
can be recorded. The index is reset to zero when both DMA and interrupt remapping is disabled (TES
and IES fields Clear in Global Status register), and increments whenever a fault is recorded in a Fault
Recording register. The index wraps around from N-1 to 0, where N is the number of fault recording
registers supported by the remapping hardware unit.
Hardware maintains the Primary Pending Fault (PPF) field in the Fault Status register as the logical
“OR” of the Fault (F) fields across all the Fault Recording registers. The PPF field is re-computed by
hardware whenever hardware or software updates the F field in any of the Fault Recording registers.
When primary fault recording is active, hardware functions as follows upon detecting a remapping
fault:
• Hardware checks the current value of the Primary Fault Overflow (PFO) field in the Fault Status
register. If it is already Set, the new fault is not recorded.
• If hardware supports compression1 of multiple faults from the same requester, it compares the
source-id (SID) field of each Fault Recording register with Fault (F) field Set, to the source-id of
the currently faulted request. If the check yields a match, the fault information is not recorded.
• If the above check does not yield a match (or if hardware does not support compression of
faults), hardware checks the Fault (F) field of the Fault Recording register referenced by the
internal index. If that field is already Set, hardware sets the Primary Fault Overflow (PFO) field in
the Fault Status register, and the fault information is not recorded.
• If the above check indicates there is no overflow condition, hardware records the current fault
information in the Fault Recording register referenced by the internal index. Depending on the
current value of the PPF field in the Fault Status register, hardware performs one of the following
steps:
— If the PPF field is currently Set (implying there are one or more pending faults), hardware sets
the F field of the current Fault Recording register and increments the internal index.
— Else, hardware records the internal index in the Fault Register Index (FRI) field of the Fault
Status register and sets the F field of the current Fault Recording register (causing the PPF
field also to be Set). Hardware increments the internal index, and an interrupt may be
generated based on the hardware interrupt generation logic described in Section 7.3.
Software is expected to process the faults reported through the fault recording registers in a circular
FIFO fashion starting from the Fault Recording register referenced by the Fault Recording Index (FRI)
field, until it finds a fault recording register with no faults (F field Clear).
To recover from a primary fault overflow condition, software must first process the pending faults in
each of the Fault Recording registers, Clear the Fault (F) field in all those registers, and Clear the
overflow status by writing a 1 to the Primary Fault Overflow (PFO) field. Once the PFO field is cleared
by software, hardware continues to record new faults starting from the Fault Recording register
referenced by the current internal index.
Advanced fault logging uses a memory-resident fault log to record fault information. The base and
size of the memory-resident fault log region are programmed by software through the Advanced Fault
Log register. Advanced fault logging must be enabled by software through the Global Command
register before enabling the remapping hardware. Section 9.4 illustrates the format of the fault
record.
When advanced fault recording is active, hardware maintains an internal index into the memory-
resident fault log where the next fault can be recorded. The index is reset to zero whenever software
programs hardware with a new fault log region through the Global Command register, and increments
whenever a fault is logged in the fault log. Whenever the internal index increments, hardware checks
for internal index wrap-around condition based on the size of the current fault log. Any internal state
used to track the index wrap condition is reset whenever software programs hardware with a new
fault log region.
Hardware may compress multiple back-to-back faults from the same DMA requester by maintaining
internally the source-id of the last fault record written to the fault log. This internal “source-id from
previous fault” state is reset whenever software programs hardware with a new fault log region.
Read completions due to software reading the remapping hardware registers must push (commit) any
in-flight fault record writes to the fault log by the respective remapping hardware unit.
When a DMA-remapping fault is detected, hardware advanced fault logging functions as follows:
• Hardware checks the current value of the Advanced Fault Overflow (AFO) field in the Fault Status
register. If it is already Set, the new fault is not recorded.
• If hardware supports compressing multiple back-to-back faults from same requester, it compares
the source-id of the currently faulted DMA request to the internally maintained “source-id from
previous fault”. If a match is detected, the fault information is not recorded.
• Otherwise, if the internal index wrap-around condition is Set (implying the fault log is full),
hardware sets the AFO field in the Advanced Fault Log register, and the fault information is not
recorded.
• If the above step indicates no overflow condition, hardware records the current fault information
to the fault record referenced by the internal index. Depending on the current value of the
Advanced Pending Fault (APF) field in the Fault Status register and the value of the internal index,
hardware performs one of the following steps:
— If APF field is currently Set, or if the current internal index value is not zero (implying there
are one or more pending faults in the current fault log), hardware simply increments the
internal index (along with the wrap-around condition check).
— Otherwise, hardware sets the APF field and increments the internal index. An interrupt may
be generated based on the hardware interrupt generation logic described in Section 7.3.
For these conditions, the Fault Event interrupt generation hardware logic functions as follows:
• Hardware checks if there are any previously reported interrupt conditions that are yet to be
serviced by software. Hardware performs this check by evaluating if any of the PPF1, PFO, (APF,
AFO if advanced fault logging is active), IQE, ICE and ITE fields in the Fault Status register is Set.
If hardware detects any interrupt condition yet to be serviced by software, the Fault Event
interrupt is not generated.
• If the above check indicates no interrupt condition yet to be serviced by software, the Interrupt
Pending (IP) field in the Fault Event Control register is Set. The Interrupt Mask (IM) field is then
checked and one of the following conditions is applied:
— If IM field is Clear, the fault event is generated along with clearing the IP field.
1. The PPF field is computed by hardware as the logical OR of Fault (F) fields across all the Fault
Recording Registers of a hardware unit.
The following logic applies for interrupts held pending by hardware in the IP field:
• If IP field was Set when software clears the IM field, the fault event interrupt is generated along
with clearing the IP field.
• If IP field was Set when software services all the pending interrupt conditions (indicated by all
status fields in the Fault Status register being Clear), the IP field is cleared.
Read completions due to software reading any of the remapping hardware registers must push
(commit) any in-flight interrupt messages generated by the respective hardware unit.
8 BIOS Considerations
The system BIOS is responsible for detecting the remapping hardware functions in the platform and
for locating the memory-mapped remapping hardware registers in the host system address space.
The BIOS reports the remapping hardware units in a platform to system software through the DMA
Remapping Reporting (DMAR) ACPI table described below.
Byte Byte
Field Description
Length Offset
Revision 1 8 1
OEMID 6 10 OEM ID
OEM Revision 4 24 OEM Revision of DMAR Table for OEM Table ID.
Byte Byte
Field Description
Length Offset
Value Description
Reserved for future use. For forward compatibility, software skips structures it does not
>3
comprehend by skipping the appropriate number of bytes indicated by the Length field.
BIOS implementations must report these remapping structure types in numerical order. i.e., All
remapping structures of type 0 (DRHD) enumerated before remapping structures of type 1 (RMRR),
and so forth.
Byte Byte
Field Description
Length Offset
Bit 0: INCLUDE_PCI_ALL
• If Set, this remapping hardware unit has under
its scope all PCI compatible devices in the
specified Segment, except devices reported
under the scope of other remapping hardware
units for the same Segment. If a DRHD structure
with INCLUDE_PCI_ALL flag Set is reported for a
Segment, it must be enumerated by BIOS after
all other DRHD structures for the same
Flags 1 4 Segment1. A DRHD structure with
INCLUDE_PCI_ALL flag Set may use the ‘Device
Scope’ field to enumerate I/OxAPIC and HPET
devices under its scope.
• If Clear, this remapping hardware unit has under
its scope only devices in the specified Segment
that are explicitly identified through the ‘Device
Scope’ field.
Bits 1-7: Reserved.
1. On platforms with multiple PCI segments, any of the segments can have a DRHD structure with
INCLUDE_PCI_ALL flag Set.
In this section, the generic term ‘PCI’ is used to describe conventional PCI, PCI-X, and PCI-Express
devices. Similarly, the term ‘PCI-PCI bridge’ is used to refer to conventional PCI bridges, PCI-X
bridges, PCI Express root ports, or downstream ports of a PCI Express switch.
A PCI sub-hierarchy is defined as the collection of PCI controllers that are downstream to a specific
PCI-PCI bridge. To identify a PCI sub-hierarchy, the Device Scope Entry needs to identify only the
parent PCI-PCI bridge of the sub-hierarchy.
Byte Byte
Field Description
Length Offset
Byte Byte
Field Description
Length Offset
1. An HPTE Timer Block is capable of MSI interrupt generation if any of the Timers in the Timer Block reports
FSB_INTERRUPT_DELIVERY capability in the Timer Configuration and Capability Registers. HPET Timer
Blocks not capable of MSI interrupt generation (and instead have their interrupts routed through I/OxAPIC)
are not reported in the Device Scope.
The following pseudocode describes how to identify the device specified through a Device Scope
structure:
For platforms reporting interrupt remapping capability (INTR_REMAP flag Set in the DMAR structure),
each I/OxAPIC in the platform reported through ACPI MADT must be explicitly enumerated under the
Device Scope of the appropriate remapping hardware units.
• For I/OxAPICs that are PCI-discoverable, the source-id for such I/OxAPICs (computed using the
above pseudocode from its Device Scope structure) must match its PCI requester-id effective at
the time of boot.
• For I/OxAPICs that are not PCI-discoverable:
— If the ‘Path’ field in Device Scope has a size of 2 bytes, the corresponding I/OxAPIC is a Root-
Complex integrated device. The ‘Start Bus Number’ and ‘Path’ field in the Device Scope
structure together provides the unique 16-bit source-id allocated by the platform for the
I/OxAPIC. Examples are I/OxAPICs integrated to the IOH and south bridge (ICH)
components.
— If the ‘Path’ field in Device Scope has a size greater than 2 bytes, the corresponding I/OxAPIC
is behind some software visible PCI-PCI bridge. In this case, the ‘Start Bus Number’ and ‘Path’
field in the Device Scope structure together identifies the PCI-path to the I/OxAPIC device.
Bus rebalancing actions by system software modifying bus assignments of the device’s parent
bridge impacts the bus number portion of device’s source-id. Examples are I/OxAPICs in PCI-
Express-to-PCI-X bridge components in the platform.
Figure 8-23 illustrates a platform configuration with a single PCI segment and host bridge (with a
starting bus number of 0), and supporting four remapping hardware units as follows:
1. Remapping hardware unit #1 has under its scope all devices downstream to the PCI Express root
port located at (dev:func) of (14:0).
2. Remapping hardware unit #2 has under its scope all devices downstream to the PCI Express root
port located at (dev:func) of (14:1).
3. Remapping hardware unit #3 has under its scope a Root-Complex integrated endpoint device
located at (dev:func) of (29:0).
4. Remapping hardware unit #4 has under its scope all other PCI compatible devices in the platform
not explicitly under the scope of the other remapping hardware units. In this example, this
includes the integrated device at (dev:func) at (30:0), and all the devices attached to the south
bridge component. The I/OxAPIC in the platform (I/O APICID = 0) is under the scope of this
remapping hardware unit, and has a BIOS assigned bus/dev/function number of (0,12,0).
Processor Processor
System Bus
This platform requires 4 DRHD structures. The Device Scope fields in each DRHD structure are
described as below.
• Device Scope for remapping hardware unit #1 contains only one Device Scope Entry, identified as
[2, 8, 0, 0, 0, 14, 0].
— System Software uses the Entry Type field value of 0x02 to conclude that all devices
downstream of the PCI-PCI bridge device at PCI Segment 0, Bus 0, Device 14, and Function 0
are within the scope of this remapping hardware unit.
• Device Scope for remapping hardware unit #2 contains only one Device Scope Entry, identified as
[2, 8, 0, 0, 0, 14, 1].
— System Software uses the Entry Type field value of 0x02 to conclude that all devices
downstream of the PCI-PCI bridge device at PCI Segment 0, Bus 0, Device 14, and Function 1
are within the scope of this remapping hardware unit.
• Device Scope for remapping hardware unit #3 contains only one Device Scope Entry, identified as
[1, 8, 0, 0, 0, 29, 0].
— System software uses the Type field value of 0x1 to conclude that the scope of remapping
hardware unit #3 includes only the endpoint device at PCI Segment 0, Bus 0, Device 29 and
Function 0.
• Device Scope for remapping hardware unit #4 contains only one Device Scope Entry, identified as
[3, 8, 0, 1, 0, 12, 0]. Also, the DHRD structure for remapping hardware unit #4 indicates the
INCLUDE_PCI_ALL flag. This hardware unit must be the last in the list of hardware unit definition
structures reported.
— System software uses the INCLUDE_PCI_ALL flag to conclude that all PCI compatible devices
that are not explicitly enumerated under other remapping hardware units are in the scope of
remapping unit #4. Also, the Device Scope Entry with Type field value of 0x3 is used to
conclude that the I/OxAPIC (with I/O APICID=0 and source-id of [0,12,0]) is under the scope
of remapping hardware unit #4.
For platforms supporting remapping hardware, BIOS implementations should avoid allocating BARs of
otherwise independent devices/functions in the same system-base-page-sized region.
The RMRR regions are expected to be used only for USB and UMA Graphics legacy usages for reserved
memory. Platform designers must avoid or limit reserved memory regions since these require system
software to create holes in the DMA virtual address range available to system software and its drivers.
Byte Byte
Field Description
Length Offset
Reserved 2 4 Reserved.
Reserved Memory
8 8 Base address of 4KB-aligned reserved memory region.
Region Base Address
Byte Byte
Field Description
Length Offset
Byte Byte
Field Description
Length Offset
The ACPI DMAR static tables and sub-tables defined in previous sections enumerate the remapping
hardware units present at platform boot-time. Following sections illustrates the ACPI methods for
dynamic updates to remapping hardware resources, such as on I/O hub hot-plug. Following sections
assume familiarity with ACPI 3.0 specification and system software support for host-bridge hot-plug.
GUID
D8C1A3A6-BE9B-4C9B-91BF-C3CB81FC5DAF
The _DSM method would be located under the ACPI device scope where the platform wants to expose
the remapping hardware units. For example, ACPI name-space includes representation for hot-
pluggable I/O hubs in the system as a ACPI host bridges. For Remapping hardware units implemented
in I/O hub component, the _DSM method would be under the respective ACPI host bridge device.
Function
Description
Index
1. Reserved Memory Region Reporting (RMRR) structures are not reported via _DSM, since use of reserved
memory regions are limited to legacy devices (USB, iGFX etc.) that are not applicable for hot-plug.
1. Invoking the _DSM method does not modify the static DMAR tables. System software must
maintain the effective DMAR information comprehending the initial DMAR table reported by the
platform, and any remapping hardware units added or removed via _DSM upon host bridge hot-
add or hot-remove.
This chapter describes the memory-resident structures for DMA and interrupt remapping.
9.1 Root-entry
The following figure and table describe the root-entry.
1
2 6
7 4
Reserved (0)
6 1 1
3 HAW HAW -1 2 1 1 0
P
Reserved (0)
CTP
Reserved (0)
9.2 Context-entry
The following figure and table describe the context-entry.
1
2 8 8 7 7 7 6 6 6
7 8 7 2 1 0 7 6 4
AW
A V A IL
R e se rve d (0 )
D ID
R e se rve d (0 )
6 1 1
3 HAW H A W -1 2 1 6 5 4 3 2 1 0
P
FPD
T
EH
ALH
R e se rv e d (0 )
ASR
R e se rv e d (0 )
DID: Domain Context-entries programmed with the same domain identifier must
87:72
Identifier always reference the same address translation structure (through the
ASR field). Similarly, context-entries referencing the same address
translation structure must be programmed with the same domain id.
This field is evaluated by hardware only when the Present (P) field is
Set.
When Caching Mode (CM) field is reported as Set, the domain-id value
of zero is architecturally reserved. Software must not use domain-id
value of zero when CM is Set.
This field is evaluated by hardware only when the Present (P) field is
Set.
This field may be Set by software to change the IOTLB default eviction
policy in hardware.
• 0: The default eviction policy of hardware is in effect for IOTLB
entries corresponding to DMA requests processed through this
context-entry.
• 1: Hardware may eagerly evict IOTLB entries hosting effective
translations with TM field Set. Refer to Page-table Entry format in
4 EH: Eviction Hint
Section 9.3 for details on TM field. Software may Set the EH field
if device commonly uses transient DMA buffers.
This field is treated as reserved by hardware implementations
reporting CH (Caching Hints) field as Clear in the Extended Capability
register.
When supported, this field is evaluated by hardware only when
Present (P) field is Set.
1. Untranslated DMA and DMA Translation requests to addresses beyond the Maximum Guest Address Width
(MGAW) supported by hardware may be blocked and reported through other means such as PCI Express
Advanced Error Reporting (AER). Such errors (referred to as platform errors) may not be reported as
DMA-remapping faults and are outside the scope of this specification.
6 6 6 5 5 1 1 1
3 2 1 2 1 HAW HAW -1 2 1 0 8 7 6 2 1 0
R
W
AVAIL
SP
AVAIL
SNP
ADDR
RSVD
AVAIL
TM
AVAIL
AVAIL: This field is available to software. Hardware always ignores the programming of this
63
Available field.
TM: For implementations supporting Caching Hints (CH field reported as Set in Extended
62 Transient Capability register), hardware may eagerly evict mappings with this field Set in the
IOTLB, if the corresponding context-entry has EH (Eviction Hint) field Set.
Mapping1
AVAIL: This field is available to software. Hardware always ignores programming of this
61:52
Available field.
Host physical address of the page frame if this is a leaf node. Otherwise a pointer to
the next level page table.
ADDR:
51:12
Address This field is evaluated by hardware only when at least one of the Read (R) and Write
(W) fields is Set. When evaluated, hardware treats bits 51:HAW as reserved, where
HAW is the host address width of the platform.
This field indicates the snoop behavior of Untranslated DMA requests remapped
through this page-table entry.
• 0: Snoop behavior of Untranslated DMA requests processed through this page-
table entry is defined by the Root-Complex handling of NS (Non-Snoop)
attribute in the DMA request.
• 1: Untranslated DMA requests processed through this page-table entry are
treated as snooped, irrespective of the NS (Non-Snoop) attribute in the DMA
SNP: request.
11 Snoop For implementations supporting Device-IOTLBs, hardware returns this field as the
Behavior ‘N’ field in Translation Requests completions.
SNP field is treated as reserved:
• Always in non-leaf entries
• In leaf entries, for hardware implementations reporting SC (Snoop Control)
field as Clear in Extended Capability register.
This field is evaluated by hardware when at least one of Read (R) and Write (W)
fields is Set.
AVAIL:
10:8 This field is available to software. Hardware ignores the programming of this field.
Available
This field tells hardware whether to stop the page-walk before reaching a leaf node
mapping to a 4KB page:
• 0: Continue with the page-walk and use the next level table.
• 1: Stop the page-walk and form the host physical address using the unused bits
in the input address for the page-walk (N-1):0 along with bits (HAW-1):N of the
page base address provided in the address (ADDR) field.
Hardware treats the SP field as reserved in:
SP: Super • Page-directory entries corresponding to super-page sizes not defined in the
7
Page architecture.
• Page-directory entries corresponding to super-page sizes not supported by
hardware implementation. (Hardware reports the supported super-page sizes
through the Capability register.)
This field is evaluated by hardware only when at least one of Read (R) and Write
(W) fields is Set.
Hardware always ignores the programming of this field in leaf page-table entries
corresponding to 4KB pages.
AVAIL: This field is available to software. Hardware always ignores the programming of this
6:2
Available field.
1. Setting the TM field in a page-table entry does not change software requirements for invalidating the IOTLB
when modifying the page-table entry.
1 1 1 1 1 1 1
2 2 2 2 2 0 0 9 9 8 7 6
7 6 5 4 3 4 3 6 5 0 9 4
S ID
R e s e rv e d ( 0 )
FR
R e s e rv e d ( 0 )
AT
T
R e s e rv e d ( 0 )
6 1 1
3 2 1 0
R e s e rv e d ( 0 )
FI
AT field in the faulted DMA request. This field is valid only when
125:124 AT: Address Type the Fault Reason (FR) indicates one of the DMA-remapping fault
conditions.
79:64 SID: Source Identifier Requester-id associated with the fault condition.
When the Fault Reason (FR) field indicates one of the DMA-
remapping fault conditions, bits 63:12 of this field contains the
page address in the faulted DMA request.
63:12 FI: Fault Information When the Fault Reason (FR) field indicates one of the interrupt-
remapping fault conditions, bits 63:48 of this field contains the
interrupt_index computed for the faulted interrupt request, and
bits 48:12 are cleared.
1
2 8 8 8 8 8 7 6
7 4 3 2 1 0 9 4
SID
SQ
SVT
Reserved (0)
6 3 3 2 2 1 1 1 1
3 2 1 4 3 6 5 2 1 87 5 4 3 2 1 0
P
FPD
Destination Mode
Redirection Hint
Trigger Mode
Delivery Mode
AVAIL
Reserved (0)
Vector
Reserved (0)
Destination ID
This field specifies the type of validation that must be performed by the
interrupt-remapping hardware on the source-id of the interrupt requests
referencing this IRTE.
• 00b: No requester-id verification is required.
• 01b: Verify requester-id in interrupt request using SID and SQ fields
in the IRTE.
SVT: Source • 10b: Verify the most significant 8-bits of the requester-id (Bus#) in
83:82 Validation the interrupt request is equal to or within the Startbus# and
Type EndBus# specified through the upper and lower 8-bits of the SID
field respectively. This encoding may be used to verify interrupts
originated behind PCI-Express-to-PCI/PCI-X bridges. Refer
Section 5.2 for more details.
• 11b: Reserved.
This field is evaluated by hardware only when the Present (P) field is
Set.
This field is evaluated by hardware only when the Present (P) field is Set
and SVT field is 01b.
This field specifies the originator (source) of the interrupt request that
references this IRTE. The format of the SID field is determined by the
programming of the SVT field.
If the SVT field is:
• 01b: The SID field contains the 16-bit requestor-id (Bus/Dev/Func
#) of the device that is allowed to originate interrupt requests
referencing this IRTE. The SQ field is used by hardware to
SID: Source determine which bits of the SID field must be considered for the
79:64
Identifier interrupt request verification.
• 10b: The most significant 8-bits of the SID field contains the
startbus#, and the least significant 8-bits of the SID field contains
the endbus#. Interrupt requests that reference this IRTE must have
a requester-id whose bus# (most significant 8-bits of requester-id)
has a value equal to or within the startbus# to endbus# range.
This field is evaluated by hardware only when the Present (P) field is Set
and SVT field is 01b or 10b.
This 8-bit field contains the interrupt vector associated with the
23:16 V: Vector remapped interrupt request. This field is evaluated by hardware only
when the Present (P) field is Set.
This 3-bit field specifies how the remapped interrupt is handled. Delivery
Modes operate only in conjunction with specified Trigger Modes (TM).
Correct Trigger Modes must be guaranteed by software. Restrictions are
indicated below:
• 000b (Fixed Mode) – Deliver the interrupt to all the agents indicated
by the Destination ID field. The Trigger Mode for fixed delivery
mode can be edge or level.
• 001b (Lowest Priority) – Deliver the interrupt to one (and only one)
of the agents indicated by the Destination ID field (the algorithm to
pick the target agent is component specific and could include
priority based algorithm). The Trigger Mode can be edge or level.
• 010b (System Management Interrupt or SMI): SMI is an edge
triggered interrupt regardless of the setting of the Trigger Mode
(TM) field. For systems that rely on SMI semantics, the vector field
is ignored, but must be programmed to all zeroes for future
compatibility. (Support for this delivery mode is implementation
specific. Platforms supporting interrupt remapping are expected to
generate SMI through dedicated pin or platform-specific special
messages)2
DLM: Delivery
7:5 • 100b (NMI) – Deliver the signal to all the agents listed in the
Mode
destination field. The vector information is ignored. NMI is an edge
triggered interrupt regardless of the Trigger Mode (TM) setting.
(Platforms supporting interrupt remapping are recommended to
generate NMI through dedicated pin or platform-specific special
messages)2
• 101b (INIT) – Deliver this signal to all the agents indicated by the
Destination ID field. The vector information is ignored. INIT is an
edge triggered interrupt regardless of the Trigger Mode (TM)
setting. (Support for this delivery mode is implementation specific.
Platforms supporting interrupt remapping are expected to generate
INIT through dedicated pin or platform-specific special messages)2
• 111b (ExtINT) – Deliver the signal to the INTR signal of all agents
indicated by the Destination ID field (as an interrupt that originated
from an 8259A compatible interrupt controller). The vector is
supplied by the INTA cycle issued by the activation of the ExtINT.
ExtINT is an edge triggered interrupt regardless of the Trigger Mode
(TM) setting.
This field is evaluated by hardware only when the Present (P) field is
Set.
This field indicates the signal type of the interrupt that uses the IRTE.
• 0: Indicates edge sensitive.
TM: Trigger
4 • 1: Indicates level sensitive.
Mode
This field is evaluated by hardware only when the Present (P) field is
Set.
1. The various processor and platform interrupt modes (like Intel®64 xAPIC mode, Intel®64 x2APIC mode
and ItaniumTM processor mode) are determined by platform/processor specific mechanisms and are
outside the scope of this specification.
2. Refer Section 5.7 for hardware considerations for handling platform events.
10 Register Descriptions
This chapter describes the structure and use of the remapping registers.
Attribute Description
RW Read-Write field that may be either set or cleared by software to the desired state.
“Read-only status, Write-1-to-clear status” field. A read of the field indicates status. A
RW1C set bit indicating a status may be cleared by writing a 1. Writing a 0 to an RW1C field
has no effect.
“Sticky Read-only status, Write-1-to-clear status” field. A read of the field indicates
status. A set bit indicating a status may be cleared by writing a 1. Writing a 0 to an
RW1CS
RW1CS field has no effect. Not initialized or modified by hardware except on
powergood reset.
“Sticky Read-only” field that cannot be directly altered by software, and is not
ROS
initialized or modified by hardware except on powergood reset.
“Reserved and Zero” field that is reserved for future RW1C implementations. Registers
RsvdZ
are read-only and must return 0 when read. Software must use 0 for writes.
020h Root-Entry Table Address Register 64 Register to set up location of root-entry table.
038h Fault Event Control Register 32 Interrupt control register for fault events.
Fault Event Upper Address Interrupt message upper address register for
044h 32
Register fault event messages.
Protected Low Memory Limit Register pointing to last address (limit) of the
06Ch 32
Register DMA-protected low memory region.
Protected High Memory Limit Register pointing to last address (limit) of the
078h 64
Register DMA-protected high memory region.
Invalidation Completion Event Data Invalidation Queue Event message data register
0A4h 32
Register for Invalidation Queue Events
1. Hardware implementations may place IOTLB registers and fault recording registers in any reserved
addresses in the 4KB register space, or place them in adjoined 4KB regions. If one or more adjunct 4KB
regions are used, unused addresses in those pages must be treated as reserved by hardware. Location of
these registers is implementation dependent, and software must read the Capability register to determine
their offset location.
3
1 8 7 4 3 0
R svd M ax M in
Abbreviation VER_REG
General Register to report the architecture version supported. Backward compatibility for
Description the architecture is maintained with new revision numbers, allowing software to load
remapping hardware drivers written for prior architecture versions.
6 5 5 5 5 44 4 3 3 3 33 2 2 2 2 11 11
3 6 5 4 3 87 0 9 8 7 43 4 3 2 1 65 32 8 7 6 5 4 3 2 0
R I P P R
D D P s S Z C M M W A
Rsvd R W MAMV NFR S v SPS FRO O L MGAW Rsvd SAGAW M H L B F ND
D D I d C R R R F L
H
Abbreviation CAP_REG
General
Register to report general remapping hardware capabilities
Description
NFR: Number of
Implementations must support at least one fault
47:40 RO X Fault- recording
recording register (NFR = 0) for each remapping
Registers
hardware unit in the platform.
1. Each remapping unit in the platform should support as many number of domains as the maximum number
of independently DMA-remappable devices expected to be attached behind it.
6 2 2 2 1 1 1
3 4 3 0 9 8 7 8 7 6 5 4 3 2 1 0
Abbreviation ECAP_REG
General
Register to report remapping hardware extended capabilities
Description
3 3 2 2 2 2 2 2 2 2
1 0 9 8 7 6 5 4 3 2 0
Abbreviation GCMD_REG
General Register to control remapping hardware. If multiple control fields in this register
Description need to be modified, software must serialize the modifications through multiple
writes to this register.
®
This field is valid only for Intel 64 implementations
supporting interrupt-remapping.
1. Implementations reporting write-buffer flushing as required in Capability register must perform implicit
write buffer flushing as a pre-condition to all context-cache and IOTLB invalidation operations.
3 3 2 2 2 2 2 2 2 2
1 0 9 8 7 6 5 4 3 2 0
TES RTPS FLS AFLS WBFS QIES IRES IRTPS CFIS Rsvd
Abbreviation GSTS_REG
General
Register to report general remapping hardware status.
Description
RTPS: Root Table This field is cleared by hardware when software sets
30 RO 0
Pointer Status the SRTP field in the Global Command register. This
field is set by hardware when hardware completes
the ‘Set Root Table Pointer’ operation using the value
provided in the Root-Entry Table Address register.
This field:
• Is cleared by hardware when software Sets the
SFL field in the Global Command register.
FLS: Fault Log
29 RO 0 • Is Set by hardware when hardware completes
Status
the ‘Set Fault Log Pointer’ operation using the
value provided in the Advanced Fault Log
register.
6 1 1
3 2 1 0
RTA Rsvd
Abbreviation RTADDR_REG
General
Register providing the base address of root-entry table.
Description
RTA: Root Table Software specifies the base address of the root-entry
63:12 RW 0h
Address table through this register, and programs it in
hardware through the SRTP field in the Global
Command register.
6 6 6 6 5 5 33 33 1 1
3 2 1 0 9 8 43 21 6 5 0
Abbreviation CCMD_REG
General Register to manage context cache.The act of writing the uppermost byte of the
Description CCMD_REG with the ICC field Set causes the hardware to perform the context-cache
invalidation.
15:0 RW 0h DID: Domain-ID The Capability register reports the domain-id width
supported by hardware. Software must ensure that
the value written to this field is within this limit.
Hardware ignores (and may not implement) bits 15:N,
where N is the supported domain-id width reported in
the Capability register.
XXXh + 008h IOTLB Invalidate Register 64 Register for IOTLB invalidation command
6 6 6 6 5 5 5 5 5 4 4 4 33
3 2 1 0 9 8 7 6 0 9 8 7 21 0
R R
IVT s IIRG s IAIG Rsvd DR DW DID Rsvd
v v
d d
Abbreviation IOTLB_REG
General Register to invalidate IOTLB. The act of writing the upper byte of the IOTLB_REG
Description with the IVT field Set causes the hardware to perform the IOTLB invalidation.
Register Offset XXXh + 0008h (where XXXh is the location of the IVA_REG)
6 1 1
3 2 1 7 6 5 0
ADDR Rsvd IH AM
Abbreviation IVA_REG
General Register to provide the DMA address whose corresponding IOTLB entry needs to be
Description invalidated through the corresponding IOTLB Invalidate register. This register is a
write-only register. A value returned on a read of this register is undefined.
Register Offset XXXh (XXXh is QWORD aligned and reported through the IVO field in the Extended
Capability register)
ADDR
Mask Pages
bits
Value invalidated
masked
0 None 1
1 12 2
2 13:12 4
3 1 1
1 6 5 8 7 6 5 4 3 2 1 0
Rsvd FRI Rsvd ITE ICE IQE APF AFO PPF PFO
Abbreviation FSTS_REG
General
Register indicating the various error status.
Description
3 3 2
1 0 9 0
IM IP Rsvd
Abbreviation FECTL_REG
General Register specifying the fault event interrupt message control bits. Section 7.3
Description describes hardware handling of fault events.
3 11
1 65 0
EIMD IMD
Abbreviation FEDATA_REG
General
Register specifying the interrupt message data.
Description
3
1 21 0
MA Rsvd
Abbreviation FEADDR_REG
General
Register specifying the interrupt message address.
Description
3
1 0
MUA
Abbreviation FEUADDR_REG
General
Register specifying the interrupt message upper address.
Description
MUA: Message
31:0 RW 0h Software requirements for programming this register
upper address
are described in Section 5.6.
1 1 1 1 1 1 1
2 2 2 2 2 0 0 9 9 8 7 6
7 6 5 4 3 4 3 6 5 0 9 4
6 11
3 21 0
FI Rsvd
General Registers to record fault information when primary fault logging is active.Hardware
Description reports the number and location of fault recording registers through the Capability
register. This register is relevant only for primary fault logging.
These registers are sticky and can be cleared only through power good reset or by
software clearing the RW1C fields by writing a 1.
63:12 ROS Xh FI: Fault Info When the Fault Reason (FR) field indicates one
of the interrupt-remapping fault conditions, bits
63:48 of this field indicate the interrupt_index
computed for the faulted interrupt request, and
bits 47:12 are cleared.
1. Hardware updates to this register may be disassembled as multiple doubleword writes. To ensure consistent
data is read from this register, software must first check the Primary Pending Fault (PPF) field in the
FSTS_REG is Set before reading the fault reporting register at offset as indicated in the FRI field of
FSTS_REG. Alternatively, software may read the highest doubleword in a fault recording register and check
if the Fault (F) field is Set before reading the rest of the data fields in that register.
6 1 1
3 2 1 98 0
Abbreviation AFLOG_REG
General Register to specify the base address of the memory-resident fault-log region. This
Description register is treated as RsvdZ for implementations not supporting advanced
translation fault logging (AFL field reported as 0 in the Capability register).
3 3
1 0 1 0
Abbreviation PMEN_REG
General Register to enable the DMA-protected memory regions set up through the PLMBASE,
Description PLMLIMT, PHMBASE, PHMLIMIT registers. This register is always treated as RO for
implementations not supporting protected memory regions (PLMR and PHMR fields
reported as Clear in the Capability register).
3
1 N 0
PLMB Rsvd
Abbreviation PLMBASE_REG
General Register to set up the base address of DMA-protected low-memory region below
Description 4GB. This register must be set up before enabling protected memory through
PMEN_REG, and must not be updated when protected memory regions are enabled.
The alignment of the protected low memory region base depends on the number of
reserved bits (N:0) of this register. Software may determine N by writing all 1s to
this register, and finding the most significant bit position with 0 in the value read
back from the register. Bits N:0 of this register are decoded by hardware as all 0s.
Software must setup the protected low memory region below 4GB. Section 10.4.18
describes the Protected Low-Memory Limit register and hardware decoding of these
registers.
Software must not modify this register when protected memory regions are enabled
(PRS field Set in PMEN_REG)
PLMB: Protected
This register specifies the base of protected low-
31:(N+1) RW 0h Low-Memory
memory region in system memory.
Base
3
1 N 0
PLML Rsvd
Abbreviation PLMLIMIT_REG
General Register to set up the limit address of DMA-protected low-memory region below
Description 4GB. This register must be set up before enabling protected memory through
PMEN_REG, and must not be updated when protected memory regions are enabled.
The alignment of the protected low memory region limit depends on the number of
reserved bits (N:0) of this register. Software may determine N by writing all 1’s to
this register, and finding most significant zero bit position with 0 in the value read
back from the register. Bits N:0 of the limit register are decoded by hardware as all
1s.
Software must not modify this register when protected memory regions are enabled
(PRS field Set in PMEN_REG)
6
3 N 0
PHMB Rsvd
Abbreviation PHMBASE_REG
General Register to set up the base address of DMA-protected high-memory region. This
Description register must be set up before enabling protected memory through PMEN_REG, and
must not be updated when protected memory regions are enabled.
The alignment of the protected high memory region base depends on the number of
reserved bits (N:0) of this register. Software may determine N by writing all 1’s to
this register, and finding most significant zero bit position below host address width
(HAW) in the value read back from the register. Bits N:0 of this register are decoded
by hardware as all 0s.
Software may setup the protected high memory region either above or below 4GB.
Section 10.4.20 describes the Protected High-Memory Limit register and hardware
decoding of these registers.
Software must not modify this register when protected memory regions are enabled
(PRS field Set in PMEN_REG)
6
3 N 0
PHML Rsvd
Abbreviation PHMLIMIT_REG
General Register to set up the limit address of DMA-protected high-memory region. This
Description register must be set up before enabling protected memory through PMEN_REG, and
must not be updated when protected memory regions are enabled.
The alignment of the protected high memory region limit depends on the number of
reserved bits (N:0) of this register. Software may determine N by writing all 1’s to
this register, and finding most significant zero bit position below host address width
(HAW) in the value read back from the register. Bits N:0 of the limit register are
decoded by hardware as all 1s.
Software must not modify this register when protected memory regions are enabled
(PRS field Set in PMEN_REG)
6 11
3 98 43 0
Rsvd QH Rsvd
Abbreviation IQH_REG
General Register indicating the invalidation queue head. This register is treated as RsvdZ by
Description implementations reporting Queued Invalidation (QI) as not supported in the
Extended Capability register.
6 11
3 98 43 0
Rsvd QT Rsvd
Abbreviation IQT_REG
General Register indicating the invalidation tail head. This register is treated as RsvdZ by
Description implementations reporting Queued Invalidation (QI) as not supported in the
Extended Capability register.
6 1 1
3 2 1 32 0
IQA Rsvd QS
Abbreviation IQA_REG
General Register to configure the base address and size of the invalidation queue. This
Description register is treated as RsvdZ by implementations reporting Queued Invalidation (QI)
as not supported in the Extended Capability register.
3
1 1 0
Rsvd IWC
Abbreviation ICS_REG
General Register to report completion status of invalidation wait descriptor with Interrupt
Description Flag (IF) Set. This register is treated as RsvdZ by implementations reporting
Queued Invalidation (QI) as not supported in the Extended Capability register.
3 3 2
1 0 9 0
IM IP Rsvd
Abbreviation IECTL_REG
General Register specifying the invalidation event interrupt control bits. This register is
Description treated as RsvdZ by implementations reporting Queued Invalidation (QI) as not
supported in the Extended Capability register.
3 11
1 65 0
EIMD IMD
Abbreviation IEDATA_REG
General Register specifying the Invalidation Event interrupt message data. This register is
Description treated as RsvdZ by implementations reporting Queued Invalidation (QI) as not
supported in the Extended Capability register.
3
1 21 0
MA Rsvd
Abbreviation IEADDR_REG
General Register specifying the Invalidation Event Interrupt message address.This register is
Description treated as RsvdZ by implementations reporting Queued Invalidation (QI) as not
supported in the Extended Capability register.
3
1 0
MUA
Abbreviation IEUADDR_REG
General
Register specifying the Invalidation Event interrupt message upper address.
Description
6 1 1 1
3 2 1 0 43 0
Abbreviation IRTA_REG
General Register providing the base address of Interrupt remapping table. This register is
Description treated as RsvdZ by implementations reporting Interrupt Remapping (IR) as not
supported in the Extended Capability register.
IRTA: Interrupt
Hardware may ignore and not implement bits
63:12 RW 0h Remapping Table
63:HAW, where HAW is the host address width.
Address
11 Programming Considerations
On a root table pointer set operation, software must globally invalidate the context-cache, and after
its completion globally invalidate the IOTLB to ensure hardware references only the new structures for
further remapping.
If software sets the root table pointer while DMA-remapping hardware is active, software must ensure
the structures referenced by the new root table pointer provide identical remapping results as the
structures referenced by the previous root table pointer to ensure any valid in-flight DMA requests are
properly remapped. This is required since hardware may utilize the old structures or the new
structures to remap in-flight DMA requests, until the context-cache and IOTLB invalidations are
completed.
For implementations reporting Caching Mode (CM) as Set in the Capability register:
• Software is required to invalidate the context-cache for any/all modifications to root or context
entries for them to be effective.
• Software is required to invalidate the IOTLB (after the context-cache invalidation completion) for
any modifications to present root or context entries for them to be effective.
• Software is required to invalidate the IOTLB for any/all modifications to any active page
directory/table entries for them to be effective.
• Software must not use domain-id value of zero as it is reserved on implementations reporting
Caching Mode (CM) as Set.
For each remapping hardware unit, software must serialize commands submitted through the Global
Command register, Context Command register, IOTLB registers and Protected Memory Enable
register.
For platforms supporting more than one remapping hardware unit, there are no hardware serialization
requirements for operations across remapping hardware units.
Software must always follow the interrupt-remapping table pointer set operation with a global
invalidate of the IEC to ensure hardware references the new structures before enabling interrupt
remapping.
If software updates the interrupt-remapping table pointer while interrupt-remap hardware is active,
software must ensure the structures referenced by the new interrupt-remapping table pointer provide
identical remapping results as the structures referenced by the previous interrupt-remapping table
pointer to ensure any valid in-flight interrupt requests are properly remapped. This is required since
hardware may utilize the old structures or the new structures to remap in-flight interrupt requests,
until the IEC invalidation is completed.
The following table describes the meaning of the codes assigned to various faults.
0h Reserved. Used by software when initializing fault records (for advanced fault logging)
1h The Present (P) field in the root-entry used to process the DMA request is Clear.
2h The Present (P) field in the context-entry used to process the DMA request is Clear.
The DMA request attempted to access an address beyond (2X - 1), where X is the
minimum of the maximum guest address width (MGAW) reported through the
4h
Capability register and the value in the Address-Width (AW) field of the context-entry
used to process the DMA request.
The Write (W) field in a page-table entry used for address translation of a write request
5h
or AtomicOp request is Clear.
The Read (R) field in a page-table entry used for address translation of a read request
or AtomicOp request is Clear. For implementations reporting ZLR field as set in the
6h
Capability register, this fault condition is not applicable for zero-length read requests to
write-only pages.
Hardware attempt to access the next level page table through the Address (ADDR)
7h
field of the page-table entry resulted in error.
Hardware attempt to access the root-entry table through the Root-Table Address (RTA)
8h
field in the Root-entry Table Address register resulted in error.
Hardware detected reserved field(s) that are not initialized to zero in a root-entry with
Ah
Present (P) field Set.
Hardware detected reserved field(s) that are not initialized to zero in a context-entry
Bh
with Present (P) field Set.
Hardware detected reserved field(s) that are not initialized to zero in a page-table
Ch
entry with at least one of Read (R) and Write (W) field Set.
Translation request or translated DMA explicitly blocked due to the programming of the
Dh
Translation Type (T) field in the corresponding present context-entry.
Decoding of the interrupt request per the Remappable request format detected one or
20h
more reserved fields as Set.
The interrupt_index value computed for the interrupt request is greater than the
21h
maximum allowed for the interrupt Remapping table-size configured by software.
The Present (P) field in the IRTE entry corresponding to the interrupt_index of the
22h
interrupt request is Clear.
Hardware attempt to access the interrupt Remapping table through the Interrupt
23h
Remapping Table Address (IRTA) field in the IRTA_REG register resulted in error.
Hardware detected one ore more reserved field(s) that are not initialized to zero in an
24h IRTE with Present (P) field Set. This includes the case where software programmed
various conditional reserved fields wrongly.
®
On Intel 64 platforms, hardware blocked an interrupt request in Compatibility format
either due to Extended Interrupt Mode Enabled (EIME field Set in Interrupt Remapping
Table Address register) or Compatibility format interrupts disabled (CFIS field Clear in
25h Global Status register).
On ItaniumTM platforms, hardware blocked an interrupt request in Compatibility
format.
Eh - 1Fh Reserved.