Intel® 64 and IA-32 Architectures Software Developer's Manual Documentation Changes
Intel® 64 and IA-32 Architectures Software Developer's Manual Documentation Changes
Intel® 64 and IA-32 Architectures Software Developer's Manual Documentation Changes
Documentation Changes
January 2013
Notice: The Intel 64 and IA-32 architectures may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Current characterized errata are documented in the specification updates.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Legal Lines and Disclaimers
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. Intel, the Intel logo, Pentium, Xeon, Intel NetBurst, Intel Core, Intel Core Solo, Intel Core Duo, Intel Core 2 Duo, Intel Core 2 Extreme, Intel Pentium D, Itanium, Intel SpeedStep, MMX, Intel Atom, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copyright 1997-2013 Intel Corporation. All rights reserved.
Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Summary Tables of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Documentation Changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Revision History
Revision History
Revision -001 -002
Initial release Added 1-10 Documentation Changes. Removed old Documentation Changes items that already have been incorporated in the published Software Developers manual Added 9 -17 Documentation Changes. Removed Documentation Change #6 - References to bits Gen and Len Deleted. Removed Documentation Change #4 - VIF Information Added to CLI Discussion Removed Documentation changes 1-17.
Description
-003
February 2003
-004 -005 -006 -007 -008 -009 -010 -011 -012 -013 -014 -015 -016 -017 -018 -019 -020 -021 -022 -023
June 2003 September 2003 November 2003 January 2004 March 2004 May 2004 August 2004 November 2004 March 2005 July 2005 September 2005 March 9, 2006 March 27, 2006 September 2006 October 2006 March 2007 May 2007 November 2007 August 2008 March 2009
Revision History
Revision -024 -025 -026 -027 -028 -029 -030 -031 -032 -033 -034 -035 -036 -037 -038
Description
Removed Documentation Changes 1-21 Added Documentation Changes 1-16 Removed Documentation Changes 1-16 Added Documentation Changes 1-18 Removed Documentation Changes 1-18 Added Documentation Changes 1-15 Removed Documentation Changes 1-15 Added Documentation Changes 1-24 Removed Documentation Changes 1-24 Added Documentation Changes 1-29 Removed Documentation Changes 1-29 Added Documentation Changes 1-29 Removed Documentation Changes 1-29 Added Documentation Changes 1-29 Removed Documentation Changes 1-29 Added Documentation Changes 1-29 Removed Documentation Changes 1-29 Added Documentation Changes 1-14 Removed Documentation Changes 1-14 Added Documentation Changes 1-38 Removed Documentation Changes 1-38 Added Documentation Changes 1-16 Removed Documentation Changes 1-16 Added Documentation Changes 1-18 Removed Documentation Changes 1-18 Added Documentation Changes 1-17 Removed Documentation Changes 1-17 Added Documentation Changes 1-28 Removed Documentation Changes 1-28 Add Documentation Changes 1-22
Date June 2009 September 2009 December 2009 March 2010 June 2010 September 2010 January 2011 April 2011 May 2011 October 2011 December 2011 March 2012 May 2012 August 2012 January 2013
Revision History
Preface
This document is an update to the specifications contained in the Affected Documents table below. This document is a compilation of device and documentation errata, specification clarifications and changes. It is intended for hardware system manufacturers and software developers of applications, operating systems, or tools.
Affected Documents
Document Title Intel 64 and IA-32 Architectures Software Developers Manual, Volume 1: Basic Architecture Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A: Instruction Set Reference, A-M Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2B: Instruction Set Reference, N-Z Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2C: Instruction Set Reference Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A: System Programming Guide, Part 1 Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3B: System Programming Guide, Part 2 Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C: System Programming Guide, Part 3
Document Number/ Location 253665 253666 253667 326018 253668 253669 326019
Nomenclature
Documentation Changes include typos, errors, or omissions from the current published specifications. These will be incorporated in any new release of the specification.
Documentation Changes
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Updates to Chapter 8, Volume 1 Updates to Chapter 13, Volume 1 Updates to Chapter 3, Volume 2A Updates to Chapter 4, Volume 2B Updates to Chapter 5, Volume 2C Updates to Chapter 6, Volume 3A Updates to Chapter 9, Volume 3A Updates to Chapter 14, Volume 3B Updates to Chapter 16, Volume 3B Updates to Chapter 17, Volume 3B Updates to Chapter 18, Volume 3B Updates to Chapter 19, Volume 3B Updates to Chapter 24, Volume 3C Updates to Chapter 25, Volume 3C Updates to Chapter 26, Volume 3C Updates to Chapter 27, Volume 3C Updates to Chapter 28, Volume 3C Updates to Chapter 30, Volume 3C Updates to Chapter 31, Volume 3C Updates to Chapter 34, Volume 3C Updates to Chapter 35, Volume 3C Updates to Appendix A, Volume 3C Updates to Appendix B, Volume 3C DOCUMENTATION CHANGES
Documentation Changes
1. Updates to Chapter 8, Volume 1
Change bars show changes to Chapter 8 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 1: Basic Architecture. -----------------------------------------------------------------------------------------...
8.1.8
The x87 FPU stores pointers to the instruction and data (operand) for the last non-control instruction executed. These are the x87 FPU instruction pointer and x87 FPU data (operand) pointers; software can save these pointers to provide state information for exception handlers. The pointers are illustrated in Figure 8-1 (the figure illustrates the pointers as used outside 64-bit mode; see below). Note that the value in the x87 FPU data pointer register is always a pointer to a memory operand, If the last noncontrol instruction that was executed did not have a memory operand, the value in the data pointer register is undefined (reserved). The contents of the x87 FPU instruction and data pointer registers remain unchanged when any of the control instructions (FCLEX/FNCLEX, FLDCW, FSTCW/FNSTCW, FSTSW/FNSTSW, FSTENV/FNSTENV, FLDENV, and WAIT/ FWAIT) are executed. For all the x87 FPUs and NPXs except the 8087, the x87 FPU instruction pointer points to any prefixes that preceded the instruction. For the 8087, the x87 FPU instruction pointer points only to the actual opcode. The x87 FPU instruction and data pointers each consists of an offset and a segment selector. On processors that support IA-32e mode, each offset comprises 64 bits; on other processors, each offset comprises 32 bits. Each segment selector comprises 16 bits. The pointers are accessed by the FINIT/FNINIT, FLDENV, FRSTOR, FSAVE/FNSAVE, FSTENV/FNSTENV, FXRSTOR, FXSAVE, XRSTOR, XSAVE, and XSAVEOPT instructions as follows: FINIT/FNINIT. Each instruction clears each 64-bit offset and 16-bit segment selector. FLDENV, FRSTOR. These instructions use the memory formats given in Figures 8-9 through 8-12: For each 64-bit offset, each instruction loads the lower 32 bits from memory and clears the upper 32 bits. If CR0.PE = 1, each instruction loads each 16-bit segment selector from memory; otherwise, it clears each 16-bit segment selector. FSAVE/FNSAVE, FSTENV/FNSTENV. These instructions use the memory formats given in Figures 8-9 through 8-12. Each instruction saves the lower 32 bits of each 64-bit offset into memory. the upper 32 bits are not saved. If CR0.PE = 1, each instruction saves each 16-bit segment selector into memory. If CPUID.(EAX=07H,ECX=0H):EBX[bit 13] = 1, the processor deprecates the segment selectors of the x87 FPU instruction and data pointers; it saves each segment selector as 0000H. After saving these data into memory, FSAVE/FNSAVE clears each 64-bit offset and 16-bit segment selector. FXRSTOR, XRSTOR. These instructions load data from a memory image whose format depend on operating mode and the REX prefix. The memory formats are given in Tables 3-53, 3-56, and 3-57 in Chapter 3,
Instruction Set Reference, A-L, of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A. Outside of 64-bit mode or if REX.W = 0, the instructions operate as follows: For each 64-bit offset, each instruction loads the lower 32 bits from memory and clears the upper 32 bits. Each instruction loads each 16-bit segment selector from memory. Each instruction loads each 64-bit offset from memory. Each instruction clears each 16-bit segment selector.
FXSAVE, XSAVE, and XSAVEOPT. These instructions store data into a memory image whose format depend on operating mode and the REX prefix. The memory formats are given in Tables 3-53, 3-56, and 3-57 in Chapter 3, Instruction Set Reference, A-L, of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A. Outside of 64-bit mode or if REX.W = 0, the instructions operate as follows: Each instruction saves the lower 32 bits of each 64-bit offset into memory. The upper 32 bits are not saved. Each instruction saves each 16-bit segment selector into memory. If CPUID.(EAX=07H,ECX=0H):EBX[bit 13] = 1, the processor deprecates the segment selectors of the x87 FPU instruction and data pointers; it saves each segment selector as 0000H.
In 64-bit mode with REX.W = 1, each instruction saves each 64-bit offset into memory. The 16-bit segment selectors are not saved. ...
8.1.10
The FSTENV/FNSTENV and FSAVE/FNSAVE instructions store x87 FPU state information in memory for use by exception handlers and other system and application software. The FSTENV/FNSTENV instruction saves the contents of the status, control, tag, x87 FPU instruction pointer, x87 FPU data pointer, and opcode registers. The FSAVE/FNSAVE instruction stores that information plus the contents of the x87 FPU data registers. Note that the FSAVE/FNSAVE instruction also initializes the x87 FPU to default values (just as the FINIT/FNINIT instruction does) after it has saved the original state of the x87 FPU. The manner in which this information is stored in memory depends on the operating mode of the processor (protected mode or real-address mode) and on the operand-size attribute in effect (32-bit or 16-bit). See Figures 8-9 through 8-12. In virtual-8086 mode or SMM, the real-address mode formats shown in Figure 8-12 is used. See Chapter 34, System Management Mode, of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C, for information on using the x87 FPU while in SMM. The FLDENV and FRSTOR instructions allow x87 FPU state information to be loaded from memory into the x87 FPU. Here, the FLDENV instruction loads only the status, control, tag, x87 FPU instruction pointer, x87 FPU data pointer, and opcode registers, and the FRSTOR instruction loads all the x87 FPU registers, including the x87 FPU stack registers. ...
8.3.6
10
FCOM/FCOMP/FCOMPPCompare floating point and set x87 FPU condition code flags. FUCOM/FUCOMP/FUCOMPPUnordered compare floating point and set x87 FPU condition code flags. FICOM/FICOMPCompare integer and set x87 FPU condition code flags. FCOMI/FCOMIPCompare floating point and set EFLAGS status flags. FUCOMI/FUCOMIPUnordered compare floating point and set EFLAGS status flags. FTST Test (compare floating point with 0.0). FXAMExamine.
Comparison of floating-point values differ from comparison of integers because floating-point values have four (rather than three) mutually exclusive relationships: less than, equal, greater than, and unordered. The unordered relationship is true when at least one of the two values being compared is a NaN or in an unsupported format. This additional relationship is required because, by definition, NaNs are not numbers, so they cannot have less than, equal, or greater than relationships with other floating-point values. The FCOM, FCOMP, and FCOMPP instructions compare the value in register ST(0) with a floating-point source operand and set the condition code flags (C0, C2, and C3) in the x87 FPU status word according to the results (see Table 8-6). If an unordered condition is detected (one or both of the values are NaNs or in an undefined format), a floatingpoint invalid-operation exception is generated. The pop versions of the instruction pop the x87 FPU register stack once or twice after the comparison operation is complete. The FUCOM, FUCOMP, and FUCOMPP instructions operate the same as the FCOM, FCOMP, and FCOMPP instructions. The only difference is that with the FUCOM, FUCOMP, and FUCOMPP instructions, if an unordered condition is detected because one or both of the operands are QNaNs, the floating-point invalid-operation exception is not generated.
Table 8-6. Setting of x87 FPU Condition Code Flags for Floating-Point Number Comparisons
Condition ST(0) > Source Operand ST(0) < Source Operand ST(0) = Source Operand Unordered C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
The FICOM and FICOMP instructions also operate the same as the FCOM and FCOMP instructions, except that the source operand is an integer value in memory. The integer value is automatically converted into an double extended-precision floating-point value prior to making the comparison. The FICOMP instruction pops the x87 FPU register stack following the comparison operation. The FTST instruction performs the same operation as the FCOM instruction, except that the value in register ST(0) is always compared with the value 0.0. The FCOMI and FCOMIP instructions were introduced into the IA-32 architecture in the P6 family processors. They perform the same comparison as the FCOM and FCOMP instructions, except that they set the status flags (ZF, PF, and CF) in the EFLAGS register to indicate the results of the comparison (see Table 8-7) instead of the x87 FPU
11
condition code flags. The FCOMI and FCOMIP instructions allow condition branch instructions (Jcc) to be executed directly from the results of their comparison.
Table 8-7. Setting of EFLAGS Status Flags for Floating-Point Number Comparisons
Comparison Results ST0 > ST(i) ST0 < ST(i) ST0 = ST(i) Unordered ZF 0 0 1 1 PF 0 0 0 1 CF 0 1 0 1
Software can check if the FCOMI and FCOMIP instructions are supported by checking the processors feature information with the CPUID instruction. The FUCOMI and FUCOMIP instructions operate the same as the FCOMI and FCOMIP instructions, except that they do not generate a floating-point invalid-operation exception if the unordered condition is the result of one or both of the operands being a QNaN. The FCOMIP and FUCOMIP instructions pop the x87 FPU register stack following the comparison operation. The FXAM instruction determines the classification of the floating-point value in the ST(0) register (that is, whether the value is zero, a denormal number, a normal finite number, , a NaN, or an unsupported format) or that the register is empty. It sets the x87 FPU condition code flags to indicate the classification (see FXAM Examine in Chapter 3, Instruction Set Reference, A-L, of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A). It also sets the C1 flag to indicate the sign of the value. ---
13.8
VCVTPH2PS and VCVTPS2PH are two instructions supporting half-precision floating-point data type conversion to and from single-precision floating-point data types. Half-precision floating-point values are not used by the processor directly for arithmetic operations. But the conversion operation are subject to SIMD floating-point exceptions. Additionally, The conversion operations of VCVTPS2PH allow programmer to specify rounding control using control fields in an immediate byte. The effects of the immediate byte are listed in Table 13-11. Rounding control can use Imm[2] to select an override RC field specified in Imm[1:0] or use MXCSR setting.
Table 13-11. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions
Bits Field Name/value Description Comment
12
Table 13-11. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions
Imm[1:0] RC=00B RC=01B RC=10B RC=11B Imm[2] Imm[7:3] MS1=0 MS1=1 Ignored Round to nearest even Round down Round up Truncate Use imm[1:0] for rounding Use MXCSR.RC for rounding Ignored by processor Ignore MXCSR.RC If Imm[2] = 0
Specific SIMD floating-point exceptions that can occur in conversion operations are shown in Table 13-12 and Table 13-13.
1. The half precision output QNaN1 is created from the single precision input QNaN as follows: the sign bit is preserved, the 8-bit exponent FFH is replaced by the 5-bit exponent 1FH, and the 24-bit significand is truncated to an 11-bit significand by removing its 14 least significant bits. 2. The half precision output QNaN1 is created from the single precision input SNaN as follows: the sign bit is preserved, the 8-bit exponent FFH is replaced by the 5-bit exponent 1FH, and the 24-bit significand is truncated to an 11-bit significand by removing its 14 least significant bits. The second most significant bit of the significand is changed from 0 to 1 to convert the signaling NaN into a quiet NaN.
VCVTPS2PH can cause denormal exceptions if the value of the source operand is denormal relative to the numerical range represented by the source format (see Table 13-14).
VCVTPS2PH
#DE=1
13
1. Masked and unmasked result is shown in Table 13-12. VCVTPS2PH can cause an underflow exception if the result of the conversion is less than the underflow threshold for half-precision floating-point data type , i.e. | x | < 1.0 214.
NOTES:
1. Masked and unmasked result is shown in Table 13-12. 2. MXCSR.FTZ is ignored, the processor behaves as if MXCSR.FTZ = 0. VCVTPS2PH can cause an overflow exception if the result of the conversion is greater than the maximum representable value for half-precision floating-point data type, i.e. | x | 1.0 216.
NOTES:
VCVTPS2PH can cause an inexact exception if the result of the conversion is not exactly representable in the destination format.
14
...
ANDLogical AND
Opcode 24 ib 25 iw 25 id REX.W + 25 id 80 /4 ib REX + 80 /4 ib 81 /4 iw 81 /4 id REX.W + 81 /4 id 83 /4 ib 83 /4 ib REX.W + 83 /4 ib 20 /r REX + 20 /r 21 /r 21 /r REX.W + 21 /r 22 /r REX + 22 /r 23 /r 23 /r REX.W + 23 /r NOTES: *In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH. Instruction AND AL, imm8 AND AX, imm16 AND EAX, imm32 AND RAX, imm32 AND r/m8, imm8 AND r/m8 , imm8 AND r/m16, imm16 AND r/m32, imm32 AND r/m64, imm32 AND r/m16, imm8 AND r/m32, imm8 AND r/m64, imm8 AND r/m8, r8 AND r/m8 , r8
* * *
Op/ En I I I I MI MI MI MI MI MI MI MI MR MR MR MR MR RM
*
64-bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid
Compat/ Description Leg Mode Valid Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. AL AND imm8. AX AND imm16. EAX AND imm32. RAX AND imm32 sign-extended to 64-bits. r/m8 AND imm8. r/m8 AND imm8. r/m16 AND imm16. r/m32 AND imm32. r/m64 AND imm32 sign extended to 64-bits. r/m16 AND imm8 (sign-extended). r/m32 AND imm8 (sign-extended). r/m64 AND imm8 (sign-extended). r/m8 AND r8. r/m64 AND r8 (sign-extended). r/m16 AND r16. r/m32 AND r32. r/m64 AND r32. r8 AND r/m8. r/m64 AND r8 (sign-extended). r16 AND r/m16. r32 AND r/m32. r64 AND r/m64.
AND r/m16, r16 AND r/m32, r32 AND r/m64, r64 AND r8, r/m8 AND r8 , r/m8
*
RM RM RM RM
15
Op/En MI I
Operand 3 NA NA
Operand 4 NA NA
Description
Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0. This instruction can be used with a LOCK prefix to allow the it to be executed atomically. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
Operation
DEST DEST AND SRC;
Flags Affected
The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.
16
Description
Searches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content of the source operand is 0, the content of the destination operand is undefined. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
Operation
IF SRC = 0 THEN ZF 1; DEST is undefined; ELSE ZF 0; temp 0; WHILE Bit(SRC, temp) = 0 DO temp temp + 1; OD;
17
FI;
DEST temp;
Flags Affected
The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags are undefined.
18
Description
Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
Operation
IF SRC = 0 THEN ZF 1; DEST is undefined; ELSE ZF 0; temp OperandSize 1; WHILE Bit(SRC, temp) = 0 DO temp temp - 1; OD; DEST temp; FI;
Flags Affected
The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags are undefined.
19
#AC(0) #UD
If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. If the LOCK prefix is used.
20
BTBit Test
Opcode 0F A3 /r 0F A3 /r REX.W + 0F A3 /r 0F BA /4 ib 0F BA /4 ib REX.W + 0F BA /4 ib Instruction BT r/m16, r16 BT r/m32, r32 BT r/m64, r64 BT r/m16, imm8 BT r/m32, imm8 BT r/m64, imm8 Op/ En MR MR MR MI MI MI 64-bit Mode Valid Valid Valid Valid Valid Valid Compat/ Description Leg Mode Valid Valid N.E. Valid Valid N.E. Store selected bit in CF flag. Store selected bit in CF flag. Store selected bit in CF flag. Store selected bit in CF flag. Store selected bit in CF flag. Store selected bit in CF flag.
Description
Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset (specified by the second operand) and stores the value of the bit in the CF flag. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value: If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit mode). If the bit base operand specifies a memory location, the operand represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be referenced by the offset operand depends on the operand size.
See also: Bit(BitBase, BitOffset) on page 3-10. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. In this case, the low-order 3 or 5 bits (3 for 16-bit operands, 5 for 32-bit operands) of the immediate bit offset are stored in the immediate bit offset field, and the highorder bits are shifted and combined with the byte displacement in the addressing mode by the assembler. The processor will ignore the high order bits if they are not zero. When accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32bit operand size, using by the following relationship: Effective Address + (4 (BitOffset DIV 32)) Or, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship: Effective Address + (2 (BitOffset DIV 16)) It may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In particular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instructions to load from or store to these addresses, and use the register form of these instructions to manipulate the data.
21
In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit operands. See the summary chart at the beginning of this section for encoding data and limits.
Operation
CF Bit(BitBase, BitOffset);
Flags Affected
The CF flag contains the value of the selected bit. The ZF flag is unaffected. The OF, SF, AF, and PF flags are undefined.
22
Description
Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and complements the selected bit in the bit string. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value: If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit mode). This allows any bit position to be selected. If the bit base operand specifies a memory location, the operand represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be referenced by the offset operand depends on the operand size.
See also: Bit(BitBase, BitOffset) on page 3-10. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. See BTBit Test in this chapter for more information on this addressing mechanism. This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
Operation
CF Bit(BitBase, BitOffset); Bit(BitBase, BitOffset) NOT Bit(BitBase, BitOffset);
Flags Affected
The CF flag contains the value of the selected bit before it is complemented. The ZF flag is unaffected. The OF, SF, AF, and PF flags are undefined.
23
24
Description
Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and clears the selected bit in the bit string to 0. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value: If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit mode). This allows any bit position to be selected. If the bit base operand specifies a memory location, the operand represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be referenced by the offset operand depends on the operand size.
See also: Bit(BitBase, BitOffset) on page 3-10. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. See BTBit Test in this chapter for more information on this addressing mechanism. This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
Operation
CF Bit(BitBase, BitOffset); Bit(BitBase, BitOffset) 0;
Flags Affected
The CF flag contains the value of the selected bit before it is cleared. The ZF flag is unaffected. The OF, SF, AF, and PF flags are undefined.
25
26
Description
Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value: If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit mode). This allows any bit position to be selected. If the bit base operand specifies a memory location, the operand represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be referenced by the offset operand depends on the operand size.
See also: Bit(BitBase, BitOffset) on page 3-10. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. See BTBit Test in this chapter for more information on this addressing mechanism. This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
Operation
CF Bit(BitBase, BitOffset); Bit(BitBase, BitOffset) 1;
Flags Affected
The CF flag contains the value of the selected bit before it is set. The ZF flag is unaffected. The OF, SF, AF, and PF flags are undefined.
27
28
Compare packed double-precision floatingpoint values in xmm2/m128 and xmm1 using imm8 as comparison predicate. Compare packed double-precision floatingpoint values in xmm3/m128 and xmm2 using bits 4:0 of imm8 as a comparison predicate. Compare packed double-precision floatingpoint values in ymm3/m256 and ymm2 using bits 4:0 of imm8 as a comparison predicate.
Description
Performs a SIMD compare of the packed double-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed on each of the pairs of packed values. The result of each comparison is a quadword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. 128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 128-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Two comparisons are performed with results written to bits 127:0 of the destination operand. ...
29
Compare packed single-precision floatingpoint values in xmm2/mem and xmm1 using imm8 as comparison predicate. Compare packed single-precision floatingpoint values in xmm3/m128 and xmm2 using bits 4:0 of imm8 as a comparison predicate. Compare packed single-precision floatingpoint values in ymm3/m256 and ymm2 using bits 4:0 of imm8 as a comparison predicate.
Description
Performs a SIMD compare of the packed single-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed on each of the pairs of packed values. The result of each comparison is a doubleword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. 128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 128-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Four comparisons are performed with results written to bits 127:0 of the destination operand. The unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN. A subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, because a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN. Note that processors with CPUID.1H:ECX.AVX =0 do not implement the greater-than, greater-than-orequal, not-greater than, and not-greater-than-or-equal relations predicates. These comparisons can be made either by using the inverse relationship (that is, use the not-less-than-or-equal to make a greater-than comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation. Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the threeoperand CMPPS instruction, for processors with CPUID.1H:ECX.AVX =0. See Table 3-11. Compiler should treat reserved Imm8 values as illegal syntax.
30
In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). ...
Compare low double-precision floating-point value in xmm2/m64 and xmm1 using imm8 as comparison predicate. Compare low double precision floating-point value in xmm3/m64 and xmm2 using bits 4:0 of imm8 as comparison predicate.
Description
Compares the low double-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed. The comparison result is a quadword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. 128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged. The unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN. A subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, because a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN. Note that processors with CPUID.1H:ECX.AVX =0 do not implement the greater-than, greater-than-orequal, not-greater than, and not-greater-than-or-equal relations predicates. These comparisons can be made either by using the inverse relationship (that is, use the not-less-than-or-equal to make a greater-than comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination operand), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation. Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the threeoperand CMPSD instruction, for processors with CPUID.1H:ECX.AVX =0. See Table 3-13. Compiler should treat reserved Imm8 values as illegal syntax. ...
31
Compare low single-precision floating-point value in xmm2/m32 and xmm1 using imm8 as comparison predicate. Compare low single precision floating-point value in xmm3/m32 and xmm2 using bits 4:0 of imm8 as comparison predicate.
Description
Compares the low single-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed. The comparison result is a doubleword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. 128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged. The unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN A subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, since a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN. Note that processors with CPUID.1H:ECX.AVX =0 do not implement the greater-than, greater-than-orequal, not-greater than, and not-greater-than-or-equal relations predicates. These comparisons can be made either by using the inverse relationship (that is, use the not-less-than-or-equal to make a greater-than comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination operand), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation. Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the threeoperand CMPSS instruction, for processors with CPUID.1H:ECX.AVX =0. See Table 3-15. Compiler should treat reserved Imm8 values as illegal syntax. ...
32
Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. Compare low double precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly.
Description
Compares the double-precision floating-point values in the low quadwords of operand 1 (first operand) and operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN).The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. Operand 1 is an XMM register; operand 2 can be an XMM register or a 64 bit memory location. The COMISD instruction differs from the UCOMISD instruction in that it signals a SIMD floating-point invalid operation exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISD instruction signals an invalid numeric exception only if a source operand is an SNaN. The EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated. In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
Operation
RESULT OrderedCompare(DEST[63:0] <> SRC[63:0]) { (* Set EFLAGS *) CASE (RESULT) OF UNORDERED: ZF,PF,CF 111; GREATER_THAN: ZF,PF,CF 000; LESS_THAN: ZF,PF,CF 001; EQUAL: ZF,PF,CF 100; ESAC; OF, AF, SF 0; }
33
Other Exceptions
See Exceptions Type 3; additionally #UD ... If VEX.vvvv != 1111B.
Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. Compare low single precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly.
Description
Compares the single-precision floating-point values in the low doublewords of operand 1 (first operand) and operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF, and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. Operand 1 is an XMM register; Operand 2 can be an XMM register or a 32 bit memory location. The COMISS instruction differs from the UCOMISS instruction in that it signals a SIMD floating-point invalid operation exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISS instruction signals an invalid numeric exception only if a source operand is an SNaN. The EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated. In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
Operation
RESULT OrderedCompare(SRC1[31:0] <> SRC2[31:0]) { (* Set EFLAGS *) CASE (RESULT) OF UNORDERED: ZF,PF,CF 111;
34
Other Exceptions
See Exceptions Type 3; additionally #UD If VEX.vvvv != 1111B.
...
01H
ECX EDX
35
03H
ECX
36
37
EBX ECX
EDX 07H
Structured Extended Feature Flags Enumeration Leaf (Output depends on ECX input value)
ECX EDX
Reserved Reserved NOTE: * If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Invalid sub-leaves of EAX = 07H: ECX = n, n > 0.
38
EBX
ECX EDX
EBX
ECX
EDX
39
EBX
ECX EDX
Processor Extended State Enumeration Sub-leaf (EAX = 0DH, ECX = 1) 0DH EAX
40
41
80000002H EAX EBX ECX EDX 80000003H EAX EBX ECX EDX 80000004H EAX EBX ECX EDX 80000005H EAX EBX ECX EDX 80000006H EAX EBX ECX
EDX
42
80000008H EAX
43
specific resource type if the bit is set. The bit position corresponds to the sub-leaf index that software must use to query monitoring capability available for that type. See Table 3-17. When CPUID executes with EAX set to 0FH and ECX = n (n >= 1, and is a valid sub-leaf index), the processor returns information software can use to program IA32_PQR_ASSOC, IA32_QM_EVTSEL MSRs before reading QoS data from the IA32_QM_CTR MSR.
Table 3-24. Mapping of Brand Indices; and Intel 64 and IA-32 Processor Brand Strings
Brand Index 00H 01H 02H 03H 04H 06H 07H 08H 09H 0AH 0BH 0CH 0EH 0FH 11H 12H 13H 14H 15H 16H 17H 18H 0FFH Intel(R) Celeron(R) processor1 Intel(R) Pentium(R) III processor1 Intel(R) Pentium(R) III Xeon(R) processor; If processor signature = 000006B1h, then Intel(R) Celeron(R) processor Intel(R) Pentium(R) III processor Mobile Intel(R) Pentium(R) III processor-M Mobile Intel(R) Celeron(R) processor1 Intel(R) Pentium(R) 4 processor Intel(R) Pentium(R) 4 processor Intel(R) Celeron(R) processor1 Intel(R) Xeon(R) processor; If processor signature = 00000F13h, then Intel(R) Xeon(R) processor MP Intel(R) Xeon(R) processor MP Mobile Intel(R) Pentium(R) 4 processor-M; If processor signature = 00000F13h, then Intel(R) Xeon(R) processor Mobile Intel(R) Celeron(R) processor1 Mobile Genuine Intel(R) processor Intel(R) Celeron(R) M processor Mobile Intel(R) Celeron(R) processor1 Intel(R) Celeron(R) processor Mobile Genuine Intel(R) processor Intel(R) Pentium(R) M processor Mobile Intel(R) Celeron(R) processor1 RESERVED Brand String This processor does not support the brand identification feature
44
Table 3-24. Mapping of Brand Indices; and Intel 64 and IA-32 Processor Brand Strings
NOTES: 1. Indicates versions of these processors that were introduced after the Pentium III
Operation
IA32_BIOS_SIGN_ID MSR Update with installed microcode revision number; CASE (EAX) OF EAX = 0: EAX Highest basic function input value understood by CPUID; EBX Vendor identification string; EDX Vendor identification string; ECX Vendor identification string; BREAK; EAX = 1H: EAX[3:0] Stepping ID; EAX[7:4] Model; EAX[11:8] Family; EAX[13:12] Processor type; EAX[15:14] Reserved; EAX[19:16] Extended Model; EAX[27:20] Extended Family; EAX[31:28] Reserved; EBX[7:0] Brand Index; (* Reserved if the value is zero. *) EBX[15:8] CLFLUSH Line Size; EBX[16:23] Reserved; (* Number of threads enabled = 2 if MT enable fuse set. *) EBX[24:31] Initial APIC ID; ECX Feature flags; (* See Figure 3-6. *) EDX Feature flags; (* See Figure 3-7. *) BREAK; EAX = 2H: EAX Cache and TLB information; EBX Cache and TLB information; ECX Cache and TLB information; EDX Cache and TLB information; BREAK; EAX = 3H: EAX Reserved; EBX Reserved; ECX ProcessorSerialNumber[31:0]; (* Pentium III processors only, otherwise reserved. *) EDX ProcessorSerialNumber[63:32]; (* Pentium III processors only, otherwise reserved. * BREAK EAX = 4H:
45
EAX Deterministic Cache Parameters Leaf; (* See Table 3-17. *) EBX Deterministic Cache Parameters Leaf; ECX Deterministic Cache Parameters Leaf; EDX Deterministic Cache Parameters Leaf; BREAK; EAX = 5H: EAX MONITOR/MWAIT Leaf; (* See Table 3-17. *) EBX MONITOR/MWAIT Leaf; ECX MONITOR/MWAIT Leaf; EDX MONITOR/MWAIT Leaf; BREAK; EAX = 6H: EAX Thermal and Power Management Leaf; (* See Table 3-17. *) EBX Thermal and Power Management Leaf; ECX Thermal and Power Management Leaf; EDX Thermal and Power Management Leaf; BREAK; EAX = 7H: EAX Structured Extended Feature Flags Enumeration Leaf; (* See Table 3-17. *) EBX Structured Extended Feature Flags Enumeration Leaf; ECX Structured Extended Feature Flags Enumeration Leaf; EDX Structured Extended Feature Flags Enumeration Leaf; BREAK; EAX = 8H: EAX Reserved = 0; EBX Reserved = 0; ECX Reserved = 0; EDX Reserved = 0; BREAK; EAX = 9H: EAX Direct Cache Access Information Leaf; (* See Table 3-17. *) EBX Direct Cache Access Information Leaf; ECX Direct Cache Access Information Leaf; EDX Direct Cache Access Information Leaf; BREAK; EAX = AH: EAX Architectural Performance Monitoring Leaf; (* See Table 3-17. *) EBX Architectural Performance Monitoring Leaf; ECX Architectural Performance Monitoring Leaf; EDX Architectural Performance Monitoring Leaf; BREAK EAX = BH: EAX Extended Topology Enumeration Leaf; (* See Table 3-17. *) EBX Extended Topology Enumeration Leaf; ECX Extended Topology Enumeration Leaf; EDX Extended Topology Enumeration Leaf; BREAK; EAX = CH: EAX Reserved = 0; EBX Reserved = 0;
46
ECX Reserved = 0; EDX Reserved = 0; BREAK; EAX = DH: EAX Processor Extended State Enumeration Leaf; (* See Table 3-17. *) EBX Processor Extended State Enumeration Leaf; ECX Processor Extended State Enumeration Leaf; EDX Processor Extended State Enumeration Leaf; BREAK; EAX = EH: EAX Reserved = 0; EBX Reserved = 0; ECX Reserved = 0; EDX Reserved = 0; BREAK; EAX = FH: EAX Quality of Service Enumeration Leaf; (* See Table 3-17. *) EBX Quality of Service Enumeration Leaf; ECX Quality of Service Enumeration Leaf; EDX Quality of Service Enumeration Leaf; BREAK; BREAK; EAX = 80000000H: EAX Highest extended function input value understood by CPUID; EBX Reserved; ECX Reserved; EDX Reserved; BREAK; EAX = 80000001H: EAX Reserved; EBX Reserved; ECX Extended Feature Bits (* See Table 3-17.*); EDX Extended Feature Bits (* See Table 3-17. *); BREAK; EAX = 80000002H: EAX Processor Brand String; EBX Processor Brand String, continued; ECX Processor Brand String, continued; EDX Processor Brand String, continued; BREAK; EAX = 80000003H: EAX Processor Brand String, continued; EBX Processor Brand String, continued; ECX Processor Brand String, continued; EDX Processor Brand String, continued; BREAK; EAX = 80000004H: EAX Processor Brand String, continued; EBX Processor Brand String, continued; ECX Processor Brand String, continued;
47
EDX Processor Brand String, continued; BREAK; EAX = 80000005H: EAX Reserved = 0; EBX Reserved = 0; ECX Reserved = 0; EDX Reserved = 0; BREAK; EAX = 80000006H: EAX Reserved = 0; EBX Reserved = 0; ECX Cache information; EDX Reserved = 0; BREAK; EAX = 80000007H: EAX Reserved = 0; EBX Reserved = 0; ECX Reserved = 0; EDX Reserved = Misc Feature Flags; BREAK; EAX = 80000008H: EAX Reserved = Physical Address Size Information; EBX Reserved = Virtual Address Size Information; ECX Reserved = 0; EDX Reserved = 0; BREAK; EAX >= 40000000H and EAX <= 4FFFFFFFH: DEFAULT: (* EAX = Value outside of recognized range for CPUID. *) (* If the highest basic information leaf data depend on ECX input value, ECX is honored.*) EAX Reserved; (* Information returned for highest basic information leaf. *) EBX Reserved; (* Information returned for highest basic information leaf. *) ECX Reserved; (* Information returned for highest basic information leaf. *) EDX Reserved; (* Information returned for highest basic information leaf. *) BREAK; ESAC; ...
48
Convert two packed signed doubleword integers from xmm2/m128 to two packed double-precision floating-point values in xmm1. Convert two packed signed doubleword integers from xmm2/mem to two packed double-precision floating-point values in xmm1. Convert four packed signed doubleword integers from xmm2/mem to four packed double-precision floating-point values in ymm1.
RM
V/V
AVX
RM
V/V
AVX
Description
Converts two packed signed doubleword integers in the source operand (second operand) to two packed doubleprecision floating-point values in the destination operand (first operand). In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding XMM register destination are unmodified. VEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed. VEX.256 encoded version: The source operand is a YMM register or 128- bit memory location. The destination operation is a YMM register. Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
49
SRC
X3
X2
X1
X0
DEST
X3
X2
X1
X0
Other Exceptions
See Exceptions Type 5; additionally #UD ... If VEX.vvvv != 1111B.
50
Convert two packed double-precision floatingpoint values from xmm2/m128 to two packed signed doubleword integers in xmm1. Convert two packed double-precision floatingpoint values in xmm2/mem to two signed doubleword integers in xmm1. Convert four packed double-precision floatingpoint values in ymm2/mem to four signed doubleword integers in xmm1.
Description
Converts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The result is stored in the low quadword of the destination operand and the high quadword is cleared to all 0s. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned. In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:64) of the corresponding YMM register destination are zeroed. VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed. Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. ...
51
Convert two packed double-precision floatingpoint values from xmm2/m128 to two packed signed doubleword integers in xmm1 using truncation. Convert two packed double-precision floatingpoint values in xmm2/mem to two signed doubleword integers in xmm1 using truncation. Convert four packed double-precision floatingpoint values in ymm2/mem to four signed doubleword integers in xmm1 using truncation.
RM
V/V
AVX
RM
V/V
AVX
Description
Converts two or four packed double-precision floating-point values in the source operand (second operand) to two or four packed signed doubleword integers in the destination operand (first operand). When a conversion is inexact, a truncated (round toward zero) value is returned.If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned. In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed. VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed. Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. ...
52
Selectively multiply packed DP floating-point values from xmm1 with packed DP floatingpoint values from xmm2, add and selectively store the packed DP floating-point values to xmm1. Selectively multiply packed DP floating-point values from xmm2 with packed DP floatingpoint values from xmm3, add and selectively store the packed DP floating-point values to xmm1.
RVMI V/V
AVX
Description
Conditionally multiplies the packed double-precision floating-point values in the destination operand (first operand) with the packed double-precision floating-point values in the source (second operand) depending on a mask extracted from bits [5:4] of the immediate operand (third operand). If a condition mask bit is zero, the corresponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel 64 and IA-32 Architectures Software Developers Manual, Volume 1. The two resulting double-precision values are summed into an intermediate result. The intermediate result is conditionally broadcasted to the destination using a broadcast mask specified by bits [1:0] of the immediate byte. If a broadcast mask bit is "1", the intermediate result is copied to the corresponding qword element in the destination operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero. DPPD follows the NaN forwarding rules stated in the Software Developers Manual, vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally generated NaNs will have at least one NaN propagated to the destination. 128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed. If VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.
53
Operation
DP_primitive (SRC1, SRC2) IF (imm8[4] = 1) THEN Temp1[63:0] DEST[63:0] * SRC[63:0]; // update SIMD exception flags ELSE Temp1[63:0] +0.0; FI; IF (imm8[5] = 1) THEN Temp1[127:64] DEST[127:64] * SRC[127:64]; // update SIMD exception flags ELSE Temp1[127:64] +0.0; FI; /* if unmasked expection reported, execute exception handler*/ Temp2[63:0] Temp1[63:0] + Temp1[127:64]; // update SIMD exception flags /* if unmasked expection reported, execute exception handler*/ IF (imm8[0] = 1) THEN DEST[63:0] Temp2[63:0]; ELSE DEST[63:0] +0.0; FI; IF (imm8[1] = 1) THEN DEST[127:64] Temp2[63:0]; ELSE DEST[127:64] +0.0; FI; DPPD (128-bit Legacy SSE version) DEST[127:0]DP_Primitive(SRC1[127:0], SRC2[127:0]); DEST[VLMAX-1:128] (Unmodified) VDPPD (VEX.128 encoded version) DEST[127:0]DP_Primitive(SRC1[127:0], SRC2[127:0]); DEST[VLMAX-1:128] 0
Flags Affected
None
Other Exceptions
See Exceptions Type 2; additionally #UD ... If VEX.L= 1.
54
Selectively multiply packed SP floating-point values from xmm1 with packed SP floatingpoint values from xmm2, add and selectively store the packed SP floating-point values or zero values to xmm1. Multiply packed SP floating point values from xmm1 with packed SP floating point values from xmm2/mem selectively add and store to xmm1. Multiply packed single-precision floating-point values from ymm2 with packed SP floating point values from ymm3/mem, selectively add pairs of elements and store to ymm1.
RVMI V/V
AVX
RVMI V/V
AVX
Description
Conditionally multiplies the packed single precision floating-point values in the destination operand (first operand) with the packed single-precision floats in the source (second operand) depending on a mask extracted from the high 4 bits of the immediate byte (third operand). If a condition mask bit in Imm8[7:4] is zero, the corresponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel 64 and IA-32 Architectures Software Developers Manual, Volume 1. The four resulting single-precision values are summed into an intermediate result. The intermediate result is conditionally broadcasted to the destination using a broadcast mask specified by bits [3:0] of the immediate byte. If a broadcast mask bit is "1", the intermediate result is copied to the corresponding dword element in the destination operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero. DPPS follows the NaN forwarding rules stated in the Software Developers Manual, vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally generated NaNs will have at least one NaN propagated to the destination. 128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed. VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.
55
Operation
DP_primitive (SRC1, SRC2) IF (imm8[4] = 1) THEN Temp1[31:0] DEST[31:0] * SRC[31:0]; // update SIMD exception flags ELSE Temp1[31:0] +0.0; FI; IF (imm8[5] = 1) THEN Temp1[63:32] DEST[63:32] * SRC[63:32]; // update SIMD exception flags ELSE Temp1[63:32] +0.0; FI; IF (imm8[6] = 1) THEN Temp1[95:64] DEST[95:64] * SRC[95:64]; // update SIMD exception flags ELSE Temp1[95:64] +0.0; FI; IF (imm8[7] = 1) THEN Temp1[127:96] DEST[127:96] * SRC[127:96]; // update SIMD exception flags ELSE Temp1[127:96] +0.0; FI; Temp2[31:0] Temp1[31:0] + Temp1[63:32]; // update SIMD exception flags /* if unmasked expection reported, execute exception handler*/ Temp3[31:0] Temp1[95:64] + Temp1[127:96]; // update SIMD exception flags /* if unmasked expection reported, execute exception handler*/ Temp4[31:0] Temp2[31:0] + Temp3[31:0]; // update SIMD exception flags /* if unmasked expection reported, execute exception handler*/ IF (imm8[0] = 1) THEN DEST[31:0] Temp4[31:0]; ELSE DEST[31:0] +0.0; FI; IF (imm8[1] = 1) THEN DEST[63:32] Temp4[31:0]; ELSE DEST[63:32] +0.0; FI; IF (imm8[2] = 1) THEN DEST[95:64] Temp4[31:0]; ELSE DEST[95:64] +0.0; FI; IF (imm8[3] = 1) THEN DEST[127:96] Temp4[31:0]; ELSE DEST[127:96] +0.0; FI; DPPS (128-bit Legacy SSE version) DEST[127:0]DP_Primitive(SRC1[127:0], SRC2[127:0]); DEST[VLMAX-1:128] (Unmodified) VDPPS (VEX.128 encoded version) DEST[127:0]DP_Primitive(SRC1[127:0], SRC2[127:0]); DEST[VLMAX-1:128] 0 VDPPS (VEX.256 encoded version) DEST[127:0]DP_Primitive(SRC1[127:0], SRC2[127:0]); DEST[255:128]DP_Primitive(SRC1[255:128], SRC2[255:128]);
Flags Affected
None
56
Other Exceptions
See Exceptions Type 2. ...
Description
The FST instruction copies the value in the ST(0) register to the destination operand, which can be a memory location or another register in the FPU register stack. When storing the value in memory, the value is converted to single-precision or double-precision floating-point format. The FSTP instruction performs the same operation as the FST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FSTP instruction can also store values in memory in double extended-precision floating-point format. If the destination operand is a memory location, the operand specifies the address where the first byte of the destination value is to be stored. If the destination operand is a register, the operand specifies a register in the register stack relative to the top of the stack. If the destination size is single-precision or double-precision, the significand of the value being stored is rounded to the width of the destination (according to the rounding mode specified by the RC field of the FPU control word), and the exponent is converted to the width and bias of the destination format. If the value being stored is too large for the destination format, a numeric overflow exception (#O) is generated and, if the exception is unmasked, no value is stored in the destination operand. If the value being stored is a denormal value, the denormal exception (#D) is not generated. This condition is simply signaled as a numeric underflow exception (#U) condition. If the value being stored is 0, , or a NaN, the least-significant bits of the significand and the exponent are truncated to fit the destination format. This operation preserves the values identity as a 0, , or NaN.
57
If the destination operand is a non-empty register, the invalid-operation exception is not generated. This instructions operation is the same in non-64-bit modes and 64-bit mode.
Operation
DEST ST(0); IF Instruction = FSTP THEN PopRegisterStack; FI;
Floating-Point Exceptions
#IS #IA #U #O #P Stack underflow occurred. If destination result is an SNaN value or unsupported format, except when the destination format is in double extended-precision floating-point format. Result is too small for the destination format. Result is too large for the destination format. Value cannot be represented exactly in destination format.
58
Description
Saves the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location specified in the destination operand. The content layout of the 512 byte region depends on whether the processor is operating in non-64-bit operating modes or 64-bit sub-mode of IA-32e mode. Bytes 464:511 are available to software use. The processor does not write to bytes 464:511 of an FXSAVE area. The operation of FXSAVE in non-64-bit modes is described first.
59
Table 3-53. Non-64-bit-Mode Layout of FXSAVE and FXRSTOR Memory Region (Contd.)
15 14 13 12 11 10 9 8 7 XMM5 XMM6 XMM7 Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Available Available Available 6 5 4 3 2 1 0 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
The destination operand contains the first byte of the memory image, and it must be aligned on a 16-byte boundary. A misaligned destination operand will result in a general-protection (#GP) exception being generated (or in some cases, an alignment check exception [#AC]). The FXSAVE instruction is used when an operating system needs to perform a context switch or when an exception handler needs to save and examine the current state of the x87 FPU, MMX technology, and/or XMM and MXCSR registers. The fields in Table 3-53 are defined in Table 3-54.
60
MXCSR_ MASK
The FXSAVE instruction saves an abridged version of the x87 FPU tag word in the FTW field (unlike the FSAVE instruction, which saves the complete tag word). The tag information is saved in physical register order (R0 through R7), rather than in top-of-stack (TOS) order. With the FXSAVE instruction, however, only a single bit (1 for valid or 0 for empty) is saved for each tag. For example, assume that the tag word is currently set as follows: R7 R6 R5 R4 R3 R2 R1 R0 11 xx xx xx 11 11 11 11 Here, 11B indicates empty stack elements and xx indicates valid (00B), zero (01B), or special (10B). For this example, the FXSAVE instruction saves only the following 8 bits of information:
61
R7 R6 R5 R4 R3 R2 R1 R0 0 1 1 1 0 0 0 0 Here, a 1 is saved for any valid, zero, or special tag, and a 0 is saved for any empty tag. The operation of the FXSAVE instruction differs from that of the FSAVE instruction, the as follows: FXSAVE instruction does not check for pending unmasked floating-point exceptions. (The FXSAVE operation in this regard is similar to the operation of the FNSAVE instruction). After the FXSAVE instruction has saved the state of the x87 FPU, MMX technology, XMM, and MXCSR registers, the processor retains the contents of the registers. Because of this behavior, the FXSAVE instruction cannot be used by an application program to pass a clean x87 FPU state to a procedure, since it retains the current state. To clean the x87 FPU state, an application must explicitly execute an FINIT instruction after an FXSAVE instruction to reinitialize the x87 FPU state. The format of the memory image saved with the FXSAVE instruction is the same regardless of the current addressing mode (32-bit or 16-bit) and operating mode (protected, real address, or system management). This behavior differs from the FSAVE instructions, where the memory image format is different depending on the addressing mode and operating mode. Because of the different image formats, the memory image saved with the FXSAVE instruction cannot be restored correctly with the FRSTOR instruction, and likewise the state saved with the FSAVE instruction cannot be restored correctly with the FXRSTOR instruction.
The FSAVE format for FTW can be recreated from the FTW valid bits and the stored 80-bit FP data (assuming the stored data was not the contents of MMX technology registers) using Table 3-55.
The J-bit is defined to be the 1-bit binary integer to the left of the decimal place in the significand. The M-bit is defined to be the most significant bit of the fractional portion of the significand (i.e., the bit immediately to the right of the decimal place). When the M-bit is the most significant bit of the fractional portion of the significand, it must be 0 if the fraction is all 0s.
62
there are two different layouts of the FXSAVE map in 64-bit mode, corresponding to FXSAVE64 (which requires REX.W=1) and FXSAVE (REX.W=0). In the FXSAVE64 map (Table 3-56), the FPU IP and FPU DP pointers are 64bit wide. In the FXSAVE map for 64-bit mode (Table 3-57), the FPU IP and FPU DP pointers are 32-bits.
FPU IP MXCSR_MASK Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved MXCSR
FOP
FTW FPU DP
FSW
FCW
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
ST0/MM0 ST1/MM1 ST2/MM2 ST3/MM3 ST4/MM4 ST5/MM5 ST6/MM6 ST7/MM7 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 Reserved Reserved Reserved Available Available Available
63
FPU CS
FPU IP MXCSR
FOP
Reserved
FTW
FSW
FCW
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
FPU DP
XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 Reserved Reserved Reserved Available Available Available ...
64
MOVMove
Opcode 88 /r REX + 88 /r 89 /r 89 /r REX.W + 89 /r 8A /r REX + 8A /r 8B /r 8B /r REX.W + 8B /r 8C /r REX.W + 8C /r 8E /r REX.W + 8E /r A0 REX.W + A0 A1 A1 REX.W + A1 A2 REX.W + A2 A3 A3 REX.W + A3 B0+ rb ib REX + B0+ rb ib B8+ rw iw B8+ rd id REX.W + B8+ rd io C6 /0 ib Instruction MOV r/m8,r8 MOV r/m8
***,
Op/ En MR r8
***
64-Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid
Compat/ Description Leg Mode Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid Valid Valid Valid Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid Move r8 to r/m8. Move r8 to r/m8. Move r16 to r/m16. Move r32 to r/m32. Move r64 to r/m64. Move r/m8 to r8. Move r/m8 to r8. Move r/m16 to r16. Move r/m32 to r32. Move r/m64 to r64. Move segment register to r/m16. Move zero extended 16-bit segment register to r/m64. Move r/m16 to segment register. Move lower 16 bits of r/m64 to segment register. Move byte at (seg:offset) to AL. Move byte at (offset) to AL. Move word at (seg:offset) to AX. Move doubleword at (seg:offset) to EAX. Move quadword at (offset) to RAX. Move AL to (seg:offset). Move AL to (offset). Move AX to (seg:offset). Move EAX to (seg:offset). Move RAX to (offset). Move imm8 to r8. Move imm8 to r8. Move imm16 to r16. Move imm32 to r32. Move imm64 to r64. Move imm8 to r/m8.
MR MR MR MR RM RM RM RM RM MR MR RM RM FD FD FD FD FD TD TD TD TD TD OI OI OI OI OI MI
MOV r/m16,r16 MOV r/m32,r32 MOV r/m64,r64 MOV r8,r/m8 MOV r8***,r/m8*** MOV r16,r/m16 MOV r32,r/m32 MOV r64,r/m64 MOV r/m16,Sreg** MOV r/m64,Sreg** MOV Sreg,r/m16** MOV Sreg,r/m64** MOV AL,moffs8* MOV AL,moffs8* MOV AX,moffs16* MOV EAX,moffs32* MOV RAX,moffs64* MOV moffs8,AL MOV moffs8***,AL MOV moffs16*,AX MOV moffs32*,EAX MOV moffs64*,RAX MOV r8, imm8 MOV r8***, imm8 MOV r16, imm16 MOV r32, imm32 MOV r64, imm64 MOV r/m8, imm8
65
REX + C6 /0 ib C7 /0 iw C7 /0 id REX.W + C7 /0 io
MOV r/m8***, imm8 MOV r/m16, imm16 MOV r/m32, imm32 MOV r/m64, imm32
MI MI MI MI
Move imm8 to r/m8. Move imm16 to r/m16. Move imm32 to r/m32. Move imm32 sign extended to 64-bits to r/ m64.
NOTES: * The moffs8, moffs16, moffs32 and moffs64 operands specify a simple offset relative to the segment base, where 8, 16, 32 and 64 refer to the size of the data. The address-size attribute of the instruction determines the size of the offset, either 16, 32 or 64 bits. ** In 32-bit mode, the assembler may insert the 16-bit operand-size prefix with this instruction (see the following Description section for further information). ***In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.
Description
Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword. The MOV instruction cannot be used to load the CS register. Attempting to do so results in an invalid opcode exception (#UD). To load the CS register, use the far JMP, CALL, or RET instruction. If the destination operand is a segment register (DS, ES, FS, GS, or SS), the source operand must be a valid segment selector. In protected mode, moving a segment selector into a segment register automatically causes the segment descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register. While loading this information, the segment selector and segment descriptor information is validated (see the Operation algorithm below). The segment descriptor data is obtained from the GDT or LDT entry for the specified segment selector. A NULL segment selector (values 0000-0003) can be loaded into the DS, ES, FS, and GS registers without causing a protection exception. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a NULL value causes a general protection exception (#GP) and no memory reference occurs. ...
66
Description
Moves the low quadword from the source operand (second operand) to the destination operand (first operand). The source operand is an XMM register and the destination operand is an MMX technology register. This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the MOVDQ2Q instruction is executed. In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
Operation
DEST SRC[63:0];
67
MOVQMove Quadword
Opcode/ Instruction 0F 6F /r MOVQ mm, mm/m64 0F 7F /r MOVQ mm/m64, mm F3 0F 7E /r MOVQ xmm1, xmm2/m64 VEX.128.F3.0F.WIG 7E /r VMOVQ xmm1, xmm2 VEX.128.F3.0F.WIG 7E /r VMOVQ xmm1, m64 66 0F D6 /r MOVQ xmm2/m64, xmm1 VEX.128.66.0F.WIG D6 /r VMOVQ xmm1/m64, xmm2 MR V/V AVX MR V/V SSE2 Move quadword from xmm1 to xmm2/ mem64. Move quadword from xmm2 register to xmm1/m64. RM V/V AVX Load quadword from m64 to xmm1. RM V/V AVX RM V/V SSE2 Move quadword from xmm2/mem64 to xmm1. Move quadword from xmm2 to xmm1. MR V/V MMX Move quadword from mm to mm/m64. Op/ En RM 64/32-bit CPUID Mode Feature Flag V/V MMX Description
Description
Copies a quadword from the source operand (second operand) to the destination operand (first operand). The source and destination operands can be MMX technology registers, XMM registers, or 64-bit memory locations. This instruction can be used to move a quadword between two MMX technology registers or between an MMX technology register and a 64-bit memory location, or to move data between two XMM registers or between an XMM register and a 64-bit memory location. The instruction cannot be used to transfer data between memory locations. When the source operand is an XMM register, the low quadword is moved; when the destination operand is an XMM register, the quadword is stored to the low quadword of the register, and the high quadword is cleared to all 0s. In 64-bit mode, use of the REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). Note: In VEX.128.66.0F D6 instruction version, VEX.vvvv and VEX.L=1 are reserved and the former must be 1111b otherwise instructions will #UD.
68
Note: In VEX.128.F3.0F 7E version, VEX.vvvv and VEX.L=1 are reserved and the former must be 1111b, otherwise instructions will #UD.
Operation
MOVQ instruction when operating on MMX technology registers and memory locations: DEST SRC; MOVQ instruction when source and destination operands are XMM registers: DEST[63:0] SRC[63:0]; DEST[127:64] 0000000000000000H; MOVQ instruction when source operand is XMM register and destination operand is memory location: DEST SRC[63:0]; MOVQ instruction when source operand is memory location and destination operand is XMM register: DEST[63:0] SRC; DEST[127:64] 0000000000000000H; VMOVQ (VEX.NDS.128.F3.0F 7E) with XMM register source and destination: DEST[63:0] SRC[63:0] DEST[VLMAX-1:64] 0 VMOVQ (VEX.128.66.0F D6) with XMM register source and destination: DEST[63:0] SRC[63:0] DEST[VLMAX-1:64] 0 VMOVQ (7E) with memory source: DEST[63:0] SRC[63:0] DEST[VLMAX-1:64] 0 VMOVQ (D6) with memory dest: DEST[63:0] SRC2[63:0]
Flags Affected
None.
Other Exceptions
See Table 22-8, Exception Conditions for Legacy SIMD/MMX Instructions without FP Exception, in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3B. ...
69
Description
Moves the quadword from the source operand (second operand) to the low quadword of the destination operand (first operand). The source operand is an MMX technology register and the destination operand is an XMM register. This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the MOVQ2DQ instruction is executed. In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
Operation
DEST[63:0] SRC[63:0]; DEST[127:64] 00000000000000000H;
70
Concatenate destination and source operands, extract byte-aligned result shifted to the right by constant value in imm8 into mm1. Concatenate destination and source operands, extract byte-aligned result shifted to the right by constant value in imm8 into xmm1 Concatenate xmm2 and xmm3/m128, extract byte aligned result shifted to the right by constant value in imm8 and result is stored in xmm1.
RMI
V/V
SSSE3
RVMI V/V
AVX
NOTES: 1. See note in Section 2.4, Instruction Exception Specification in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A and Section 22.25.3, Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A.
Description
PALIGNR concatenates the destination operand (the first operand) and the source operand (the second operand) into an intermediate composite, shifts the composite at byte granularity to the right by a constant immediate, and extracts the right-aligned result into the destination. The first and the second operands can be an MMX or an XMM register. The immediate value is considered unsigned. Immediate shift counts larger than the 2L (i.e. 32 for 128bit operands, or 16 for 64-bit operands) produce a zero result. Both operands can be MMX register or XMM registers. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated. In 64-bit mode, use the REX prefix to access additional registers. 128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.
71
Operation
PALIGNR (with 64-bit operands) temp1[127:0] = CONCATENATE(DEST,SRC)>>(imm8*8) DEST[63:0] = temp1[63:0] PALIGNR (with 128-bit operands) temp1[255:0] = CONCATENATE(DEST,SRC)>>(imm8*8) DEST[127:0] = temp1[127:0] VPALIGNR temp1[255:0] CONCATENATE(SRC1,SRC2)>>(imm8*8) DEST[127:0] temp1[127:0] DEST[VLMAX-1:128] 0
Other Exceptions
See Exceptions Type 4; additionally #UD ... If VEX.L = 1.
Compare packed bytes in mm/m64 and mm for equality. Compare packed bytes in xmm2/m128 and xmm1 for equality. Compare packed words in mm/m64 and mm for equality. Compare packed words in xmm2/m128 and xmm1 for equality. Compare packed doublewords in mm/m64 and mm for equality. Compare packed doublewords in xmm2/m128 and xmm1 for equality. Compare packed bytes in xmm3/m128 and xmm2 for equality.
72
VEX.NDS.128.66.0F.WIG 75 /r VPCMPEQW xmm1, xmm2, xmm3/m128 VEX.NDS.128.66.0F.WIG 76 /r VPCMPEQD xmm1, xmm2, xmm3/m128 NOTES:
AVX AVX
Compare packed words in xmm3/m128 and xmm2 for equality. Compare packed doublewords in xmm3/m128 and xmm2 for equality.
1. See note in Section 2.4, Instruction Exception Specification in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A and Section 22.25.3, Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A.
Description
Performs a SIMD compare for equality of the packed bytes, words, or doublewords in the destination operand (first operand) and the source operand (second operand). If a pair of data elements is equal, the corresponding data element in the destination operand is set to all 1s; otherwise, it is set to all 0s. The source operand can be an MMX technology register or a 64-bit memory location, or it can be an XMM register or a 128-bit memory location. The destination operand can be an MMX technology register or an XMM register. The PCMPEQB instruction compares the corresponding bytes in the destination and source operands; the PCMPEQW instruction compares the corresponding words in the destination and source operands; and the PCMPEQD instruction compares the corresponding doublewords in the destination and source operands. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD. ...
73
Insert a byte integer value from r32/m8 into xmm1 at the destination element in xmm1 specified by imm8. Insert a dword integer value from r/m32 into the xmm1 at the destination element specified by imm8. Insert a qword integer value from r/m64 into the xmm1 at the destination element specified by imm8. Merge a byte integer value from r32/m8 and rest from xmm2 into xmm1 at the byte offset in imm8. Insert a dword integer value from r32/m32 and rest from xmm2 into xmm1 at the dword offset in imm8. Insert a qword integer value from r64/m64 and rest from xmm2 into xmm1 at the qword offset in imm8.
RMI
V/V
SSE4_1
RMI
V/N. E.
SSE4_1
RVMI V1/V
AVX
RVMI V/V
AVX
RVMI V/I
AVX
Description
Copies a byte/dword/qword from the source operand (second operand) and inserts it in the destination operand (first operand) at the location specified with the count operand (third operand). (The other elements in the destination register are left untouched.) The source operand can be a general-purpose register or a memory location. (When the source operand is a general-purpose register, PINSRB copies the low byte of the register.) The destination operand is an XMM register. The count operand is an 8-bit immediate. When specifying a qword[dword, byte] location in an an XMM register, the [2, 4] least-significant bit(s) of the count operand specify the location. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15, R8-15). Use of REX.W permits the use of 64 bit general purpose registers. 128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD. Attempt to execute VPINSRQ in non-64-bit mode will cause #UD.
74
Operation
CASE OF PINSRB: SEL COUNT[3:0]; MASK (0FFH << (SEL * 8)); TEMP (((SRC[7:0] << (SEL *8)) AND MASK); PINSRD: SEL COUNT[1:0]; MASK (0FFFFFFFFH << (SEL * 32)); TEMP (((SRC << (SEL *32)) AND MASK) ; PINSRQ: SEL COUNT[0] MASK (0FFFFFFFFFFFFFFFFH << (SEL * 64)); TEMP (((SRC << (SEL *32)) AND MASK) ; ESAC; DEST ((DEST AND NOT MASK) OR TEMP); VPINSRB (VEX.128 encoded version) SEL imm8[3:0] DEST[127:0] write_b_element(SEL, SRC2, SRC1) DEST[VLMAX-1:128] 0 VPINSRD (VEX.128 encoded version) SEL imm8[1:0] DEST[127:0] write_d_element(SEL, SRC2, SRC1) DEST[VLMAX-1:128] 0 VPINSRQ (VEX.128 encoded version) SEL imm8[0] DEST[127:0] write_q_element(SEL, SRC2, SRC1) DEST[VLMAX-1:128] 0
Flags Affected
None.
Other Exceptions
See Exceptions Type 5; additionally #UD If VEX.L = 1. If VPINSRQ in non-64-bit mode with VEX.W=1. ...
75
Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to MM1. Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to XMM1. Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to xmm1.
Description
PMADDUBSW multiplies vertically each unsigned byte of the destination operand (first operand) with the corresponding signed byte of the source operand (second operand), producing intermediate signed 16-bit integers. Each adjacent pair of signed words is added and the saturated result is packed to the destination operand. For example, the lowest-order bytes (bits 7-0) in the source and destination operands are multiplied and the intermediate signed word result is added with the corresponding intermediate result from the 2nd lowest-order bytes (bits 15-8) of the operands; the sign-saturated result is stored in the lowest word of the destination register (150). The same operation is performed on the other pairs of adjacent bytes. Both operands can be MMX register or XMM registers. When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated. In 64-bit mode, use the REX prefix to access additional registers. 128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.L must be 0, otherwise the instruction will #UD.
Operation
PMADDUBSW (with 64 bit operands) DEST[15-0] = SaturateToSignedWord(SRC[15-8]*DEST[15-8]+SRC[7-0]*DEST[7-0]); DEST[31-16] = SaturateToSignedWord(SRC[31-24]*DEST[31-24]+SRC[23-16]*DEST[23-16]); DEST[47-32] = SaturateToSignedWord(SRC[47-40]*DEST[47-40]+SRC[39-32]*DEST[39-32]); DEST[63-48] = SaturateToSignedWord(SRC[63-56]*DEST[63-56]+SRC[55-48]*DEST[55-48]);
76
PMADDUBSW (with 128 bit operands) DEST[15-0] = SaturateToSignedWord(SRC[15-8]* DEST[15-8]+SRC[7-0]*DEST[7-0]); // Repeat operation for 2nd through 7th word SRC1/DEST[127-112] = SaturateToSignedWord(SRC[127-120]*DEST[127-120]+ SRC[119-112]* DEST[119-112]); VPMADDUBSW (VEX.128 encoded version) DEST[15:0] SaturateToSignedWord(SRC2[15:8]* SRC1[15:8]+SRC2[7:0]*SRC1[7:0]) // Repeat operation for 2nd through 7th word DEST[127:112] SaturateToSignedWord(SRC2[127:120]*SRC1[127:120]+ SRC2[119:112]* SRC1[119:112]) DEST[VLMAX-1:128] 0
Other Exceptions
See Exceptions Type 4; additionally #UD ... If VEX.L = 1.
77
Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
Description
Copies words from the high quadword of the source operand (second operand) and inserts them in the high quadword of the destination operand (first operand) at word locations selected with the order operand (third operand). This operation is similar to the operation used by the PSHUFD instruction, which is illustrated in Figure 4-11. For the PSHUFHW instruction, each 2-bit field in the order operand selects the contents of one word location in the high quadword of the destination operand. The binary encodings of the order operand fields select words (0, 1, 2 or 3, 4) from the high quadword of the source operand to be copied to the destination operand. The low quadword of the source operand is copied to the low quadword of the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The order operand is an 8-bit immediate. Note that this instruction permits a word in the high quadword of the source operand to be copied to more than one word location in the high quadword of the destination operand. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.
Operation
PSHUFHW (128-bit Legacy SSE version) DEST[63:0] SRC[63:0] DEST[79:64] (SRC >> (imm[1:0] *16))[79:64] DEST[95:80] (SRC >> (imm[3:2] * 16))[79:64] DEST[111:96] (SRC >> (imm[5:4] * 16))[79:64] DEST[127:112] (SRC >> (imm[7:6] * 16))[79:64] DEST[VLMAX-1:128] (Unmodified) VPSHUFHW (VEX.128 encoded version) DEST[63:0] SRC1[63:0] DEST[79:64] (SRC1 >> (imm[1:0] *16))[79:64]
78
DEST[95:80] (SRC1 >> (imm[3:2] * 16))[79:64] DEST[111:96] (SRC1 >> (imm[5:4] * 16))[79:64] DEST[127:112] (SRC1 >> (imm[7:6] * 16))[79:64] DEST[VLMAX-1:128] 0
Flags Affected
None.
Other Exceptions
See Exceptions Type 4; additionally #UD If VEX.L = 1. If VEX.vvvv != 1111B. ...
79
Shift words in mm left mm/m64 while shifting in 0s. Shift words in xmm1 left by xmm2/m128 while shifting in 0s. Shift words in mm left by imm8 while shifting in 0s. Shift words in xmm1 left by imm8 while shifting in 0s. Shift doublewords in mm left by mm/m64 while shifting in 0s. Shift doublewords in xmm1 left by xmm2/ m128 while shifting in 0s. Shift doublewords in mm left by imm8 while shifting in 0s. Shift doublewords in xmm1 left by imm8 while shifting in 0s. Shift quadword in mm left by mm/m64 while shifting in 0s. Shift quadwords in xmm1 left by xmm2/m128 while shifting in 0s. Shift quadword in mm left by imm8 while shifting in 0s. Shift quadwords in xmm1 left by imm8 while shifting in 0s. Shift words in xmm2 left by amount specified in xmm3/m128 while shifting in 0s. Shift words in xmm2 left by imm8 while shifting in 0s. Shift doublewords in xmm2 left by amount specified in xmm3/m128 while shifting in 0s. Shift doublewords in xmm2 left by imm8 while shifting in 0s. Shift quadwords in xmm2 left by amount specified in xmm3/m128 while shifting in 0s.
80
VMI
V/V
AVX
1. See note in Section 2.4, Instruction Exception Specification in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A and Section 22.25.3, Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A.
Description
Shifts the bits in the individual data elements (words, doublewords, or quadword) in the destination operand (first operand) to the left by the number of bits specified in the count operand (second operand). As the bits in the data elements are shifted left, the empty low-order bits are cleared (set to 0). If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s. Figure 4-12 gives an example of shifting words in a 64-bit operand. The destination operand may be an MMX technology register or an XMM register; the count operand can be either an MMX technology register or an 64-bit memory location, an XMM register or a 128-bit memory location, or an 8-bit immediate. Note that only the first 64-bits of a 128-bit count operand are checked to compute the count. ...
81
Description
Decrements the stack pointer and then stores the source operand on the top of the stack. Address and operand sizes are determined and used as follows: Address size. The D flag in the current code-segment descriptor determines the default address size; it may be overridden by an instruction prefix (67H). The address size is used only when referencing a source operand in memory. Operand size. The D flag in the current code-segment descriptor determines the default operand size; it may be overridden by instruction prefixes (66H or REX.W). The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is decremented (2, 4 or 8). If the source operand is an immediate and its size is less than the operand size, a sign-extended value is pushed on the stack. If the source operand is a segment register (16 bits) and the operand size is greater than 16 bits, a zero-extended value is pushed on the stack.
82
Stack-address size. Outside of 64-bit mode, the B flag in the current stack-segment descriptor determines the size of the stack pointer (16 or 32 bits); in 64-bit mode, the size of the stack pointer is always 64 bits. The stack-address size determines the width of the stack pointer when writing to the stack in memory and when decrementing the stack pointer. (As stated above, the amount by which the stack pointer is decremented is determined by the operand size.) If the operand size is less than the stack-address size, the PUSH instruction may result in a misaligned stack pointer (a stack pointer that is not aligned on a doubleword or quadword boundary).
The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. If a PUSH instruction uses a memory operand in which the ESP register is used for computing the operand address, the address of the operand is computed before the ESP register is decremented. If the ESP or SP register is 1 when the PUSH instruction is executed in real-address mode, a stack-fault exception (#SS) is generated (because the limit of the stack segment is violated). Its delivery encounters a second stackfault exception (for the same reason), causing generation of a double-fault exception (#DF). Delivery of the double-fault exception encounters a third stack-fault exception, and the logical processor enters shutdown mode. See the discussion of the double-fault exception in Chapter 6 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A. ...
RCL/RCR/ROL/ROR-Rotate
Opcode** D0 /2 REX + D0 /2 D2 /2 REX + D2 /2 C0 /2 ib REX + C0 /2 ib D1 /2 D3 /2 C1 /2 ib D1 /2 REX.W + D1 /2 D3 /2 REX.W + D3 /2 C1 /2 ib REX.W + C1 /2 ib D0 /3 REX + D0 /3 D2 /3 REX + D2 /3 C0 /3 ib Instruction RCL r/m8, 1 RCL r/m8*, 1 RCL r/m8, CL RCL r/m8*, CL RCL r/m8, imm8 RCL r/m8*, imm8 RCL r/m16, 1 RCL r/m16, CL RCL r/m16, imm8 RCL r/m32, 1 RCL r/m64, 1 RCL r/m32, CL RCL r/m64, CL RCL r/m32, imm8 RCL r/m64, imm8 RCR r/m8, 1 RCR r/m8*, 1 RCR r/m8, CL RCR r/m8*, CL RCR r/m8, imm8 Op/ En M1 M1 MC MC MI MI M1 MC MI M1 M1 MC MC MI MI M1 M1 MC MC MI 64-Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat/ Description Leg Mode Valid N.E. Valid N.E. Valid N.E. Valid Valid Valid Valid N.E. Valid N.E. Valid N.E. Valid N.E. Valid N.E. Valid Rotate 9 bits (CF, r/m8) left once. Rotate 9 bits (CF, r/m8) left once. Rotate 9 bits (CF, r/m8) left CL times. Rotate 9 bits (CF, r/m8) left CL times. Rotate 9 bits (CF, r/m8) left imm8 times. Rotate 9 bits (CF, r/m8) left imm8 times. Rotate 17 bits (CF, r/m16) left once. Rotate 17 bits (CF, r/m16) left CL times. Rotate 17 bits (CF, r/m16) left imm8 times. Rotate 33 bits (CF, r/m32) left once. Rotate 65 bits (CF, r/m64) left once. Uses a 6 bit count. Rotate 33 bits (CF, r/m32) left CL times. Rotate 65 bits (CF, r/m64) left CL times. Uses a 6 bit count. Rotate 33 bits (CF, r/m32) left imm8 times. Rotate 65 bits (CF, r/m64) left imm8 times. Uses a 6 bit count. Rotate 9 bits (CF, r/m8) right once. Rotate 9 bits (CF, r/m8) right once. Rotate 9 bits (CF, r/m8) right CL times. Rotate 9 bits (CF, r/m8) right CL times. Rotate 9 bits (CF, r/m8) right imm8 times.
83
RCR r/m8*, imm8 RCR r/m16, 1 RCR r/m16, CL RCR r/m16, imm8 RCR r/m32, 1 RCR r/m64, 1 RCR r/m32, CL RCR r/m64, CL RCR r/m32, imm8 RCR r/m64, imm8 ROL r/m8, 1 ROL r/m8*, 1 ROL r/m8, CL ROL r/m8*, CL ROL r/m8, imm8
MI M1 MC MI M1 M1 MC MC MI MI M1 M1 MC MC MI
Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid
N.E. Valid Valid Valid Valid N.E. Valid N.E. Valid N.E. Valid N.E. Valid N.E. Valid
Rotate 9 bits (CF, r/m8) right imm8 times. Rotate 17 bits (CF, r/m16) right once. Rotate 17 bits (CF, r/m16) right CL times. Rotate 17 bits (CF, r/m16) right imm8 times. Rotate 33 bits (CF, r/m32) right once. Uses a 6 bit count. Rotate 65 bits (CF, r/m64) right once. Uses a 6 bit count. Rotate 33 bits (CF, r/m32) right CL times. Rotate 65 bits (CF, r/m64) right CL times. Uses a 6 bit count. Rotate 33 bits (CF, r/m32) right imm8 times. Rotate 65 bits (CF, r/m64) right imm8 times. Uses a 6 bit count. Rotate 8 bits r/m8 left once. Rotate 8 bits r/m8 left once Rotate 8 bits r/m8 left CL times. Rotate 8 bits r/m8 left CL times. Rotate 8 bits r/m8 left imm8 times.
84
Opcode** REX + C0 /0 ib D1 /0 D3 /0 C1 /0 ib D1 /0
Instruction ROL r/m8*, imm8 ROL r/m16, 1 ROL r/m16, CL ROL r/m16, imm8 ROL r/m32, 1
Op/ En MI M1 MC MI M1
Compat/ Description Leg Mode N.E. Valid Valid Valid Valid Rotate 8 bits r/m8 left imm8 times. Rotate 16 bits r/m16 left once. Rotate 16 bits r/m16 left CL times. Rotate 16 bits r/m16 left imm8 times. Rotate 32 bits r/m32 left once.
ROL r/m64, 1 ROL r/m32, CL ROL r/m64, CL ROL r/m32, imm8 ROL r/m64, imm8 ROR r/m8, 1 ROR r/m8*, 1 ROR r/m8, CL ROR r/m8*, CL ROR r/m8, imm8 ROR r/m8*, imm8 ROR r/m16, 1 ROR r/m16, CL ROR r/m16, imm8 ROR r/m32, 1 ROR r/m64, 1 ROR r/m32, CL ROR r/m64, CL ROR r/m32, imm8 ROR r/m64, imm8
M1 MC MC MI MI M1 M1 MC MC MI MI M1 MC MI M1 M1 MC MC MI MI
Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid
N.E. Valid N.E. Valid N.E. Valid N.E. Valid N.E. Valid N.E. Valid Valid Valid Valid N.E. Valid N.E. Valid N.E.
Rotate 64 bits r/m64 left once. Uses a 6 bit count. Rotate 32 bits r/m32 left CL times. Rotate 64 bits r/m64 left CL times. Uses a 6 bit count. Rotate 32 bits r/m32 left imm8 times. Rotate 64 bits r/m64 left imm8 times. Uses a 6 bit count. Rotate 8 bits r/m8 right once. Rotate 8 bits r/m8 right once. Rotate 8 bits r/m8 right CL times. Rotate 8 bits r/m8 right CL times. Rotate 8 bits r/m16 right imm8 times. Rotate 8 bits r/m16 right imm8 times. Rotate 16 bits r/m16 right once. Rotate 16 bits r/m16 right CL times. Rotate 16 bits r/m16 right imm8 times. Rotate 32 bits r/m32 right once. Rotate 64 bits r/m64 right once. Uses a 6 bit count. Rotate 32 bits r/m32 right CL times. Rotate 64 bits r/m64 right CL times. Uses a 6 bit count. Rotate 32 bits r/m32 right imm8 times. Rotate 64 bits r/m64 right imm8 times. Uses a 6 bit count.
NOTES: * In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH. ** See IA-32 Architecture Compatibility section below.
85
Op/En MC MI
Operand 2 CL imm8
Operand 3 NA NA
Operand 4 NA NA
Description
Shifts (rotates) the bits of the first operand (destination operand) the number of bit positions specified in the second operand (count operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the count operand is an unsigned integer that can be an immediate or a value in the CL register. In legacy and compatibility mode, the processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 least-significant bits. The rotate left (ROL) and rotate through carry left (RCL) instructions shift all the bits toward more-significant bit positions, except for the most-significant bit, which is rotated to the least-significant bit location. The rotate right (ROR) and rotate through carry right (RCR) instructions shift all the bits toward less significant bit positions, except for the least-significant bit, which is rotated to the most-significant bit location. The RCL and RCR instructions include the CF flag in the rotation. The RCL instruction shifts the CF flag into the least-significant bit and shifts the most-significant bit into the CF flag. The RCR instruction shifts the CF flag into the most-significant bit and shifts the least-significant bit into the CF flag. For the ROL and ROR instructions, the original value of the CF flag is not a part of the result, but the CF flag receives a copy of the bit that was shifted from one end to the other. The OF flag is defined only for the 1-bit rotates; it is undefined in all other cases (except RCL and RCR instructions only: a zero-bit rotate does nothing, that is affects no flags). For left rotates, the OF flag is set to the exclusive OR of the CF bit (after the rotate) and the most-significant bit of the result. For right rotates, the OF flag is set to the exclusive OR of the two most-significant bits of the result. In 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Use of REX.W promotes the first operand to 64 bits and causes the count operand to become a 6-bit counter. ...
86
Round the low packed single precision floating-point value in xmm2/m32 and place the result in xmm1. The rounding mode is determined by imm8. Round the low packed single precision floating-point value in xmm3/m32 and place the result in xmm1. The rounding mode is determined by imm8. Also, upper packed single precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32].
RVMI V/V
AVX
Description
Round the single-precision floating-point value in the lowest dword of the source operand (second operand) using the rounding mode specified in the immediate operand (third operand) and place the result in the destination operand (first operand). The rounding process rounds a single-precision floating-point input to an integer value and returns the result as a single-precision floating-point value in the lowest position. The upper three singleprecision floating-point values in the destination are retained. The immediate operand specifies control fields for the rounding operation, three bit fields are defined and shown in Figure 4-17. Bit 3 of the immediate byte controls processor behavior for a precision exception, bit 2 selects the source of rounding mode control. Bits 1:0 specify a non-sticky rounding-mode value (Table 4-17 lists the encoded values for rounding-mode field). The Precision Floating-Point Exception is signaled according to the immediate operand. If any source operand is an SNaN then it will be converted to a QNaN. If DAZ is set to 1 then denormals will be converted to zero before rounding. 128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX1:32) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.
Operation
IF (imm[2] = 1) THEN // rounding mode is determined by MXCSR.RC DEST[31:0] ConvertSPFPToInteger_M(SRC[31:0]); ELSE // rounding mode is determined by IMM8.RC DEST[31:0] ConvertSPFPToInteger_Imm(SRC[31:0]); FI; DEST[127:32] remains unchanged ;
87
ROUNDSS (128-bit Legacy SSE version) DEST[31:0] RoundToInteger(SRC[31:0], ROUND_CONTROL) DEST[VLMAX-1:32] (Unmodified) VROUNDSS (VEX.128 encoded version) DEST[31:0] RoundToInteger(SRC2[31:0], ROUND_CONTROL) DEST[127:32] SRC1[127:32] DEST[VLMAX-1:128] 0
Other Exceptions
See Exceptions Type 3. ...
88
Description
The SHLD instruction is used for multi-precision shifts of 64 bits or more. The instruction shifts the first operand (destination operand) to the left the number of bits specified by the third operand (count operand). The second operand (source operand) provides bits to shift in from the right (starting with bit 0 of the destination operand). The destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be stored in an immediate byte or in the CL register. If the count operand is CL, the shift count is the logical AND of CL and a count mask. In non-64-bit modes and default 64-bit mode; only bits 0 through 4 of the count are used. This masks the count to a value between 0 and 31. If a count is greater than the operand size, the result is undefined. If the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. If the count operand is 0, flags are not affected. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits (upgrading the count mask to 6 bits). See the summary chart at the beginning of this section for encoding data and limits.
Operation
IF (In 64-Bit Mode and REX.W = 1) THEN COUNT COUNT MOD 64; ELSE COUNT COUNT MOD 32; FI SIZE OperandSize;
89
IF COUNT = 0 THEN No operation; ELSE IF COUNT > SIZE THEN (* Bad parameters *) DEST is undefined; CF, OF, SF, ZF, AF, PF are undefined; ELSE (* Perform the shift *) CF BIT[DEST, SIZE COUNT]; (* Last bit shifted out on exit *) FOR i SIZE 1 DOWN TO COUNT DO Bit(DEST, i) Bit(DEST, i COUNT); OD; FOR i COUNT 1 DOWN TO 0 DO BIT[DEST, i] BIT[SRC, i COUNT + SIZE]; OD; FI; FI; ...
90
Description
The SHRD instruction is useful for multi-precision shifts of 64 bits or more. The instruction shifts the first operand (destination operand) to the right the number of bits specified by the third operand (count operand). The second operand (source operand) provides bits to shift in from the left (starting with the most significant bit of the destination operand). The destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be stored in an immediate byte or the CL register. If the count operand is CL, the shift count is the logical AND of CL and a count mask. In non-64-bit modes and default 64-bit mode, the width of the count mask is 5 bits. Only bits 0 through 4 of the count register are used (masking the count to a value between 0 and 31). If the count is greater than the operand size, the result is undefined. If the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. If the count operand is 0, flags are not affected. In 64-bit mode, the instructions default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits (upgrading the count mask to 6 bits). See the summary chart at the beginning of this section for encoding data and limits. ...
91
Description
Stores the segment selector from the local descriptor table register (LDTR) in the destination operand. The destination operand can be a general-purpose register or a memory location. The segment selector stored with this instruction points to the segment descriptor (located in the GDT) for the current LDT. This instruction can only be executed in protected mode. Outside IA-32e mode, when the destination operand is a 32-bit register, the 16-bit segment selector is copied into the low-order 16 bits of the register. The high-order 16 bits of the register are cleared for the Pentium 4, Intel Xeon, and P6 family processors. They are undefined for Pentium, Intel486, and Intel386 processors. When the destination operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of the operand size. In compatibility mode, when the destination operand is a 32-bit register, the 16-bit segment selector is copied into the low-order 16 bits of the register. The high-order 16 bits of the register are cleared. When the destination operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of the operand size. In 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). The behavior of SLDT with a 64-bit register is to zero-extend the 16-bit selector and store it in the register. If the destination is memory and operand size is 64, SLDT will write the 16-bit selector to memory as a 16-bit quantity, regardless of the operand size ...
92
Computes square root of the low doubleprecision floating-point value in xmm2/m64 and stores the results in xmm1. Computes square root of the low doubleprecision floating point value in xmm3/m64 and stores the results in xmm2. Also, upper double precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64].
Description
Computes the square root of the low double-precision floating-point value in the source operand (second operand) and stores the double-precision floating-point result in the destination operand. The source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. The high quadword of the destination operand remains unchanged. See Figure 11-4 in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 1, for an illustration of a scalar double-precision floating-point operation. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX1:64) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.
93
Computes square root of the low singleprecision floating-point value in xmm2/m32 and stores the results in xmm1. Computes square root of the low singleprecision floating-point value in xmm3/m32 and stores the results in xmm1. Also, upper single precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32].
Description
Computes the square root of the low single-precision floating-point value in the source operand (second operand) and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged. See Figure 10-6 in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 1, for an illustration of a scalar single-precision floating-point operation. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). 128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX1:32) of the corresponding YMM destination register remain unchanged. VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed. ...
94
SUBSubtract
Opcode 2C ib 2D iw 2D id REX.W + 2D id 80 /5 ib REX + 80 /5 ib 81 /5 iw 81 /5 id REX.W + 81 /5 id 83 /5 ib 83 /5 ib REX.W + 83 /5 ib 28 /r REX + 28 /r 29 /r 29 /r REX.W + 29 /r 2A /r REX + 2A /r 2B /r 2B /r REX.W + 2B /r Instruction SUB AL, imm8 SUB AX, imm16 SUB EAX, imm32 SUB RAX, imm32 SUB r/m8, imm8 SUB r/m8*, imm8 SUB r/m16, imm16 SUB r/m32, imm32 SUB r/m64, imm32 SUB r/m16, imm8 SUB r/m32, imm8 SUB r/m64, imm8 SUB r/m8, r8 SUB r/m8*, r8* SUB r/m16, r16 SUB r/m32, r32 SUB r/m64, r64 SUB r8, r/m8 SUB r8*, r/m8* SUB r16, r/m16 SUB r32, r/m32 SUB r64, r/m64 Op/ En I I I I MI MI MI MI MI MI MI MI MR MR MR MR MR RM RM RM RM RM 64-Bit Mode Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Valid Compat/ Description Leg Mode Valid Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. Valid N.E. Valid Valid N.E. Subtract imm8 from AL. Subtract imm16 from AX. Subtract imm32 from EAX. Subtract imm32 sign-extended to 64-bits from RAX. Subtract imm8 from r/m8. Subtract imm8 from r/m8. Subtract imm16 from r/m16. Subtract imm32 from r/m32. Subtract imm32 sign-extended to 64-bits from r/m64. Subtract sign-extended imm8 from r/m16. Subtract sign-extended imm8 from r/m32. Subtract sign-extended imm8 from r/m64. Subtract r8 from r/m8. Subtract r8 from r/m8. Subtract r16 from r/m16. Subtract r32 from r/m32. Subtract r64 from r/m64. Subtract r/m8 from r8. Subtract r/m8 from r8. Subtract r/m16 from r16. Subtract r/m32 from r32. Subtract r/m64 from r64.
NOTES: * In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.
Description
Subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. (However, two memory operands cannot be used in one
95
instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. ...
Compares (unordered) the low doubleprecision floating-point values in xmm1 and xmm2/m64 and set the EFLAGS accordingly. Compare low double precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly.
Description
Performs an unordered compare of the double-precision floating-point values in the low quadwords of source operand 1 (first operand) and source operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. Source operand 1 is an XMM register; source operand 2 can be an XMM register or a 64 bit memory location. The UCOMISD instruction differs from the COMISD instruction in that it signals a SIMD floating-point invalid operation exception (#I) only when a source operand is an SNaN. The COMISD instruction signals an invalid operation exception if a source operand is either a QNaN or an SNaN. The EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
Operation
RESULT UnorderedCompare(SRC1[63:0] < > SRC2[63:0]) { (* Set EFLAGS *) CASE (RESULT) OF UNORDERED: ZF, PF, CF 111; GREATER_THAN: ZF, PF, CF 000; LESS_THAN: ZF, PF, CF 001; EQUAL: ZF, PF, CF 100; ESAC; OF, AF, SF 0;
96
...
Compare lower single-precision floating-point value in xmm1 register with lower singleprecision floating-point value in xmm2/mem and set the status flags accordingly. Compare low single precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly.
RM
V/V
AVX
Description
Performs an unordered compare of the single-precision floating-point values in the low doublewords of the source operand 1 (first operand) and the source operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that 0.0 is equal to +0.0. Source operand 1 is an XMM register; source operand 2 can be an XMM register or a 32 bit memory location. The UCOMISS instruction differs from the COMISS instruction in that it signals a SIMD floating-point invalid operation exception (#I) only when a source operand is an SNaN. The COMISS instruction signals an invalid operation exception if a source operand is either a QNaN or an SNaN. The EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated. In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15). Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
Operation
RESULT UnorderedCompare(SRC1[31:0] <> SRC2[31:0]) { (* Set EFLAGS *) CASE (RESULT) OF UNORDERED: ZF,PF,CF 111; GREATER_THAN: ZF,PF,CF 000; LESS_THAN: ZF,PF,CF 001; EQUAL: ZF,PF,CF 100; ESAC; OF,AF,SF 0; ...
97
Broadcast single-precision floating-point element in mem to four locations in xmm1. Broadcast single-precision floating-point element in mem to eight locations in ymm1. Broadcast double-precision floating-point element in mem to four locations in ymm1. Broadcast 128 bits of floating-point data in mem to low and high 128-bits in ymm1.
Description
Load floating point values from the source operand (second operand) and broadcast to all elements of the destination operand (first operand). The destination operand is a YMM register. The source operand is either a 32-bit, 64-bit, or 128-bit memory location. Register source encodings are reserved and will #UD. VBROADCASTSD and VBROADCASTF128 are only supported as 256-bit wide versions. VBROADCASTSS is supported in both 128-bit and 256-bit wide versions. Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. If VBROADCASTSD or VBROADCASTF128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded with VEX.L= 0 will cause an #UD exception. ...
98
Convert eight packed single-precision floating-point value in ymm2 to packed half-precision (16-bit) floating-point value in xmm1/mem. Imm8 provides rounding controls. Convert four packed single-precision floating-point value in xmm2 to packed halfprecision (16-bit) floating-point value in xmm1/mem. Imm8 provides rounding controls.
MR
V/V
F16C
Description
Convert four or eight packed single-precision floating values in first source operand to four or eight packed halfprecision (16-bit) floating-point values. The rounding mode is specified using the immediate field (imm8). Underflow results (i.e. tiny results) are converted to denormals. MXCSR.FTZ is ignored. If a source element is denormal relative to input format with MXCSR.DAZ not set, DM masked and at least one of PM or UM unmasked; a SIMD exception will be raised with DE, UE and PE set. 128-bit version: The source operand is a XMM register. The destination operand is a XMM register or 64-bit memory location. The upper-bits vector register zeroing behavior of VEX prefix encoding still applies if the destination operand is a xmm register. So the upper bits (255:64) of corresponding YMM register are zeroed. 256-bit version: The source operand is a YMM register. The destination operand is a XMM register or 128-bit memory location. The upper-bits vector register zeroing behavior of VEX prefix encoding still applies if the destination operand is a xmm register. So the upper bits (255:128) of the corresponding YMM register are zeroed. Note: VEX.vvvv is reserved (must be 1111b). The diagram below illustrates how data is converted from four packed single precision (in 128 bits) to four half precision (in 64 bits) FP values.
99
96
32
31 VS0
0 xmm2 convert
127
96
95
64
63 VH3
48 47 VH2
32 31 VH1
16 15 VH0
0 xmm1/mem64
Table 4-191. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions
Bits Imm[1:0] Field Name/value RC=00B RC=01B RC=10B RC=11B Imm[2] Imm[7:3] MS1=0 MS1=1 Ignored Description Round to nearest even Round down Round up Truncate Use imm[1:0] for rounding Use MXCSR.RC for rounding Ignored by processor Ignore MXCSR.RC Comment If Imm[2] = 0
Operation
vCvt_s2h(SRC1[31:0]) { IF Imm[2] = 0 THEN // using Imm[1:0] for rounding control, see Table 4-19 RETURN Cvt_Single_Precision_To_Half_Precision_FP_Imm(SRC1[31:0]); ELSE // using MXCSR.RC for rounding control RETURN Cvt_Single_Precision_To_Half_Precision_FP_Mxcsr(SRC1[31:0]); FI; } VCVTPS2PH (VEX.256 encoded version) DEST[15:0] vCvt_s2h(SRC1[31:0]); DEST[31:16] vCvt_s2h(SRC1[63:32]); DEST[47:32] vCvt_s2h(SRC1[95:64]); DEST[63:48] vCvt_s2h(SRC1[127:96]);
100
DEST[79:64] vCvt_s2h(SRC1[159:128]); DEST[95:80] vCvt_s2h(SRC1[191:160]); DEST[111:96] vCvt_s2h(SRC1[223:192]); DEST[127:112] vCvt_s2h(SRC1[255:224]); DEST[255:128] 0; // if DEST is a register VCVTPS2PH (VEX.128 encoded version) DEST[15:0] vCvt_s2h(SRC1[31:0]); DEST[31:16] vCvt_s2h(SRC1[63:32]); DEST[47:32] vCvt_s2h(SRC1[95:64]); DEST[63:48] vCvt_s2h(SRC1[127:96]); DEST[VLMAX-1:64] 0; // if DEST is a register
Flags Affected
None ...
Description
The GETSEC[ENTERACCS] function loads, authenticates and executes an authenticated code module using an Intel TXT platform chipset's public key. The ENTERACCS leaf of GETSEC is selected with EAX set to 2 at entry. There are certain restrictions enforced by the processor for the execution of the GETSEC[ENTERACCS] instruction: Execution is not allowed unless the processor is in protected mode or IA-32e mode with CPL = 0 and EFLAGS.VM = 0. Processor cache must be available and not disabled, that is, CR0.CD and CR0.NW bits must be 0. For processor packages containing more than one logical processor, CR0.CD is checked to ensure consistency between enabled logical processors. For enforcing consistency of operation with numeric exception reporting using Interrupt 16, CR0.NE must be set. An Intel TXT-capable chipset must be present as communicated to the processor by sampling of the power-on configuration capability field after reset.
101
The processor can not already be in authenticated code execution mode as launched by a previous GETSEC[ENTERACCS] or GETSEC[SENTER] instruction without a subsequent exiting using GETSEC[EXITAC]). To avoid potential operability conflicts between modes, the processor is not allowed to execute this instruction if it currently is in SMM or VMX operation. To insure consistent handling of SIPI messages, the processor executing the GETSEC[ENTERACCS] instruction must also be designated the BSP (boot-strap processor) as defined by IA32_APIC_BASE.BSP (Bit 8).
Failure to conform to the above conditions results in the processor signaling a general protection exception. Prior to execution of the ENTERACCS leaf, other logical processors, i.e. RLPs, in the platform must be: idle in a wait-for-SIPI state (as initiated by an INIT assertion or through reset for non-BSP designated processors), or in the SENTER sleep state as initiated by a GETSEC[SENTER] from the initiating logical processor (ILP).
If other logical processor(s) in the same package are not idle in one of these states, execution of ENTERACCS signals a general protection exception. The same requirement and action applies if the other logical processor(s) of the same package do not have CR0.CD = 0. A successful execution of ENTERACCS results in the ILP entering an authenticated code execution mode. Prior to reaching this point, the processor performs several checks. These include: ... Establish and check the location and size of the specified authenticated code module to be executed by the processor. Inhibit the ILPs response to the external events: INIT, A20M, NMI and SMI. Broadcast a message to enable protection of memory and I/O from other processor agents. Load the designated code module into an authenticated code execution area. Isolate the contents of the authenticated code execution area from further state modification by external agents. Authenticate the authenticated code module. Initialize the initiating logical processor state based on information contained in the authenticated code module header. Unlock the Intel TXT-capable chipset private configuration space and TPM locality 3 space. Begin execution in the authenticated code module at the defined entry point.
6.3.1
External Interrupts
External interrupts are received through pins on the processor or through the local APIC. The primary interrupt pins on Pentium 4, Intel Xeon, P6 family, and Pentium processors are the LINT[1:0] pins, which are connected to the local APIC (see Chapter 10, Advanced Programmable Interrupt Controller (APIC)). When the local APIC is enabled, the LINT[1:0] pins can be programmed through the APICs local vector table (LVT) to be associated with any of the processors exception or interrupt vectors.
102
When the local APIC is global/hardware disabled, these pins are configured as INTR and NMI pins, respectively. Asserting the INTR pin signals the processor that an external interrupt has occurred. The processor reads from the system bus the interrupt vector number provided by an external interrupt controller, such as an 8259A (see Section 6.2, Exception and Interrupt Vectors). Asserting the NMI pin signals a non-maskable interrupt (NMI), which is assigned to interrupt vector 2.
103
6.7.1
While an NMI interrupt handler is executing, the processor blocks delivery of subsequent NMIs until the next execution of the IRET instruction. This blocking of NMIs prevents nested execution of the NMI handler. It is recommended that the NMI interrupt handler be accessed through an interrupt gate to disable maskable hardware interrupts (see Section 6.8.1, Masking Maskable Hardware Interrupts). An execution of the IRET instruction unblocks NMIs even if the instruction causes a fault. For example, if the IRET instruction executes with EFLAGS.VM = 1 and IOPL of less than 3, a general-protection exception is generated (see Section 20.2.7, Sensitive Instructions). In such a case, NMIs are unmasked before the exception handler is invoked. ...
6.9
If more than one exception or interrupt is pending at an instruction boundary, the processor services them in a predictable order. Table 6-2 shows the priority among classes of exception and interrupt sources.
104
105
instruction of the handler. Lower priority exceptions are discarded; lower priority interrupts are held pending. Discarded exceptions are re-generated when the interrupt handler returns execution to the point in the program or task where the exceptions and/or interrupts occurred. ...
Abort.
Page Faults
Table 6-5 shows the various combinations of exception classes that cause a double fault to be generated. A double-fault exception falls in the abort class of exceptions. The program or task cannot be restarted or resumed. The double-fault handler can be used to collect diagnostic information about the state of the machine and/or, when possible, to shut the application and/or system down gracefully or restart the system. A segment or page fault may be encountered while prefetching instructions; however, this behavior is outside the domain of Table 6-5. Any further faults generated while the processor is attempting to transfer control to the appropriate fault handler could still lead to a double-fault sequence. ...
106
Fault.
9.1.1
Table 9-1 shows the state of the flags and other registers following power-up for the Pentium 4, Intel Xeon, P6 family (including Intel processors with CPUID DisplayFamily signature of 06H), and Pentium processors. The state of control register CR0 is 60000010H (see Figure 9-1). This places the processor is in real-address mode with paging disabled.
107
9.1.2
Hardware may request that the BIST be performed at power-up. The EAX register is cleared (0H) if the processor passes the BIST. A nonzero value in the EAX register after the BIST indicates that a processor fault was detected. If the BIST is not requested, the contents of the EAX register after a hardware reset is 0H. The overhead for performing a BIST varies between processor families. For example, the BIST takes approximately 30 million processor clock periods to execute on the Pentium 4 processor. This clock count is model-specific; Intel reserves the right to change the number of periods for any Intel 64 or IA-32 processor, without notification.
P6 Family Processor (Including DisplayFamily = 06H) 00000002H 0000FFF0H 60000010H2 00000000H Selector = F000H Base = FFFF0000H Limit = FFFFH AR = Present, R/W, Accessed Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W, Accessed 000n06xxH3 0
4
Pentium Processor 00000002H 0000FFF0H 60000010H2 00000000H Selector = F000H Base = FFFF0000H Limit = FFFFH AR = Present, R/W, Accessed Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W, Accessed 000005xxH 04 00000000H Pwr up or Reset: +0.0 FINIT/FNINIT: Unchanged Pwr up or Reset: 0040H FINIT/FNINIT: 037FH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 5555H FINIT/FNINIT: FFFFH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 00000000H FINIT/FNINIT: 00000000H
EDX EAX EBX, ECX, ESI, EDI, EBP, ESP ST0 through ST75 x87 FPU Control Word5
00000000H Pwr up or Reset: +0.0 FINIT/FNINIT: Unchanged Pwr up or Reset: 0040H FINIT/FNINIT: 037FH
00000000H Pwr up or Reset: +0.0 FINIT/FNINIT: Unchanged Pwr up or Reset: 0040H FINIT/FNINIT: 037FH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 5555H FINIT/FNINIT: FFFFH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 00000000H FINIT/FNINIT: 00000000H
x87 FPU Status Word5 Pwr up or Reset: 0000H FINIT/FNINIT: 0000H x87 FPU Tag Word5 x87 FPU Data Operand and CS Seg. Selectors5 x87 FPU Data Operand and Inst. Pointers5 Pwr up or Reset: 5555H FINIT/FNINIT: FFFFH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 00000000H FINIT/FNINIT: 00000000H
108
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register MM0 through MM75 Pentium 4 and Intel Xeon Processor Pwr up or Reset: 0000000000000000H INIT or FINIT/FNINIT: Unchanged P6 Family Processor (Including DisplayFamily = 06H) Pentium II and Pentium III Processors Only Pwr up or Reset: 0000000000000000H INIT or FINIT/FNINIT: Unchanged If CPUID.01H:SSE is 1 Pwr up or Reset: 0H INIT: Unchanged Pentium III processor onlyPwr up or Reset: 1F80H INIT: Unchanged Base = 00000000H Limit = FFFFH AR = Present, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W 00000000H FFFF0FF0H 00000400H Power up or Reset: 0H INIT: Unchanged Power up or Reset: 0H INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Invalid6 Pwr up or Reset: Disabled INIT: Unchanged Pwr up or Reset: Disabled INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Pwr up or Reset: Enabled INIT: Unchanged 0000000000000000H Base = 00000000H Limit = FFFFH AR = Present, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W 00000000H FFFF0FF0H 00000400H Power up or Reset: 0H INIT: Unchanged Power up or Reset: 0H INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Invalid6 Not Implemented Not Implemented Not Implemented NA Pentium Processor Pentium with MMX Technology Only Pwr up or Reset: 0000000000000000H INIT or FINIT/FNINIT: Unchanged NA
Pwr up or Reset: 0H INIT: Unchanged Pwr up or Reset: 1F80H INIT: Unchanged Base = 00000000H Limit = FFFFH AR = Present, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W 00000000H FFFF0FF0H 00000400H Power up or Reset: 0H INIT: Unchanged Power up or Reset: 0H INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Invalid6 Pwr up or Reset: Disabled INIT: Unchanged Pwr up or Reset: Disabled INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Pwr up or Reset: Enabled INIT: Unchanged 0000000000000000H
MXCSR
GDTR, IDTR
DR0, DR1, DR2, DR3 DR6 DR7 Time-Stamp Counter Perf. Counters and Event Select All Other MSRs
Data and Code Cache, TLBs Fixed MTRRs Variable MTRRs Machine-Check Architecture APIC R8-R157
109
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register XMM8-XMM157 YMMn[128:VLMAX]8 Pentium 4 and Intel Xeon Processor Pwr up or Reset: 0H INIT: Unchanged N.A. P6 Family Processor (Including DisplayFamily = 06H) Pwr up or Reset: 0H INIT: Unchanged Pwr up or Reset: 0H INIT: Unchanged N.A. N.A. Pentium Processor
NOTES: 1. The 10 most-significant bits of the EFLAGS register are undefined following a reset. Software should not depend on the states of any of these bits. 2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared. 3. Where n is the Extended Model Value for the respective processor. 4. If Built-In Self-Test (BIST) is invoked on power up or reset, EAX is 0 only if all tests passed. (BIST cannot be invoked during an INIT.) 5. The state of the x87 FPU and MMX registers is not changed by the execution of an INIT. 6. Internal caches are invalid after power-up and RESET, but left unchanged with an INIT. 7. If the processor supports IA-32e mode. 8. If the processor supports AVX. ...
14.5.3
Pentium 4, Intel Xeon and Pentium M processors also support software-controlled clock modulation. This provides a means for operating systems to implement a power management policy to reduce the power consumption of the processor. Here, the stop-clock duty cycle is controlled by software through the IA32_CLOCK_MODULATION MSR (see Figure 14-10).
63
543
10
Reserved On-Demand Clock Modulation Enable On-Demand Clock Modulation Duty Cycle Reserved
110
On-Demand Clock Modulation Duty Cycle, bits 1 through 3 Selects the on-demand clock modulation duty cycle (see Table 14-1). This field is only active when the on-demand clock modulation enable flag is set.
Note that the on-demand clock modulation mechanism (like the thermal monitor) controls the processors stopclock circuitry internally to modulate the clock signal. The STPCLK# pin is not used in this mechanism.
The on-demand clock modulation mechanism can be used to control processor power consumption. Power management software can write to the IA32_CLOCK_MODULATION MSR to enable clock modulation and to select a modulation duty cycle. If on-demand clock modulation and TM1 are both enabled and the thermal status of the processor is hot (bit 0 of the IA32_THERM_STATUS MSR is set), clock modulation at the duty cycle specified by TM1 takes precedence, regardless of the setting of the on-demand clock modulation duty cycle. For Hyper-Threading Technology enabled processors, the IA32_CLOCK_MODULATION register is duplicated for each logical processor. In order for the On-demand clock modulation feature to work properly, the feature must be enabled on all the logical processors within a physical processor. If the programmed duty cycle is not identical for all the logical processors, the processor core clock will modulate to the highest duty cycle programmed for processors with any of the following CPUID DisplayFamily_DisplayModel signatures (see CPUID instruction in Chapter3, Instruction Set Reference, A-L in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 2A): 06_1A, 06_1C, 06_1E, 06_1F, 06_25, 06_26, 06_27, 06_2C, 06_2E, 06_2F, 06_35, 06_36, and 0F_xx. For all other processors, if the programmed duty cycle is not identical for all logical processors in the same core, the processor core will modulate at the lowest programmed duty cycle. For multiple processor cores in a physical package, each processor core can modulate to a programmed duty cycle independently. For the P6 family processors, on-demand clock modulation was implemented through the chipset, which controlled clock modulation through the processors STPCLK# pin. ...
14.7.2
The specific RAPL domains available in a platform varies across product segments. Platforms targeting client segment support the following RAPL domain hierarchy: Package Two power planes: PP0 and PP1 (PP1 may reflect to uncore devices) Package
Platforms targeting server segment support the following RAPL domain hierarchy:
111
Each level of the RAPL hierarchy provides respective set of RAPL interface MSRs. Table 14-2 lists the RAPL MSR interfaces available for each RAPL domain. The power limit MSR of each RAPL domain is located at offset 0 relative to an MSR base address which is non-architectural (see Chapter 35). The energy status MSR of each domain is located at offset 1 relative to the MSR base address of respective domain.
The presence of the optional MSR interfaces (the three right-most columns of Table 14-2) may be model-specific. See Chapter 35 for detail.
14.7.3
The MSR interfaces defined for the package RAPL domain are: MSR_PKG_POWER_LIMIT allows software to set power limits for the package and measurement attributes associated with each limit, MSR_PKG_ENERGY_STATUS reports measured actual energy usage, MSR_PKG_POWER_INFO reports the package power range information for RAPL usage.
MSR_PKG_PERF_STATUS can report the performance impact of power limiting, but its availability may be modelspecific.
63 62
L O C K
56 55
49 48 47 46
32 31
24 23
17 16 15 14
Enable limit #1 Pkg clamping limit #1 Enable limit #2 Pkg clamping limit #2
112
MSR_PKG_POWER_LIMIT allows a software agent to define power limitation for the package domain. Power limitation is defined in terms of average power usage (Watts) over a time window specified in MSR_PKG_POWER_LIMIT. Two power limits can be specified, corresponding to time windows of different sizes. Each power limit provides independent clamping control that would permit the processor cores to go below OSrequested state to meet the power limits. A lock mechanism allow the software agent to enforce power limit settings. Once the lock bit is set, the power limit settings are static and un-modifiable until next RESET. The bit fields of MSR_PKG_POWER_LIMIT (Figure 14-17) are: Package Power Limit #1(bits 14:0): Sets the average power usage limit of the package domain corresponding to time window # 1. The unit of this field is specified by the Power Units field of MSR_RAPL_POWER_UNIT. Enable Power Limit #1(bit 15): 0 = disabled; 1 = enabled. Package Clamping Limitation #1 (bit 16): Allow going below OS-requested P/T state setting during time window specified by bits 23:17. Time Window for Power Limit #1 (bits 23:17): Indicates the time window for power limit #1 Time limit = 2^Y * (1.0 + Z/4.0) * Time_Unit Here Y is the unsigned integer value represented. by bits 21:17, Z is an unsigned integer represented by bits 23:22. Time_Unit is specified by the Time Units field of MSR_RAPL_POWER_UNIT. Package Power Limit #2(bits 46:32): Sets the average power usage limit of the package domain corresponding to time window # 2. The unit of this field is specified by the Power Units field of MSR_RAPL_POWER_UNIT. Enable Power Limit #2(bit 47): 0 = disabled; 1 = enabled. Package Clamping Limitation #2 (bit 48): Allow going below OS-requested P/T state setting during time window specified by bits 23:17. Time Window for Power Limit #2 (bits 55:49): Indicates the time window for power limit #2 Time limit = 2^Y * (1.0 + Z/4.0) * Time_Unit Here Y is the unsigned integer value represented. by bits 53:49, Z is an unsigned integer represented by bits 55:54. Time_Unit is specified by the Time Units field of MSR_RAPL_POWER_UNIT. This field may have a hard-coded value in hardware and ignores values written by software. Lock (bit 63): If set, all write attempts to this MSR are ignored until next RESET. MSR_PKG_ENERGY_STATUS is a read-only MSR. It reports the actual energy use for the package domain. This MSR is updated every ~1msec. It has a wraparound time of around 60 secs when power consumption is high, and may be longer otherwise. ...
14.7.2
The MSR interfaces defined for the DRAM domain is supported only in the server platform. The MSR interfaces are: ... MSR_DRAM_POWER_LIMIT allows software to set power limits for the DRAM domain and measurement attributes associated with each limit, MSR_DRAM_ENERGY_STATUS reports measured actual energy usage, MSR_DRAM_POWER_INFO reports the DRAM domain power range information for RAPL usage. MSR_DRAM_PERF_STATUS can report the performance impact of power limiting.
113
16.4.3
MC error codes associated with integrated memory controllers are reported in the MSRs IA32_MC8_STATUSIA32_MC11_STATUS. The supported error codes are follows the architectural MCACOD definition type 1MMMCCCC (see Chapter 15, Machine-Check Architecture,). MSR_ERROR_CONTROL.[ bit 1] can enable additional information logging of the IMC. The additional error information logged by the IMC is stored in IA32_MCi_STATUS and IA32_MCi_MISC; (i = 8, 11).
Table 16-15. Intel IMC MC Error Codes for IA32_MCi_STATUS (i= 8, 11)
Type MCA error codes1 Model specific errors Bit No. 0-15 31:16 Bit Function MCACOD Reserved except for the following Bit Description Bus error format: 1PPTRRRRIILL 0x001 - Address parity error 0x002 - HA Wrt buffer Data parity error 0x004 - HA Wrt byte enable parity error 0x008 - Corrected patrol scrub error 0x010 - Uncorrected patrol scrub error 0x020 - Corrected spare error 0x040 - Uncorrected spare error When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log first device error when corrected error is detected during normal read. Reserved See Chapter 15, Machine-Check Architecture,
36-32 37 56-38
57-63
1. These fields are architecturally defined. Refer to Chapter 15, Machine-Check Architecture, for more information.
Table 16-16. Intel IMC MC Error Codes for IA32_MCi_MISC (i= 8, 11)
Type MCA addr info1 Model specific errors Bit No. 0-8 13:9 Bit Function Bit Description See Chapter 15, Machine-Check Architecture, When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log second device error when corrected error is detected during normal read. Otherwise contain parity error if MCi_Status indicates HA_WB_Data or HA_W_BE parity error. ErrMask_1stErrDev When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log first-device error bit mask.
29-14
114
Bit Description When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log second-device error bit mask. When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log first-device error failing rank. When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log second-device error failing rank. When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log first-device error failing DIMM slot. When MSR_ERROR_CONTROL.[1] is set, allows the iMC to log second-device error failing DIMM slot. When MSR_ERROR_CONTROL.[1] is set, indicates the iMC hs logged valid data from the first correctable error in a memory device. When MSR_ERROR_CONTROL.[1] is set, indicates the iMC hs logged valid data due to a second correctable error in a memory device. Use this information only after there is valid first error info indicated by bit 62.
1. These fields are architecturally defined. Refer to Chapter 15, Machine-Check Architecture, for more information.
NOTES:
16.5
INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY WITH CPUID DISPLAYFAMILY_DISPLAYMODEL SIGNATURE 06_3EH, MACHINE ERROR CODES FOR MACHINE CHECK
Next generation Intel Xeon processor based on Intel microarchitecture codenamed Ivy Bridge can be identified with CPUID DisplayFamily_DisplaySignature 06_3EH. Incremental error codes for internal machine check error from PCU controller is reported in the register bank IA32_MC4, Table 16-17 lists model-specific fields to interpret error codes applicable to IA32_MC4_STATUS. Incremental MC error codes related to the Intel QPI links are reported in the register banks IA32_MC5. Information listed in Table 16-14 for QPI MC error code apply to IA32_MC5_STATUS. Incremental error codes for the memory controller unit is reported in the register banks IA32_MC9-IA32_MC16. Table 16-18 lists model-specific error codes apply to IA32_MCi_STATUS, i = 9-16.
115
16.5.1
Type MCA error codes
Bit Description 0000b - No Error 0001b - Non_IMem_Sel 0010b - I_Parity_Error 0011b - Bad_OpCode 0100b - I_Stack_Underflow 0101b - I_Stack_Overflow 0110b - D_Stack_Underflow 0111b - D_Stack_Overflow 1000b - Non-DMem_Sel 1001b - D_Parity_Error
0-15 19:16
23-20 31-24
Reserved 00h - No Error 0Dh - MC_IMC_FORCE_SR_S3_TIMEOUT 0Eh - MC_CPD_UNCPD_ST_TIMOUT 0Fh - MC_PKGS_SAFE_WP_TIMEOUT 43h - MC_PECI_MAILBOX_QUIESCE_TIMEOUT 44h - MC_CRITICAL_VR_FAILED 45h - MC_ICC_MAX-NOTSUPPORTED 5Ch - MC_MORE_THAN_ONE_LT_AGENT 60h - MC_INVALID_PKGS_REQ_PCH 61h - MC_INVALID_PKGS_REQ_QPI 62h - MC_INVALID_PKGS_RES_QPI 63h - MC_INVALID_PKGC_RES_PCH 64h - MC_INVALID_PKG_STATE_CONFIG 70h - MC_WATCHDG_TIMEOUT_PKGC_SLAVE 71h - MC_WATCHDG_TIMEOUT_PKGC_MASTER 72h - MC_WATCHDG_TIMEOUT_PKGS_MASTER 7Ah - MC_HA_FAILSTS_CHANGE_DETECTED 7Bh - MC_PCIE_R2PCIE-RW_BLOCK_ACK_TIMEOUT 81h - MC_RECOVERABLE_DIE_THERMAL_TOO_HOT
Reserved
Reserved
57-63
1. These fields are architecturally defined. Refer to Chapter 15, Machine-Check Architecture, for more information.
116
16.5.2
MC error codes associated with integrated memory controllers are reported in the MSRs IA32_MC9_STATUSIA32_MC16_STATUS. The supported error codes are follows the architectural MCACOD definition type 1MMMCCCC (see Chapter 15, Machine-Check Architecture,). MSR_ERROR_CONTROL.[ bit 1] can enable additional information logging of the IMC. The additional error information logged by the IMC is stored in IA32_MCi_STATUS and IA32_MCi_MISC; (i = 9, 16).
Table 16-18. Intel IMC MC Error Codes for IA32-MCi_STATUS (i= 9, 16)
Type MCA error codes Model specific errors
1
Bit Description Bus error format: 1PPTRRRRIILL 0x001 - Address parity error 0x002 - HA Wrt buffer Data parity error 0x004 - HA Wrt byte enable parity error 0x008 - Corrected patrol scrub error 0x010 - Uncorrected patrol scrub error 0x020 - Corrected spare error 0x040 - Uncorrected spare error 0x100 - iMC, WDB, parity errors
When MSR_ERROR_CONTROL.[1] is set, logs an encoded value from the first error
device.
57-63
1. These fields are architecturally defined. Refer to Chapter 15, Machine-Check Architecture, for more information.
117
Type
Bit Description Reserved When MSR_ERROR_CONTROL.[1] is set, indicates the iMC has logged valid data from a correctable error from memory read associated with first error device. When MSR_ERROR_CONTROL.[1] is set, indicates the iMC hs logged valid data due to a second correctable error in a memory device. Use this information only after there is valid first error info indicated by bit 62.
1. These fields are architecturally defined. Refer to Chapter 15, Machine-Check Architecture, for more information. ...
NOTES:
16.6.3.2
Note:
Processor Model Specific Error Code Field Type B: Bus and Interconnect Error
The Model Specific Error Code field in MC4_STATUS (bits 31:16)
Exactly one of the bits defined in the preceding table will be set for a Bus and Interconnect Error. The Data ECC can be correctable or uncorrectable (the MC4_STATUS.UC bit, of course, distinguishes between correctable and uncorrectable cases with the Other_Info field possibly providing the ECC Syndrome for correctable errors). All other errors for this processor MCA Error Type are uncorrectable.
118
16.6.3.3
Processor Model Specific Error Code Field Type C: Cache Bus Controller Error
Table 16-26. Type C Cache Bus Controller Error Codes
MC4_STATUS[31:16] (MSCE) Value 0000_0000_0000_0001 0x0001 0000_0000_0000_0010 0x0002 0000_0000_0000_0011 0x0003 0000_0000_0000_0100 0x0004 0000_0000_0000_0101 0x0005 0000_0000_0000_0110 0x0006 0000_0000_0000_0111 0x0007 0000_0000_0000_1000 0x0008 0000_0000_0000_1001 0x0009 0000_0001_0000_0000 0x0100 0000_0010_0000_0000 0x0200 0000_0011_0000_0000 0x0300 0000_0100_0000_0000 0x0400 1100_0000_0000_0001 0xC001 1100_0000_0000_0010 0xC002 1100_0000_0000_0100 0xC004 1110_0000_0000_0001 0xE001 1110_0000_0000_0010 0xE002 1110_0000_0000_0100 0xE004 all other encodings
Error Description Inclusion Error from Core 0 Inclusion Error from Core 1 Write Exclusive Error from Core 0 Write Exclusive Error from Core 1 Inclusion Error from FSB SNP Stall Error from FSB Write Stall Error from FSB FSB Arb Timeout Error CBC OOD Queue Underflow/overflow Enhanced Intel SpeedStep Technology TM1-TM2 Error Internal Timeout error Internal Timeout Error Intel Cache Safe Technology Queue Full Error or Disabled-ways-in-a-set overflow Correctable ECC event on outgoing FSB data Correctable ECC event on outgoing Core 0 data Correctable ECC event on outgoing Core 1 data Uncorrectable ECC error on outgoing FSB data Uncorrectable ECC error on outgoing Core 0 data Uncorrectable ECC error on outgoing Core 1 data Reserved
119
All errors - except for the correctable ECC types - in this table are uncorrectable. The correctable ECC events may supply the ECC syndrome in the Other_Info field of the MC4_STATUS MSR..
Table 16-27. Decoding Family 0FH Machine Check Codes for Cache Hierarchy Errors
Type MCA error codes1 Model specific error codes Bit No. 0-15 16-17 Tag Error Code Contains the tag error code for this machine check error: 00 = No error detected 01 = Parity error on tag miss with a clean line 10 = Parity error/multiple tag match on tag hit 11 = Parity error/multiple tag match on tag miss 18-19 Data Error Code Contains the data error code for this machine check error: 00 = No error detected 01 = Single bit error 10 = Double bit error on a clean line 11 = Double bit error on a modified line 20 L3 Error This bit is set if the machine check error originated in the L3 it can be ignored for invalid PIC request errors): 1 = L3 error 0 = L2 error 21 Invalid PIC Request Indicates error due to invalid PIC request access was made to PIC space with WB memory): 1 = Invalid PIC request error 0 = No invalid PIC request error 22-31 Other Information 32-39 Reserved 8-bit Error Count Reserved Holds a count of the number of errors since reset. The counter begins at 0 for the first error and saturates at a count of 255. Reserved Bit Function Bit Description
Reserved
57-63
1. These fields are architecturally defined. Refer to Chapter 15, Machine-Check Architecture, for more information. ...
120
17.2
DEBUG REGISTERS
Eight debug registers (see Figure 17-1 for 32-bit operation and Figure 17-2 for 64-bit operation) control the debug operation of the processor. These registers can be written to and read using the move to/from debug register form of the MOV instruction. A debug register may be the source or destination operand for one of these instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Reserved (set to 1)
B B B 0 1 1 1 1 1 1 1 1 B B B B DR6 T S D 3 2 1 0
0
31
DR5
31 0
DR4
31 0
DR3
0
31
DR2
0
31
DR1
0
31
DR0
Reserved
121
...
The length of the breakpoint location: 1, 2, 4, or 8 bytes (refer to the notes in Section 17.2.4). The operation that must be performed at the address for a debug exception to be generated. Whether the breakpoint is enabled. Whether the breakpoint condition was present when the debug exception was generated.
17.8
LAST BRANCH, CALL STACK, INTERRUPT, AND EXCEPTION RECORDING FOR PROCESSORS BASED ON INTEL MICROARCHITECTURE CODE NAME HASWELL
Generally, all of the last branch record, interrupt and exception recording facility described in Section 17.7, Last Branch, Interrupt, and Exception Recording for Processors based on Intel Microarchitecture code name Sandy Bridge, apply to next generation processors based on Intel Microarchitecture code name Haswell. The LBR facility also supports an alternate capability to profile call stack profiles. Configuring the LBR facility to conduct call stack profiling is by writing 1 to the MSR_LBR_SELECT.EN_CALLSTACK[bit 9]; see Table 17-11. If MSR_LBR_SELECT.EN_CALLSTACK is clear, the LBR facility will capture branches normally as described in Section 17.7.
Access R/W R/W R/W R/W R/W R/W R/W R/W R/W
Description When set, do not capture branches occurring in ring 0 When set, do not capture branches occurring in ring >0 When set, do not capture conditional branches When set, do not capture near relative calls When set, do not capture near indirect calls When set, do not capture near returns When set, do not capture near indirect jumps except near indirect calls and near returns When set, do not capture near relative jumps except near relative calls. When set, do not capture far branches Enable LBR stack to use LIFO filtering to capture Call stack profile Must be zero
1. Must set valid combination of bits 0-8 in conjunction with bit 9, otherwise the counter result is undefined. The call stack profiling capability is an enhancement of the LBR facility. The LBR stack is a ring buffer typically used to profile control flow transitions resulting from branches. However, the finite depth of the LBR stack often become less effective when profiling certain high-level languages (e.g. C++), where a transition of the execution flow is accompanied by a large number of leaf function calls, each of which returns an individual parameter to form the list of parameters for the main execution function call. A long list of such parameters returned by the leaf functions would serve to flush the data captured in the LBR stack, often losing the main execution context. When the call stack feature is enabled, the LBR stack will capture unfiltered call data normally, but as return instructions are executed the last captured branch record is flushed from the on-chip registers in a last-in first-out (LIFO) manner. Thus, branch information relative to leaf functions will not be captured, while preserving the call stack information of the main line execution path.
122
The configuration of the call stack facility is summarized below: Set IA32_DEBUGCTL.LBR (bit 0) to enable the LBR stack to capture branch records. The source and target addresses of the call branches will be captured in the 16 pairs of From/To LBR MSRs that form the LBR stack. Program the Top of Stack (TOS) MSR that points to the last valid from/to pair. This register is incremented by 1, modulo 16, before recording the next pair of addresses. Program the branch filtering bits of MSR_LBR_SELECT (bits 0:8) as desired. Program the MSR_LBR_SELECT to enable LIFO filtering of return instructions with: The following bits in MSR_LBR_SELECT must be set to 1: JCC, NEAR_IND_JMP, NEAR_REL_JMP, FAR_BRANCH, EN_CALLSTACK; The following bits in MSR_LBR_SELECT must be cleared: NEAR_REL_CALL, NEAR-IND_CALL, NEAR_RET; At most one of CPL_EQ_0, CPL_NEQ_0 is set. ...
17.14
Future generations of Intel Xeon processor may offer monitoring capability in each logical processor to measure specific quality-of-service metric, for example, L3 cache occupancy. The programming interface for this capability is described in the rest of this chapter.
17.14.1
Cache QoS Monitoring provides a layer of abstraction between applications and logical processors through the use of Resource Monitoring IDs (RMIDs). Each logical processor in the system can be assigned an RMID independently, or multiple logical processors can be assigned to the same RMID value (e.g., to track an application with multiple threads). For each logical processor, only one RMID value is active at a time. This is enforced by the IA32_PQR_ASSOC MSR, which specifies the active RMID of a logical processor. Writing to this MSR by software changes the active RMID of the logical processor from an old value to a new value. The Cache QoS Hardware tracks cache utilization of memory accesses according to the RMIDs and reports monitored data via a counter register (IA32_QM_CTR). Software must also configure an event selection MSR (IA32_QM_EVTSEL) to specify which QOS metric is to be reported. Processor support of the QoS Monitoring framework is reported via CPUID instruction. The resource type available to the QoS Monitoring framework is enumerated via a new leaf unction in CPUID. Reading and writing to the QoS MSRs require RDMSR and WRMSR instructions.
17.14.2
Software can query processor support of QoS capabilities by executing CPUID instruction with EAX = 07H, ECX = 0H as input. If CPUID.(EAX=07H, ECX=0):EBX.QOS[bit 12] reports 1, the processor provides the following programming interfaces for QoS monitoring: One or more sub-leaves in CPUID leaf function 0FH (QoS Enumeration leaf): QoS leaf sub-function 0 enumerates available resources that support QoS monitoring, i.e. executing CPUID with EAX=0FH and ECX=0H. In the initial implementation, L3 cache QoS is the only resource type available. Each supported resource type is represented by a bit field in CPUID.(EAX=0FH, ECX=0):EDX[31:1]. The bit position corresponds the sub-leaf index that software must use to query details of the QoS monitoring capability of that resource type. Reserved bit fields of CPUID.(EAX=0FH, ECX=0):EDX[31:1] corresponds to unsupported sub-leaves of the CPUID.0FH leaf (see Figure 17-19 and
123
Figure 17-20). Additionally, CPUID.(EAX=0FH, ECX=0H):EBX reports the highest RMID value of any resource type that supports QoS monitoring in the processor.
EDX
Reserved
2 1 L 3
31
EBX
EBX
Upscaling Factor
ECX
MaxRMID
EDX
EventTypeBitMask
124
IA32_PQR_ASSOC: This MSR specifies the active RMID that QoS monitoring hardware will use to tag internal operations, such as L3 cache request. The layout of the MSR is shown in Figure 17-21. Software specifies the active RMID to monitor in the IA32_PQR_ASSOC.RMID field. The width of the RMID field can vary from one implementation to another, and is derived from LOG2 ( 1 + CPUID.(EAX=0FH, ECX=0):EBX[31:0]). In the initial implementation, the width of the RMID field is 10 bits. The value of this MSR after power-on is 0. IA32_QM_EVTSEL: This MSR provides a role similar to the event select MSRs for programmable performance monitoring described in Chapter 18. The simplified layout of the MSR is shown in Figure 17-21. Bits IA32_QM_EVTSEL.EvtID (bits 7:0) specifies an event code of a supported resource type for hardware to report QoS monitored data associated with IA32_QM_EVTSEL.RMID (bits 41:32). Software can configure IA32_QM_EVTSEL.RMID with any RMID that are active within the physical processor. The width of IA32_QM_EVTSEL.RMID matches that of IA32_PQR_ASSOC.RMID. IA32_QM_CTR: This MSR reports monitored QoS data when available. It contains three bit fields. If software configures an unsupported RMID or event type in IA32_QM_EVTSEL, then IA32_QM_CTR.Error (bit 63) will be set, indicating there is no valid data to report. If IA32_QM_CTR.Unavailable (bit 62) is set, it indicates QoS monitored data for the RMID is not available, and IA32_QM_CTR.data (bits 61:0) should be ignored. Therefore, IA32_QM_CTR.data (bits 61:0) is valid only if bit 63 and 32 are both clear. For Cache QoS monitoring, software can convert IA32_QM_CTR.data into cache occupancy metric by multiplying with CPUID.(EAX=0FH, ECX=1H).EBX.
63 Reserved
42 41 RMID
32 31 Reserved
8 7 EvtID
0 IA32_QM_EVTSEL
63 E U
0 IA32_QM_CTR
125
3. If CPUID.(EAX=07H, ECX=0):EBX.QOS[bit 12] = 1, then execute CPUID with EAX=0FH, ECX= 0 to query available resource types that support QoS monitoring; 4. If CPUID.(EAX=0FH, ECX=0):EBX.L3[bit 1] = 1, then execute CPUID with EAX=0FH, ECX= 1 to query the capability of L3 Cache QoS monitoring. 5. If CPUID.(EAX=0FH, ECX=0):EBX reports additional resource types supporting QoS monitoring, then execute CPUID with EAX=0FH, ECX set to a corresponding resource type ID as enumerated by the bit position of CPUID.(EAX=0FH, ECX=0):EBX. ...
Description Unknown L3 cache miss Minimal latency core cache hit. This request was satisfied by the L1 data cache. Pending core cache HIT. Outstanding core cache miss to same cache-line address was already underway. This data request was satisfied by the L2. L3 HIT. Local or Remote home requests that hit L3 cache in the uncore with no coherency actions required (snooping). L3 HIT. Local or Remote home requests that hit the L3 cache and was serviced by another processor core with a cross core snoop where no modified copies were found. (clean). L3 HIT. Local or Remote home requests that hit the L3 cache and was serviced by another processor core with a cross core snoop where modified copies were found. (HITM). Reserved/LLC Snoop HitM. Local or Remote home requests that hit the last level cache and was serviced by another core with a cross core snoop where modified copies found L3 MISS. Local homed requests that missed the L3 cache and was serviced by forwarded data following a cross package snoop where no modified copies found. (Remote home requests are not counted). Reserved L3 MISS. Local home requests that missed the L3 cache and was serviced by local DRAM (go to shared state). L3 MISS. Remote home requests that missed the L3 cache and was serviced by remote DRAM (go to shared state). L3 MISS. Local home requests that missed the L3 cache and was serviced by local DRAM (go to exclusive state). L3 MISS. Remote home requests that missed the L3 cache and was serviced by remote DRAM (go to exclusive state). I/O, Request of input/output operation The request was to un-cacheable memory.
1. Bit 7 is supported only for processor with CPUID DisplayFamily_DisplayModel signature of 06_2A, and 06_2E; otherwise it is reserved. ...
126
18.8.4.3
Processors based on Intel microarchitecture code name Sandy Bridge offer a precise store capability that complements the load latency facility. It provides a means to profile store memory references in the system. Precise stores leverage the PEBS facility and provide additional information about sampled stores. Having precise memory reference events with linear address information for both loads and stores can help programmers improve data structure layout, eliminate remote node references, and identify cache-line conflicts in NUMA systems. Only IA32_PMC3 can be used to capture precise store information. After enabling this facility, counter overflows will initiate the generation of PEBS records as previously described in PEBS. Upon counter overflow hardware captures the linear address and other status information of the next store that retires. This information is then written to the PEBS record. To enable the precise store facility, software must complete the following steps. Please note that the precise store facility relies on the PEBS facility, so the PEBS configuration requirements must be completed before attempting to capture precise store information. Complete the PEBS configuration steps. Program the MEM_TRANS_RETIRED.PRECISE_STORE event in IA32_PERFEVTSEL3. Only counter 3 (IA32_PMC3) supports collection of precise store information. Set IA32_PEBS_ENABLE[3] and IA32_PEBS_ENABLE[63]. This enables IA32_PMC3 as a PEBS counter and enables the precise store facility, respectively.
The precise store information written into a PEBS record affects entries at offset 98H, A0H and A8H of Table 18-12. The specificity of Data Source entry at offset A0H has been enhanced to report three piece of information.
127
18.8.4.4
Upon triggering a PEBS assist, there will be a finite delay between the time the counter overflows and when the microcode starts to carry out its data collection obligations. INST_RETIRED is a very common event that is used to sample where performance bottleneck happened and to help identify its location in instruction address space. Even if the delay is constant in core clock space, it invariably manifest as variable skids in instruction address space. This creates a challenge for programmers to profile a workload and pinpoint the location of bottlenecks. The core PMU in processors based on Intel microarchitecture code name Sandy Bridge include a facility referred to as precise distribution of Instruction Retired (PDIR). The PDIR facility mitigates the skid problem by providing an early indication of when the INST_RETIRED counter is about to overflow, allowing the machine to more precisely trap on the instruction that actually caused the counter overflow thus eliminating skid. PDIR applies only to the INST_RETIRED.ALL precise event, and must use IA32_PMC1 with PerfEvtSel1 property configured and bit 1 in the IA32_PEBS_ENABLE set to 1. INST_RETIRED.ALL is a non-architectural performance event, it is not supported in prior generation microarchitectures. Additionally, on processors with CPUID DisplayFamily_DisplayModel signatures of 06_2A and 06_2D, the tool that programs PDIR should quiesce the rest of the programmable counters in the core when PDIR is active. ...
Table 18-34. Precise Events That Supports Data Linear Address Profiling
Event Name MEM_UOPS_RETIRED.STLB_MISS_LOADS MEM_UOPS_RETIRED.LOCK_LOADS MEM_UOPS_RETIRED.SPLIT_LOADS MEM_UOPS_RETIRED.ALL_LOADS MEM_LOAD_UOPS_RETIRED.L1_HIT MEM_LOAD_UOPS_RETIRED.LLC_HIT MEM_LOAD_UOPS_RETIRED.L2_MISS MEM_LOAD_UOPS_RETIRED.HIT_LFB MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT UOPS_RETIRED.ALL (if load or store is tagged) Event Name MEM_UOPS_RETIRED.STLB_MISS_STORES MEM_UOPS_RETIRED.LOCK_STORES MEM_UOPS_RETIRED.SPLIT_STORES MEM_UOPS_RETIRED.ALL_STORES MEM_LOAD_UOPS_RETIRED.L2_HIT MEM_LOAD_UOPS_RETIRED.L1_MISS MEM_LOAD_UOPS_RETIRED.LLC_MISS MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM MEM_LOAD_UOPS_MISC_RETIRED.UC
128
Table 18-34. Precise Events That Supports Data Linear Address Profiling (Contd.)
Event Name MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_NONE MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM_SNP_HIT MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_DRAM_SNP_HIT MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_FWD MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS DataLA can use any one of the IA32_PMC0-IA32_PMC3 counters. Counter overflows will initiate the generation of PEBS records. Upon counter overflow, hardware captures the linear address and possible other status information of the retiring memory uop. This information is then written to the PEBS record that is subsequently generated. To enable the DataLA facility, software must complete the following steps. Please note that the DataLA facility relies on the PEBS facility, so the PEBS configuration requirements must be completed before attempting to capture DataLA information. Complete the PEBS configuration steps. Program the an event listed in Table 18-34 using any one of IA32_PERFEVTSEL0-IA32_PERFEVTSEL3. Set the corresponding IA32_PEBS_ENABLE.PEBS_EN_CTRx bit. This enables the corresponding IA32_PMCx as a PEBS counter and enables the DataLA facility. Event Name MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_DRAM MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_HITM MEM_LOAD_UOPS_MISC_RETIRED.NON_DRAM
When the DataLA facility is enabled, the relevant information written into a PEBS record affects entries at offsets 98H, A0H and A8H, as shown in Table 18-35. ...
When the IA32_PERFEVTSELx MSR is programmed with both IN_TX=0 and IN_TXCP=0 on a processor that supports Intel TSX, the result in a counter may include detectable conditions associated with a transaction code region for its aborted execution (if any) and completed execution. In the initial implementation, when IN_TX (bit 32) is set, AnyThread (bit 21) should be cleared to prevent incorrect results.
129
63
34
31
24 23 22 21 20 19 18 17 16 15
8 7 Event Select
IN_TXCPIn Tx exclude abort IN_TXIn Trans. Rgn INVInvert counter mask ENEnable counters ANYAny Thread INTAPIC interrupt enable PCPin control EEdge detect OSOperating system mode USRUser Mode
Reserved
Additionally, a number of performance events are solely focused on characterizing the execution of Intel TSX transactional code, they are listed in Table 19-3. ...
130
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors
Event Num. 03H 05H 05H 07H 08H 08H 08H 08H 08H 08H 08H 08H 08H 0DH 0EH Umask Value 02H 01H 02H 01H 01H 02H 04H 0EH 10H 20H 40H 60H 80H 03H 01H Event Mask Mnemonic LD_BLOCKS.STORE_FORWARD MISALIGN_MEM_REF.LOADS MISALIGN_MEM_REF.STORES LD_BLOCKS_PARTIAL.ADDRESS _ALIAS Description loads blocked by overlapping with store buffer that cannot be forwarded . Speculative cache-line split load uops dispatched to L1D. Speculative cache-line split Store-address uops dispatched to L1D. False dependencies in MOB due to partial compare on address. Comment
DTLB_LOAD_MISSES.MISS_CAUS Misses in all TLB levels that cause a page walk of any ES_A_WALK page size. DTLB_LOAD_MISSES.WALK_COM Completed page walks due to demand load misses PLETED_4K that caused 4K page walks in any TLB levels. DTLB_LOAD_MISSES.WALK_COM Completed page walks due to demand load misses PLETED_2M_4M that caused 2M/4M page walks in any TLB levels. DTLB_LOAD_MISSES.WALK_COM Completed page walks in any TLB of any page size PLETED due to demand load misses DTLB_LOAD_MISSES.WALK_DUR Cycle PMH is busy with a walk. ATION DTLB_LOAD_MISSES.STLB_HIT_ Load misses that missed DTLB but hit STLB (4K). 4K DTLB_LOAD_MISSES.STLB_HIT_ Load misses that missed DTLB but hit STLB (2M). 2M DTLB_LOAD_MISSES.STLB_HIT Number of cache load STLB hits. No page walk. DTLB_LOAD_MISSES.PDE_CACH DTLB demand load misses with low part of linear-toE_MISS physical address translation missed INT_MISC.RECOVERY_CYCLES UOPS_ISSUED.ANY Cycles waiting to recover after Machine Clears except JEClear. Set Cmask= 1. Increments each cycle the # of Uops issued by the RAT to RS. Set Cmask = 1, Inv = 1, Any= 1to count stalled cycles of this core. Set Edge to count occurrences Set Cmask = 1, Inv = 1to count stalled cycles
0EH 0EH
10H 20H
UOPS_ISSUED.FLAGS_MERGE UOPS_ISSUED.SLOW_LEA
Number of flags-merge uops allocated. Such uops adds delay. Number of slow LEA or similar uops allocated. Such uop has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. Number of multiply packed/scalar single precision uops allocated.
UOPS_ISSUED.SiNGLE_MUL
L2_RQSTS.DEMAND_DATA_RD_ Demand Data Read requests that missed L2, no MISS rejects. L2_RQSTS.DEMAND_DATA_RD_ Demand Data Read requests that hit L2 cache. HIT
131
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. 24H 24H 24H 24H 24H 24H 24H 24H 24H 24H 24H 24H 24H 24H 27H 2EH 2EH 3CH Umask Value E1H 42H 22H E2H 44H 24H 27H E7H E4H 50H 30H F8H 3FH FFH 50H 4FH 41H 00H Event Mask Mnemonic L2_RQSTS.ALL_DEMAND_DATA _RD L2_RQSTS.RFO_HIT L2_RQSTS.RFO_MISS L2_RQSTS.ALL_RFO L2_RQSTS.CODE_RD_HIT L2_RQSTS.CODE_RD_MISS L2_RQSTS.ALL_DEMAND_MISS L2_RQSTS.ALL_DEMAND_REFE RENCES L2_RQSTS.ALL_CODE_RD L2_RQSTS.L2_PF_HIT L2_RQSTS.L2_PF_MISS L2_RQSTS.ALL_PF L2_RQSTS.MISS L2_RQSTS.REFERENCES L2_DEMAND_RQSTS.WB_HIT Description Counts any demand and L1 HW prefetch data load requests to L2. Counts the number of store RFO requests that hit the L2 cache. Counts the number of store RFO requests that miss the L2 cache. Counts all L2 store RFO requests. Number of instruction fetches that hit the L2 cache. Number of instruction fetches that missed the L2 cache. Demand requests that miss L2 cache. Demand requests to L2 cache. Counts all L2 code requests. Counts all L2 HW prefetcher requests that hit L2. Counts all L2 HW prefetcher requests that missed L2. Counts all L2 HW prefetcher requests. All requests that missed L2. All requests to L2 cache. Not rejected writebacks that hit L2 cache Comment
LONGEST_LAT_CACHE.REFEREN This event counts requests originating from the core see Table 19-1 CE that reference a cache line in the last level cache. LONGEST_LAT_CACHE.MISS CPU_CLK_UNHALTED.THREAD_ P This event counts each cache miss condition for references to the last level cache. see Table 19-1
Counts the number of thread cycles while the thread see Table 19-1 is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. Increments at the frequency of XCLK (100 MHz) when not halted. Increments the number of outstanding L1D misses every cycle. Set Cmaks = 1 and Edge =1 to count occurrences. see Table 19-1 Counter 2 only; Set Cmask = 1 to count cycles.
3CH 48H
01H 01H
DTLB_STORE_MISSES.MISS_CAU Miss in all TLB levels causes an page walk of any SES_A_WALK page size (4K/2M/4M/1G). DTLB_STORE_MISSES.WALK_CO Completed page walks due to store misses in one or MPLETED_4K more TLB levels of 4K page structure. DTLB_STORE_MISSES.WALK_CO Completed page walks due to store misses in one or MPLETED_2M_4M more TLB levels of 2M/4M page structure.
132
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. 49H 49H 49H 49H 49H 49H 4CH 4CH 51H 58H 58H 58H 58H 5CH 5CH 5EH 60H 60H 60H 60H Umask Value 0EH 10H 20H 40H 60H 80H 01H 02H 01H 04H 08H 01H 02H 01H 02H 01H 01H 02H 04H 08H Event Mask Mnemonic Description Comment
DTLB_STORE_MISSES.WALK_CO Completed page walks due to store miss in any TLB MPLETED levels of any page size (4K/2M/4M/1G). DTLB_STORE_MISSES.WALK_DU Cycles PMH is busy with this walk. RATION DTLB_STORE_MISSES.STLB_HIT Store misses that missed DTLB but hit STLB (4K). _4K DTLB_STORE_MISSES.STLB_HIT Store misses that missed DTLB but hit STLB (2M). _2M DTLB_STORE_MISSES.STLB_HIT Store operations that miss the first TLB level but hit the second and do not cause page walks. DTLB_STORE_MISSES.PDE_CAC HE_MISS LOAD_HIT_PRE.SW_PF LOAD_HIT_PRE.HW_PF L1D.REPLACEMENT DTLB store misses with low part of linear-to-physical address translation missed. Non-SW-prefetch load dispatches that hit fill buffer allocated for S/W prefetch. Non-SW-prefetch load dispatches that hit fill buffer allocated for H/W prefetch. Counts the number of lines brought into the L1 data cache.
MOVE_ELIMINATION.INT_NOT_E Number of integer Move Elimination candidate uops LIMINATED that were not eliminated. MOVE_ELIMINATION.SIMD_NOT_ Number of SIMD Move Elimination candidate uops ELIMINATED that were not eliminated. MOVE_ELIMINATION.INT_ELIMIN Number of integer Move Elimination candidate uops ATED that were eliminated. MOVE_ELIMINATION.SIMD_ELIMI Number of SIMD Move Elimination candidate uops NATED that were eliminated. CPL_CYCLES.RING0 CPL_CYCLES.RING123 RS_EVENTS.EMPTY_CYCLES Unhalted core cycles when the thread is in ring 0. Unhalted core cycles when the thread is not in ring 0. Cycles the RS is empty for the thread. Use Edge to count transition
OFFCORE_REQUESTS_OUTSTAN Offcore outstanding Demand Data Read transactions Use only when HTT is off DING.DEMAND_DATA_RD in SQ to uncore. Set Cmask=1 to count cycles. OFFCORE_REQUESTS_OUTSTAN Offcore outstanding Demand code Read transactions Use only when HTT is off DING.DEMAND_CODE_RD in SQ to uncore. Set Cmask=1 to count cycles. OFFCORE_REQUESTS_OUTSTAN Offcore outstanding RFO store transactions in SQ to Use only when HTT is off DING.DEMAND_RFO uncore. Set Cmask=1 to count cycles. OFFCORE_REQUESTS_OUTSTAN Offcore outstanding cacheable data read DING.ALL_DATA_RD transactions in SQ to uncore. Set Cmask=1 to count cycles. LOCK_CYCLES.SPLIT_LOCK_UC_ LOCK_DURATION Cycles in which the L1D and L2 are locked, due to a UC lock or split lock. Use only when HTT is off
63H
01H
133
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. 63H 79H 79H Umask Value 02H 02H 04H Event Mask Mnemonic LOCK_CYCLES.CACHE_LOCK_DU RATION IDQ.EMPTY IDQ.MITE_UOPS Description Cycles in which the L1D is locked. Counts cycles the IDQ is empty. Increment each cycle # of uops delivered to IDQ from Can combine Umask 04H MITE path. and 20H Set Cmask = 1 to count cycles. 79H 08H IDQ.DSB_UOPS Increment each cycle. # of uops delivered to IDQ from DSB path. Set Cmask = 1 to count cycles. 79H 10H IDQ.MS_DSB_UOPS Increment each cycle # of uops delivered to IDQ when MS_busy by DSB. Set Cmask = 1 to count cycles. Add Edge=1 to count # of delivery. Increment each cycle # of uops delivered to IDQ when MS_busy by MITE. Set Cmask = 1 to count cycles. Can combine Umask 04H, 08H Can combine Umask 04H, 08H Can combine Umask 08H and 10H Comment
79H
20H
IDQ.MS_MITE_UOPS
79H
30H
IDQ.MS_UOPS
Increment each cycle # of uops delivered to IDQ from Can combine Umask 04H, MS by either DSB or MITE. Set Cmask = 1 to count 08H cycles. Counts cycles DSB is delivered at least one uops. Set Cmask = 1. Counts cycles DSB is delivered four uops. Set Cmask = 4.
79H 79H 79H 79H 79H 80H 85H 85H 85H 85H 85H 85H 85H 85H
18H 18H 24H 24H 3CH 02H 01H 02H 04H 0EH 10H 20H 40H 60H
IDQ.ALL_DSB_CYCLES_ANY_UO PS IDQ.ALL_DSB_CYCLES_4_UOPS
IDQ.ALL_MITE_CYCLES_ANY_UO Counts cycles MITE is delivered at least one uops. Set PS Cmask = 1. IDQ.ALL_MITE_CYCLES_4_UOPS Counts cycles MITE is delivered four uops. Set Cmask = 4. IDQ.MITE_ALL_UOPS ICACHE.MISSES ITLB_MISSES.MISS_CAUSES_A_ WALK ITLB_MISSES.WALK_COMPLETE D_4K ITLB_MISSES.WALK_COMPLETE D_2M_4M ITLB_MISSES.WALK_COMPLETE D ITLB_MISSES.WALK_DURATION ITLB_MISSES.STLB_HIT_4K ITLB_MISSES.STLB_HIT_2M ITLB_MISSES.STLB_HIT # of uops delivered to IDQ from any path. Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes UC accesses. Misses in ITLB that causes a page walk of any page size. Completed page walks due to misses in ITLB 4K page entries. Completed page walks due to misses in ITLB 2M/4M page entries. Completed page walks in ITLB of any page size. Cycle PMH is busy with a walk. ITLB misses that hit STLB (4K). ITLB misses that hit STLB (2M). ITLB misses that hit STLB. No page walk.
134
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. 87H 87H 88H 88H 88H 88H 88H 88H 88H 88H 88H 89H 89H 89H 89H 89H 89H 89H 89H 9CH A1H Umask Value 01H 04H 01H 02H 04H 08H 10H 20H 40H 80H FFH 01H 04H 08H 10H 20H 40H 80H FFH 01H 01H Event Mask Mnemonic ILD_STALL.LCP ILD_STALL.IQ_FULL BR_INST_EXEC.COND BR_INST_EXEC.DIRECT_JMP BR_INST_EXEC.INDIRECT_JMP_ NON_CALL_RET BR_INST_EXEC.RETURN_NEAR Description Stalls caused by changing prefix length of the instruction. Stall cycles due to IQ is full. Qualify conditional near branch instructions executed, but not necessarily retired. Qualify all unconditional near branch instructions excluding calls and indirect branches. Qualify executed indirect near branch instructions that are not calls nor returns. Qualify indirect near branches that have a return mnemonic. Must combine with umask 40H, 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Comment
BR_INST_EXEC.DIRECT_NEAR_C Qualify unconditional near call branch instructions, ALL excluding non call branch, executed.
BR_INST_EXEC.INDIRECT_NEAR Qualify indirect near calls, including both register and Must combine with _CALL memory indirect, executed. umask 80H BR_INST_EXEC.NONTAKEN BR_INST_EXEC.TAKEN Qualify non-taken near branches executed. Qualify taken near branches executed. Must combine with 01H,02H, 04H, 08H, 10H, 20H. Applicable to umask 01H only
BR_INST_EXEC.ALL_BRANCHES Counts all near executed branches (not necessarily retired). BR_MISP_EXEC.COND BR_MISP_EXEC.INDIRECT_JMP_ NON_CALL_RET BR_MISP_EXEC.RETURN_NEAR Qualify conditional near branch instructions mispredicted. Qualify mispredicted indirect near branch instructions that are not calls nor returns. Qualify mispredicted indirect near branches that have a return mnemonic. Must combine with umask 40H, 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Applicable to umask 01H only
BR_MISP_EXEC.DIRECT_NEAR_C Qualify mispredicted unconditional near call branch ALL instructions, excluding non call branch, executed. BR_MISP_EXEC.INDIRECT_NEAR Qualify mispredicted indirect near calls, including _CALL both register and memory indirect, executed. BR_MISP_EXEC.NONTAKEN BR_MISP_EXEC.TAKEN Qualify mispredicted non-taken near branches executed. Qualify mispredicted taken near branches executed. Must combine with 01H,02H, 04H, 08H, 10H, 20H.
BR_MISP_EXEC.ALL_BRANCHES Counts all near executed branches (not necessarily retired). IDQ_UOPS_NOT_DELIVERED.CO RE UOPS_EXECUTED_PORT.PORT_ 0 Count number of non-delivered uops to RAT per thread. Cycles which a Uop is dispatched on port 0 in this thread. Use Cmask to qualify uop b/w Set AnyThread to count per core
135
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. A1H A1H A1H A1H A1H A1H A1H A2H A2H A2H A2H A3H A3H A3H A3H AEH B0H B0H B0H B0H B1H B7H Umask Value 02H 04H 08H 10H 20H 40H 80H 01H 04H 08H 10H 01H 02H 05H 08H 01H 01H 02H 04H 08H 02H 01H Event Mask Mnemonic UOPS_EXECUTED_PORT.PORT_ 1 UOPS_EXECUTED_PORT.PORT_ 2 UOPS_EXECUTED_PORT.PORT_ 3 UOPS_EXECUTED_PORT.PORT_ 4 UOPS_EXECUTED_PORT.PORT_ 5 UOPS_EXECUTED_PORT.PORT_ 6 UOPS_EXECUTED_PORT.PORT_ 7 RESOURCE_STALLS.ANY RESOURCE_STALLS.RS RESOURCE_STALLS.SB RESOURCE_STALLS.ROB Description Cycles which a Uop is dispatched on port 1 in this thread. Cycles which a uop is dispatched on port 2 in this thread. Cycles which a uop is dispatched on port 3 in this thread. Cycles which a uop is dispatched on port 4 in this thread. Cycles which a uop is dispatched on port 5 in this thread. Cycles which a Uop is dispatched on port 6 in this thread. Cycles which a Uop is dispatched on port 7 in this thread Cycles Allocation is stalled due to Resource Related reason. Cycles stalled due to no eligible RS entry available. Cycles stalled due to no store buffers available (not including draining form sync). Cycles stalled due to re-order buffer full. Use only when HTT is off Comment Set AnyThread to count per core Set AnyThread to count per core Set AnyThread to count per core Set AnyThread to count per core Set AnyThread to count per core Set AnyThread to count per core Set AnyThread to count per core
CYCLE_ACTIVITY.CYCLES_L2_PE Cycles with pending L2 miss loads. Set Cmask=2 to NDING count cycle. CYCLE_ACTIVITY.CYCLES_LDM_ PENDING Cycles with pending memory loads. Set Cmask=2 to count cycle.
CYCLE_ACTIVITY.STALLS_L2_PE Number of loads missed L2. NDING CYCLE_ACTIVITY.CYCLES_L1D_P Cycles with pending L1 cache miss loads. Set ENDING Cmask=8 to count cycle. ITLB.ITLB_FLUSH Counts the number of ITLB flushes, includes 4k/2M/ 4M pages.
OFFCORE_REQUESTS.DEMAND_ Demand data read requests sent to uncore. DATA_RD OFFCORE_REQUESTS.DEMAND_ Demand code read requests sent to uncore. CODE_RD
Use only when HTT is off Use only when HTT is off
OFFCORE_REQUESTS.DEMAND_ Demand RFO read requests sent to uncore, including Use only when HTT is off RFO regular RFOs, locks, ItoM. OFFCORE_REQUESTS.ALL_DATA Data read requests sent to uncore (demand and _RD prefetch). UOPS_EXECUTED.CORE OFF_CORE_RESPONSE_0 Use only when HTT is off
Counts total number of uops to be executed per-core Do not need to set ANY each cycle. see Section 18.8.5, Off-core Response Performance Requires MSR 01A6H Monitoring.
136
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. BBH BCH BCH BCH BCH BCH BCH BCH BCH BDH BDH C0H C0H C1H C1H C1H C2H Umask Value 01H 11H 21H 12H 22H 14H 24H 18H 28H 01H 20H 00H 01H 08H 10H 40H 01H Event Mask Mnemonic OFF_CORE_RESPONSE_1 Description Comment
See Section 18.8.5, Off-core Response Performance Requires MSR 01A7H Monitoring.
PAGE_WALKER_LOADS.DTLB_L1 Number of DTLB page walker loads that hit in the L1+FB. PAGE_WALKER_LOADS.ITLB_L1 Number of ITLB page walker loads that hit in the L1+FB. Number of ITLB page walker loads that hit in the L2. Number of ITLB page walker loads that hit in the L3. Number of DTLB page walker loads from memory.
PAGE_WALKER_LOADS.DTLB_L2 Number of DTLB page walker loads that hit in the L2. PAGE_WALKER_LOADS.ITLB_L2 PAGE_WALKER_LOADS.ITLB_L3 PAGE_WALKER_LOADS.DTLB_M EMORY PAGE_WALKER_LOADS.DTLB_L3 Number of DTLB page walker loads that hit in the L3.
PAGE_WALKER_LOADS.ITLB_ME Number of ITLB page walker loads from memory. MORY TLB_FLUSH.DTLB_THREAD TLB_FLUSH.STLB_ANY INST_RETIRED.ANY_P INST_RETIRED.ALL OTHER_ASSISTS.AVX_TO_SSE OTHER_ASSISTS.SSE_TO_AVX OTHER_ASSISTS.ANY_WB_ASSI ST UOPS_RETIRED.ALL DTLB flush attempts of the thread-specific entries. Count number of STLB flush attempts. Number of instructions at retirement. Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution. Number of transitions from AVX-256 to legacy SSE when penalty applicable. Number of transitions from SSE to AVX-256 when penalty applicable. Number of microcode assists invoked by HW upon uop writeback. Counts the number of micro-ops retired, Use cmask=1 and invert to count active cycles or stalled cycles. Counts the number of retirement slots used each cycle. Supports PEBS, use Any=1 for core granular. See Table 19-1 PMC1 only;
UOPS_RETIRED.RETIRE_SLOTS
MACHINE_CLEARS.MEMORY_OR Counts the number of machine clears due to memory DERING order conflicts. MACHINE_CLEARS.SMC MACHINE_CLEARS.MASKMOV Number of self-modifying-code machine clears detected. Counts the number of executed AVX masked load operations that refer to an illegal address range with the mask bits set to 0. Branch instructions at retirement. See Table 19-1
C4H C4H
00H 01H
BR_INST_RETIRED.ALL_BRANC HES
BR_INST_RETIRED.CONDITIONA Counts the number of conditional branch instructions Supports PEBS L retired.
137
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. C4H C4H C4H C4H C4H C4H C5H C5H C5H CAH CAH CAH CAH CAH CCH CDH Umask Value 02H 04H 08H 10H 20H 40H 00H 01H 04H 02H 04H 08H 10H 1EH 20H 01H Event Mask Mnemonic BR_INST_RETIRED.NEAR_CALL BR_INST_RETIRED.ALL_BRANC HES BR_INST_RETIRED.NEAR_RETU RN BR_INST_RETIRED.NOT_TAKEN BR_INST_RETIRED.NEAR_TAKE N BR_INST_RETIRED.FAR_BRANC H BR_MISP_RETIRED.ALL_BRANC HES Description Direct and indirect near call instructions retired. Counts the number of branch instructions retired. Counts the number of near return instructions retired. Counts the number of not taken branch instructions retired. Number of near taken branches retired. Number of far branches retired. Mispredicted branch instructions at retirement See Table 19-1 Supports PEBS Comment
BR_MISP_RETIRED.CONDITIONA Mispredicted conditional branch instructions retired. L BR_MISP_RETIRED.ALL_BRANC HES FP_ASSIST.X87_OUTPUT FP_ASSIST.X87_INPUT FP_ASSIST.SIMD_OUTPUT FP_ASSIST.SIMD_INPUT FP_ASSIST.ANY ROB_MISC_EVENTS.LBR_INSER TS MEM_TRANS_RETIRED.LOAD_L ATENCY MEM_UOP_RETIRED.LOADS MEM_UOP_RETIRED.STORES Mispredicted macro branch instructions retired. Number of X87 FP assists due to Output values. Number of X87 FP assists due to input values. Number of SIMD FP assists due to Output values. Number of SIMD FP assists due to input values. Cycles with any input/output SSE* or FP assists. Count cases of saving new LBR records by hardware.
Randomly sampled loads whose latency is above a Specify threshold in MSR user defined threshold. A small fraction of the overall 0x3F6 loads are sampled due to randomization. Qualify retired memory uops that are loads. Combine Supports PEBS and with umask 10H, 20H, 40H, 80H. DataLA Qualify retired memory uops that are stores. Combine with umask 10H, 20H, 40H, 80H. Supports PEBS and DataLA Supports PEBS and DataLA
MEM_UOP_RETIRED.STLB_MISS Qualify retired memory uops with STLB miss. Must combine with umask 01H, 02H, to produce counts. MEM_UOP_RETIRED.LOCK MEM_UOP_RETIRED.SPLIT MEM_UOP_RETIRED.ALL
Qualify retired memory uops with lock. Must combine Supports PEBS and with umask 01H, 02H, to produce counts. DataLA Qualify retired memory uops with line split. Must combine with umask 01H, 02H, to produce counts. Supports PEBS and DataLA
Qualify any retired memory uops. Must combine with Supports PEBS and umask 01H, 02H, to produce counts. DataLA
MEM_LOAD_UOPS_RETIRED.L1_ Retired load uops with L1 cache hits as data sources. Supports PEBS and HIT DataLA
138
Table 19-2. Non-Architectural Performance Events In the Processor Core of Next Generation Intel Core Processors (Contd.)
Event Num. D1H D1H D1H D1H Umask Value 02H 04H 10H 40H Event Mask Mnemonic Description Comment
MEM_LOAD_UOPS_RETIRED.L2_ Retired load uops with L2 cache hits as data sources. Supports PEBS and HIT DataLA MEM_LOAD_UOPS_RETIRED.LLC Retired load uops with LLC cache hits as data _HIT sources. MEM_LOAD_UOPS_RETIRED.L2_ Retired load uops missed L2. Unknown data source MISS excluded. MEM_LOAD_UOPS_RETIRED.HIT Retired load uops which data sources were load uops _LFB missed L1 but hit FB due to preceding miss to the same cache line with data not ready. MEM_LOAD_UOPS_LLC_HIT_RE TIRED.XSNP_MISS MEM_LOAD_UOPS_LLC_HIT_RE TIRED.XSNP_HIT MEM_LOAD_UOPS_LLC_HIT_RE TIRED.XSNP_HITM MEM_LOAD_UOPS_LLC_HIT_RE TIRED.XSNP_NONE MEM_LOAD_UOPS_LLC_MISS_R ETIRED.LOCAL_DRAM BACLEARS.ANY L2_TRANS.DEMAND_DATA_RD L2_TRANS.RFO L2_TRANS.CODE_RD L2_TRANS.ALL_PF L2_TRANS.L1D_WB L2_TRANS.L2_FILL L2_TRANS.L2_WB L2_TRANS.ALL_REQUESTS L2_LINES_IN.I L2_LINES_IN.S L2_LINES_IN.E L2_LINES_IN.ALL Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. Retired load uops which data sources were HitM responses from shared LLC. Retired load uops which data sources were hits in LLC without snoops required. Supports PEBS and DataLA Supports PEBS and DataLA Supports PEBS and DataLA Supports PEBS and DataLA Supports PEBS and DataLA Supports PEBS and DataLA
D2H D2H D2H D2H D3H E6H F0H F0H F0H F0H F0H F0H F0H F0H F1H F1H F1H F1H F2H F2H
01H 02H 04H 08H 01H 1FH 01H 02H 04H 08H 10H 20H 40H 80H 01H 02H 04H 07H 05H 06H
Retired load uops which data sources missed LLC but Supports PEBS and serviced from local dram. DataLA. Number of front end re-steers due to BPU misprediction. Demand Data Read requests that access L2 cache. RFO requests that access L2 cache. L2 cache accesses when fetching instructions. Any MLC or LLC HW prefetch accessing L2, including rejects. L1D writebacks that access L2 cache. L2 fill requests that access L2 cache. L2 writebacks that access L2 cache. Transactions accessing L2 pipe. L2 cache lines in I state filling L2. L2 cache lines in S state filling L2. L2 cache lines in E state filling L2. L2 cache lines filling L2. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects.
L2_LINES_OUT.DEMAND_CLEAN Clean L2 cache lines evicted by demand. L2_LINES_OUT.DEMAND_DIRTY Dirty L2 cache lines evicted by demand.
139
....
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors
Event Num. 03H 05H 05H 07H 08H 08H 08H 0EH Umask Value 02H 01H 02H 01H 81H 82H 84H 01H Event Mask Mnemonic LD_BLOCKS.STORE_FORWARD MISALIGN_MEM_REF.LOADS MISALIGN_MEM_REF.STORES LD_BLOCKS_PARTIAL.ADDRESS_ ALIAS Description loads blocked by overlapping with store buffer that cannot be forwarded . Speculative cache-line split load uops dispatched to L1D. Speculative cache-line split Store-address uops dispatched to L1D. False dependencies in MOB due to partial compare on address. Comment
DTLB_LOAD_MISSES.MISS_CAUSE Misses in all TLB levels that cause a page walk of S_A_WALK any page size from demand loads. DTLB_LOAD_MISSES.WALK_COM PLETED DTLB_LOAD_MISSES.WALK_DUR ATION UOPS_ISSUED.ANY Misses in all TLB levels that caused page walk completed of any size by demand loads. Cycle PMH is busy with a walk due to demand loads. Increments each cycle the # of Uops issued by the RAT to RS. Set Cmask = 1, Inv = 1, Any= 1to count stalled cycles of this core. Set Cmask = 1, Inv = 1to count stalled cycles
0EH 0EH
10H 20H
UOPS_ISSUED.FLAGS_MERGE UOPS_ISSUED.SLOW_LEA
Number of flags-merge uops allocated. Such uops adds delay. Number of slow LEA or similar uops allocated. Such uop has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. Number of multiply packed/scalar single precision uops allocated. Cycles that the divider is active, includes INT and FP. Set 'edge =1, cmask=1' to count the number of divides.
0EH 14H
40H 01H
UOPS_ISSUED.SiNGLE_MUL ARITH.FPU_DIV_ACTIVE
L2_RQSTS.DEMAND_DATA_RD_H Demand Data Read requests that hit L2 cache IT L2_RQSTS.ALL_DEMAND_DATA_ RD L2_RQSTS.RFO_HITS L2_RQSTS.RFO_MISS L2_RQSTS.ALL_RFO L2_RQSTS.CODE_RD_HIT L2_RQSTS.CODE_RD_MISS Counts any demand and L1 HW prefetch data load requests to L2. Counts the number of store RFO requests that hit the L2 cache. Counts the number of store RFO requests that miss the L2 cache. Counts all L2 store RFO requests. Number of instruction fetches that hit the L2 cache. Number of instruction fetches that missed the L2 cache.
140
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. 24H 24H 24H 24H 27H 27H 27H 28H 28H 28H 28H 2EH Umask Value 30H 40H 80H C0H 01H 08H 0FH 01H 04H 08H 0FH 4FH Event Mask Mnemonic L2_RQSTS.ALL_CODE_RD L2_RQSTS.PF_HIT L2_RQSTS.PF_MISS L2_RQSTS.ALL_PF L2_STORE_LOCK_RQSTS.MISS L2_STORE_LOCK_RQSTS.HIT_M L2_STORE_LOCK_RQSTS.ALL L2_L1D_WB_RQSTS.MISS L2_L1D_WB_RQSTS.HIT_E L2_L1D_WB_RQSTS.HIT_M L2_L1D_WB_RQSTS.ALL Description Counts all L2 code requests. Counts all L2 HW prefetcher requests that hit L2. Counts all L2 HW prefetcher requests that missed L2. Counts all L2 HW prefetcher requests. RFOs that miss cache lines RFOs that hit cache lines in M state RFOs that access cache lines in any state Not rejected writebacks that missed LLC. Not rejected writebacks from L1D to L2 cache lines in E state. Not rejected writebacks from L1D to L2 cache lines in M state. Not rejected writebacks from L1D to L2 cache lines in any state. see Table 19-1 Comment
LONGEST_LAT_CACHE.REFERENC This event counts requests originating from the E core that reference a cache line in the last level cache. LONGEST_LAT_CACHE.MISS CPU_CLK_UNHALTED.THREAD_P This event counts each cache miss condition for references to the last level cache. Counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.
2EH 3CH
41H 00H
3CH 48H
01H 01H
CPU_CLK_THREAD_UNHALTED.R Increments at the frequency of XCLK (100 MHz) EF_XCLK when not halted. L1D_PEND_MISS.PENDING Increments the number of outstanding L1D misses every cycle. Set Cmaks = 1 and Edge =1 to count occurrences.
DTLB_STORE_MISSES.MISS_CAUS Miss in all TLB levels causes an page walk of any ES_A_WALK page size (4K/2M/4M/1G). DTLB_STORE_MISSES.WALK_CO MPLETED Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G).
DTLB_STORE_MISSES.WALK_DUR Cycles PMH is busy with this walk. ATION DTLB_STORE_MISSES.STLB_HIT LOAD_HIT_PRE.SW_PF LOAD_HIT_PRE.HW_PF Store operations that miss the first TLB level but hit the second and do not cause page walks Non-SW-prefetch load dispatches that hit fill buffer allocated for S/W prefetch. Non-SW-prefetch load dispatches that hit fill buffer allocated for H/W prefetch.
141
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. 51H 58H 58H 58H 58H 5CH 5CH 5EH 5FH 60H Umask Value 01H 04H 08H 01H 02H 01H 02H 01H 04H 01H Event Mask Mnemonic L1D.REPLACEMENT Description Counts the number of lines brought into the L1 data cache. Comment
MOVE_ELIMINATION.INT_NOT_EL Number of integer Move Elimination candidate uops IMINATED that were not eliminated. MOVE_ELIMINATION.SIMD_NOT_E Number of SIMD Move Elimination candidate uops LIMINATED that were not eliminated. MOVE_ELIMINATION.INT_ELIMINA Number of integer Move Elimination candidate uops TED that were eliminated. MOVE_ELIMINATION.SIMD_ELIMIN Number of SIMD Move Elimination candidate uops ATED that were eliminated. CPL_CYCLES.RING0 CPL_CYCLES.RING123 RS_EVENTS.EMPTY_CYCLES DTLB_LOAD_MISSES.STLB_HIT OFFCORE_REQUESTS_OUTSTAN DING.DEMAND_DATA_RD OFFCORE_REQUESTS_OUTSTAN DING.DEMAND_CODE_RD OFFCORE_REQUESTS_OUTSTAN DING.DEMAND_RFO OFFCORE_REQUESTS_OUTSTAN DING.ALL_DATA_RD LOCK_CYCLES.SPLIT_LOCK_UC_L OCK_DURATION Unhalted core cycles when the thread is in ring 0. Unhalted core cycles when the thread is not in ring 0. Cycles the RS is empty for the thread. Counts load operations that missed 1st level DTLB but hit the 2nd level. Offcore outstanding Demand Data Read transactions in SQ to uncore. Set Cmask=1 to count cycles. Offcore outstanding Demand Code Read transactions in SQ to uncore. Set Cmask=1 to count cycles. Offcore outstanding RFO store transactions in SQ to uncore. Set Cmask=1 to count cycles. Offcore outstanding cacheable data read transactions in SQ to uncore. Set Cmask=1 to count cycles. Cycles in which the L1D and L2 are locked, due to a UC lock or split lock. Use Edge to count transition
60H
02H
60H 60H
04H 08H
LOCK_CYCLES.CACHE_LOCK_DUR Cycles in which the L1D is locked. ATION IDQ.EMPTY IDQ.MITE_UOPS Counts cycles the IDQ is empty. Increment each cycle # of uops delivered to IDQ from MITE path. Set Cmask = 1 to count cycles. Increment each cycle. # of uops delivered to IDQ from DSB path. Set Cmask = 1 to count cycles. Increment each cycle # of uops delivered to IDQ when MS_busy by DSB. Set Cmask = 1 to count cycles. Add Edge=1 to count # of delivery. Can combine Umask 04H and 20H Can combine Umask 08H and 10H Can combine Umask 04H, 08H
79H
08H
IDQ.DSB_UOPS
79H
10H
IDQ.MS_DSB_UOPS
142
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. 79H Umask Value 20H Event Mask Mnemonic IDQ.MS_MITE_UOPS Description Increment each cycle # of uops delivered to IDQ when MS_busy by MITE. Set Cmask = 1 to count cycles. Increment each cycle # of uops delivered to IDQ from MS by either DSB or MITE. Set Cmask = 1 to count cycles. Counts cycles DSB is delivered at least one uops. Set Cmask = 1. Counts cycles DSB is delivered four uops. Set Cmask = 4. Comment Can combine Umask 04H, 08H Can combine Umask 04H, 08H
79H
30H
IDQ.MS_UOPS
79H 79H 79H 79H 79H 80H 85H 85H 85H 85H 87H 87H 88H 88H 88H 88H 88H 88H 88H 88H
18H 18H 24H 24H 3CH 02H 01H 02H 04H 10H 01H 04H 01H 02H 04H 08H 10H 20H 40H 80H
IDQ.ALL_DSB_CYCLES_ANY_UOP S IDQ.ALL_DSB_CYCLES_4_UOPS
IDQ.ALL_MITE_CYCLES_ANY_UOP Counts cycles MITE is delivered at least one uops. S Set Cmask = 1. IDQ.ALL_MITE_CYCLES_4_UOPS IDQ.MITE_ALL_UOPS ICACHE.MISSES Counts cycles MITE is delivered four uops. Set Cmask = 4. # of uops delivered to IDQ from any path. Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes UC accesses.
ITLB_MISSES.MISS_CAUSES_A_W Misses in all ITLB levels that cause page walks ALK ITLB_MISSES.WALK_COMPLETED ITLB_MISSES.WALK_DURATION ITLB_MISSES.STLB_HIT ILD_STALL.LCP ILD_STALL.IQ_FULL BR_INST_EXEC.COND BR_INST_EXEC.DIRECT_JMP BR_INST_EXEC.INDIRECT_JMP_N ON_CALL_RET BR_INST_EXEC.RETURN_NEAR BR_INST_EXEC.DIRECT_NEAR_C ALL Misses in all ITLB levels that cause completed page walks Cycle PMH is busy with a walk. Number of cache load STLB hits. No page walk. Stalls caused by changing prefix length of the instruction. Stall cycles due to IQ is full. Qualify conditional near branch instructions executed, but not necessarily retired. Qualify all unconditional near branch instructions excluding calls and indirect branches. Qualify executed indirect near branch instructions that are not calls nor returns. Qualify indirect near branches that have a return mnemonic. Qualify unconditional near call branch instructions, excluding non call branch, executed. Must combine with umask 40H, 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Applicable to umask 01H only
BR_INST_EXEC.INDIRECT_NEAR_ Qualify indirect near calls, including both register CALL and memory indirect, executed. BR_INST_EXEC.NONTAKEN BR_INST_EXEC.TAKEN Qualify non-taken near branches executed. Qualify taken near branches executed. Must combine with 01H,02H, 04H, 08H, 10H, 20H.
143
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. 88H 89H 89H 89H 89H 89H 89H 89H 89H 9CH A1H A1H A1H A1H A1H A1H A1H A1H A1H A1H A2H Umask Value FFH 01H 04H 08H 10H 20H 40H 80H FFH 01H 01H 02H 04H 08H 0CH 10H 20H 30H 40H 80H 01H Event Mask Mnemonic BR_INST_EXEC.ALL_BRANCHES BR_MISP_EXEC.COND BR_MISP_EXEC.INDIRECT_JMP_N ON_CALL_RET BR_MISP_EXEC.RETURN_NEAR BR_MISP_EXEC.DIRECT_NEAR_C ALL Description Counts all near executed branches (not necessarily retired). Qualify conditional near branch instructions mispredicted. Qualify mispredicted indirect near branch instructions that are not calls nor returns. Qualify mispredicted indirect near branches that have a return mnemonic. Qualify mispredicted unconditional near call branch instructions, excluding non call branch, executed. Must combine with umask 40H, 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Applicable to umask 01H only Comment
BR_MISP_EXEC.INDIRECT_NEAR_ Qualify mispredicted indirect near calls, including CALL both register and memory indirect, executed. BR_MISP_EXEC.NONTAKEN BR_MISP_EXEC.TAKEN BR_MISP_EXEC.ALL_BRANCHES Qualify mispredicted non-taken near branches executed. Qualify mispredicted taken near branches executed. Must combine with 01H,02H, 04H, 08H, 10H, 20H. Counts all near executed branches (not necessarily retired).
IDQ_UOPS_NOT_DELIVERED.COR Count number of non-delivered uops to RAT per E thread. UOPS_DISPATCHED_PORT.PORT_ Cycles which a Uop is dispatched on port 0. 0 UOPS_DISPATCHED_PORT.PORT_ Cycles which a Uop is dispatched on port 1. 1 UOPS_DISPATCHED_PORT.PORT_ Cycles which a load uop is dispatched on port 2. 2_LD UOPS_DISPATCHED_PORT.PORT_ Cycles which a store address uop is dispatched on 2_STA port 2. UOPS_DISPATCHED_PORT.PORT_ Cycles which a Uop is dispatched on port 2. 2 UOPS_DISPATCHED_PORT.PORT_ Cycles which a load uop is dispatched on port 3. 3_LD UOPS_DISPATCHED_PORT.PORT_ Cycles which a store address uop is dispatched on 3_STA port 3. UOPS_DISPATCHED_PORT.PORT_ Cycles which a Uop is dispatched on port 3. 3 UOPS_DISPATCHED_PORT.PORT_ Cycles which a Uop is dispatched on port 4. 4 UOPS_DISPATCHED_PORT.PORT_ Cycles which a Uop is dispatched on port 5. 5 RESOURCE_STALLS.ANY Cycles Allocation is stalled due to Resource Related reason.
144
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. A2H A2H A2H A3H A3H A3H A3H ABH ABH ACH AEH B0H B0H B0H B0H B1H Umask Value 04H 08H 10H 01H 02H 08H 04H 01H 02H 08H 01H 01H 02H 04H 08H 01H Event Mask Mnemonic RESOURCE_STALLS.RS RESOURCE_STALLS.SB RESOURCE_STALLS.ROB Description Cycles stalled due to no eligible RS entry available. Cycles stalled due to no store buffers available (not including draining form sync). Cycles stalled due to re-order buffer full. Comment
CYCLE_ACTIVITY.CYCLES_L2_PEN Cycles with pending L2 miss loads. Set AnyThread DING to count per core. CYCLE_ACTIVITY.CYCLES_LDM_P ENDING Cycles with pending memory loads. Set AnyThread to count per core. PMC0-3 only. PMC2 only
CYCLE_ACTIVITY.CYCLES_L1D_PE Cycles with pending L1 cache miss loads. Set NDING AnyThread to count per core. CYCLE_ACTIVITY.CYCLES_NO_EX ECUTE DSB2MITE_SWITCHES.COUNT DSB2MITE_SWITCHES.PENALTY_ CYCLES DSB_FILL.EXCEED_DSB_LINES ITLB.ITLB_FLUSH Cycles of dispatch stalls. Set AnyThread to count per core. Number of DSB to MITE switches. Cycles DSB to MITE switches caused delay. DSB Fill encountered > 3 DSB lines. Counts the number of ITLB flushes, includes 4k/2M/ 4M pages.
OFFCORE_REQUESTS.DEMAND_D Demand data read requests sent to uncore. ATA_RD OFFCORE_REQUESTS.DEMAND_C Demand code read requests sent to uncore. ODE_RD OFFCORE_REQUESTS.DEMAND_R Demand RFO read requests sent to uncore, FO including regular RFOs, locks, ItoM OFFCORE_REQUESTS.ALL_DATA_ Data read requests sent to uncore (demand and RD prefetch). UOPS_EXECUTED.THREAD Counts total number of uops to be executed perthread each cycle. Set Cmask = 1, INV =1 to count stall cycles. Counts total number of uops to be executed percore each cycle. see Section 18.8.5, Off-core Response Performance Monitoring. See Section 18.8.5, Off-core Response Performance Monitoring. DTLB flush attempts of the thread-specific entries. Count number of STLB flush attempts. Number of instructions at retirement. See Table 19-1 Precise instruction retired event with HW to reduce PMC1 only effect of PEBS shadow in IP distribution. Do not need to set ANY Requires MSR 01A6H Requires MSR 01A7H
145
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. C1H C1H C1H C2H Umask Value 08H 10H 20H 01H Event Mask Mnemonic OTHER_ASSISTS.AVX_STORE OTHER_ASSISTS.AVX_TO_SSE OTHER_ASSISTS.SSE_TO_AVX UOPS_RETIRED.ALL Description Number of assists associated with 256-bit AVX store operations. Number of transitions from AVX-256 to legacy SSE when penalty applicable. Number of transitions from SSE to AVX-256 when penalty applicable. Counts the number of micro-ops retired, Use Supports PEBS, use cmask=1 and invert to count active cycles or stalled Any=1 for core granular. cycles. Counts the number of retirement slots used each cycle. Comment
UOPS_RETIRED.RETIRE_SLOTS
MACHINE_CLEARS.MEMORY_ORD Counts the number of machine clears due to ERING memory order conflicts. MACHINE_CLEARS.SMC MACHINE_CLEARS.MASKMOV Number of self-modifying-code machine clears detected. Counts the number of executed AVX masked load operations that refer to an illegal address range with the mask bits set to 0. Branch instructions at retirement. See Table 19-1 Supports PEBS
C4H C4H C4H C4H C4H C4H C4H C4H C5H C5H C5H C5H C5H C5H CAH
00H 01H 02H 04H 08H 10H 20H 40H 00H 01H 02H 04H 10H 20H 02H
BR_INST_RETIRED.ALL_BRANCH ES
BR_INST_RETIRED.CONDITIONAL Counts the number of conditional branch instructions retired. BR_INST_RETIRED.NEAR_CALL BR_INST_RETIRED.ALL_BRANCH ES Direct and indirect near call instructions retired. Counts the number of branch instructions retired.
BR_INST_RETIRED.NEAR_RETUR Counts the number of near return instructions N retired. BR_INST_RETIRED.NOT_TAKEN BR_INST_RETIRED.NEAR_TAKEN Counts the number of not taken branch instructions retired. Number of near taken branches retired. See Table 19-1
BR_INST_RETIRED.FAR_BRANCH Number of far branches retired. BR_MISP_RETIRED.ALL_BRANCH Mispredicted branch instructions at retirement. ES BR_MISP_RETIRED.NEAR_CALL Direct and indirect mispredicted near call instructions retired.
BR_MISP_RETIRED.ALL_BRANCH Mispredicted macro branch instructions retired. ES BR_MISP_RETIRED.NOT_TAKEN BR_MISP_RETIRED.TAKEN FP_ASSIST.X87_OUTPUT Mispredicted not taken branch instructions retired. Mispredicted taken branch instructions retired. Number of X87 FP assists due to Output values.
146
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. CAH CAH CAH CAH CCH CDH Umask Value 04H 08H 10H 1EH 20H 01H Event Mask Mnemonic FP_ASSIST.X87_INPUT FP_ASSIST.SIMD_OUTPUT FP_ASSIST.SIMD_INPUT FP_ASSIST.ANY ROB_MISC_EVENTS.LBR_INSERT S MEM_TRANS_RETIRED.LOAD_LA TENCY Description Number of X87 FP assists due to input values. Number of SIMD FP assists due to Output values. Number of SIMD FP assists due to input values. Cycles with any input/output SSE* or FP assists. Count cases of saving new LBR records by hardware. Randomly sampled loads whose latency is above a user defined threshold. A small fraction of the overall loads are sampled due to randomization. Specify threshold in MSR 0x3F6 See Section 18.8.4.3 Supports PEBS Comment
CDH D0H D0H D0H D0H D0H D0H D1H D1H D1H D1H D1H
02H 01H 02H 10H 20H 40H 80H 01H 02H 04H 20H 40H
MEM_TRANS_RETIRED.PRECISE_ Sample stores and collect precise store operation STORE via PEBS record. PMC3 only. MEM_UOPS_RETIRED.LOADS MEM_UOPS_RETIRED.STORES Qualify retired memory uops that are loads. Combine with umask 10H, 20H, 40H, 80H. Qualify retired memory uops that are stores. Combine with umask 10H, 20H, 40H, 80H.
MEM_UOPS_RETIRED.STLB_MISS Qualify retired memory uops with STLB miss. Must combine with umask 01H, 02H, to produce counts. MEM_UOPS_RETIRED.LOCK MEM_UOPS_RETIRED.SPLIT MEM_UOPS_RETIRED.ALL MEM_LOAD_UOPS_RETIRED.L1_ HIT MEM_LOAD_UOPS_RETIRED.L2_ HIT Qualify retired memory uops with lock. Must combine with umask 01H, 02H, to produce counts. Qualify retired memory uops with line split. Must combine with umask 01H, 02H, to produce counts. Qualify any retired memory uops. Must combine with umask 01H, 02H, to produce counts. Retired load uops with L1 cache hits as data sources. Retired load uops with L2 cache hits as data sources. Supports PEBS
MEM_LOAD_UOPS_RETIRED.LLC_ Retired load uops whose data source was LLC hit HIT with no snoop required. MEM_LOAD_UOPS_RETIRED.LLC_ Retired load uops whose data source is LLC miss MISS MEM_LOAD_UOPS_RETIRED.HIT_ Retired load uops which data sources were load LFB uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. MEM_LOAD_UOPS_LLC_HIT_RETI Retired load uops whose data source was an onRED.XSNP_MISS package core cache LLC hit and cross-core snoop missed. MEM_LOAD_UOPS_LLC_HIT_RETI Retired load uops whose data source was an onRED.XSNP_HIT package LLC hit and cross-core snoop hits. MEM_LOAD_UOPS_LLC_HIT_RETI Retired load uops whose data source was an onRED.XSNP_HITM package core cache with HitM responses. Supports PEBS
D2H
01H
D2H D2H
02H 04H
Supports PEBS
147
Table 19-5. Non-Architectural Performance Events In the Processor Core of 3rd Generation Intel Core i7, i5, i3 Processors (Contd.)
Event Num. D2H D3H E6H F0H F0H F0H F0H F0H F0H F0H F0H F1H F1H F1H F1H F2H F2H F2H F2H F2H Umask Value 08H 01H 1FH 01H 02H 04H 08H 10H 20H 40H 80H 01H 02H 04H 07H 01H 02H 04H 08H 0AH Event Mask Mnemonic Description Comment
MEM_LOAD_UOPS_LLC_HIT_RETI Retired load uops whose data source was LLC hit RED.XSNP_NONE with no snoop required. MEM_LOAD_UOPS_LLC_MISS_RE TIRED.LOCAL_DRAM BACLEARS.ANY L2_TRANS.DEMAND_DATA_RD L2_TRANS.RFO L2_TRANS.CODE_RD L2_TRANS.ALL_PF L2_TRANS.L1D_WB L2_TRANS.L2_FILL L2_TRANS.L2_WB L2_TRANS.ALL_REQUESTS L2_LINES_IN.I L2_LINES_IN.S L2_LINES_IN.E L2_LINES_IN.ALL L2_LINES_OUT.DEMAND_CLEAN L2_LINES_OUT.DEMAND_DIRTY L2_LINES_OUT.PF_CLEAN L2_LINES_OUT.PF_DIRTY L2_LINES_OUT.DIRTY_ALL Retired load uops whose data source was local Supports PEBS. memory (cross-socket snoop not needed or missed). Number of front end re-steers due to BPU misprediction. Demand Data Read requests that access L2 cache. RFO requests that access L2 cache. L2 cache accesses when fetching instructions. Any MLC or LLC HW prefetch accessing L2, including rejects. L1D writebacks that access L2 cache. L2 fill requests that access L2 cache. L2 writebacks that access L2 cache. Transactions accessing L2 pipe. L2 cache lines in I state filling L2. L2 cache lines in S state filling L2. L2 cache lines in E state filling L2. L2 cache lines filling L2. Clean L2 cache lines evicted by demand. Dirty L2 cache lines evicted by demand. Clean L2 cache lines evicted by the MLC prefetcher. Dirty L2 cache lines evicted by the MLC prefetcher. Dirty L2 cache lines filling the L2. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects.
Non-architecture performance monitoring events in the processor core that are applicable only to next generation Intel Xeon processor family based on Intel microarchitecture Ivy Bridge, with CPUID signature of DisplayFamily_DisplayModel 06_3EH, are listed in Table 19-6.
148
Table 19-6. Non-Architectural Performance Events Applicable only to the Processor Core of Next Generation Intel Xeon Processor E5 Family
Event Num. D3H D3H D3H D3H Umask Value 01H 04H 10H 20H Event Mask Mnemonic MEM_LOAD_UOPS_LLC_MISS_R ETIRED.LOCAL_DRAM MEM_LOAD_UOPS_LLC_MISS_R ETIRED.REMOTE_DRAM MEM_LOAD_UOPS_LLC_MISS_R ETIRED.REMOTE_HITM MEM_LOAD_UOPS_LLC_MISS_R ETIRED.REMOTE_FWD Description Comment
Retired load uops whose data sources was local DRAM Supports PEBS (cross-socket snoop not needed or missed). Retired load uops whose data source was remote DRAM. Retired load uops whose data sources was remote HITM. Retired load uops whose data sources was forwards from a remote cache. Supports PEBS Supports PEBS Supports PEBS
...
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family
Event Num. 03H 03H 03H 03H 05H 05H 07H 07H Umask Value 01H 02H 08H 10H 01H 02H 01H 08H Event Mask Mnemonic LD_BLOCKS.DATA_UNKNOWN Description blocked loads due to store buffer blocks with unknown data. Comment
LD_BLOCKS.STORE_FORWARD loads blocked by overlapping with store buffer that cannot be forwarded . LD_BLOCKS.NO_SR LD_BLOCKS.ALL_BLOCK MISALIGN_MEM_REF.LOADS MISALIGN_MEM_REF.STORES LD_BLOCKS_PARTIAL.ADDRES S_ALIAS # of Split loads blocked due to resource not available. Number of cases where any load is blocked but has no DCU miss. Speculative cache-line split load uops dispatched to L1D. Speculative cache-line split Store-address uops dispatched to L1D. False dependencies in MOB due to partial compare on address.
LD_BLOCKS_PARTIAL.ALL_STA The number of times that load operations are _BLOCK temporarily blocked because of older stores, with addresses that are not yet known. A load operation may incur more than one block of this type. DTLB_LOAD_MISSES.MISS_CA USES_A_WALK Misses in all TLB levels that cause a page walk of any page size.
DTLB_LOAD_MISSES.WALK_CO Misses in all TLB levels that caused page walk MPLETED completed of any size. DTLB_LOAD_MISSES.WALK_DU Cycle PMH is busy with a walk. RATION DTLB_LOAD_MISSES.STLB_HIT Number of cache load STLB hits. No page walk.
149
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. 0DH 0DH 0EH Umask Value 03H 40H 01H Event Mask Mnemonic INT_MISC.RECOVERY_CYCLES INT_MISC.RAT_STALL_CYCLES UOPS_ISSUED.ANY Description Cycles waiting to recover after Machine Clears or JEClear. Set Cmask= 1. Cycles RAT external stall is sent to IDQ for this thread. Increments each cycle the # of Uops issued by the RAT to RS. Set Cmask = 1, Inv = 1, Any= 1to count stalled cycles of this core. 10H 10H 10H 10H 10H 11H 11H 14H 01H 10H 20H 40H 80H 01H 02H 01H FP_COMP_OPS_EXE.X87 Counts number of X87 uops executed. FP_COMP_OPS_EXE.SSE_FP_P Counts number of SSE* double precision FP packed ACKED_DOUBLE uops executed. FP_COMP_OPS_EXE.SSE_FP_S Counts number of SSE* single precision FP scalar CALAR_SINGLE uops executed. FP_COMP_OPS_EXE.SSE_PACK Counts number of SSE* single precision FP packed ED SINGLE uops executed. FP_COMP_OPS_EXE.SSE_SCAL Counts number of SSE* double precision FP scalar AR_DOUBLE uops executed. SIMD_FP_256.PACKED_SINGLE Counts 256-bit packed single-precision floatingpoint instructions. SIMD_FP_256.PACKED_DOUBL Counts 256-bit packed double-precision floatingE point instructions. ARITH.FPU_DIV_ACTIVE Cycles that the divider is active, includes INT and FP. Set 'edge =1, cmask=1' to count the number of divides. Counts the number of instructions written into the IQ every cycle. Set Cmask = 1, Inv = 1to count stalled cycles Comment Set Edge to count occurrences
17H 24H 24H 24H 24H 24H 24H 24H 24H 24H 24H
01H 01H 03H 04H 08H 0CH 10H 20H 30H 40H 80H
INSTS_WRITTEN_TO_IQ.INSTS
L2_RQSTS.DEMAND_DATA_RD Demand Data Read requests that hit L2 cache. _HIT L2_RQSTS.ALL_DEMAND_DAT A_RD L2_RQSTS.RFO_HITS L2_RQSTS.RFO_MISS L2_RQSTS.ALL_RFO L2_RQSTS.CODE_RD_HIT L2_RQSTS.CODE_RD_MISS L2_RQSTS.ALL_CODE_RD L2_RQSTS.PF_HIT L2_RQSTS.PF_MISS Counts any demand and L1 HW prefetch data load requests to L2. Counts the number of store RFO requests that hit the L2 cache. Counts the number of store RFO requests that miss the L2 cache. Counts all L2 store RFO requests. Number of instruction fetches that hit the L2 cache. Number of instruction fetches that missed the L2 cache. Counts all L2 code requests. Requests from L2 Hardware prefetcher that hit L2. Requests from L2 Hardware prefetcher that missed L2.
150
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. 24H 27H 27H 27H 27H 28H 28H 28H 28H 28H 2EH Umask Value C0H 01H 04H 08H 0FH 01H 02H 04H 08H 0FH 4FH Event Mask Mnemonic L2_RQSTS.ALL_PF L2_STORE_LOCK_RQSTS.MISS L2_STORE_LOCK_RQSTS.HIT_ E L2_STORE_LOCK_RQSTS.HIT_ M L2_STORE_LOCK_RQSTS.ALL L2_L1D_WB_RQSTS.MISS L2_L1D_WB_RQSTS.HIT_S L2_L1D_WB_RQSTS.HIT_E L2_L1D_WB_RQSTS.HIT_M L2_L1D_WB_RQSTS.ALL Description Any requests from L2 Hardware prefetchers. RFOs that miss cache lines. RFOs that hit cache lines in E state. RFOs that hit cache lines in M state. RFOs that access cache lines in any state. Not rejected writebacks from L1D to L2 cache lines that missed L2. Not rejected writebacks from L1D to L2 cache lines in S state. Not rejected writebacks from L1D to L2 cache lines in E state. Not rejected writebacks from L1D to L2 cache lines in M state. Not rejected writebacks from L1D to L2 cache. see Table 19-1 Comment
LONGEST_LAT_CACHE.REFERE This event counts requests originating from the NCE core that reference a cache line in the last level cache. LONGEST_LAT_CACHE.MISS CPU_CLK_UNHALTED.THREAD _P This event counts each cache miss condition for references to the last level cache. Counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.
2EH 3CH
41H 00H
3CH 48H
01H 01H
CPU_CLK_THREAD_UNHALTED Increments at the frequency of XCLK (100 MHz) .REF_XCLK when not halted. L1D_PEND_MISS.PENDING Increments the number of outstanding L1D misses every cycle. Set Cmaks = 1 and Edge =1 to count occurrences.
DTLB_STORE_MISSES.MISS_CA Miss in all TLB levels causes an page walk of any USES_A_WALK page size (4K/2M/4M/1G). DTLB_STORE_MISSES.WALK_C Miss in all TLB levels causes a page walk that OMPLETED completes of any page size (4K/2M/4M/1G). DTLB_STORE_MISSES.WALK_D Cycles PMH is busy with this walk. URATION DTLB_STORE_MISSES.STLB_HI Store operations that miss the first TLB level but hit T the second and do not cause page walks. LOAD_HIT_PRE.SW_PF Not SW-prefetch load dispatches that hit fill buffer allocated for S/W prefetch.
151
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. 4CH 4EH Umask Value 02H 02H Event Mask Mnemonic LOAD_HIT_PRE.HW_PF HW_PRE_REQ.DL1_MISS Description Not SW-prefetch load dispatches that hit fill buffer allocated for H/W prefetch. Hardware Prefetch requests that miss the L1D cache. A request is being counted each time it access the cache & miss it, including if a block is applicable or if hit the Fill Buffer for example. Counts the number of lines brought into the L1 data cache. Counts the number of allocations of modified L1D cache lines. Counts the number of modified lines evicted from the L1 data cache due to replacement. Cache lines in M state evicted out of L1D due to Snoop HitM or dirty line replacement. This accounts for both L1 streamer and IP-based (IPP) HW prefetchers. Comment
PARTIAL_RAT_STALLS.FLAGS_ Increments the number of flags-merge uops in flight MERGE_UOP each cycle. Set Cmask = 1 to count cycles. PARTIAL_RAT_STALLS.SLOW_ LEA_WINDOW Cycles with at least one slow LEA uop allocated.
59H 59H 5BH 5BH 5BH 5BH 5CH 5CH 5EH 60H
40H 80H 0CH 0FH 40H 4FH 01H 02H 01H 01H
PARTIAL_RAT_STALLS.MUL_SI Number of Multiply packed/scalar single precision NGLE_UOP uops allocated. RESOURCE_STALLS2.ALL_FL_ EMPTY Cycles stalled due to free list empty. PMC0-3 only regardless HTT
RESOURCE_STALLS2.ALL_PRF Cycles stalled due to control structures full for _CONTROL physical registers. RESOURCE_STALLS2.BOB_FUL Cycles Allocator is stalled due Branch Order Buffer. L RESOURCE_STALLS2.OOO_RS RC CPL_CYCLES.RING0 CPL_CYCLES.RING123 RS_EVENTS.EMPTY_CYCLES Cycles stalled due to out of order resources full. Unhalted core cycles when the thread is in ring 0. Unhalted core cycles when the thread is not in ring 0. Cycles the RS is empty for the thread. Use Edge to count transition
OFFCORE_REQUESTS_OUTSTA Offcore outstanding Demand Data Read NDING.DEMAND_DATA_RD transactions in SQ to uncore. Set Cmask=1 to count cycles. OFFCORE_REQUESTS_OUTSTA Offcore outstanding RFO store transactions in SQ to NDING.DEMAND_RFO uncore. Set Cmask=1 to count cycles. OFFCORE_REQUESTS_OUTSTA Offcore outstanding cacheable data read NDING.ALL_DATA_RD transactions in SQ to uncore. Set Cmask=1 to count cycles.
60H 60H
04H 08H
152
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. 63H 63H 79H 79H Umask Value 01H 02H 02H 04H Event Mask Mnemonic Description Comment
LOCK_CYCLES.SPLIT_LOCK_UC Cycles in which the L1D and L2 are locked, due to a _LOCK_DURATION UC lock or split lock. LOCK_CYCLES.CACHE_LOCK_D URATION IDQ.EMPTY IDQ.MITE_UOPS Cycles in which the L1D is locked. Counts cycles the IDQ is empty. Increment each cycle # of uops delivered to IDQ from MITE path. Set Cmask = 1 to count cycles. Increment each cycle. # of uops delivered to IDQ from DSB path. Set Cmask = 1 to count cycles. Increment each cycle # of uops delivered to IDQ when MS busy by DSB. Set Cmask = 1 to count cycles MS is busy. Set Cmask=1 and Edge =1 to count MS activations. Increment each cycle # of uops delivered to IDQ when MS is busy by MITE. Set Cmask = 1 to count cycles. Increment each cycle # of uops delivered to IDQ from MS by either DSB or MITE. Set Cmask = 1 to count cycles. Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes UC accesses. Misses in all ITLB levels that cause page walks. Misses in all ITLB levels that cause completed page walks. Cycle PMH is busy with a walk. Number of cache load STLB hits. No page walk. Stalls caused by changing prefix length of the instruction. Stall cycles due to IQ is full. Qualify conditional near branch instructions executed, but not necessarily retired. Qualify all unconditional near branch instructions excluding calls and indirect branches. Must combine with umask 40H, 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Can combine Umask 04H and 20H Can combine Umask 08H and 10H Can combine Umask 08H and 10H
79H
08H
IDQ.DSB_UOPS
79H
10H
IDQ.MS_DSB_UOPS
79H
20H
IDQ.MS_MITE_UOPS
Can combine Umask 04H and 20H Can combine Umask 04H, 08H and 30H
79H
30H
IDQ.MS_UOPS
80H 85H 85H 85H 85H 87H 87H 88H 88H 88H 88H
02H 01H 02H 04H 10H 01H 04H 01H 02H 04H 08H
ICACHE.MISSES ITLB_MISSES.MISS_CAUSES_A _WALK ITLB_MISSES.WALK_COMPLET ED ITLB_MISSES.WALK_DURATIO N ITLB_MISSES.STLB_HIT ILD_STALL.LCP ILD_STALL.IQ_FULL BR_INST_EXEC.COND BR_INST_EXEC.DIRECT_JMP
BR_INST_EXEC.INDIRECT_JMP_ Qualify executed indirect near branch instructions NON_CALL_RET that are not calls nor returns. BR_INST_EXEC.RETURN_NEAR Qualify indirect near branches that have a return mnemonic.
153
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. 88H 88H 88H 88H 88H 89H 89H 89H 89H 89H 89H 89H 89H 9CH A1H A1H A1H A1H A1H A1H A1H Umask Value 10H 20H 40H 80H FFH 01H 04H 08H 10H 20H 40H 80H FFH 01H 01H 02H 04H 08H 0CH 10H 20H Event Mask Mnemonic Description Comment Must combine with umask 80H Must combine with umask 80H Applicable to umask 01H only
BR_INST_EXEC.DIRECT_NEAR_ Qualify unconditional near call branch instructions, CALL excluding non call branch, executed. BR_INST_EXEC.INDIRECT_NEA Qualify indirect near calls, including both register R_CALL and memory indirect, executed. BR_INST_EXEC.NONTAKEN BR_INST_EXEC.TAKEN Qualify non-taken near branches executed. Qualify taken near branches executed. Must combine with 01H,02H, 04H, 08H, 10H, 20H.
BR_INST_EXEC.ALL_BRANCHE Counts all near executed branches (not necessarily S retired). BR_MISP_EXEC.COND BR_MISP_EXEC.INDIRECT_JMP _NON_CALL_RET BR_MISP_EXEC.RETURN_NEA R BR_MISP_EXEC.DIRECT_NEAR _CALL Qualify conditional near branch instructions mispredicted. Qualify mispredicted indirect near branch instructions that are not calls nor returns. Qualify mispredicted indirect near branches that have a return mnemonic. Qualify mispredicted unconditional near call branch instructions, excluding non call branch, executed. Must combine with umask 40H, 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Must combine with umask 80H Applicable to umask 01H only
BR_MISP_EXEC.INDIRECT_NEA Qualify mispredicted indirect near calls, including R_CALL both register and memory indirect, executed. BR_MISP_EXEC.NONTAKEN BR_MISP_EXEC.TAKEN Qualify mispredicted non-taken near branches executed,. Qualify mispredicted taken near branches executed. Must combine with 01H,02H, 04H, 08H, 10H, 20H
BR_MISP_EXEC.ALL_BRANCHE Counts all near executed branches (not necessarily S retired). IDQ_UOPS_NOT_DELIVERED.C ORE Count number of non-delivered uops to RAT per thread. Use Cmask to qualify uop b/w
UOPS_DISPATCHED_PORT.POR Cycles which a Uop is dispatched on port 0. T_0 UOPS_DISPATCHED_PORT.POR Cycles which a Uop is dispatched on port 1. T_1 UOPS_DISPATCHED_PORT.POR Cycles which a load uop is dispatched on port 2. T_2_LD UOPS_DISPATCHED_PORT.POR Cycles which a store address uop is dispatched on T_2_STA port 2. UOPS_DISPATCHED_PORT.POR Cycles which a Uop is dispatched on port 2. T_2 UOPS_DISPATCHED_PORT.POR Cycles which a load uop is dispatched on port 3. T_3_LD UOPS_DISPATCHED_PORT.POR Cycles which a store address uop is dispatched on T_3_STA port 3.
154
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. A1H A1H A1H A2H A2H A2H A2H A2H A2H A2H A2H A3H A3H A3H ABH ABH ACH ACH ACH AEH B0H B0H B0H Umask Value 30H 40H 80H 01H 02H 04H 08H 10H 20H 40H 80H 02H 01H 04H 01H 02H 02H 08H 0AH 01H 01H 04H 08H Event Mask Mnemonic Description Comment
UOPS_DISPATCHED_PORT.POR Cycles which a Uop is dispatched on port 3. T_3 UOPS_DISPATCHED_PORT.POR Cycles which a Uop is dispatched on port 4. T_4 UOPS_DISPATCHED_PORT.POR Cycles which a Uop is dispatched on port 5. T_5 RESOURCE_STALLS.ANY RESOURCE_STALLS.LB RESOURCE_STALLS.RS RESOURCE_STALLS.SB RESOURCE_STALLS.ROB RESOURCE_STALLS.FCSW RESOURCE_STALLS.MXCSR RESOURCE_STALLS.OTHER Cycles Allocation is stalled due to Resource Related reason. Counts the cycles of stall due to lack of load buffers. Cycles stalled due to no eligible RS entry available. Cycles stalled due to no store buffers available. (not including draining form sync). Cycles stalled due to re-order buffer full. Cycles stalled due to writing the FPU control word. Cycles stalled due to the MXCSR register rename occurring to close to a previous MXCSR rename. Cycles stalled while execution was stalled due to other resource issues. PMC2 only
CYCLE_ACTIVITY.CYCLES_L1D_ Cycles with pending L1 cache miss loads.Set PENDING AnyThread to count per core. CYCLE_ACTIVITY.CYCLES_L2_P Cycles with pending L2 miss loads. Set AnyThread ENDING to count per core. CYCLE_ACTIVITY.CYCLES_NO_ DISPATCH DSB2MITE_SWITCHES.COUNT DSB2MITE_SWITCHES.PENALT Y_CYCLES DSB_FILL.OTHER_CANCEL DSB_FILL.EXCEED_DSB_LINES DSB_FILL.ALL_CANCEL ITLB.ITLB_FLUSH
Cycles of dispatch stalls. Set AnyThread to count per PMC0-3 only core. Number of DSB to MITE switches. Cycles DSB to MITE switches caused delay. Cases of cancelling valid DSB fill not because of exceeding way limit. DSB Fill encountered > 3 DSB lines. Cases of cancelling valid Decode Stream Buffer (DSB) fill not because of exceeding way limit. Counts the number of ITLB flushes, includes 4k/2M/ 4M pages.
OFFCORE_REQUESTS.DEMAND Demand data read requests sent to uncore. _DATA_RD OFFCORE_REQUESTS.DEMAND Demand RFO read requests sent to uncore, including _RFO regular RFOs, locks, ItoM. OFFCORE_REQUESTS.ALL_DAT Data read requests sent to uncore (demand and A_RD prefetch).
155
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. B1H Umask Value 01H Event Mask Mnemonic UOPS_DISPATCHED.THREAD Description Counts total number of uops to be dispatched perthread each cycle. Set Cmask = 1, INV =1 to count stall cycles. Counts total number of uops to be dispatched percore each cycle. Comment PMC0-3 only regardless HTT Do not need to set ANY
UOPS_DISPATCHED.CORE
OFFCORE_REQUESTS_BUFFER Offcore requests buffer cannot take more entries .SQ_FULL for this thread core. AGU_BYPASS_CANCEL.COUNT Counts executed load operations with all the following traits: 1. addressing of the format [base + offset], 2. the offset is between 1 and 2047, 3. the address specified in the base register is in one page and the address [base+offset] is in another page. see Section 18.8.5, Off-core Response Performance Monitoring. See Section 18.8.5, Off-core Response Performance Monitoring. DTLB flush attempts of the thread-specific entries. Count number of STLB flush attempts. cmask=1 See Table 19-1 Requires MSR 01A6H Requires MSR 01A7H
B7H BBH BDH BDH BFH C0H C0H C1H C1H C1H C1H C2H
01H 01H 01H 20H 05H 00H 01H 02H 08H 10H 20H 01H
L1D_BLOCKS.BANK_CONFLICT Cycles when dispatched loads are cancelled due to _CYCLES L1D bank conflicts with other load ports. INST_RETIRED.ANY_P INST_RETIRED.ALL OTHER_ASSISTS.ITLB_MISS_R ETIRED OTHER_ASSISTS.AVX_STORE Number of instructions at retirement.
Precise instruction retired event with HW to reduce PMC1 only; Must quiesce effect of PEBS shadow in IP distribution. other PMCs. Instructions that experienced an ITLB miss. Number of assists associated with 256-bit AVX store operations.
OTHER_ASSISTS.AVX_TO_SSE Number of transitions from AVX-256 to legacy SSE when penalty applicable. OTHER_ASSISTS.SSE_TO_AVX Number of transitions from SSE to AVX-256 when penalty applicable. UOPS_RETIRED.ALL Counts the number of micro-ops retired, Use Supports PEBS cmask=1 and invert to count active cycles or stalled cycles.
UOPS_RETIRED.RETIRE_SLOTS Counts the number of retirement slots used each cycle. MACHINE_CLEARS.MEMORY_O Counts the number of machine clears due to RDERING memory order conflicts. MACHINE_CLEARS.SMC MACHINE_CLEARS.MASKMOV Counts the number of times that a program writes to a code section. Counts the number of executed AVX masked load operations that refer to an illegal address range with the mask bits set to 0.
156
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. C4H C4H C4H C4H C4H C4H C4H C4H C5H C5H C5H C5H C5H C5H CAH CAH CAH CAH CAH CCH CDH Umask Value 00H 01H 02H 04H 08H 10H 20H 40H 00H 01H 02H 04H 10H 20H 02H 04H 08H 10H 1EH 20H 01H Event Mask Mnemonic BR_INST_RETIRED.ALL_BRAN CHES Description Branch instructions at retirement. Comment See Table 19-1 Supports PEBS
BR_INST_RETIRED.CONDITION Counts the number of conditional branch AL instructions retired. BR_INST_RETIRED.NEAR_CALL Direct and indirect near call instructions retired. BR_INST_RETIRED.ALL_BRAN CHES BR_INST_RETIRED.NEAR_RET URN BR_INST_RETIRED.NOT_TAKE N BR_INST_RETIRED.NEAR_TAK EN BR_INST_RETIRED.FAR_BRAN CH BR_MISP_RETIRED.ALL_BRAN CHES Counts the number of branch instructions retired. Counts the number of near return instructions retired. Counts the number of not taken branch instructions retired. Number of near taken branches retired. Number of far branches retired. Mispredicted branch instructions at retirement.
BR_MISP_RETIRED.CONDITION Mispredicted conditional branch instructions retired. Supports PEBS AL BR_MISP_RETIRED.NEAR_CAL L BR_MISP_RETIRED.ALL_BRAN CHES BR_MISP_RETIRED.NOT_TAKE N BR_MISP_RETIRED.TAKEN FP_ASSIST.X87_OUTPUT FP_ASSIST.X87_INPUT FP_ASSIST.SIMD_OUTPUT FP_ASSIST.SIMD_INPUT FP_ASSIST.ANY ROB_MISC_EVENTS.LBR_INSE RTS MEM_TRANS_RETIRED.LOAD_ LATENCY Direct and indirect mispredicted near call instructions retired. Mispredicted macro branch instructions retired. Mispredicted not taken branch instructions retired. Mispredicted taken branch instructions retired. Number of X87 assists due to output value. Number of X87 assists due to input value. Number of SIMD FP assists due to output values. Number of SIMD FP assists due to input values. Cycles with any input/output SSE* or FP assists. Count cases of saving new LBR records by hardware. Randomly sampled loads whose latency is above a user defined threshold. A small fraction of the overall loads are sampled due to randomization. PMC3 only. Specify threshold in MSR 0x3F6
CDH D0H
02H 01H
MEM_TRANS_RETIRED.PRECIS Sample stores and collect precise store operation E_STORE via PEBS record. PMC3 only. MEM_UOP_RETIRED.LOADS Qualify retired memory uops that are loads. Combine with umask 10H, 20H, 40H, 80H.
157
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. D0H D0H D0H D0H D0H D1H D1H D1H D1H D1H Umask Value 02H 10H 20H 40H 80H 01H 02H 04H 20H 40H Event Mask Mnemonic MEM_UOP_RETIRED.STORES Description Qualify retired memory uops that are stores. Combine with umask 10H, 20H, 40H, 80H. Comment
MEM_UOP_RETIRED.STLB_MIS Qualify retired memory uops with STLB miss. Must S combine with umask 01H, 02H, to produce counts. MEM_UOP_RETIRED.LOCK MEM_UOP_RETIRED.SPLIT MEM_UOP_RETIRED.ALL MEM_LOAD_UOPS_RETIRED.L 1_HIT MEM_LOAD_UOPS_RETIRED.L 2_HIT Qualify retired memory uops with lock. Must combine with umask 01H, 02H, to produce counts. Qualify retired memory uops with line split. Must combine with umask 01H, 02H, to produce counts. Qualify any retired memory uops. Must combine with umask 01H, 02H, to produce counts. Retired load uops with L1 cache hits as data sources. Retired load uops with L2 cache hits as data sources. Supports PEBS. PMC0-3 only regardless HTT
MEM_LOAD_UOPS_RETIRED.LL Retired load uops which data sources were data hits Supports PEBS C_HIT in LLC without snoops required. MEM_LOAD_UOPS_RETIRED.LL Retired load uops which data sources were data C_MISS missed LLC (excluding unknown data source). MEM_LOAD_UOPS_RETIRED.HI Retired load uops which data sources were load T_LFB uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. BACLEARS.ANY Counts the number of times the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end. RFO requests that access L2 cache. L2 cache accesses when fetching instructions. L2 or LLC HW prefetches that access L2 cache. L1D writebacks that access L2 cache. L2 fill requests that access L2 cache. L2 writebacks that access L2 cache. Transactions accessing L2 pipe. L2 cache lines in I state filling L2. L2 cache lines in S state filling L2. L2 cache lines in E state filling L2. L2 cache lines filling L2. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects. Counting does not cover rejects. including rejects Supports PEBS
E6H
01H
F0H F0H F0H F0H F0H F0H F0H F0H F1H F1H F1H F1H
01H 02H 04H 08H 10H 20H 40H 80H 01H 02H 04H 07H
L2_TRANS.DEMAND_DATA_RD Demand Data Read requests that access L2 cache. L2_TRANS.RFO L2_TRANS.CODE_RD L2_TRANS.ALL_PF L2_TRANS.L1D_WB L2_TRANS.L2_FILL L2_TRANS.L2_WB L2_TRANS.ALL_REQUESTS L2_LINES_IN.I L2_LINES_IN.S L2_LINES_IN.E L2_LINES_IN.ALL
158
Table 19-7. Non-Architectural Performance Events In the Processor Core Common to 2nd Generation Intel Core i72xxx, Intel Core i5-2xxx, Intel Core i3-2xxx Processor Series and Intel Xeon Processors E5 Family (Contd.)
Event Num. F2H F2H F2H F2H F2H F4H ... Umask Value 01H 02H 04H 08H 0AH 10H Event Mask Mnemonic Description Comment
L2_LINES_OUT.DEMAND_CLEA Clean L2 cache lines evicted by demand. N L2_LINES_OUT.DEMAND_DIRT Y L2_LINES_OUT.PF_CLEAN L2_LINES_OUT.PF_DIRTY L2_LINES_OUT.DIRTY_ALL SQ_MISC.SPLIT_LOCK Dirty L2 cache lines evicted by demand. Clean L2 cache lines evicted by L2 prefetch. Dirty L2 cache lines evicted by L2 prefetch. Dirty L2 cache lines filling the L2. Split locks in SQ. Counting does not cover rejects.
Table 19-9. Non-Architectural Performance Events Applicable only to the Processor Core of Intel Xeon Processor E5 Family
Event Num. CDH D1H D1H D2H D2H D2H D2H D3H Umask Value 01H 04H 20H 01H 02H 04H 08H 01H Event Mask Mnemonic MEM_TRANS_RETIRED.LOAD_ LATENCY Description Comment
Additional Configuration: Disable BL bypass and direct2core, and if the memory is remotely homed. The count is not reliable If the memory is locally homed.
MEM_LOAD_UOPS_RETIRED.LL Additional Configuration: Disable BL bypass C_HIT MEM_LOAD_UOPS_RETIRED.LL Additional Configuration: Disable BL bypass and direct2core C_MISS MEM_LOAD_UOPS_LLC_HIT_R Additional Configuration: Disable bypass ETIRED.XSNP_MISS MEM_LOAD_UOPS_LLC_HIT_R Additional Configuration: Disable bypass ETIRED.XSNP_HIT MEM_LOAD_UOPS_LLC_HIT_R Additional Configuration: Disable bypass ETIRED.XSNP_HITM MEM_LOAD_UOPS_LLC_HIT_R Additional Configuration: Disable bypass ETIRED.XSNP_NONE MEM_LOAD_UOPS_LLC_MISS_ RETIRED.LOCAL_DRAM MEM_LOAD_UOPS_LLC_MISS_ RETIRED.REMOTE_DRAM OFF_CORE_RESPONSE_N Retired load uops which data sources were data missed LLC but serviced by local DRAM. Retired load uops which data sources were data missed LLC but serviced by remote DRAM. Sub-events of OFF_CORE_RESPONSE_N (suffix N = 0, 1) programmed using MSR 01A6H/01A7H with values shown in the comment column. 0x3FFFC00004 0x600400004 0x67F800004 Disable BL bypass and direct2core (see MSR 0x3C9) Disable BL bypass and direct2core (see MSR 0x3C9)
D3H
04H
B7H/ BBH
01H
159
Table 19-9. Non-Architectural Performance Events Applicable only to the Processor Core of Intel Xeon Processor E5 Family
Event Num. Umask Value Event Mask Mnemonic Description Comment 0x87F800004 0x107FC00004 0x67FC00001 0x3F803C0001 0x600400001 0x67F800001 0x87F800001 0x107FC00001 0x3F803C0040 0x67FC00010 0x3F803C0010 0x600400010 0x67F800010 0x87F800010 0x107FC00010 0x3FFFC00200 0x3FFFC00080
OFFCORE_RESPONSE.DEMAND_CODE_RD.LLC_MISS.REMOTE_HIT_FWD_N OFFCORE_RESPONSE.DEMAND_CODE_RD.LLC_MISS.REMOTE_HITM_N OFFCORE_RESPONSE.DEMAND_DATA_RD.LLC_MISS.ANY_DRAM_N OFFCORE_RESPONSE.DEMAND_DATA_RD.LLC_MISS.ANY_RESPONSE_N OFFCORE_RESPONSE.DEMAND_DATA_RD.LLC_MISS.LOCAL_DRAM_N OFFCORE_RESPONSE.DEMAND_DATA_RD.LLC_MISS.REMOTE_DRAM_N OFFCORE_RESPONSE.DEMAND_DATA_RD.LLC_MISS.REMOTE_HIT_FWD_N OFFCORE_RESPONSE.DEMAND_DATA_RD.LLC_MISS.REMOTE_HITM_N OFFCORE_RESPONSE.PF_L2_CODE_RD.LLC_MISS.ANY_RESPONSE_N OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.ANY_DRAM_N OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.ANY_RESPONSE_N OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.LOCAL_DRAM_N OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.REMOTE_DRAM_N OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.REMOTE_HIT_FWD_N OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.REMOTE_HITM_N OFFCORE_RESPONSE.PF_LLC_CODE_RD.LLC_MISS.ANY_RESPONSE_N OFFCORE_RESPONSE.PF_LLC_DATA_RD.LLC_MISS.ANY_RESPONSE_N ...
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere
Event Num. 03H 04H 05H 06H Umask Value 02H 07H 02H 04H Event Mask Mnemonic Description Comment
LOAD_BLOCK.OVERLAP_STOR Loads that partially overlap an earlier store. E SB_DRAIN.ANY MISALIGN_MEMORY.STORE STORE_BLOCKS.AT_RET All Store buffer stall cycles. All store referenced with misaligned address. Counts number of loads delayed with at-Retirement block code. The following loads need to be executed at retirement and wait for all senior stores on the same thread to be drained: load splitting across 4K boundary (page split), load accessing uncacheable (UC or USWC) memory, load lock, and load with page table in UC or USWC memory region. Cacheable loads delayed with L1D block code. Counts false dependency due to partial address aliasing. Counts all load misses that cause a page walk.
160
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 08H 08H 08H 08H Umask Value 02H 04H 10H 20H Event Mask Mnemonic DTLB_LOAD_MISSES.WALK_C OMPLETED Description Counts number of completed page walks due to load miss in the STLB. Comment
DTLB_LOAD_MISSES.WALK_CY Cycles PMH is busy with a page walk due to a load CLES miss in the STLB. DTLB_LOAD_MISSES.STLB_HI T DTLB_LOAD_MISSES.PDE_MIS S MEM_INST_RETIRED.LOADS Number of cache load STLB hits. Number of DTLB cache load misses where the low part of the linear to physical address translation was missed. Counts the number of instructions with an architecturally-visible load retired on the architected path. Counts the number of instructions with an architecturally-visible store retired on the architected path. In conjunction with ld_lat facility
0BH
01H
0BH
02H
MEM_INST_RETIRED.STORES
0BH 0CH
10H 01H
MEM_INST_RETIRED.LATENCY Counts the number of instructions exceeding the _ABOVE_THRESHOLD latency specified with ld_lat facility. MEM_STORE_RETIRED.DTLB_ MISS The event counts the number of retired stores that missed the DTLB. The DTLB miss is not counted if the store operation causes a fault. Does not counter prefetches. Counts both primary and secondary misses to the TLB. Counts the number of Uops issued by the Register Allocation Table to the Reservation Station, i.e. the UOPs issued from the front end to the back end.
0EH
01H
UOPS_ISSUED.ANY
0EH
01H
UOPS_ISSUED.STALLED_CYCL ES
Counts the number of cycles no Uops issued by the set invert=1, cmask = 1 Register Allocation Table to the Reservation Station, i.e. the UOPs issued from the front end to the back end. Counts the number of fused Uops that were issued from the Register Allocation Table to the Reservation Station. Load instructions retired with unknown LLC miss (Precise Event). Applicable to one and two sockets Applicable to one and two sockets Applicable to two sockets only Applicable to one and two sockets Applicable to two sockets only
0EH
02H
UOPS_ISSUED.FUSED
MEM_UNCORE_RETIRED.UNK NOWN_SOURCE
MEM_UNCORE_RETIRED.OHTE Load instructions retired that HIT modified data in R_CORE_L2_HIT sibling core (Precise Event). MEM_UNCORE_RETIRED.REMO Load instructions retired that HIT modified data in TE_HITM remote socket (Precise Event). MEM_UNCORE_RETIRED.LOCA Load instructions retired local dram and remote L_DRAM_AND_REMOTE_CACH cache HIT data sources (Precise Event). E_HIT MEM_UNCORE_RETIRED.REMO Load instructions retired remote DRAM and remote TE_DRAM home-remote cache HITM (Precise Event).
0FH
10H
161
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 0FH 0FH 10H Umask Value 20H 80H 01H Event Mask Mnemonic Description Comment Applicable to two sockets only Applicable to one and two sockets
MEM_UNCORE_RETIRED.OTHE Load instructions retired other LLC miss (Precise R_LLC_MISS Event). MEM_UNCORE_RETIRED.UNCA Load instructions retired I/O (Precise Event). CHEABLE FP_COMP_OPS_EXE.X87 Counts the number of FP Computational Uops Executed. The number of FADD, FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction. Counts number of MMX Uops executed. Counts number of SSE and SSE2 FP uops executed.
10H 10H 10H 10H 10H 10H 10H 12H 12H 12H 12H 12H 12H 12H 13H
02H 04H 08H 10H 20H 40H 80H 01H 02H 04H 08H 10H 20H 40H 01H
FP_COMP_OPS_EXE.MMX FP_COMP_OPS_EXE.SSE_FP
FP_COMP_OPS_EXE.SSE2_INT Counts number of SSE2 integer uops executed. EGER FP_COMP_OPS_EXE.SSE_FP_P Counts number of SSE FP packed uops executed. ACKED FP_COMP_OPS_EXE.SSE_FP_S Counts number of SSE FP scalar uops executed. CALAR FP_COMP_OPS_EXE.SSE_SING Counts number of SSE* FP single precision uops LE_PRECISION executed. FP_COMP_OPS_EXE.SSE_DOU Counts number of SSE* FP double precision uops BLE_PRECISION executed. SIMD_INT_128.PACKED_MPY Counts number of 128 bit SIMD integer multiply operations.
SIMD_INT_128.PACKED_SHIFT Counts number of 128 bit SIMD integer shift operations. SIMD_INT_128.PACK SIMD_INT_128.UNPACK Counts number of 128 bit SIMD integer pack operations. Counts number of 128 bit SIMD integer unpack operations.
SIMD_INT_128.PACKED_LOGIC Counts number of 128 bit SIMD integer logical AL operations. SIMD_INT_128.PACKED_ARIT H Counts number of 128 bit SIMD integer arithmetic operations.
SIMD_INT_128.SHUFFLE_MOV Counts number of 128 bit SIMD integer shuffle and E move operations. LOAD_DISPATCH.RS Counts number of loads dispatched from the Reservation Station that bypass the Memory Order Buffer.
162
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 13H Umask Value 02H Event Mask Mnemonic Description Comment
LOAD_DISPATCH.RS_DELAYED Counts the number of delayed RS dispatches at the stage latch. If an RS dispatch can not bypass to LB, it has another chance to dispatch from the onecycle delayed staging latch before it is written into the LB. LOAD_DISPATCH.MOB LOAD_DISPATCH.ANY ARITH.CYCLES_DIV_BUSY Counts the number of loads dispatched from the Reservation Station to the Memory Order Buffer. Counts all loads dispatched from the Reservation Station. Counts the number of cycles the divider is busy executing divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE. Set 'edge =1, invert=1, cmask=1' to count the number of divides. Count may be incorrect When SMT is on
14H
02H
ARITH.MUL
Counts the number of multiply operations executed. Count may be incorrect This includes integer as well as floating point When SMT is on multiply operations but excludes DPPS mul and MPSAD. Counts the number of instructions written into the instruction queue every cycle. Counts number of instructions that require decoder 0 to be decoded. Usually, this means that the instruction maps to more than 1 uop. An instruction that generates two uops was decoded. This event counts the number of cycles during which instructions are written to the instruction queue. Dividing this counter by the number of instructions written to the instruction queue (INST_QUEUE_WRITES) yields the average number of instructions decoded each cycle. If this number is less than four and the pipe stalls, this indicates that the decoder is failing to decode enough instructions per cycle to sustain the 4-wide pipeline. Number of loops that can not stream from the instruction queue. Counts number of loads that hit the L2 cache. L2 loads include both L1D demand misses as well as L1D prefetches. L2 loads can be rejected for various reasons. Only non rejected loads are counted. If SSE* instructions that are 6 bytes or longer arrive one after another, then front end throughput may limit execution speed.
17H 18H
01H 01H
INST_QUEUE_WRITES INST_DECODED.DEC0
19H 1EH
01H 01H
TWO_UOP_INSTS_DECODED INST_QUEUE_WRITE_CYCLES
20H 24H
01H 01H
LSD_OVERFLOW L2_RQSTS.LD_HIT
163
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 24H Umask Value 02H Event Mask Mnemonic L2_RQSTS.LD_MISS Description Counts the number of loads that miss the L2 cache. L2 loads include both L1D demand misses as well as L1D prefetches. Counts all L2 load requests. L2 loads include both L1D demand misses as well as L1D prefetches. Counts the number of store RFO requests that hit the L2 cache. L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches. Count includes WC memory requests, where the data is not fetched but the permission to write the line is required. Counts the number of store RFO requests that miss the L2 cache. L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches. Counts all L2 store RFO requests. L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches.. Counts number of instruction fetches that hit the L2 cache. L2 instruction fetches include both L1I demand misses as well as L1I instruction prefetches. Counts number of instruction fetches that miss the L2 cache. L2 instruction fetches include both L1I demand misses as well as L1I instruction prefetches. Counts all instruction fetches. L2 instruction fetches include both L1I demand misses as well as L1I instruction prefetches. Counts L2 prefetch hits for both code and data. Counts L2 prefetch misses for both code and data. Counts all L2 prefetches for both code and data. Counts all L2 misses for both code and data. Counts all L2 requests for both code and data. Comment
24H 24H
03H 04H
L2_RQSTS.LOADS L2_RQSTS.RFO_HIT
24H
08H
L2_RQSTS.RFO_MISS
24H
0CH
L2_RQSTS.RFOS
24H
10H
L2_RQSTS.IFETCH_HIT
24H
20H
L2_RQSTS.IFETCH_MISS
24H
30H
L2_RQSTS.IFETCHES
L2_DATA_RQSTS.DEMAND.I_S Counts number of L2 data demand loads where the TATE cache line to be loaded is in the I (invalid) state, i.e. a cache miss. L2 demand loads are both L1D demand misses and L1D prefetches. L2_DATA_RQSTS.DEMAND.S_ STATE Counts number of L2 data demand loads where the cache line to be loaded is in the S (shared) state. L2 demand loads are both L1D demand misses and L1D prefetches.
26H
02H
164
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 26H Umask Value 04H Event Mask Mnemonic L2_DATA_RQSTS.DEMAND.E_ STATE Description Counts number of L2 data demand loads where the cache line to be loaded is in the E (exclusive) state. L2 demand loads are both L1D demand misses and L1D prefetches. Comment
26H
08H
L2_DATA_RQSTS.DEMAND.M_ Counts number of L2 data demand loads where the STATE cache line to be loaded is in the M (modified) state. L2 demand loads are both L1D demand misses and L1D prefetches. L2_DATA_RQSTS.DEMAND.ME Counts all L2 data demand requests. L2 demand SI loads are both L1D demand misses and L1D prefetches. L2_DATA_RQSTS.PREFETCH.I_ Counts number of L2 prefetch data loads where the STATE cache line to be loaded is in the I (invalid) state, i.e. a cache miss. L2_DATA_RQSTS.PREFETCH.S Counts number of L2 prefetch data loads where the _STATE cache line to be loaded is in the S (shared) state. A prefetch RFO will miss on an S state line, while a prefetch read will hit on an S state line. L2_DATA_RQSTS.PREFETCH.E Counts number of L2 prefetch data loads where the _STATE cache line to be loaded is in the E (exclusive) state. L2_DATA_RQSTS.PREFETCH.M Counts number of L2 prefetch data loads where the _STATE cache line to be loaded is in the M (modified) state. L2_DATA_RQSTS.PREFETCH.M Counts all L2 prefetch requests. ESI L2_DATA_RQSTS.ANY L2_WRITE.RFO.I_STATE Counts all L2 data requests. Counts number of L2 demand store RFO requests This is a demand RFO where the cache line to be loaded is in the I (invalid) request state, i.e, a cache miss. The L1D prefetcher does not issue a RFO prefetch. Counts number of L2 store RFO requests where the This is a demand RFO cache line to be loaded is in the S (shared) state. request The L1D prefetcher does not issue a RFO prefetch,. Counts number of L2 store RFO requests where the This is a demand RFO cache line to be loaded is in the M (modified) state. request The L1D prefetcher does not issue a RFO prefetch. Counts number of L2 store RFO requests where the This is a demand RFO cache line to be loaded is in either the S, E or M request states. The L1D prefetcher does not issue a RFO prefetch. Counts all L2 store RFO requests.The L1D prefetcher does not issue a RFO prefetch. Counts number of L2 demand lock RFO requests where the cache line to be loaded is in the I (invalid) state, i.e. a cache miss. This is a demand RFO request
26H
0FH
26H
10H
26H
20H
27H
02H
L2_WRITE.RFO.S_STATE
27H
08H
L2_WRITE.RFO.M_STATE
27H
0EH
L2_WRITE.RFO.HIT
27H 27H
0FH 10H
L2_WRITE.RFO.MESI L2_WRITE.LOCK.I_STATE
165
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 27H 27H Umask Value 20H 40H Event Mask Mnemonic L2_WRITE.LOCK.S_STATE L2_WRITE.LOCK.E_STATE Description Counts number of L2 lock RFO requests where the cache line to be loaded is in the S (shared) state. Counts number of L2 demand lock RFO requests where the cache line to be loaded is in the E (exclusive) state. Counts number of L2 demand lock RFO requests where the cache line to be loaded is in the M (modified) state. Counts number of L2 demand lock RFO requests where the cache line to be loaded is in either the S, E, or M state. Counts all L2 demand lock RFO requests. Counts number of L1 writebacks to the L2 where the cache line to be written is in the I (invalid) state, i.e. a cache miss. Counts number of L1 writebacks to the L2 where the cache line to be written is in the S state. Counts number of L1 writebacks to the L2 where the cache line to be written is in the E (exclusive) state. Counts number of L1 writebacks to the L2 where the cache line to be written is in the M (modified) state. Counts all L1 writebacks to the L2 . Counts uncore Last Level Cache misses. Because cache hierarchy, cache sizes and other implementation-specific characteristics; value comparison to estimate performance differences is not recommended. Counts uncore Last Level Cache references. Because cache hierarchy, cache sizes and other implementation-specific characteristics; value comparison to estimate performance differences is not recommended. see Table 19-1 Comment
27H
80H
L2_WRITE.LOCK.M_STATE
27H
E0H
L2_WRITE.LOCK.HIT
27H 28H
F0H 01H
L2_WRITE.LOCK.MESI L1D_WB_L2.I_STATE
28H 28H
02H 04H
L1D_WB_L2.S_STATE L1D_WB_L2.E_STATE
28H
08H
L1D_WB_L2.M_STATE
28H 2EH
0FH 41H
L1D_WB_L2.MESI L3_LAT_CACHE.MISS
2EH
4FH
L3_LAT_CACHE.REFERENCE
3CH
00H
CPU_CLK_UNHALTED.THREAD Counts the number of thread cycles while the _P thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. CPU_CLK_UNHALTED.REF_P DTLB_MISSES.ANY Increments at the frequency of TSC when not halted. Counts the number of misses in the STLB which causes a page walk.
3CH 49H
01H 01H
166
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 49H 49H 49H Umask Value 02H 04H 10H Event Mask Mnemonic DTLB_MISSES.WALK_COMPLE TED DTLB_MISSES.WALK_CYCLES DTLB_MISSES.STLB_HIT Description Counts number of misses in the STLB which resulted in a completed page walk. Counts cycles of page walk due to misses in the STLB. Counts the number of DTLB first level misses that hit in the second level TLB. This event is only relevant if the core contains multiple DTLB levels. Number of DTLB misses caused by low part of address, includes references to 2M pages because 2M pages do not use the PDE. Comment
49H
20H
DTLB_MISSES.PDE_MISS
49H 4CH
80H 01H
DTLB_MISSES.LARGE_WALK_C Counts number of completed large page walks due OMPLETED to misses in the STLB. LOAD_HIT_PRE Counts load operations sent to the L1 data cache Counter 0, 1 only while a previous SSE prefetch instruction to the same cache line has started prefetching but has not yet finished. Counts number of hardware prefetch requests dispatched out of the prefetch FIFO. Counter 0, 1 only
4EH 4EH
01H 02H
L1D_PREFETCH.REQUESTS L1D_PREFETCH.MISS
Counts number of hardware prefetch requests that Counter 0, 1 only miss the L1D. There are two prefetchers in the L1D. A streamer, which predicts lines sequentially after this one should be fetched, and the IP prefetcher that remembers access patterns for the current instruction. The streamer prefetcher stops on an L1D hit, while the IP prefetcher does not. Counter 0, 1 only Counts number of prefetch requests triggered by the Finite State Machine and pushed into the prefetch FIFO. Some of the prefetch requests are dropped due to overwrites or competition between the IP index prefetcher and streamer prefetcher. The prefetch FIFO contains 4 entries. Counts Extended Page walk cycles. Counts the number of lines brought into the L1 data Counter 0, 1 only cache. Counts the number of modified lines brought into the L1 data cache. Counts the number of modified lines evicted from the L1 data cache due to replacement. Counts the number of modified lines evicted from the L1 data cache due to snoop HITM intervention. Counter 0, 1 only Counter 0, 1 only Counter 0, 1 only
4EH
04H
L1D_PREFETCH.TRIGGERS
L1D_CACHE_PREFETCH_LOCK Counts the number of cacheable load lock _FB_HIT speculated instructions accepted into the fill buffer.
167
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 60H Umask Value 01H Event Mask Mnemonic Description Comment counter 0
OFFCORE_REQUESTS_OUTST Counts weighted cycles of offcore demand data ANDING.DEMAND.READ_DATA read requests. Does not include L2 prefetch requests. OFFCORE_REQUESTS_OUTST Counts weighted cycles of offcore demand code ANDING.DEMAND.READ_CODE read requests. Does not include L2 prefetch requests. OFFCORE_REQUESTS_OUTST ANDING.DEMAND.RFO OFFCORE_REQUESTS_OUTST ANDING.ANY.READ Counts weighted cycles of offcore demand RFO requests. Does not include L2 prefetch requests. Counts weighted cycles of offcore read requests of any kind. Include L2 prefetch requests.
60H
02H
counter 0
counter 0 counter 0 Counter 0, 1 only. L1D and L2 locks have a very high performance penalty and it is highly recommended to avoid such accesses. Counter 0, 1 only.
CACHE_LOCK_CYCLES.L1D_L2 Cycle count during which the L1D and L2 are locked. A lock is asserted when there is a locked memory access, due to uncacheable memory, a locked operation that spans two cache lines, or a page walk from an uncacheable page table. This event does not cause locks, it merely detects them. CACHE_LOCK_CYCLES.L1D IO_TRANSACTIONS L1I.HITS L1I.MISSES Counts the number of cycles that cacheline in the L1 data cache unit is locked. Counts the number of completed I/O transactions. Counts all instruction fetches that hit the L1 instruction cache. Counts all instruction fetches that miss the L1I cache. This includes instruction cache misses, streaming buffer misses, victim cache misses and uncacheable fetches. An instruction fetch miss is counted only once and not once for every cycle it is outstanding. Counts all instruction fetches, including uncacheable fetches that bypass the L1I. Cycle counts for which an instruction fetch stalls due to a L1I cache miss, ITLB miss or ITLB fault. Counts number of large ITLB hits. Counts the number of misses in all levels of the ITLB which causes a page walk.
ITLB_MISSES.WALK_COMPLET Counts number of misses in all levels of the ITLB ED which resulted in a completed page walk. ITLB_MISSES.WALK_CYCLES ITLB_MISSES.STLB_HIT ITLB_MISSES.LARGE_WALK_C OMPLETED Counts ITLB miss page walk cycles. Counts number of ITLB first level miss but second level hits Counts number of completed large page walks due to misses in the STLB.
168
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 87H Umask Value 01H Event Mask Mnemonic ILD_STALL.LCP Description Cycles Instruction Length Decoder stalls due to length changing prefixes: 66, 67 or REX.W (for EM64T) instructions which change the length of the decoded instruction. Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU) Most Recently Used (MRU) bypass. Stall cycles due to a full instruction queue. Counts the number of regen stalls. Counts any cycles the Instruction Length Decoder is stalled. Counts the number of conditional near branch instructions executed, but not necessarily retired. Counts all unconditional near branch instructions excluding calls and indirect branches. Counts the number of executed indirect near branch instructions that are not calls. Counts all non call near branch instructions executed, but not necessarily retired. Counts indirect near branches that have a return mnemonic. Counts unconditional near call branch instructions, excluding non call branch, executed. Comment
87H
02H
ILD_STALL.MRU
87H 87H 87H 88H 88H 88H 88H 88H 88H 88H 88H 88H 88H
04H 08H 0FH 01H 02H 04H 07H 08H 10H 20H 30H 40H 7FH
ILD_STALL.IQ_FULL ILD_STALL.REGEN ILD_STALL.ANY BR_INST_EXEC.COND BR_INST_EXEC.DIRECT BR_INST_EXEC.INDIRECT_NO N_CALL BR_INST_EXEC.NON_CALLS BR_INST_EXEC.RETURN_NEA R BR_INST_EXEC.DIRECT_NEAR _CALL
BR_INST_EXEC.INDIRECT_NEA Counts indirect near calls, including both register R_CALL and memory indirect, executed. BR_INST_EXEC.NEAR_CALLS BR_INST_EXEC.TAKEN BR_INST_EXEC.ANY Counts all near call branches executed, but not necessarily retired. Counts taken near branches executed, but not necessarily retired. Counts all near executed branches (not necessarily retired). This includes only instructions and not micro-op branches. Frequent branching is not necessarily a major performance issue. However frequent branch mispredictions may be a problem. Counts the number of mispredicted conditional near branch instructions executed, but not necessarily retired. Counts mispredicted macro unconditional near branch instructions, excluding calls and indirect branches (should always be 0). Counts the number of executed mispredicted indirect near branch instructions that are not calls.
89H
01H
BR_MISP_EXEC.COND
89H
02H
BR_MISP_EXEC.DIRECT
89H
04H
BR_MISP_EXEC.INDIRECT_NO N_CALL
169
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. 89H 89H 89H 89H 89H 89H 89H Umask Value 07H 08H 10H 20H 30H 40H 7FH Event Mask Mnemonic BR_MISP_EXEC.NON_CALLS BR_MISP_EXEC.RETURN_NEA R Description Counts mispredicted non call near branches executed, but not necessarily retired. Counts mispredicted indirect branches that have a rear return mnemonic. Comment
BR_MISP_EXEC.DIRECT_NEAR Counts mispredicted non-indirect near calls _CALL executed, (should always be 0). BR_MISP_EXEC.INDIRECT_NE AR_CALL BR_MISP_EXEC.NEAR_CALLS BR_MISP_EXEC.TAKEN BR_MISP_EXEC.ANY Counts mispredicted indirect near calls exeucted, including both register and memory indirect. Counts all mispredicted near call branches executed, but not necessarily retired. Counts executed mispredicted near branches that are taken, but not necessarily retired. Counts the number of mispredicted near branch instructions that were executed, but not necessarily retired. Counts the number of Allocator resource related stalls. Includes register renaming buffer entries, memory buffer entries. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations. Counts the cycles of stall due to lack of load buffer for load operation. This event counts the number of cycles when the number of instructions in the pipeline waiting for execution reaches the limit the processor can handle. A high count of this event indicates that there are long latency operations in the pipe (possibly load and store operations that miss the L2 cache, or instructions dependent upon instructions further down the pipeline that have yet to retire. This event counts the number of cycles that a resource related stall will occur due to the number of store instructions reaching the limit of the pipeline, (i.e. all store buffers are used). The stall ends when a store instruction commits its data to the cache or memory. Counts the number of cycles while execution was stalled due to writing the floating-point unit (FPU) control word. When RS is full, new instructions can not enter the reservation station and start execution. Does not include stalls due to SuperQ (off core) queue full, too many cache misses, etc.
A2H
01H
RESOURCE_STALLS.ANY
A2H A2H
02H 04H
RESOURCE_STALLS.LOAD RESOURCE_STALLS.RS_FULL
A2H
08H
RESOURCE_STALLS.STORE
A2H A2H
10H 20H
RESOURCE_STALLS.ROB_FULL Counts the cycles of stall due to re-order buffer full. RESOURCE_STALLS.FPCW
170
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. A2H Umask Value 40H Event Mask Mnemonic RESOURCE_STALLS.MXCSR Description Stalls due to the MXCSR register rename occurring to close to a previous MXCSR rename. The MXCSR provides control and status for the MMX registers. Counts the number of cycles while execution was stalled due to other resource issues. Comment
A2H A6H
80H 01H
RESOURCE_STALLS.OTHER
MACRO_INSTS.FUSIONS_DECO Counts the number of instructions decoded that are DED macro-fused but not necessarily executed or retired. BACLEAR_FORCE_IQ Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ is also responsible for providing conditional branch prediciton direction based on a static scheme and dynamic data provided by the L2 Branch Prediction Unit. If the conditional branch target is not found in the Target Array and the IQ predicts that the branch is taken, then the IQ will force the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble in the instruction fetch pipeline. Counts the number of micro-ops delivered by loop stream detector. Counts the number of ITLB flushes. Counts number of offcore demand data read requests. Does not count L2 prefetch requests. Counts number of offcore demand code read requests. Does not count L2 prefetch requests. Counts number of offcore demand RFO requests. Does not count L2 prefetch requests. Use cmask=1 and invert to count cycles
A7H
01H
A8H AEH B0H B0H B0H B0H B0H B0H B0H B1H
01H 01H 01H 02H 04H 08H 10H 40H 80H 01H
OFFCORE_REQUESTS.ANY.REA Counts number of offcore read requests. Includes D L2 prefetch requests. OFFCORE_REQUESTS.ANY.RFO Counts number of offcore RFO requests. Includes L2 prefetch requests. OFFCORE_REQUESTS.L1D_WR Counts number of L1D writebacks to the uncore. ITEBACK OFFCORE_REQUESTS.ANY UOPS_EXECUTED.PORT0 Counts all offcore requests. Counts number of Uops executed that were issued on port 0. Port 0 handles integer arithmetic, SIMD and FP add Uops. Counts number of Uops executed that were issued on port 1. Port 1 handles integer arithmetic, SIMD, integer shift, FP multiply and FP divide Uops.
B1H
02H
UOPS_EXECUTED.PORT1
B1H
04H
UOPS_EXECUTED.PORT2_COR Counts number of Uops executed that were issued E on port 2. Port 2 handles the load Uops. This is a core count only and can not be collected per thread.
171
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. B1H Umask Value 08H Event Mask Mnemonic Description Comment
UOPS_EXECUTED.PORT3_COR Counts number of Uops executed that were issued E on port 3. Port 3 handles store Uops. This is a core count only and can not be collected per thread. UOPS_EXECUTED.PORT4_COR Counts number of Uops executed that where issued E on port 4. Port 4 handles the value to be stored for the store Uops issued on port 3. This is a core count only and can not be collected per thread. UOPS_EXECUTED.CORE_ACTI VE_CYCLES_NO_PORT5 Counts number of cycles there are one or more uops being executed and were issued on ports 0-4. This is a core count only and can not be collected per thread. Counts number of Uops executed that where issued on port 5. Counts number of cycles there are one or more uops being executed on any ports. This is a core count only and can not be collected per thread. Counts number of Uops executed that where issued use cmask=1, invert=1 on port 0, 1, or 5. to count stall cycles Counts number of Uops executed that where issued on port 2, 3, or 4.
B1H
10H
B1H
1FH
B1H B1H
20H 3FH
OFFCORE_REQUESTS_SQ_FUL Counts number of cycles the SQ is full to handle offL core requests. SNOOPQ_REQUESTS_OUTSTA Counts weighted cycles of snoopq requests for NDING.DATA data. Counter 0 only. SNOOPQ_REQUESTS_OUTSTA Counts weighted cycles of snoopq invalidate NDING.INVALIDATE requests. Counter 0 only. SNOOPQ_REQUESTS_OUTSTA Counts weighted cycles of snoopq requests for NDING.CODE code. Counter 0 only. SNOOPQ_REQUESTS.CODE SNOOPQ_REQUESTS.DATA SNOOPQ_REQUESTS.INVALID ATE OFF_CORE_RESPONSE_0 SNOOP_RESPONSE.HIT SNOOP_RESPONSE.HITE SNOOP_RESPONSE.HITM Counts the number of snoop code requests. Counts the number of snoop data requests. Counts the number of snoop invalidate requests. see Section 18.6.1.3, Off-core Response Performance Monitoring in the Processor Core Counts HIT snoop response sent by this thread in response to a snoop request. Counts HIT E snoop response sent by this thread in response to a snoop request. Counts HIT M snoop response sent by this thread in response to a snoop request. Requires programming MSR 01A6H Use cmask=1 to count cycles not empty. Use cmask=1 to count cycles not empty. Use cmask=1 to count cycles not empty.
B3H
02H
B3H
04H
172
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. BBH C0H Umask Value 01H 00H Event Mask Mnemonic OFF_CORE_RESPONSE_1 INST_RETIRED.ANY_P Description see Section 18.6.1.3, Off-core Response Performance Monitoring in the Processor Core See Table 19-1 Notes: INST_RETIRED.ANY is counted by a designated fixed counter. INST_RETIRED.ANY_P is counted by a programmable counter and is an architectural performance event. Event is supported if CPUID.A.EBX[1] = 0. C0H 02H INST_RETIRED.X87 Counts the number of floating point computational operations retired: floating point computational operations executed by the assist handler and suboperations of complex floating point instructions like transcendental instructions. Counts the number of retired: MMX instructions. Counts the number of micro-ops retired, (macroUse cmask=1 and invert fused=1, micro-fused=2, others=1; maximum count to count active cycles or of 8 per cycle). Most instructions are composed of stalled cycles one or two micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. Counts the number of retirement slots used each cycle Counts number of macro-fused uops retired. Counts the cycles machine clear is asserted. Comment Use MSR 01A7H Counting: Faulting executions of GETSEC/ VM entry/VM Exit/MWait will not count as retired instructions.
C0H C2H
04H 01H
INST_RETIRED.MMX UOPS_RETIRED.ANY
MACHINE_CLEARS.MEM_ORDE Counts the number of machine clears due to R memory order conflicts. MACHINE_CLEARS.SMC Counts the number of times that a program writes to a code section. Self-modifying code causes a sever penalty in all Intel 64 and IA-32 processors. The modified cache line is written back to the L2 and L3caches. See Table 19-1
BR_INST_RETIRED.ALL_BRAN Branch instructions at retirement CHES BR_INST_RETIRED.CONDITION Counts the number of conditional branch AL instructions retired. BR_INST_RETIRED.NEAR_CAL Counts the number of direct & indirect near L unconditional calls retired. BR_MISP_RETIRED.ALL_BRAN Mispredicted branch instructions at retirement CHES BR_MISP_RETIRED.CONDITION Counts mispredicted conditional retired calls. AL
173
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. C5H C5H C7H C7H C7H C7H C7H C8H CBH CBH CBH CBH Umask Value 02H 04H 01H 02H 04H 08H 10H 20H 01H 02H 04H 08H Event Mask Mnemonic Description Comment
BR_MISP_RETIRED.NEAR_CAL Counts mispredicted direct & indirect near L unconditional retired calls. BR_MISP_RETIRED.ALL_BRAN Counts all mispredicted retired calls. CHES SSEX_UOPS_RETIRED.PACKED Counts SIMD packed single-precision floating point _SINGLE Uops retired. SSEX_UOPS_RETIRED.SCALAR Counts SIMD calar single-precision floating point _SINGLE Uops retired. SSEX_UOPS_RETIRED.PACKED Counts SIMD packed double-precision floating point _DOUBLE Uops retired. SSEX_UOPS_RETIRED.SCALAR Counts SIMD scalar double-precision floating point _DOUBLE Uops retired. SSEX_UOPS_RETIRED.VECTOR Counts 128-bit SIMD vector integer Uops retired. _INTEGER ITLB_MISS_RETIRED Counts the number of retired instructions that missed the ITLB when the instruction was fetched.
MEM_LOAD_RETIRED.L1D_HIT Counts number of retired loads that hit the L1 data cache. MEM_LOAD_RETIRED.L2_HIT MEM_LOAD_RETIRED.L3_UNS HARED_HIT Counts number of retired loads that hit the L2 data cache. Counts number of retired loads that hit their own, unshared lines in the L3 cache.
MEM_LOAD_RETIRED.OTHER_ Counts number of retired loads that hit in a sibling CORE_L2_HIT_HITM core's L2 (on die core). Since the L3 is inclusive of all cores on the package, this is an L3 hit. This counts both clean or modified hits. MEM_LOAD_RETIRED.L3_MISS Counts number of retired loads that miss the L3 cache. The load was satisfied by a remote socket, local memory or an IOH. MEM_LOAD_RETIRED.HIT_LFB Counts number of retired loads that miss the L1D and the address is located in an allocated line fill buffer and will soon be committed to cache. This is counting secondary L1D misses. MEM_LOAD_RETIRED.DTLB_MI Counts the number of retired loads that missed the SS DTLB. The DTLB miss is not counted if the load operation causes a fault. This event counts loads from cacheable memory only. The event does not count loads by software prefetches. Counts both primary and secondary misses to the TLB. FP_MMX_TRANS.TO_FP Counts the first floating-point instruction following any MMX instruction. You can use this event to estimate the penalties for the transitions between floating-point and MMX technology states.
CBH
10H
CBH
40H
CBH
80H
CCH
01H
174
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. CCH Umask Value 02H Event Mask Mnemonic FP_MMX_TRANS.TO_MMX Description Counts the first MMX instruction following a floating-point instruction. You can use this event to estimate the penalties for the transitions between floating-point and MMX technology states. Counts all transitions from floating point to MMX instructions and from MMX instructions to floating point instructions. You can use this event to estimate the penalties for the transitions between floating-point and MMX technology states. Counts the number of instructions decoded, (but not necessarily executed or retired). Comment
CCH
03H
FP_MMX_TRANS.ANY
MACRO_INSTS.DECODED
UOPS_DECODED.STALL_CYCLE Counts the cycles of decoder stalls. INV=1, Cmask= S 1 UOPS_DECODED.MS Counts the number of Uops decoded by the Microcode Sequencer, MS. The MS delivers uops when the instruction is more than 4 uops long or a microcode assist is occurring. Counts number of stack pointer (ESP) instructions decoded: push , pop , call , ret, etc. ESP instructions do not generate a Uop to increment or decrement ESP. Instead, they update an ESP_Offset register that keeps track of the delta to the current value of the ESP register. Counts number of stack pointer (ESP) sync operations where an ESP instruction is corrected by adding the ESP offset register to the current value of the ESP register. Counts the number of cycles during which execution stalled due to several reasons, one of which is a partial flag register stall. A partial register stall may occur when two conditions are met: 1) an instruction modifies some, but not all, of the flags in the flag register and 2) the next instruction, which depends on flags, depends on flags that were not modified by this instruction. This event counts the number of cycles instruction execution latency became longer than the defined latency because the instruction used a register that was partially written by previous instruction.
D1H
04H
UOPS_DECODED.ESP_FOLDIN G
D1H
08H
UOPS_DECODED.ESP_SYNC
D2H
01H
RAT_STALLS.FLAGS
D2H
02H
RAT_STALLS.REGISTERS
175
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. D2H Umask Value 04H Event Mask Mnemonic RAT_STALLS.ROB_READ_POR T Description Counts the number of cycles when ROB read port stalls occurred, which did not allow new micro-ops to enter the out-of-order pipeline. Note that, at this stage in the pipeline, additional stalls may occur at the same cycle and prevent the stalled micro-ops from entering the pipe. In such a case, micro-ops retry entering the execution pipe in the next cycle and the ROB-read port stall is counted again. Counts the cycles where we stall due to microarchitecturally required serialization. Microcode scoreboarding stalls. Counts all Register Allocation Table stall cycles due to: Cycles when ROB read port stalls occurred, which did not allow new micro-ops to enter the execution pipe. Cycles when partial register stalls occurred Cycles when flag stalls occurred Cycles floating-point unit (FPU) status word stalls occurred. To count each of these conditions separately use the events: RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and RAT_STALLS.FPSW. Counts the number of stall cycles due to the lack of renaming resources for the ES, DS, FS, and GS segment registers. If a segment is renamed but not retired and a second update to the same segment occurs, a stall occurs in the front-end of the pipeline until the renamed segment retires. Counts the number of times the ES segment register is renamed. Counts unfusion events due to floating point exception to a fused uop. Counts the number of branch instructions decoded. Counts number of times the Branch Prediciton Unit missed predicting a call or return branch. Counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end. This can occur if the code has many branches such that they cannot be consumed by the BPU. Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble in the instruction fetch pipeline. The effect on total execution time depends on the surrounding code. Comment
D2H
08H
RAT_STALLS.SCOREBOARD
D2H
0FH
RAT_STALLS.ANY
D4H
01H
SEG_RENAME_STALLS
176
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. E6H Umask Value 02H Event Mask Mnemonic BACLEAR.BAD_TARGET Description Counts number of Branch Address Calculator clears (BACLEAR) asserted due to conditional branch instructions in which there was a target hit but the direction was wrong. Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble in the instruction fetch pipeline. Counts early (normal) Branch Prediction Unit clears: The BPU clear leads to 2 BPU predicted a taken branch after incorrectly cycle bubble in the Front assuming that it was not taken. End. Counts late Branch Prediction Unit clears due to Most Recently Used conflicts. The PBU clear leads to a 3 cycle bubble in the Front End. Counts cycles threads are active. Counts L2 load operations due to HW prefetch or demand loads. Counts L2 RFO operations due to HW prefetch or demand RFOs. Counts L2 instruction fetch operations due to HW prefetch or demand ifetch. Counts L2 prefetch operations. Counts L1D writeback operations to the L2. Counts L2 cache line fill operations due to load, RFO, L1D writeback or prefetch. Counts L2 writeback operations to the L3. Counts all L2 cache operations. Counts the number of cache lines allocated in the L2 cache in the S (shared) state. Counts the number of cache lines allocated in the L2 cache in the E (exclusive) state. Counts the number of cache lines allocated in the L2 cache. Comment
E8H
01H
BPU_CLEARS.EARLY
E8H
02H
BPU_CLEARS.LATE
ECH F0H F0H F0H F0H F0H F0H F0H F0H F1H F1H F1H F2H F2H F2H F2H F2H
01H 01H 02H 04H 08H 10H 20H 40H 80H 02H 04H 07H 01H 02H 04H 08H 0FH
THREAD_ACTIVE L2_TRANSACTIONS.LOAD L2_TRANSACTIONS.RFO L2_TRANSACTIONS.IFETCH L2_TRANSACTIONS.PREFETC H L2_TRANSACTIONS.L1D_WB L2_TRANSACTIONS.FILL L2_TRANSACTIONS.WB L2_TRANSACTIONS.ANY L2_LINES_IN.S_STATE L2_LINES_IN.E_STATE L2_LINES_IN.ANY
L2_LINES_OUT.DEMAND_CLEA Counts L2 clean cache lines evicted by a demand N request. L2_LINES_OUT.DEMAND_DIRT Counts L2 dirty (modified) cache lines evicted by a Y demand request. L2_LINES_OUT.PREFETCH_CL EAN Counts L2 clean cache line evicted by a prefetch request.
L2_LINES_OUT.PREFETCH_DIR Counts L2 modified cache line evicted by a prefetch TY request. L2_LINES_OUT.ANY Counts all L2 cache lines evicted for any reason.
177
Table 19-13. Non-Architectural Performance Events In the Processor Core for Processors Based on Intel Microarchitecture Code Name Westmere (Contd.)
Event Num. F4H F4H F6H Umask Value 04H 10H 01H Event Mask Mnemonic SQ_MISC.LRU_HINTS SQ_MISC.SPLIT_LOCK SQ_FULL_STALL_CYCLES Description Counts number of Super Queue LRU hints sent to L3. Counts the number of SQ lock splits across a cache line. Counts cycles the Super Queue is full. Neither of the threads on this core will be able to access the uncore. Counts the number of floating point operations executed that required micro-code assist intervention. Assists are required in the following cases: SSE instructions, (Denormal input when the DAZ flag is off or Underflow result when the FTZ flag is off): x87 instructions, (NaN or denormal are loaded to a register or used as input from memory, Division by 0 or Underflow output). Counts number of floating point micro-code assist when the output value (destination register) is invalid. Counts number of floating point micro-code assist when the input value (one of the source operands to an FP instruction) is invalid. Counts number of SID integer 64 bit packed multiply operations. Counts number of SID integer 64 bit packed shift operations. Counts number of SID integer 64 bit pack operations. Counts number of SID integer 64 bit unpack operations. Comment
F7H
01H
FP_ASSIST.ALL
F7H
02H
FP_ASSIST.OUTPUT
F7H
04H
FP_ASSIST.INPUT
SIMD_INT_64.PACKED_LOGICA Counts number of SID integer 64 bit logical L operations. SIMD_INT_64.PACKED_ARITH Counts number of SID integer 64 bit arithmetic operations.
...
178
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture
Event Num 03H Umask Value 02H Event Name LOAD_BLOCK.STA Definition Loads blocked by a preceding store with unknown address Description and Comment This event indicates that loads are blocked by preceding stores. A load is blocked when there is a preceding store to an address that is not yet calculated. The number of events is greater or equal to the number of load operations that were blocked. If the load and the store are always to different addresses, check why the memory disambiguation mechanism is not working. To avoid such blocks, increase the distance between the store and the following load so that the store address is known at the time the load is dispatched. 03H 04H LOAD_BLOCK.STD Loads blocked by a preceding store with unknown data This event indicates that loads are blocked by preceding stores. A load is blocked when there is a preceding store to the same address and the stored data value is not yet known. The number of events is greater or equal to the number of load operations that were blocked. To avoid such blocks, increase the distance between the store and the dependant load, so that the store data is known at the time the load is dispatched. 03H 08H LOAD_BLOCK. OVERLAP_STORE Loads that partially overlap an earlier store, or 4-Kbyte aliased with a previous store This event indicates that loads are blocked due to a variety of reasons. Some of the triggers for this event are when a load is blocked by a preceding store, in one of the following: Some of the loaded byte locations are written by the preceding store and some are not. The load is from bytes written by the preceding store, the store is aligned to its size and either: The loads data size is one or two bytes and it is not aligned to the store. The loads data size is of four or eight bytes and the load is misaligned. The load is from bytes written by the preceding store, the store is misaligned and the load is not aligned on the beginning of the store. The load is split over an eight byte boundary (excluding 16-byte loads). The load and store have the same offset relative to the beginning of different 4-KByte pages. This case is also called 4-KByte aliasing. In all these cases the load is blocked until after the blocking store retires and the stored data is committed to the cache hierarchy. 03H 10H LOAD_BLOCK. UNTIL_RETIRE Loads blocked until retirement This event indicates that load operations were blocked until retirement. The number of events is greater or equal to the number of load operations that were blocked. This includes mainly uncacheable loads and split loads (loads that cross the cache line boundary) but may include other cases where loads are blocked until retirement.
179
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 03H Umask Value 20H Event Name LOAD_BLOCK.L1D Definition Loads blocked by the L1 data cache Description and Comment This event indicates that loads are blocked due to one or more reasons. Some triggers for this event are: The number of L1 data cache misses exceeds the maximum number of outstanding misses supported by the processor. This includes misses generated as result of demand fetches, software prefetches or hardware prefetches. Cache line split loads. Partial reads, such as reads to un-cacheable memory, I/O instructions and more. A locked load operation is in progress. The number of events is greater or equal to the number of load operations that were blocked. 04H 01H SB_DRAIN_ CYCLES Cycles while stores are This event counts every cycle during which the store buffer blocked due to store is draining. This includes: buffer drain Serializing operations such as CPUID Synchronizing operations such as XCHG Interrupt acknowledgment Other conditions, such as cache flushing Cycles while store is waiting for a preceding store to be globally observed This event counts the total duration, in number of cycles, which stores are waiting for a preceding stored cache line to be observed by other cores. This situation happens as a result of the strong store ordering behavior, as defined in Memory Ordering, Chapter 8, Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A. The stall may occur and be noticeable if there are many cases when a store either misses the L1 data cache or hits a cache line in the Shared state. If the store requires a bus transaction to read the cache line then the stall ends when snoop response for the bus transaction arrives. 04H 08H STORE_BLOCK. SNOOP A store is blocked due to a conflict with an external or internal snoop. Number of segment register loads This event counts the number of cycles the store port was used for snooping the L1 data cache and a store was stalled by the snoop. The store is typically resubmitted one cycle later. This event counts the number of segment register load operations. Instructions that load new values into segment registers cause a penalty. This event indicates performance issues in 16-bit code. If this event occurs frequently, it may be useful to calculate the number of instructions retired per segment register load. If the resulting calculation is low (on average a small number of instructions are executed between segment register loads), then the codes segment register usage should be optimized.
04H
02H
STORE_BLOCK. ORDER
06H
00H
SEGMENT_REG_ LOADS
180
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num Umask Value Event Name Definition Description and Comment As a result of branch misprediction, this event is speculative and may include segment register loads that do not actually occur. However, most segment register loads are internally serialized and such speculative effects are minimized. 07H 00H SSE_PRE_EXEC. NTA Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed Streaming SIMD Extensions (SSE) Weakly-ordered store instructions executed This event counts the number of times the SSE instruction prefetchNTA is executed. This instruction prefetches the data to the L1 data cache. This event counts the number of times the SSE instruction prefetchT0 is executed. This instruction prefetches the data to the L1 data cache and L2 cache. This event counts the number of times the SSE instructions prefetchT1 and prefetchT2 are executed. These instructions prefetch the data to the L2 cache.
07H
01H
SSE_PRE_EXEC.L1
07H
02H
SSE_PRE_EXEC.L2
07H
03H
SSE_PRE_ EXEC.STORES
This event counts the number of times SSE non-temporal store instructions are executed.
08H
01H
DTLB_MISSES. ANY
Memory accesses that This event counts the number of Data Table Lookaside missed the DTLB Buffer (DTLB) misses. The count includes misses detected as a result of speculative accesses. Typically a high count for this event indicates that the code accesses a large number of data pages.
08H
02H
DTLB_MISSES .MISS_LD
This event counts the number of Data Table Lookaside Buffer (DTLB) misses due to load operations. This count includes misses detected as a result of speculative accesses.
08H
04H
DTLB_MISSES.L0_MISS_LD
L0 DTLB misses due to This event counts the number of level 0 Data Table load operations Lookaside Buffer (DTLB0) misses due to load operations. This count includes misses detected as a result of speculative accesses. Loads that miss that DTLB0 and hit the DTLB1 can incur two-cycle penalty.
08H
08H
DTLB_MISSES. MISS_ST
This event counts the number of Data Table Lookaside Buffer (DTLB) misses due to store operations. This count includes misses detected as a result of speculative accesses. Address translation for store operations is performed in the DTLB1.
09H
01H
MEMORY_ DISAMBIGUATION.RESET
This event counts the number of cycles during which memory disambiguation misprediction occurs. As a result the execution pipeline is cleaned and execution of the mispredicted load instruction and all succeeding instructions restarts.
181
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num Umask Value Event Name Definition Description and Comment This event occurs when the data address accessed by a load instruction, collides infrequently with preceding stores, but usually there is no collision. It happens rarely, and may have a penalty of about 20 cycles. 09H 02H MEMORY_DISAMBIGUATIO N.SUCCESS PAGE_WALKS .COUNT Number of loads successfully disambiguated. This event counts the number of load operations that were successfully disambiguated. Loads are preceded by a store with an unknown address, but they are not blocked.
0CH
01H
Number of page-walks This event counts the number of page-walks executed due executed to either a DTLB or ITLB miss. The page walk duration, PAGE_WALKS.CYCLES, divided by number of page walks is the average duration of a page walk. The average can hint whether most of the page-walks are satisfied by the caches or cause an L2 cache miss.
0CH
02H
PAGE_WALKS. CYCLES
This event counts the duration of page-walks in core cycles. The paging mode in use typically affects the duration of page walks. Page walk duration divided by number of page walks is the average duration of page-walks. The average can hint at whether most of the page-walks are satisfied by the caches or cause an L2 cache miss.
10H
00H
This event counts the number of floating point computational micro-ops executed. Use IA32_PMC0 only. This event counts the number of floating point operations executed that required micro-code assist intervention. Assists are required in the following cases: Streaming SIMD Extensions (SSE) instructions: Denormal input when the DAZ (Denormals Are Zeros) flag is off Underflow result when the FTZ (Flush To Zero) flag is off X87 instructions: NaN or denormal are loaded to a register or used as input from memory Division by 0 Underflow output Use IA32_PMC1 only.
11H
00H
12H
00H
MUL
This event counts the number of multiply operations executed. This includes integer as well as floating point multiply operations. Use IA32_PMC1 only. This event counts the number of divide operations executed. This includes integer divides, floating point divides and square-root operations executed. Use IA32_PMC1 only.
13H
00H
DIV
182
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 14H Umask Value 00H Event Name CYCLES_DIV _BUSY Definition Cycles the divider busy Description and Comment This event counts the number of cycles the divider is busy executing divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE. Use IA32_PMC0 only. 18H 00H IDLE_DURING _DIV Cycles the divider is busy and all other execution units are idle. This event counts the number of cycles the divider is busy (with a divide or a square root operation) and no other execution unit or load operation is in progress. Load operations are assumed to hit the L1 data cache. This event considers only micro-ops dispatched after the divider started operating. Use IA32_PMC0 only. 19H 00H DELAYED_ BYPASS.FP Delayed bypass to FP operation This event counts the number of times floating point operations use data immediately after the data was generated by a non-floating point execution unit. Such cases result in one penalty cycle due to data bypass between the units. Use IA32_PMC1 only. 19H 01H DELAYED_ BYPASS.SIMD Delayed bypass to SIMD operation This event counts the number of times SIMD operations use data immediately after the data was generated by a nonSIMD execution unit. Such cases result in one penalty cycle due to data bypass between the units. Use IA32_PMC1 only. 19H 02H DELAYED_ BYPASS.LOAD Delayed bypass to load operation This event counts the number of delayed bypass penalty cycles that a load operation incurred. When load operations use data immediately after the data was generated by an integer execution unit, they may (pending on certain dynamic internal conditions) incur one penalty cycle due to delayed data bypass between the units. Use IA32_PMC1 only. 21H See Table 18-2 See Table 18-2 L2_ADS.(Core) Cycles L2 address bus is in use Cycles the L2 transfers data to the core This event counts the number of cycles the L2 address bus is being used for accesses to the L2 cache or bus queue. It can count occurrences for this core or both cores. This event counts the number of cycles during which the L2 data bus is busy transferring data from the L2 cache to the core. It counts for all L1 cache misses (data and instruction) that hit the L2 cache. This event can count occurrences for this core or both cores.
23H
L2_DBUS_BUSY _RD.(Core)
183
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 24H Umask Value Combined mask from Table 18-2 and Table 18-4 See Table 18-2 See Table 18-2 and Table 18-4 See Table 18-2 and Table 18-4 Combined mask from Table 18-2 and Table 18-5 Event Name L2_LINES_IN. (Core, Prefetch) Definition L2 cache misses Description and Comment This event counts the number of cache lines allocated in the L2 cache. Cache lines are allocated in the L2 cache as a result of requests from the L1 data and instruction caches and the L2 hardware prefetchers to cache lines that are missing in the L2 cache. This event can count occurrences for this core or both cores. It can also count demand requests and L2 hardware prefetch requests together or separately. L2_M_LINES_IN. (Core) L2_LINES_OUT. (Core, Prefetch) L2 cache line modifications This event counts whenever a modified cache line is written back from the L1 data cache to the L2 cache. This event can count occurrences for this core or both cores. L2 cache lines evicted This event counts the number of L2 cache lines evicted. This event can count occurrences for this core or both cores. It can also count evictions due to demand requests and L2 hardware prefetch requests together or separately.
25H
26H
27H
L2_M_LINES_OUT.(Core, Prefetch)
This event counts the number of L2 modified cache lines evicted. These lines are written back to memory unless they also exist in a modified-state in one of the L1 data caches. This event can count occurrences for this core or both cores. It can also count evictions due to demand requests and L2 hardware prefetch requests together or separately.
28H
This event counts the number of instruction cache line requests from the IFU. It does not include fetch requests from uncacheable memory. It does not include ITLB miss accesses. This event can count occurrences for this core or both cores. It can also count accesses to cache lines at different MESI states.
29H
Combin L2_LD.(Core, Prefetch, ed mask Cache Line State) from Table 18-2, Table 18-4, and Table 18-5
L2 cache reads
This event counts L2 cache read requests coming from the L1 data cache and L2 prefetchers. The event can count occurrences: for this core or both cores due to demand requests and L2 hardware prefetch requests together or separately of accesses to cache lines at different MESI states
184
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 2AH Umask Value See Table 18-2 and Table 18-5 See Table 18-2 and Table 18-5 See Table 18-2, Table 18-4, and Table 18-5 41H Event Name L2_ST.(Core, Cache Line State) Definition L2 store requests Description and Comment This event counts all store operations that miss the L1 data cache and request the data from the L2 cache. The event can count occurrences for this core or both cores. It can also count accesses to cache lines at different MESI states. L2_LOCK.(Core, Cache Line State) L2 locked accesses This event counts all locked accesses to cache lines that miss the L1 data cache. The event can count occurrences for this core or both cores. It can also count accesses to cache lines at different MESI states. L2_RQSTS.(Core, Prefetch, Cache Line State) L2 cache requests This event counts all completed L2 cache requests. This includes L1 data cache reads, writes, and locked accesses, L1 data prefetch requests, instruction fetches, and all L2 hardware prefetch requests. This event can count occurrences: for this core or both cores. due to demand requests and L2 hardware prefetch requests together, or separately of accesses to cache lines at different MESI states L2_RQSTS.SELF. DEMAND.I_STATE L2 cache demand requests from this core that missed the L2 L2 cache demand requests from this core This event counts all completed L2 cache demand requests from this core that miss the L2 cache. This includes L1 data cache reads, writes, and locked accesses, L1 data prefetch requests, and instruction fetches. This is an architectural performance event. 2EH 4FH L2_RQSTS.SELF. DEMAND.MESI This event counts all completed L2 cache demand requests from this core. This includes L1 data cache reads, writes, and locked accesses, L1 data prefetch requests, and instruction fetches. This is an architectural performance event. 30H See Table 18-2, Table 18-4, and Table 18-5 L2_REJECT_BUSQ.(Core, Rejected L2 cache Prefetch, Cache Line State) requests This event indicates that a pending L2 cache request that requires a bus transaction is delayed from moving to the bus queue. Some of the reasons for this event are: The bus queue is full. The bus queue already holds an entry for a cache line in the same set. The number of events is greater or equal to the number of requests that were rejected. for this core or both cores. due to demand requests and L2 hardware prefetch requests together, or separately. of accesses to cache lines at different MESI states.
2BH
2EH
2EH
185
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 32H Umask Value See Table 18-2 Event Name L2_NO_REQ.(Core) Definition Cycles no L2 cache requests are pending Description and Comment This event counts the number of cycles that no L2 cache requests were pending from a core. When using the BOTH_CORE modifier, the event counts only if none of the cores have a pending request. The event counts also when one core is halted and the other is not halted. The event can count occurrences for this core or both cores. 3AH 00H EIST_TRANS Number of Enhanced Intel SpeedStep Technology (EIST) transitions This event counts the number of transitions that include a frequency change, either with or without voltage change. This includes Enhanced Intel SpeedStep Technology (EIST) and TM2 transitions. The event is incremented only while the counting core is in C0 state. Since transitions to higher-numbered CxE states and TM2 transitions include a frequency change or voltage transition, the event is incremented accordingly. 3BH C0H THERMAL_TRIP Number of thermal trips This event counts the number of thermal trips. A thermal trip occurs whenever the processor temperature exceeds the thermal trip threshold temperature. Following a thermal trip, the processor automatically reduces frequency and voltage. The processor checks the temperature every millisecond and returns to normal when the temperature falls below the thermal trip threshold temperature. 3CH 00H CPU_CLK_ UNHALTED. CORE_P Core cycles when core This event counts the number of core cycles while the core is not halted is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason, this event may have a changing ratio in regard to time. When the core frequency is constant, this event can give approximate elapsed time while the core not in halt state. This is an architectural performance event. 3CH 01H CPU_CLK_ UNHALTED.BUS Bus cycles when core is not halted This event counts the number of bus cycles while the core is not in the halt state. This event can give a measurement of the elapsed time while the core was not in the halt state. The core enters the halt state when it is running the HLT instruction. The event also has a constant ratio with CPU_CLK_UNHALTED.REF event, which is the maximum bus to processor frequency ratio. Non-halted bus cycles are a component in many key event ratios.
186
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 3CH Umask Value 02H Event Name CPU_CLK_ UNHALTED.NO _OTHER Definition Description and Comment
Bus cycles when core This event counts the number of bus cycles during which is active and the other the core remains non-halted and the other core on the is halted processor is halted. This event can be used to determine the amount of parallelism exploited by an application or a system. Divide this event count by the bus frequency to determine the amount of time that only one core was in use.
40H
See Table 18-5 See Table 18-5 See Table 18-5 10H
L1D_CACHE_LD. (Cache Line State) L1D_CACHE_ST. (Cache Line State) L1D_CACHE_ LOCK.(Cache Line State) L1D_CACHE_ LOCK_DURATION
L1 cacheable data reads L1 cacheable data writes L1 data cacheable locked reads Duration of L1 data cacheable locked operation
This event counts the number of data reads from cacheable memory. Locked reads are not counted. This event counts the number of data writes to cacheable memory. Locked writes are not counted. This event counts the number of locked data reads from cacheable memory. This event counts the number of cycles during which any cache line is locked by any locking instruction. Locking happens at retirement and therefore the event does not occur for instructions that are speculatively executed. Locking duration is shorter than locked instruction execution duration. This event counts all references to the L1 data cache, including all loads and stores with any memory types. The event counts memory accesses only when they are actually performed. For example, a load blocked by unknown store address and later performed is only counted once. The event includes non-cacheable accesses, such as I/O accesses.
41H
42H
42H
43H
01H
L1D_ALL_REF
43H
02H
L1D_ALL_ CACHE_REF
This event counts the number of data reads and writes from cacheable memory, including locked operations. This event is a sum of: L1D_CACHE_LD.MESI L1D_CACHE_ST.MESI L1D_CACHE_LOCK.MESI
45H 46H
0FH 00H
L1D_REPL L1D_M_REPL
Cache lines allocated in the L1 data cache Modified cache lines allocated in the L1 data cache Modified cache lines evicted from the L1 data cache
This event counts the number of lines brought into the L1 data cache. This event counts the number of modified lines brought into the L1 data cache. This event counts the number of modified lines evicted from the L1 data cache, whether due to replacement or by snoop HITM intervention.
47H
00H
L1D_M_EVICT
187
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 48H Umask Value 00H Event Name L1D_PEND_ MISS Definition Total number of outstanding L1 data cache misses at any cycle Description and Comment This event counts the number of outstanding L1 data cache misses at any cycle. An L1 data cache miss is outstanding from the cycle on which the miss is determined until the first chunk of data is available. This event counts: all cacheable demand requests L1 data cache hardware prefetch requests requests to write through memory requests to write combine memory Uncacheable requests are not counted. The count of this event divided by the number of L1 data cache misses, L1D_REPL, is the average duration in core cycles of an L1 data cache miss. 49H 01H L1D_SPLIT.LOADS Cache line split loads from the L1 data cache Cache line split stores to the L1 data cache Streaming SIMD Extensions (SSE) Prefetch NTA instructions missing all cache levels Streaming SIMD Extensions (SSE) PrefetchT0 instructions missing all cache levels Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions missing all cache levels Load operations conflicting with a software prefetch to the same address This event counts the number of load operations that span two cache lines. Such load operations are also called split loads. Split load operations are executed at retirement. This event counts the number of store operations that span two cache lines. This event counts the number of times the SSE instructions prefetchNTA were executed and missed all cache levels. Due to speculation an executed instruction might not retire. This instruction prefetches the data to the L1 data cache. This event counts the number of times the SSE instructions prefetchT0 were executed and missed all cache levels. Due to speculation executed instruction might not retire. The prefetchT0 instruction prefetches data to the L2 cache and L1 data cache. This event counts the number of times the SSE instructions prefetchT1 and prefetchT2 were executed and missed all cache levels. Due to speculation, an executed instruction might not retire. The prefetchT1 and PrefetchNT2 instructions prefetch data to the L2 cache. This event counts load operations sent to the L1 data cache while a previous Streaming SIMD Extensions (SSE) prefetch instruction to the same cache line has started prefetching but has not yet finished.
49H 4BH
02H 00H
4BH
01H
SSE_PRE_ MISS.L1
4BH
02H
SSE_PRE_ MISS.L2
4CH
00H
LOAD_HIT_PRE
4EH
10H
L1D_PREFETCH. REQUESTS
L1 data cache prefetch This event counts the number of times the L1 data cache requests requested to prefetch a data cache line. Requests can be rejected when the L2 cache is busy and resubmitted later or lost. All requests are counted, including those that are rejected.
188
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 60H Umask Value See Table 18-2 and Table 18-3 Event Name BUS_REQUEST_ OUTSTANDING. (Core and Bus Agents) Definition Description and Comment
Outstanding cacheable This event counts the number of pending full cache line read data read bus transactions on the bus occurring in each cycle. A read requests duration transaction is pending from the cycle it is sent on the bus until the full cache line is received by the processor. The event counts only full-line cacheable read requests from either the L1 data cache or the L2 prefetchers. It does not count Read for Ownership transactions, instruction byte fetch transactions, or any other bus transaction.
61H
This event counts the number of Bus Not Ready (BNR) signals that the processor asserts on the bus to suspend additional bus requests by other bus agents. A bus agent asserts the BNR signal when the number of data and snoop transactions is close to the maximum that the bus can handle. To obtain the number of bus cycles during which the BNR signal is asserted, multiply the event count by two. While this signal is asserted, new transactions cannot be submitted on the bus. As a result, transaction latency may have higher impact on program performance.
62H
This event counts the number of bus cycles during which the DRDY (Data Ready) signal is asserted on the bus. The DRDY signal is asserted when data is sent on the bus. With the 'THIS_AGENT' mask this event counts the number of bus cycles during which this agent (the processor) writes data on the bus back to memory or to other bus agents. This includes all explicit and implicit data writebacks, as well as partial writes. With the 'ALL_AGENTS' mask, this event counts the number of bus cycles during which any bus agent sends data on the bus. This includes all data reads and writes on the bus.
63H
This event counts the number of bus cycles, during which the LOCK signal is asserted on the bus. A LOCK signal is asserted when there is a locked memory access, due to: uncacheable memory locked operation that spans two cache lines page-walk from an uncacheable page table Bus locks have a very high performance penalty and it is highly recommended to avoid such accesses.
64H
BUS_DATA_ RCV.(Core)
This event counts the number of bus cycles during which the processor is busy receiving data.
189
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 65H Umask Value See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3. See Table 18-2 and Table 18-3. See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 Event Name BUS_TRANS_BRD.(Core and Bus Agents) Definition Burst read bus transactions Description and Comment This event counts the number of burst read transactions including: L1 data cache read misses (and L1 data cache hardware prefetches) L2 hardware prefetches by the DPL and L2 streamer IFU read misses of cacheable lines. It does not include RFO transactions. BUS_TRANS_RFO.(Core and Bus Agents) RFO bus transactions This event counts the number of Read For Ownership (RFO) bus transactions, due to store operations that miss the L1 data cache and the L2 cache. It also counts RFO bus transactions due to locked operations.
66H
67H
This event counts all explicit writeback bus transactions due to dirty line evictions. It does not count implicit writebacks due to invalidation by a snoop request.
68H
This event counts all instruction fetch full cache line bus transactions.
69H
This event counts all invalidate transactions. Invalidate transactions are generated when: A store operation hits a shared line in the L2 cache. A full cache line write misses the L2 cache or hits a shared line in the L2 cache. This event counts partial write bus transactions.
6AH
6BH
This event counts all (read and write) partial bus transactions.
190
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 6CH Umask Value See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-6 See Table 18-2 and Table 18-7 Event Name BUS_TRANS_IO.(Core and Bus Agents) Definition IO bus transactions Description and Comment This event counts the number of completed I/O bus transactions as a result of IN and OUT instructions. The count does not include memory mapped IO.
6DH
6EH
This event counts burst (full cache line) transactions including: Burst reads RFOs Explicit writebacks Write combine lines
6FH
This event counts all memory bus transactions including: Burst transactions Partial reads and writes - invalidate transactions The BUS_TRANS_MEM count is the sum of BUS_TRANS_BURST, BUS_TRANS_P and BUS_TRANS_IVAL. This event counts all bus transactions. This includes: Memory transactions IO transactions (non memory-mapped) Deferred transaction completion Other less frequent transactions, such as interrupts
70H
77H
External snoops
This event counts the snoop responses to bus transactions. Responses can be counted separately by type and by bus agent. With the 'THIS_AGENT' mask, the event counts snoop responses from this processor to bus transactions sent by this processor. With the 'ALL_AGENTS' mask the event counts all snoop responses seen on the bus.
78H
L1 data cache This event counts the number of times the L1 data cache is snooped by other core snooped for a cache line that is needed by the other core in the same processor. The cache line is either missing in the L1 instruction or data caches of the other core, or is available for reading only and the other core wishes to write the cache line.
191
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num Umask Value Event Name Definition Description and Comment The snoop operation may change the cache line state. If the other core issued a read request that hit this core in E state, typically the state changes to S state in this core. If the other core issued a read for ownership request (due a write miss or hit to S state) that hits this core's cache line in E or S state, this typically results in invalidation of the cache line in this core. If the snoop hits a line in M state, the state is changed at a later opportunity. These snoops are performed through the L1 data cache store port. Therefore, frequent snoops may conflict with extensive stores to the L1 data cache, which may increase store latency and impact performance. 7AH See Table 18-3 See Table 18-3 See Table 18-2 BUS_HIT_DRV. (Bus Agents) BUS_HITM_DRV. (Bus Agents) BUSQ_EMPTY. (Core) Bus queue empty HITM signal asserted HIT signal asserted This event counts the number of bus cycles during which the processor drives the HIT# pin to signal HIT snoop response. This event counts the number of bus cycles during which the processor drives the HITM# pin to signal HITM snoop response. This event counts the number of cycles during which the core did not have any pending transactions in the bus queue. It also counts when the core is halted and the other core is not halted. This event can count occurrences for this core or both cores. 7EH See Table 18-2 and Table 18-3 SNOOP_STALL_ Bus stalled for snoops This event counts the number of times that the bus snoop DRV.(Core and Bus Agents) stall signal is asserted. To obtain the number of bus cycles during which snoops on the bus are prohibited, multiply the event count by two. During the snoop stall cycles, no new bus transactions requiring a snoop response can be initiated on the bus. A bus agent asserts a snoop stall signal if it cannot response to a snoop request within three bus cycles. BUS_IO_WAIT. (Core) IO requests waiting in the bus queue This event counts the number of core cycles during which IO requests wait in the bus queue. With the SELF modifier this event counts IO requests per core. With the BOTH_CORE modifier, this event increments by one for any cycle for which there is a request from either core. 80H 00H L1I_READS Instruction fetches This event counts all instruction fetches, including uncacheable fetches that bypass the Instruction Fetch Unit (IFU). This event counts all instruction fetches that miss the Instruction Fetch Unit (IFU) or produce memory requests. This includes uncacheable fetches. An instruction fetch miss is counted only once and not once for every cycle it is outstanding. 82H 02H ITLB.SMALL_MISS ITLB small page misses This event counts the number of instruction fetches from small pages that miss the ITLB.
7BH
7DH
7FH
81H
00H
L1I_MISSES
192
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 82H 82H Umask Value 10H 40H Event Name ITLB.LARGE_MISS ITLB.FLUSH Definition Description and Comment
ITLB large page misses This event counts the number of instruction fetches from large pages that miss the ITLB. ITLB flushes This event counts the number of ITLB flushes. This usually happens upon CR3 or CR0 writes, which are executed by the operating system during process switches. This event counts the number of instruction fetches from either small or large pages that miss the ITLB. This event counts the number of cycles during which the instruction queue is full. In this situation, the core front-end stops fetching more instructions. This is an indication of very long stalls in the back-end pipeline stages. This event counts the number of cycles for which an instruction fetch stalls, including stalls due to any of the following reasons: instruction Fetch Unit cache misses instruction TLB misses instruction TLB faults
82H 83H
12H 02H
ITLB.MISSES INST_QUEUE.FULL
ITLB misses Cycles during which the instruction queue is full Cycles during which instruction fetches stalled
86H
00H
CYCLES_L1I_ MEM_STALLED
87H
00H
ILD_STALL
This event counts the number of cycles during which the instruction length decoder uses the slow length decoder. Usually, instruction length decoding is done in one cycle. When the slow decoder is used, instruction decoding requires 6 cycles. The slow decoder is used in the following cases: operand override prefix (66H) preceding an instruction with immediate data address override prefix (67H) preceding an instruction with a modr/m in real, big real, 16-bit protected or 32-bit protected modes To avoid instruction length decoding stalls, generate code using imm8 or imm32 values instead of imm16 values. If you must use an imm16 value, store the value in a register using mov reg, imm32 and use the register format of the instruction.
88H
00H
BR_INST_EXEC
This event counts all executed branches (not necessarily retired). This includes only instructions and not micro-op branches. Frequent branching is not necessarily a major performance issue. However frequent branch mispredictions may be a problem.
89H 8AH
00H 00H
This event counts the number of mispredicted branch instructions that were executed. This event counts the number of branch instructions that were mispredicted at decoding.
8BH
00H
Conditional branch This event counts the number of conditional branch instructions executed. instructions executed, but not necessarily retired.
193
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num 8CH Umask Value 00H Event Name BR_CND_ MISSP_EXEC BR_IND_EXEC BR_IND_MISSP _EXEC BR_RET_EXEC BR_RET_ MISSP_EXEC BR_RET_BAC_ MISSP_EXEC BR_CALL_EXEC BR_CALL_ MISSP_EXEC BR_IND_CALL_ EXEC BR_TKN_ BUBBLE_1 Definition Mispredicted conditional branch instructions executed Indirect branch instructions executed Mispredicted indirect branch instructions executed RET instructions executed Mispredicted RET instructions executed Description and Comment This event counts the number of mispredicted conditional branch instructions that were executed. This event counts the number of indirect branch instructions that were executed. This event counts the number of mispredicted indirect branch instructions that were executed. This event counts the number of RET instructions that were executed. This event counts the number of mispredicted RET instructions that were executed.
8DH 8EH
00H 00H
RET instructions This event counts the number of RET instructions that were executed mispredicted executed and were mispredicted at decoding. at decoding CALL instructions executed Mispredicted CALL instructions executed Indirect CALL instructions executed Branch predicted taken with bubble 1 This event counts the number of CALL instructions executed. This event counts the number of mispredicted CALL instructions that were executed. This event counts the number of indirect CALL instructions that were executed. The events BR_TKN_BUBBLE_1 and BR_TKN_BUBBLE_2 together count the number of times a taken branch prediction incurred a one-cycle penalty. The penalty incurs when: Too many taken branches are placed together. To avoid this, unroll loops and add a non-taken branch in the middle of the taken sequence. The branch target is unaligned. To avoid this, align the branch target.
98H
00H
BR_TKN_ BUBBLE_2
The events BR_TKN_BUBBLE_1 and BR_TKN_BUBBLE_2 together count the number of times a taken branch prediction incurred a one-cycle penalty. The penalty incurs when: Too many taken branches are placed together. To avoid this, unroll loops and add a non-taken branch in the middle of the taken sequence. The branch target is unaligned. To avoid this, align the branch target.
A0H
00H
Micro-ops dispatched for execution Cycles micro-ops dispatched for execution on port 0
This event counts the number of micro-ops dispatched for execution. Up to six micro-ops can be dispatched in each cycle. This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Issue Ports are described in Intel 64 and IA-32 Architectures Optimization Reference Manual. Use IA32_PMC0 only.
A1H
01H
194
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num A1H Umask Value 02H Event Name RS_UOPS_ DISPATCHED.PORT1 RS_UOPS_ DISPATCHED.PORT2 RS_UOPS_ DISPATCHED.PORT3 RS_UOPS_ DISPATCHED.PORT4 RS_UOPS_ DISPATCHED.PORT5 MACRO_INSTS. DECODED MACRO_INSTS. CISC_DECODED Definition Cycles micro-ops dispatched for execution on port 1 Cycles micro-ops dispatched for execution on port 2 Cycles micro-ops dispatched for execution on port 3 Cycles micro-ops dispatched for execution on port 4 Cycles micro-ops dispatched for execution on port 5 Instructions decoded CISC Instructions decoded Description and Comment This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Use IA32_PMC0 only. This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Use IA32_PMC0 only. This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Use IA32_PMC0 only. This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Use IA32_PMC0 only. This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Use IA32_PMC0 only. This event counts the number of instructions decoded (but not necessarily executed or retired). This event counts the number of complex instructions decoded. Complex instructions usually have more than four micro-ops. Only one complex instruction can be decoded at a time. This event counts the number of times that the ESP register is explicitly used in the address expression of a load or store operation, after it is implicitly used, for example by a push or a pop instruction. ESP synch micro-op uses resources from the rename pipestage and up to retirement. The expected ratio of this event divided by the number of ESP implicit changes is 0,2. If the ratio is higher, consider rearranging your code to avoid ESP synchronization events. ABH 02H ESP.ADDITIONS ESP register automatic This event counts the number of ESP additions performed additions automatically by the decoder. A high count of this event is good, since each automatic addition performed by the decoder saves a micro-op from the execution units. To maximize the number of ESP additions performed automatically by the decoder, choose instructions that implicitly use the ESP, such as PUSH, POP, CALL, and RET instructions whenever possible. B0H 00H SIMD_UOPS_EXEC SIMD micro-ops executed (excluding stores) SIMD saturated arithmetic micro-ops executed SIMD packed multiply micro-ops executed This event counts all the SIMD micro-ops executed. It does not count MOVQ and MOVD stores from register to memory. This event counts the number of SIMD saturated arithmetic micro-ops executed. This event counts the number of SIMD packed multiply micro-ops executed.
A1H
04H
A1H
08H
A1H
10H
A1H
20H
AAH AAH
01H 08H
ABH
01H
ESP.SYNCH
B1H
00H
B3H
01H
195
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num B3H B3H B3H B3H B3H Umask Value 02H 04H 08H 10H 20H Event Name Definition Description and Comment This event counts the number of SIMD packed shift microops executed. This event counts the number of SIMD pack micro-ops executed. This event counts the number of SIMD unpack micro-ops executed. This event counts the number of SIMD packed logical microops executed. This event counts the number of SIMD packed arithmetic micro-ops executed. This event counts the number of instructions that retire execution. For instructions that consist of multiple microops, this event counts the retirement of the last micro-op of the instruction. The counter continue counting during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY_P is an architectural performance event. C0H C0H C0H 01H 02H 04H INST_RETIRED. LOADS INST_RETIRED. STORES INST_RETIRED. OTHER X87_OPS_ RETIRED.FXCH Instructions retired, which contain a load Instructions retired, which contain a store Instructions retired, with no load or store operation FXCH instructions retired This event counts the number of instructions retired that contain a load operation. This event counts the number of instructions retired that contain a store operation. This event counts the number of instructions retired that do not contain a load or a store operation. This event counts the number of FXCH instructions retired. Modern compilers generate more efficient code and are less likely to use this instruction. If you obtain a high count for this event consider recompiling the code. This event counts the number of floating-point computational operations retired. It counts: floating point computational operations executed by the assist handler sub-operations of complex floating-point instructions like transcendental instructions This event does not count: floating-point computational operations that cause traps or assists. floating-point loads and stores. When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event.
SIMD_UOP_TYPE_EXEC.SHI SIMD packed shift FT micro-ops executed SIMD_UOP_TYPE_EXEC.PA CK SIMD pack micro-ops executed
SIMD_UOP_TYPE_EXEC.UN SIMD unpack microPACK ops executed SIMD_UOP_TYPE_EXEC.LO GICAL SIMD packed logical micro-ops executed
SIMD_UOP_TYPE_EXEC.ARI SIMD packed THMETIC arithmetic micro-ops executed INST_RETIRED. ANY_P Instructions retired
C0H
00H
C1H
01H
C1H
FEH
X87_OPS_ RETIRED.ANY
196
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num C2H Umask Value 01H Event Name UOPS_RETIRED. LD_IND_BR Definition Fused load+op or load+indirect branch retired Description and Comment This event counts the number of retired micro-ops that fused a load with another operation. This includes: Fusion of a load and an arithmetic operation, such as with the following instruction: ADD EAX, [EBX] where the content of the memory location specified by EBX register is loaded, added to EXA register, and the result is stored in EAX. Fusion of a load and a branch in an indirect branch operation, such as with the following instructions: JMP [RDI+200] RET Fusion decreases the number of micro-ops in the processor pipeline. A high value for this event count indicates that the code is using the processor resources effectively. C2H 02H UOPS_RETIRED. STD_STA Fused store address + This event counts the number of store address calculations data retired that are fused with store data emission into one micro-op. Traditionally, each store operation required two micro-ops. This event counts fusion of retired micro-ops only. Fusion decreases the number of micro-ops in the processor pipeline. A high value for this event count indicates that the code is using the processor resources effectively. C2H 04H UOPS_RETIRED. MACRO_FUSION Retired instruction pairs fused into one micro-op This event counts the number of times CMP or TEST instructions were fused with a conditional branch instruction into one micro-op. It counts fusion by retired micro-ops only. Fusion decreases the number of micro-ops in the processor pipeline. A high value for this event count indicates that the code uses the processor resources more effectively. C2H 07H UOPS_RETIRED. FUSED Fused micro-ops retired This event counts the total number of retired fused microops. The counts include the following fusion types: Fusion of load operation with an arithmetic operation or with an indirect branch (counted by event UOPS_RETIRED.LD_IND_BR) Fusion of store address and data (counted by event UOPS_RETIRED.STD_STA) Fusion of CMP or TEST instruction with a conditional branch instruction (counted by event UOPS_RETIRED.MACRO_FUSION) Fusion decreases the number of micro-ops in the processor pipeline. A high value for this event count indicates that the code is using the processor resources effectively. C2H C2H 08H 0FH UOPS_RETIRED. NON_FUSED UOPS_RETIRED. ANY Non-fused micro-ops retired Micro-ops retired This event counts the number of micro-ops retired that were not fused. This event counts the number of micro-ops retired. The processor decodes complex macro instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two micro-ops.
197
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num Umask Value Event Name Definition Description and Comment Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. In some cases micro-op sequences are fused or whole instructions are fused into one micro-op. See other UOPS_RETIRED events for differentiating retired fused and non-fused micro-ops. C3H 01H MACHINE_ NUKES.SMC MACHINE_NUKES.MEM_OR DER Self-Modifying Code detected Execution pipeline restart due to memory ordering conflict or memory disambiguation misprediction This event counts the number of times that a program writes to a code section. Self-modifying code causes a sever penalty in all Intel 64 and IA-32 processors. This event counts the number of times the pipeline is restarted due to either multi-threaded memory ordering conflicts or memory disambiguation misprediction. A multi-threaded memory ordering conflict occurs when a store, which is executed in another core, hits a load that is executed out of order in this core but not yet retired. As a result, the load needs to be restarted to satisfy the memory ordering model. See Chapter 8, Multiple-Processor Management in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A. To count memory disambiguation mispredictions, use the event MEMORY_DISAMBIGUATION.RESET. C4H C4H 00H 01H BR_INST_RETIRED.ANY Retired branch instructions This event counts the number of branch instructions retired. This is an architectural performance event.
C3H
04H
BR_INST_RETIRED.PRED_N Retired branch This event counts the number of branch instructions retired OT_ instructions that were that were correctly predicted to be not-taken. TAKEN predicted not-taken BR_INST_RETIRED.MISPRE D_NOT_ TAKEN Retired branch This event counts the number of branch instructions retired instructions that were that were mispredicted and not-taken. mispredicted nottaken
C4H
02H
C4H
04H
BR_INST_RETIRED.PRED_T Retired branch This event counts the number of branch instructions retired AKEN instructions that were that were correctly predicted to be taken. predicted taken BR_INST_RETIRED.MISPRE D_TAKEN BR_INST_RETIRED.TAKEN BR_INST_RETIRED.MISPRE D Retired branch This event counts the number of branch instructions retired instructions that were that were mispredicted and taken. mispredicted taken Retired taken branch instructions Retired mispredicted branch instructions. (precise event) This event counts the number of branches retired that were taken. This event counts the number of retired branch instructions that were mispredicted by the processor. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa. This is an architectural performance event.
C4H
08H
C4H C5H
0CH 00H
C6H
01H
CYCLES_INT_ MASKED
Cycles during which This event counts the number of cycles during which interrupts are disabled interrupts are disabled.
198
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num C6H Umask Value 02H Event Name CYCLES_INT_ PENDING_AND _MASKED SIMD_INST_ RETIRED.PACKED_SINGLE SIMD_INST_ RETIRED.SCALAR_SINGLE SIMD_INST_ RETIRED.PACKED_DOUBLE SIMD_INST_ RETIRED.SCALAR_DOUBLE SIMD_INST_ RETIRED.VECTOR SIMD_INST_ RETIRED.ANY Definition Description and Comment
Cycles during which This event counts the number of cycles during which there interrupts are pending are pending interrupts but interrupts are disabled. and disabled Retired SSE packedsingle instructions Retired SSE scalarsingle instructions Retired SSE2 packeddouble instructions Retired SSE2 scalardouble instructions Retired SSE2 vector integer instructions Retired Streaming SIMD instructions (precise event) This event counts the number of SSE packed-single instructions retired. This event counts the number of SSE scalar-single instructions retired. This event counts the number of SSE2 packed-double instructions retired. This event counts the number of SSE2 scalar-double instructions retired. This event counts the number of SSE2 vector integer instructions retired. This event counts the overall number of retired SIMD instructions that use XMM registers. To count each type of SIMD instruction separately, use the following events: SIMD_INST_RETIRED.PACKED_SINGLE SIMD_INST_RETIRED.SCALAR_SINGLE SIMD_INST_RETIRED.PACKED_DOUBLE SIMD_INST_RETIRED.SCALAR_DOUBLE and SIMD_INST_RETIRED.VECTOR When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event.
This event counts the number of hardware interrupts received by the processor. This event counts the number of retired instructions that missed the ITLB when they were fetched.
Retired computational This event counts the number of computational SSE packedSSE packed-single single instructions retired. Computational instructions instructions perform arithmetic computations (for example: add, multiply and divide). Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event.
CAH
02H
Retired computational This event counts the number of computational SSE scalarSSE scalar-single single instructions retired. Computational instructions instructions perform arithmetic computations (for example: add, multiply and divide). Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event.
199
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num CAH Umask Value 04H Event Name SIMD_COMP_ INST_RETIRED. PACKED_DOUBLE Definition Description and Comment
Retired computational This event counts the number of computational SSE2 SSE2 packed-double packed-double instructions retired. Computational instructions instructions perform arithmetic computations (for example: add, multiply and divide). Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event.
CAH
08H
SIMD_COMP_INST_RETIRE D.SCALAR_DOUBLE
Retired computational This event counts the number of computational SSE2 scalarSSE2 scalar-double double instructions retired. Computational instructions instructions perform arithmetic computations (for example: add, multiply and divide). Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event.
CBH
01H
Retired loads that miss This event counts the number of retired load operations the L1 data cache that missed the L1 data cache. This includes loads from (precise event) cache lines that are currently being fetched, due to a previous L1 data cache miss to the same cache line. This event counts loads from cacheable memory only. The event does not count loads by software prefetches. When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event. Use IA32_PMC0 only.
CBH
02H
This event counts the number of load operations that miss the L1 data cache and send a request to the L2 cache to fetch the missing cache line. That is the missing cache line fetching has not yet started. The event count is equal to the number of cache lines fetched from the L2 cache by retired loads. This event counts loads from cacheable memory only. The event does not count loads by software prefetches. The event might not be counted if the load is blocked (see LOAD_BLOCK events). When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event. Use IA32_PMC0 only.
200
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num CBH Umask Value 04H Event Name MEM_LOAD_ RETIRED.L2_MISS Definition Description and Comment
Retired loads that miss This event counts the number of retired load operations the L2 cache (precise that missed the L2 cache. event) This event counts loads from cacheable memory only. It does not count loads by software prefetches. When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event. Use IA32_PMC0 only.
CBH
08H
MEM_LOAD_ RETIRED.L2_LINE_MISS
This event counts the number of load operations that miss the L2 cache and result in a bus request to fetch the missing cache line. That is the missing cache line fetching has not yet started. This event count is equal to the number of cache lines fetched from memory by retired loads. This event counts loads from cacheable memory only. The event does not count loads by software prefetches. The event might not be counted if the load is blocked (see LOAD_BLOCK events). When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event. Use IA32_PMC0 only.
CBH
10H
Retired loads that miss This event counts the number of retired loads that missed the DTLB (precise the DTLB. The DTLB miss is not counted if the load event) operation causes a fault. This event counts loads from cacheable memory only. The event does not count loads by software prefetches. When this event is captured with the precise event mechanism, the collected samples contain the address of the instruction that was executed immediately after the instruction that caused the event. Use IA32_PMC0 only.
CCH
01H
FP_MMX_TRANS_TO_MMX Transitions from This event counts the first MMX instructions following a Floating Point to MMX floating-point instruction. Use this event to estimate the Instructions penalties for the transitions between floating-point and MMX states. FP_MMX_TRANS_TO_FP Transitions from MMX Instructions to Floating Point Instructions SIMD assists invoked This event counts the first floating-point instructions following any MMX instruction. Use this event to estimate the penalties for the transitions between floating-point and MMX states. This event counts the number of SIMD assists invoked. SIMD assists are invoked when an EMMS instruction is executed, changing the MMX state in the floating point stack.
CCH
02H
CDH
00H
SIMD_ASSIST
201
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num CEH CFH D2H Umask Value 00H 00H 01H Event Name SIMD_INSTR_ RETIRED Definition SIMD Instructions retired Description and Comment This event counts the number of retired SIMD instructions that use MMX registers. This event counts the number of saturated arithmetic SIMD instructions that retired. This event counts the number of cycles when ROB read port stalls occurred, which did not allow new micro-ops to enter the out-of-order pipeline. Note that, at this stage in the pipeline, additional stalls may occur at the same cycle and prevent the stalled micro-ops from entering the pipe. In such a case, micro-ops retry entering the execution pipe in the next cycle and the ROBread-port stall is counted again. D2H 02H RAT_STALLS. PARTIAL_CYCLES Partial register stall cycles This event counts the number of cycles instruction execution latency became longer than the defined latency because the instruction uses a register that was partially written by previous instructions. This event counts the number of cycles during which execution stalled due to several reasons, one of which is a partial flag register stall. A partial register stall may occur when two conditions are met: an instruction modifies some, but not all, of the flags in the flag register the next instruction, which depends on flags, depends on flags that were not modified by this instruction D2H 08H RAT_STALLS. FPSW FPU status word stall This event indicates that the FPU status word (FPSW) is written. To obtain the number of times the FPSW is written divide the event count by 2. The FPSW is written by instructions with long latency; a small count may indicate a high penalty. D2H 0FH RAT_STALLS. ANY All RAT stall cycles This event counts the number of stall cycles due to conditions described by: D4H 01H SEG_RENAME_ STALLS.ES RAT_STALLS.ROB_READ_PORT RAT_STALLS.PARTIAL RAT_STALLS.FLAGS RAT_STALLS.FPSW.
SIMD_SAT_INSTR_RETIRED Saturated arithmetic instructions retired RAT_STALLS. ROB_READ_PORT ROB read port stalls cycles
D2H
04H
RAT_STALLS. FLAGS
Segment rename stalls This event counts the number of stalls due to the lack of - ES renaming resources for the ES segment register. If a segment is renamed, but not retired and a second update to the same segment occurs, a stall occurs in the front-end of the pipeline until the renamed segment retires. Segment rename stalls This event counts the number of stalls due to the lack of - DS renaming resources for the DS segment register. If a segment is renamed, but not retired and a second update to the same segment occurs, a stall occurs in the front-end of the pipeline until the renamed segment retires.
D4H
02H
SEG_RENAME_ STALLS.DS
202
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num D4H Umask Value 04H Event Name SEG_RENAME_ STALLS.FS Definition Description and Comment
Segment rename stalls This event counts the number of stalls due to the lack of - FS renaming resources for the FS segment register. If a segment is renamed, but not retired and a second update to the same segment occurs, a stall occurs in the front-end of the pipeline until the renamed segment retires.
D4H
08H
SEG_RENAME_ STALLS.GS
Segment rename stalls This event counts the number of stalls due to the lack of - GS renaming resources for the GS segment register. If a segment is renamed, but not retired and a second update to the same segment occurs, a stall occurs in the front-end of the pipeline until the renamed segment retires.
D4H
0FH
SEG_RENAME_ STALLS.ANY
This event counts the number of stalls due to the lack of renaming resources for the ES, DS, FS, and GS segment registers. If a segment is renamed but not retired and a second update to the same segment occurs, a stall occurs in the front-end of the pipeline until the renamed segment retires.
SEG_REG_ RENAMES.ES SEG_REG_ RENAMES.DS SEG_REG_ RENAMES.FS SEG_REG_ RENAMES.GS SEG_REG_ RENAMES.ANY RESOURCE_ STALLS.ROB_FULL
Segment renames - ES This event counts the number of times the ES segment register is renamed. Segment renames - DS This event counts the number of times the DS segment register is renamed. Segment renames - FS This event counts the number of times the FS segment register is renamed. Segment renames - GS This event counts the number of times the GS segment register is renamed. Any (ES/DS/FS/GS) segment rename Cycles during which the ROB full This event counts the number of times any of the four segment registers (ES/DS/FS/GS) is renamed. This event counts the number of cycles when the number of instructions in the pipeline waiting for retirement reaches the limit the processor can handle. A high count for this event indicates that there are long latency operations in the pipe (possibly load and store operations that miss the L2 cache, and other instructions that depend on these cannot execute until the former instructions complete execution). In this situation new instructions can not enter the pipe and start execution.
DCH
02H
RESOURCE_ STALLS.RS_FULL
This event counts the number of cycles when the number of instructions in the pipeline waiting for execution reaches the limit the processor can handle. A high count of this event indicates that there are long latency operations in the pipe (possibly load and store operations that miss the L2 cache, and other instructions that depend on these cannot execute until the former instructions complete execution). In this situation new instructions can not enter the pipe and start execution.
203
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num DCH Umask Value 04 Event Name RESOURCE_ STALLS.LD_ST Definition Description and Comment
Cycles during which This event counts the number of cycles while resourcethe pipeline has related stalls occur due to: exceeded load or store The number of load instructions in the pipeline reached limit or waiting to the limit the processor can handle. The stall ends when a commit all stores loading instruction retires. The number of store instructions in the pipeline reached the limit the processor can handle. The stall ends when a storing instruction commits its data to the cache or memory. There is an instruction in the pipe that can be executed only when all previous stores complete and their data is committed in the caches or memory. For example, the SFENCE and MFENCE instructions require this behavior. Cycles stalled due to FPU control word write Cycles stalled due to branch misprediction This event counts the number of cycles while execution was stalled due to writing the floating-point unit (FPU) control word. This event counts the number of cycles after a branch misprediction is detected at execution until the branch and all older micro-ops retire. During this time new micro-ops cannot enter the out-of-order pipeline.
DCH
08H
DCH
10H
DCH
1FH
RESOURCE_ STALLS.ANY
Resource related stalls This event counts the number of cycles while resourcerelated stalls occurs for any conditions described by the following events: RESOURCE_STALLS.ROB_FULL RESOURCE_STALLS.RS_FULL RESOURCE_STALLS.LD_ST RESOURCE_STALLS.FPCW RESOURCE_STALLS.BR_MISS_CLEAR
E0H E4H
00H 00H
This event counts the number of branch instructions decoded. This event counts the number of byte sequences that were mistakenly detected as taken branch instructions. This results in a BACLEAR event. This occurs mainly after task switches.
E6H
00H
BACLEARS
BACLEARS asserted
This event counts the number of times the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front and. This can occur if the code has many branches such that they cannot be consumed by the BPU. Each BACLEAR asserted costs approximately 7 cycles of instruction fetch. The effect on total execution time depends on the surrounding code.
204
Table 19-17. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.)
Event Num F0 Umask Value 00H Event Name PREF_RQSTS_UP Definition Upward prefetches issued from DPL Description and Comment This event counts the number of upward prefetches issued from the Data Prefetch Logic (DPL) to the L2 cache. A prefetch request issued to the L2 cache cannot be cancelled and the requested cache line is fetched to the L2 cache.
F8
00H
PREF_RQSTS_DN
Downward prefetches This event counts the number of downward prefetches issued from DPL. issued from the Data Prefetch Logic (DPL) to the L2 cache. A prefetch request issued to the L2 cache cannot be cancelled and the requested cache line is fetched to the L2 cache.
...
STORe_FORWARDS.GO Good store forwards OD SEGMENT_REG_ LOADS.ANY Number of segment register loads
07H
06H
07H
08H
PREFETCH.PREFETCHN Streaming SIMD TA Extensions (SSE) Prefetch NTA instructions executed DATA_TLB_MISSES.DT LB_MISS Memory accesses that missed the DTLB
This event counts the number of times the SSE instruction prefetchNTA is executed. This instruction prefetches the data to the L1 data cache. This event counts the number of Data Table Lookaside Buffer (DTLB) misses. The count includes misses detected as a result of speculative accesses. Typically a high count for this event indicates that the code accesses a large number of data pages.
08H
07H
205
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 08H Umask Value 05H Event Name DATA_TLB_MISSES.DT LB_MISS_LD DATA_TLB_MISSES.L0 _DTLB_MISS_LD DATA_TLB_MISSES.DT LB_MISS_ST PAGE_WALKS.WALKS Definition DTLB misses due to load operations Description and Comment This event counts the number of Data Table Lookaside Buffer (DTLB) misses due to load operations. This count includes misses detected as a result of speculative accesses.
08H
09H
L0_DTLB misses due to This event counts the number of L0_DTLB misses due to load load operations operations. This count includes misses detected as a result of speculative accesses. DTLB misses due to store operations Number of page-walks executed This event counts the number of Data Table Lookaside Buffer (DTLB) misses due to store operations. This count includes misses detected as a result of speculative accesses. This event counts the number of page-walks executed due to either a DTLB or ITLB miss. The page walk duration, PAGE_WALKS.CYCLES, divided by number of page walks is the average duration of a page walk. This can hint to whether most of the page-walks are satisfied by the caches or cause an L2 cache miss. Edge trigger bit must be set.
08H
06H
0CH
03H
0CH
03H
PAGE_WALKS.CYCLES
Duration of page-walks This event counts the duration of page-walks in core cycles. The in core cycles paging mode in use typically affects the duration of page walks. Page walk duration divided by number of page walks is the average duration of page-walks. This can hint at whether most of the page-walks are satisfied by the caches or cause an L2 cache miss. Edge trigger bit must be cleared. Floating point computational microops executed Floating point computational microops retired Floating point assists This event counts the number of x87 floating point computational micro-ops executed. This event counts the number of x87 floating point computational micro-ops retired. This event counts the number of floating point operations executed that required micro-code assist intervention. These assists are required in the following cases: X87 instructions: 1. NaN or denormal are loaded to a register or used as input from memory 2. Division by 0 3. Underflow output
10H
01H
10H
81H
11H
01H
11H
81H
FP_ASSIST.AR
This event counts the number of floating point operations executed that required micro-code assist intervention. These assists are required in the following cases: X87 instructions: 1. NaN or denormal are loaded to a register or used as input from memory 2. Division by 0 3. Underflow output
206
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 12H Umask Value 01H Event Name MUL.S Definition Multiply operations executed Multiply operations retired Divide operations executed Divide operations retired Cycles the driver is busy Description and Comment This event counts the number of multiply operations executed. This includes integer as well as floating point multiply operations. This event counts the number of multiply operations retired. This includes integer as well as floating point multiply operations. This event counts the number of divide operations executed. This includes integer divides, floating point divides and squareroot operations executed. This event counts the number of divide operations retired. This includes integer divides, floating point divides and square-root operations executed. This event counts the number of cycles the divider is busy executing divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE. This event counts the number of cycles the L2 address bus is being used for accesses to the L2 cache or bus queue. This event can count occurrences for this core or both cores. L2_DBUS_BUSY Cycles the L2 cache data bus is busy This event counts core cycles during which the L2 cache data bus is busy transferring data from the L2 cache to the core. It counts for all L1 cache misses (data and instruction) that hit the L2 cache. The count will increment by two for a full cache-line request. This event counts the number of cache lines allocated in the L2 cache. Cache lines are allocated in the L2 cache as a result of requests from the L1 data and instruction caches and the L2 hardware prefetchers to cache lines that are missing in the L2 cache. This event can count occurrences for this core or both cores. This event can also count demand requests and L2 hardware prefetch requests together or separately. L2_M_LINES_IN L2 cache line modifications L2 cache lines evicted This event counts whenever a modified cache line is written back from the L1 data cache to the L2 cache. This event can count occurrences for this core or both cores. L2_LINES_OUT This event counts the number of L2 cache lines evicted. This event can count occurrences for this core or both cores. This event can also count evictions due to demand requests and L2 hardware prefetch requests together or separately.
12H
81H
MUL.AR
13H
01H
DIV.S
13H
81H
DIV.AR
14H
01H
CYCLES_DIV_BUSY
21H
L2_ADS
22H
24H
L2_LINES_IN
L2 cache misses
25H
26H
207
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 27H Umask Value See Table 18-2 and Table 18-4 See Table 18-2 and Table 18-5 See Table 18-2, Table 18-4 and Table 18-5 See Table 18-2 and Table 18-5 See Table 18-2 and Table 18-5 See Table 18-2, Table 18-4 and Table 18-5 Event Name L2_M_LINES_OUT Definition Modified lines evicted from the L2 cache Description and Comment This event counts the number of L2 modified cache lines evicted. These lines are written back to memory unless they also exist in a shared-state in one of the L1 data caches. This event can count occurrences for this core or both cores. This event can also count evictions due to demand requests and L2 hardware prefetch requests together or separately. L2_IFETCH L2 cacheable instruction fetch requests This event counts the number of instruction cache line requests from the ICache. It does not include fetch requests from uncacheable memory. It does not include ITLB miss accesses. This event can count occurrences for this core or both cores. This event can also count accesses to cache lines at different MESI states. L2_LD L2 cache reads This event counts L2 cache read requests coming from the L1 data cache and L2 prefetchers. This event can count occurrences for this core or both cores. This event can count occurrences - for this core or both cores. - due to demand requests and L2 hardware prefetch requests together or separately. - of accesses to cache lines at different MESI states. 2AH L2_ST L2 store requests This event counts all store operations that miss the L1 data cache and request the data from the L2 cache. This event can count occurrences for this core or both cores. This event can also count accesses to cache lines at different MESI states. L2_LOCK L2 locked accesses This event counts all locked accesses to cache lines that miss the L1 data cache. This event can count occurrences for this core or both cores. This event can also count accesses to cache lines at different MESI states. L2_RQSTS L2 cache requests This event counts all completed L2 cache requests. This includes L1 data cache reads, writes, and locked accesses, L1 data prefetch requests, instruction fetches, and all L2 hardware prefetch requests. This event can count occurrences - for this core or both cores. - due to demand requests and L2 hardware prefetch requests together, or separately. - of accesses to cache lines at different MESI states.
28H
29H
2BH
2EH
208
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 2EH Umask Value 41H Event Name L2_RQSTS.SELF.DEMA ND.I_STATE Definition Description and Comment
L2 cache demand This event counts all completed L2 cache demand requests requests from this core from this core that miss the L2 cache. This includes L1 data that missed the L2 cache reads, writes, and locked accesses, L1 data prefetch requests, and instruction fetches. This is an architectural performance event. L2 cache demand This event counts all completed L2 cache demand requests requests from this core from this core. This includes L1 data cache reads, writes, and locked accesses, L1 data prefetch requests, and instruction fetches. This is an architectural performance event. Rejected L2 cache requests This event indicates that a pending L2 cache request that requires a bus transaction is delayed from moving to the bus queue. Some of the reasons for this event are: - The bus queue is full. - The bus queue already holds an entry for a cache line in the same set. The number of events is greater or equal to the number of requests that were rejected. - for this core or both cores. - due to demand requests and L2 hardware prefetch requests together, or separately. - of accesses to cache lines at different MESI states.
2EH
4FH
L2_RQSTS.SELF.DEMA ND.MESI
30H
L2_REJECT_BUSQ
32H
L2_NO_REQ
Cycles no L2 cache requests are pending Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions
This event counts the number of cycles that no L2 cache requests are pending. This event counts the number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions that include a frequency change, either with or without VID change. This event is incremented only while the counting core is in C0 state. In situations where an EIST transition was caused by hardware as a result of CxE state transitions, those EIST transitions will also be registered in this event. Enhanced Intel Speedstep Technology transitions are commonly initiated by OS, but can be initiated by HW internally. For example: CxE states are C-states (C1,C2,C3) which not only place the CPU into a sleep state by turning off the clock and other components, but also lower the voltage (which reduces the leakage power consumption). The same is true for thermal throttling transition which uses Enhanced Intel Speedstep Technology internally.
3AH
EIST_TRANS
209
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 3BH Umask Value C0H Event Name THERMAL_TRIP Definition Number of thermal trips Description and Comment This event counts the number of thermal trips. A thermal trip occurs whenever the processor temperature exceeds the thermal trip threshold temperature. Following a thermal trip, the processor automatically reduces frequency and voltage. The processor checks the temperature every millisecond, and returns to normal when the temperature falls below the thermal trip threshold temperature. This event counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency. -This is an architectural performance event. - The event CPU_CLK_UNHALTED.CORE_P is counted by a programmable counter. - The event CPU_CLK_UNHALTED.CORE is counted by a designated fixed counter, leaving the two programmable counters available for other events. 3CH 01H CPU_CLK_UNHALTED.B Bus cycles when core is This event counts the number of bus cycles while the core is not US not halted in the halt state. This event can give you a measurement of the elapsed time while the core was not in the halt state, by dividing the event count by the bus frequency. The core enters the halt state when it is running the HLT instruction. The event also has a constant ratio with CPU_CLK_UNHALTED.REF event, which is the maximum bus to processor frequency ratio. Non-halted bus cycles are a component in many key event ratios. 3CH 02H CPU_CLK_UNHALTED. NO_OTHER Bus cycles when core is This event counts the number of bus cycles during which the active and the other is core remains non-halted, and the other core on the processor is halted halted. This event can be used to determine the amount of parallelism exploited by an application or a system. Divide this event count by the bus frequency to determine the amount of time that only one core was in use. 40H 40H 21H 22H L1D_CACHE.LD L1D_CACHE.ST L1 Cacheable Data Reads L1 Cacheable Data Writes This event counts the number of data reads from cacheable memory. This event counts the number of data writes to cacheable memory.
3CH
00H
210
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 60H Umask Value See Table 18-2 and Table 18-3 See Table 18-3 Event Name Definition Description and Comment
BUS_REQUEST_OUTST Outstanding cacheable This event counts the number of pending full cache line read ANDING data read bus requests transactions on the bus occurring in each cycle. A read duration transaction is pending from the cycle it is sent on the bus until the full cache line is received by the processor. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled. BUS_BNR_DRV Number of Bus Not This event counts the number of Bus Not Ready (BNR) signals Ready signals asserted that the processor asserts on the bus to suspend additional bus requests by other bus agents. A bus agent asserts the BNR signal when the number of data and snoop transactions is close to the maximum that the bus can handle. While this signal is asserted, new transactions cannot be submitted on the bus. As a result, transaction latency may have higher impact on program performance. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled.
61H
62H
BUS_DRDY_CLOCKS
This event counts the number of bus cycles during which the DRDY (Data Ready) signal is asserted on the bus. The DRDY signal is asserted when data is sent on the bus. This event counts the number of bus cycles during which this agent (the processor) writes data on the bus back to memory or to other bus agents. This includes all explicit and implicit data writebacks, as well as partial writes. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled.
63H
BUS_LOCK_CLOCKS
Bus cycles when a This event counts the number of bus cycles, during which the LOCK signal is asserted. LOCK signal is asserted on the bus. A LOCK signal is asserted when there is a locked memory access, due to: - Uncacheable memory - Locked operation that spans two cache lines - Page-walk from an uncacheable page table. Bus locks have a very high performance penalty and it is highly recommended to avoid such accesses. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled.
64H
BUS_DATA_RCV
Bus cycles while processor receives data Burst read bus transactions
This event counts the number of cycles during which the processor is busy receiving data. NOTE: This event is threadindependent and will not provide a count per logical processor when AnyThr is disabled. This event counts the number of burst read transactions including: - L1 data cache read misses (and L1 data cache hardware prefetches) - L2 hardware prefetches by the DPL and L2 streamer - IFU read misses of cacheable lines. It does not include RFO transactions.
65H
BUS_TRANS_BRD
211
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 66H Umask Value See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 Event Name BUS_TRANS_RFO Definition RFO bus transactions Description and Comment This event counts the number of Read For Ownership (RFO) bus transactions, due to store operations that miss the L1 data cache and the L2 cache. This event also counts RFO bus transactions due to locked operations.
67H
BUS_TRANS_WB
This event counts all explicit writeback bus transactions due to dirty line evictions. It does not count implicit writebacks due to invalidation by a snoop request.
68H
BUS_TRANS_IFETCH
This event counts all instruction fetch full cache line bus transactions.
69H
BUS_TRANS_INVAL
This event counts all invalidate transactions. Invalidate transactions are generated when: - A store operation hits a shared line in the L2 cache. - A full cache line write misses the L2 cache or hits a shared line in the L2 cache.
6AH
BUS_TRANS_PWR
6BH
BUS_TRANS_P
This event counts all (read and write) partial bus transactions.
6CH
BUS_TRANS_IO
IO bus transactions
This event counts the number of completed I/O bus transactions as a result of IN and OUT instructions. The count does not include memory mapped IO.
6DH
BUS_TRANS_DEF
212
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 6EH Umask Value See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-3 See Table 18-2 and Table 18-5 See Table 18-3 See Table 18-3 See Table 18-2 See Table 18-2 and Table 18-3 See Table 18-2 Event Name BUS_TRANS_BURST Definition Burst (full cache-line) bus transactions. Description and Comment This event counts burst (full cache line) transactions including: - Burst reads - RFOs - Explicit writebacks - Write combine lines BUS_TRANS_MEM Memory bus transactions This event counts all memory bus transactions including: - burst transactions - partial reads and writes - invalidate transactions The BUS_TRANS_MEM count is the sum of BUS_TRANS_BURST, BUS_TRANS_P and BUS_TRANS_INVAL. BUS_TRANS_ANY All bus transactions This event counts all bus transactions. This includes: - Memory transactions - IO transactions (non memory-mapped) - Deferred transaction completion - Other less frequent transactions, such as interrupts EXT_SNOOP External snoops This event counts the snoop responses to bus transactions. Responses can be counted separately by type and by bus agent. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled.
6FH
70H
77H
7AH
BUS_HIT_DRV
This event counts the number of bus cycles during which the processor drives the HIT# pin to signal HIT snoop response. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled. This event counts the number of bus cycles during which the processor drives the HITM# pin to signal HITM snoop response. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled. This event counts the number of cycles during which the core did not have any pending transactions in the bus queue. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled.
7BH
BUS_HITM_DRV
7DH
BUSQ_EMPTY
7EH
SNOOP_STALL_DRV
This event counts the number of times that the bus snoop stall signal is asserted. During the snoop stall cycles no new bus transactions requiring a snoop response can be initiated on the bus. NOTE: This event is thread-independent and will not provide a count per logical processor when AnyThr is disabled. This event counts the number of core cycles during which IO requests wait in the bus queue. This event counts IO requests from the core.
7FH
BUS_IO_WAIT
213
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. 80H 80H Umask Value 03H 02H Event Name ICACHE.ACCESSES ICACHE.MISSES Definition Instruction fetches Icache miss Description and Comment This event counts all instruction fetches, including uncacheable fetches. This event counts all instruction fetches that miss the Instruction cache or produce memory requests. This includes uncacheable fetches. An instruction fetch miss is counted only once and not once for every cycle it is outstanding. This event counts the number of ITLB flushes. This event counts the number of instruction fetches that miss the ITLB.
ITLB.FLUSH ITLB.MISSES
MACRO_INSTS.CISC_DE CISC macro instructions This event counts the number of complex instructions decoded, CODED decoded but not necessarily executed or retired. Only one complex instruction can be decoded at a time. MACRO_INSTS.ALL_DE All Instructions CODED decoded SIMD_UOPS_EXEC.S SIMD micro-ops executed (excluding stores) This event counts the number of instructions decoded. This event counts all the SIMD micro-ops executed. This event does not count MOVQ and MOVD stores from register to memory.
AAH B0H
03H 00H
B0H B1H
80H 00H
SIMD_UOPS_EXEC.AR
SIMD micro-ops retired This event counts the number of SIMD saturated arithmetic (excluding stores) micro-ops executed. This event counts the number of SIMD saturated arithmetic micro-ops executed. This event counts the number of SIMD saturated arithmetic micro-ops retired. This event counts the number of SIMD packed multiply microops executed. This event counts the number of SIMD packed multiply microops retired. This event counts the number of SIMD packed shift micro-ops executed. This event counts the number of SIMD packed shift micro-ops retired. This event counts the number of SIMD pack micro-ops executed. This event counts the number of SIMD pack micro-ops retired.
SIMD_SAT_UOP_EXEC. SIMD saturated S arithmetic micro-ops executed SIMD_SAT_UOP_EXEC. SIMD saturated AR arithmetic micro-ops retired SIMD_UOP_TYPE_EXE C.MUL.S SIMD_UOP_TYPE_EXE C.MUL.AR SIMD_UOP_TYPE_EXE C.SHIFT.S SIMD_UOP_TYPE_EXE C.SHIFT.AR SIMD_UOP_TYPE_EXE C.PACK.S SIMD_UOP_TYPE_EXE C.PACK.AR SIMD_UOP_TYPE_EXE C.UNPACK.S SIMD_UOP_TYPE_EXE C.UNPACK.AR SIMD_UOP_TYPE_EXE C.LOGICAL.S SIMD packed multiply micro-ops executed SIMD packed multiply micro-ops retired SIMD packed shift micro-ops executed SIMD packed shift micro-ops retired SIMD pack micro-ops executed SIMD pack micro-ops retired
B1H
80H
SIMD unpack micro-ops This event counts the number of SIMD unpack micro-ops executed executed. SIMD unpack micro-ops This event counts the number of SIMD unpack micro-ops retired. retired SIMD packed logical micro-ops executed This event counts the number of SIMD packed logical micro-ops executed.
214
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. B3H B3H B3H C0H Umask Value 90H 20H A0H 00H Event Name SIMD_UOP_TYPE_EXE C.LOGICAL.AR SIMD_UOP_TYPE_EXE C.ARITHMETIC.S SIMD_UOP_TYPE_EXE C.ARITHMETIC.AR INST_RETIRED.ANY_P Definition SIMD packed logical micro-ops retired Description and Comment This event counts the number of SIMD packed logical micro-ops retired.
SIMD packed arithmetic This event counts the number of SIMD packed arithmetic micromicro-ops executed ops executed. SIMD packed arithmetic This event counts the number of SIMD packed arithmetic micromicro-ops retired ops retired. Instructions retired (precise event). This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers. This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers. This event counts the number of micro-ops retired. The processor decodes complex macro instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. In some cases micro-op sequences are fused or whole instructions are fused into one micro-op. See other UOPS_RETIRED events for differentiating retired fused and non-fused micro-ops. This event counts the number of times that a program writes to a code section. Self-modifying code causes a severe penalty in all Intel architecture processors. This event counts the number of branch instructions retired. This is an architectural performance event. This event counts the number of branch instructions retired that were correctly predicted to be not-taken.
N/A
00H
INST_RETIRED.ANY
Instructions retired
C2H
10H
UOPS_RETIRED.ANY
Micro-ops retired
C3H
01H
C4H C4H
00H 01H
C4H
02H
BR_INST_RETIRED.MIS Retired branch This event counts the number of branch instructions retired PRED_NOT_TAKEN instructions that were that were mispredicted and not-taken. mispredicted not-taken BR_INST_RETIRED.PRE Retired branch D_TAKEN instructions that were predicted taken BR_INST_RETIRED.MIS Retired branch PRED_TAKEN instructions that were mispredicted taken This event counts the number of branch instructions retired that were correctly predicted to be taken. This event counts the number of branch instructions retired that were mispredicted and taken.
C4H
04H
C4H
08H
215
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. C4H Umask Value 0AH Event Name Definition Description and Comment This event counts the number of retired branch instructions that were mispredicted by the processor. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa. Mispredicted branches degrade the performance because the processor starts executing instructions along a wrong path it predicts. When the misprediction is discovered, all the instructions executed in the wrong path must be discarded, and the processor must start again on the correct path. Using the Profile-Guided Optimization (PGO) features of the Intel C++ compiler may help reduce branch mispredictions. See the compiler documentation for more information on this feature. To determine the branch misprediction ratio, divide the BR_INST_RETIRED.MISPRED event count by the number of BR_INST_RETIRED.ANY event count. To determine the number of mispredicted branches per instruction, divide the number of mispredicted branches by the INST_RETIRED.ANY event count. To measure the impact of the branch mispredictions use the event RESOURCE_STALLS.BR_MISS_CLEAR. Tips: - See the optimization guide for tips on reducing branch mispredictions. - PGO's purpose is to have straight line code for the most frequent execution paths, reducing branches taken and increasing the "basic block" size, possibly also reducing the code footprint or working-set. C4H C4H 0CH 0FH BR_INST_RETIRED.TAK Retired taken branch EN instructions BR_INST_RETIRED.AN Y1 Retired branch instructions This event counts the number of branches retired that were taken. This event counts the number of branch instructions retired that were mispredicted. This event is a duplicate of BR_INST_RETIRED.MISPRED. This event counts the number of retired branch instructions that were mispredicted by the processor. A branch misprediction occurs when the processor predicts that the branch would be taken, but it is not, or vice-versa. Mispredicted branches degrade the performance because the processor starts executing instructions along a wrong path it predicts. When the misprediction is discovered, all the instructions executed in the wrong path must be discarded, and the processor must start again on the correct path.
C5H
00H
216
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. Umask Value Event Name Definition Description and Comment Using the Profile-Guided Optimization (PGO) features of the Intel C++ compiler may help reduce branch mispredictions. See the compiler documentation for more information on this feature. To determine the branch misprediction ratio, divide the BR_INST_RETIRED.MISPRED event count by the number of BR_INST_RETIRED.ANY event count. To determine the number of mispredicted branches per instruction, divide the number of mispredicted branches by the INST_RETIRED.ANY event count. To measure the impact of the branch mispredictions use the event RESOURCE_STALLS.BR_MISS_CLEAR. Tips: - See the optimization guide for tips on reducing branch mispredictions. - PGO's purpose is to have straight line code for the most frequent execution paths, reducing branches taken and increasing the "basic block" size, possibly also reducing the code footprint or working-set. C6H C6H 01H 02H CYCLES_INT_MASKED. CYCLES_INT_MASKED CYCLES_INT_MASKED. CYCLES_INT_PENDING _AND_MASKED Cycles during which This event counts the number of cycles during which interrupts interrupts are disabled are disabled. Cycles during which interrupts are pending and disabled This event counts the number of cycles during which there are pending interrupts but interrupts are disabled. This event counts the number of SSE packed-single instructions retired.
C7H
01H
SIMD_INST_RETIRED.P Retired Streaming ACKED_SINGLE SIMD Extensions (SSE) packed-single instructions SIMD_INST_RETIRED.S Retired Streaming CALAR_SINGLE SIMD Extensions (SSE) scalar-single instructions SIMD_INST_RETIRED.P Retired Streaming ACKED_DOUBLE SIMD Extensions 2 (SSE2) packed-double instructions SIMD_INST_RETIRED.S Retired Streaming CALAR_DOUBLE SIMD Extensions 2 (SSE2) scalar-double instructions. SIMD_INST_RETIRED.V Retired Streaming ECTOR SIMD Extensions 2 (SSE2) vector instructions.
C7H
02H
C7H
04H
C7H
08H
C7H
10H
217
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. C7H Umask Value 1FH Event Name Definition Description and Comment This event counts the overall number of SIMD instructions retired. To count each type of SIMD instruction separately, use the following events: SIMD_INST_RETIRED.PACKED_SINGLE, SIMD_INST_RETIRED.SCALAR_SINGLE, SIMD_INST_RETIRED.PACKED_DOUBLE, SIMD_INST_RETIRED.SCALAR_DOUBLE, and SIMD_INST_RETIRED.VECTOR. C8H 00H HW_INT_RCV Hardware interrupts received This event counts the number of hardware interrupts received by the processor. This event will count twice for dual-pipe micro-ops. This event counts the number of computational SSE packedsingle instructions retired. Computational instructions perform arithmetic computations, like add, multiply and divide. Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event. This event counts the number of computational SSE scalarsingle instructions retired. Computational instructions perform arithmetic computations, like add, multiply and divide. Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event. This event counts the number of computational SSE2 packeddouble instructions retired. Computational instructions perform arithmetic computations, like add, multiply and divide. Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event. This event counts the number of computational SSE2 scalardouble instructions retired. Computational instructions perform arithmetic computations, like add, multiply and divide. Instructions that perform load and store operations or logical operations, like XOR, OR, and AND are not counted by this event. This event counts the number of retired load operations that missed the L1 data cache and hit the L2 cache.
CAH
01H
SIMD_COMP_INST_RET Retired computational IRED.PACKED_SINGLE Streaming SIMD Extensions (SSE) packed-single instructions. SIMD_COMP_INST_RET Retired computational IRED.SCALAR_SINGLE Streaming SIMD Extensions (SSE) scalar-single instructions. SIMD_COMP_INST_RET Retired computational IRED.PACKED_DOUBLE Streaming SIMD Extensions 2 (SSE2) packed-double instructions. SIMD_COMP_INST_RET Retired computational IRED.SCALAR_DOUBLE Streaming SIMD Extensions 2 (SSE2) scalar-double instructions MEM_LOAD_RETIRED.L Retired loads that hit 2_HIT the L2 cache (precise event)
CAH
02H
CAH
04H
CAH
08H
CBH
01H
CBH
02H
MEM_LOAD_RETIRED.L Retired loads that miss This event counts the number of retired load operations that 2_MISS the L2 cache (precise missed the L2 cache. event) MEM_LOAD_RETIRED.D Retired loads that miss This event counts the number of retired loads that missed the TLB_MISS the DTLB (precise DTLB. The DTLB miss is not counted if the load operation causes event) a fault.
CBH
04H
218
Table 19-18. Non-Architectural Performance Events for Intel Atom Processors (Contd.)
Event Num. CDH Umask Value 00H Event Name SIMD_ASSIST Definition SIMD assists invoked Description and Comment This event counts the number of SIMD assists invoked. SIMD assists are invoked when an EMMS instruction is executed after MMX technology code has changed the MMX state in the floating point stack. For example, these assists are required in the following cases: Streaming SIMD Extensions (SSE) instructions: 1. Denormal input when the DAZ (Denormals Are Zeros) flag is off 2. Underflow result when the FTZ (Flush To Zero) flag is off CEH CFH E0H E4H 00H 00H 01H 01H SIMD_INSTR_RETIRED SIMD Instructions retired This event counts the number of SIMD instructions that retired. This event counts the number of saturated arithmetic SIMD instructions that retired. This event counts the number of branch instructions decoded. This event counts the number of byte sequences that were mistakenly detected as taken branch instructions. This results in a BACLEAR event and the BTB is flushed. This occurs mainly after task switches. This event counts the number of times the front end is redirected for a branch prediction, mainly when an early branch prediction is corrected by other branch handling mechanisms in the front-end. This can occur if the code has many branches such that they cannot be consumed by the branch predictor. Each Baclear asserted costs approximately 7 cycles. The effect on total execution time depends on the surrounding code.
SIMD_SAT_INSTR_RETI Saturated arithmetic RED instructions retired BR_INST_DECODED BOGUS_BR Branch instructions decoded Bogus branches
E6H
01H
BACLEARS.ANY
BACLEARS asserted
...
24.1
OVERVIEW
A logical processor uses virtual-machine control data structures (VMCSs) while it is in VMX operation. These manage transitions into and out of VMX non-root operation (VM entries and VM exits) as well as processor behavior in VMX non-root operation. This structure is manipulated by the new instructions VMCLEAR, VMPTRLD, VMREAD, and VMWRITE. A VMM can use a different VMCS for each virtual machine that it supports. For a virtual machine with multiple logical processors (virtual processors), the VMM can use a different VMCS for each virtual processor. A logical processor associates a region in memory with each VMCS. This region is called the VMCS region.1 Software references a specific VMCS using the 64-bit physical address of the region (a VMCS pointer). VMCS
219
pointers must be aligned on a 4-KByte boundary (bits 11:0 must be zero). These pointers must not set bits beyond the processors physical-address width.1,2 A logical processor may maintain a number of VMCSs that are active. The processor may optimize VMX operation by maintaining the state of an active VMCS in memory, on the processor, or both. At any given time, at most one of the active VMCSs is the current VMCS. (This document frequently uses the term the VMCS to refer to the current VMCS.) The VMLAUNCH, VMREAD, VMRESUME, and VMWRITE instructions operate only on the current VMCS. The following items describe how a logical processor determines which VMCSs are active and which is current: The memory operand of the VMPTRLD instruction is the address of a VMCS. After execution of the instruction, that VMCS is both active and current on the logical processor. Any other VMCS that had been active remains so, but no other VMCS is current. The VMCS link pointer field in the current VMCS (see Section 24.4.2) is itself the address of a VMCS. If VM entry is performed successfully with the 1-setting of the VMCS shadowing VM-execution control, the VMCS referenced by the VMCS link pointer field becomes active on the logical processor. The identity of the current VMCS does not change. The memory operand of the VMCLEAR instruction is also the address of a VMCS. After execution of the instruction, that VMCS is neither active nor current on the logical processor. If the VMCS had been current on the logical processor, the logical processor no longer has a current VMCS.
The VMPTRST instruction stores the address of the logical processors current VMCS into a specified memory location (it stores the value FFFFFFFF_FFFFFFFFH if there is no current VMCS). The launch state of a VMCS determines which VM-entry instruction should be used with that VMCS: the VMLAUNCH instruction requires a VMCS whose launch state is clear; the VMRESUME instruction requires a VMCS whose launch state is launched. A logical processor maintains a VMCSs launch state in the corresponding VMCS region. The following items describe how a logical processor manages the launch state of a VMCS: If the launch state of the current VMCS is clear, successful execution of the VMLAUNCH instruction changes the launch state to launched. The memory operand of the VMCLEAR instruction is the address of a VMCS. After execution of the instruction, the launch state of that VMCS is clear. There are no other ways to modify the launch state of a VMCS (it cannot be modified using VMWRITE) and there is no direct way to discover it (it cannot be read using VMREAD).
Figure 24-1 illustrates the different states of a VMCS. It uses X to refer to the VMCS and Y to refer to any other VMCS. Thus: VMPTRLD X always makes X current and active; VMPTRLD Y always makes X not current (because it makes Y current); VMLAUNCH makes the launch state of X launched if X was current and its launch state was clear; and VMCLEAR X always makes X inactive and not current and makes its launch state clear. The figure does not illustrate operations that do not modify the VMCS state relative to these parameters (e.g., execution of VMPTRLD X when X is already current). Note that VMCLEAR X makes X inactive, not current, and clear, even if Xs current state is not defined (e.g., even if X has not yet been initialized). See Section 24.11.3. Because a shadow VMCS (see Section 24.10) cannot be used for VM entry, the launch state of a shadow VMCS is not meaningful. Figure 24-1 does not illustrate all the ways in which a shadow VMCS may be made active.
1. The amount of memory required for a VMCS region is at most 4 KBytes. The exact size is implementation specific and can be determined by consulting the VMX capability MSR IA32_VMX_BASIC to determine the size of the VMCS region (see Appendix A.1). 1. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX. 2. If IA32_VMX_BASIC[48] is read as 1, these pointers must not set any bits in the range 63:32; see Appendix A.1.
220
VMCLEAR X
VMCLEAR X
V VM MP CL TR EA LD R X X
VMCLEAR X
VMPTRLD X
VMPTRLD Y
VMPTRLD X
VMPTRLD Y
R A LE C VM
Anything Else
VMLAUNCH
24.2
A VMCS region comprises up to 4-KBytes.1 The format of a VMCS region is given in Table 24-1.
Byte Offset 0 4 8
Contents Bits 30:0: VMCS revision identifier Bit 31: shadow-VMCS indicator (see Section 24.10) VMX-abort indicator VMCS data (implementation-specific format)
The first 4 bytes of the VMCS region contain the VMCS revision identifier at bits 30:0.2 Processors that maintain VMCS data in different formats (see below) use different VMCS revision identifiers. These identifiers enable software to avoid using a VMCS region formatted for one processor on a processor that uses a different format.3 Bit 31 of this 4-byte region indicates whether the VMCS is a shadow VMCS (see Section 24.10). Software should write the VMCS revision identifier to the VMCS region before using that region for a VMCS. The VMCS revision identifier is never written by the processor; VMPTRLD fails if its operand references a VMCS region whose VMCS revision identifier differs from that used by the processor. (VMPTRLD also fails if the shadow-VMCS 1. The exact size is implementation specific and can be determined by consulting the VMX capability MSR IA32_VMX_BASIC to determine the size of the VMCS region (see Appendix A.1). 2. Earlier versions of this manual specified that the VMCS revision identifier was a 32-bit field. For all processors produced prior to this change, bit 31 of the VMCS revision identifier was 0. 3. Logical processors that use the same VMCS revision identifier use the same size for VMCS regions.
221
indicator is 1 and the processor does not support the 1-setting of the VMCS shadowing VM-execution control; see Section 24.6.2) Software can discover the VMCS revision identifier that a processor uses by reading the VMX capability MSR IA32_VMX_BASIC (see Appendix A.1). Software should clear or set the shadow-VMCS indicator depending on whether the VMCS is to be an ordinary VMCS or a shadow VMCS (see Section 24.10). VMPTRLD fails if the shadow-VMCS indicator is set and the processor does not support the 1-setting of the VMCS shadowing VM-execution control. Software can support for this setting by reading the VMX capability MSR IA32_VMX_PROCBASED_CTLS2 (see Appendix A.3.3). The next 4 bytes of the VMCS region are used for the VMX-abort indicator. The contents of these bits do not control processor operation in any way. A logical processor writes a non-zero value into these bits if a VMX abort occurs (see Section 27.7). Software may also write into this field. The remainder of the VMCS region is used for VMCS data (those parts of the VMCS that control VMX non-root operation and the VMX transitions). The format of these data is implementation-specific. VMCS data are discussed in Section 24.3 through Section 24.9. To ensure proper behavior in VMX operation, software should maintain the VMCS region and related structures (enumerated in Section 24.11.4) in writeback cacheable memory. Future implementations may allow or require a different memory type1. Software should consult the VMX capability MSR IA32_VMX_BASIC (see Appendix A.1).
24.3
The VMCS data are organized into six logical groups: Guest-state area. Processor state is saved into the guest-state area on VM exits and loaded from there on VM entries. Host-state area. Processor state is loaded from the host-state area on VM exits. VM-execution control fields. These fields control processor behavior in VMX non-root operation. They determine in part the causes of VM exits. VM-exit control fields. These fields control VM exits. VM-entry control fields. These fields control VM entries. VM-exit information fields. These fields receive information on VM exits and describe the cause and the nature of VM exits. On some processors, these fields are read-only.2
The VM-execution control fields, the VM-exit control fields, and the VM-entry control fields are sometimes referred to collectively as VMX controls. ...
1. Alternatively, software may map any of these regions or structures with the UC memory type. Doing so is strongly discouraged unless necessary as it will cause the performance of transitions using those structures to suffer significantly. In addition, the processor will continue to use the memory type reported in the VMX capability MSR IA32_VMX_BASIC with exceptions noted in Appendix A.1. 2. Software can discover whether these fields can be written by reading the VMX capability MSR IA32_VMX_MISC (see Appendix A.6).
222
Table 24-7 lists the secondary processor-based VM-execution controls. See Chapter 25 for more details of how these controls affect processor behavior in VMX non-root operation.
PAUSE-loop exiting This control determines whether a series of executions of PAUSE can cause a VM exit (see Section 24.6.13 and Section 25.1.3). RDRAND exiting Enable INVPCID Enable VM functions VMCS shadowing EPT-violation #VE This control determines whether executions of RDRAND cause VM exits. If this control is 0, any execution of INVPCID causes an invalid-opcode exception (#UD). Setting this control to 1 enables use of the VMFUNC instruction in VMX non-root operation. See Section 25.5.5. If this control is 1, executions of VMREAD and VMWRITE in VMX non-root operation may access a shadow VMCS (instead of causing VM exits). See Section 24.10 and Section 30.3. If this control is 1, EPT violations may cause virtualization exceptions (#VE) instead of VM exits. See Section 25.5.6.
...
223
...
24.9
The VMCS contains a section of fields that contain information about the most recent VM exit. On some processors, attempts to write to these fields with VMWRITE fail (see VMWRITEWrite Field to VirtualMachine Control Structure in Chapter 30 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C).1 ...
24.10
Every VMCS is either an ordinary VMCS or a shadow VMCS. A VMCSs type is determined by the shadow-VMCS indicator in the VMCS region (this is the value of bit 31 of the first 4 bytes of the VMCS region; see Table 24-1): 0 indicates an ordinary VMCS, while 1 indicates a shadow VMCS. Shadow VMCSs are supported only on processors that support the 1-setting of the VMCS shadowing VM-execution control (see Section 24.6.2). A shadow VMCS differs from an ordinary VMCS in two ways: An ordinary VMCS can be used for VM entry but a shadow VMCS cannot. Attempts to perform VM entry when the current VMCS is a shadow VMCS fail (see Section 26.1). The VMREAD and VMWRITE instructions can be used in VMX non-root operation to access a shadow VMCS but not an ordinary VMCS. This fact results from the following: If the VMCS shadowing VM-execution control is 0, execution of the VMREAD and VMWRITE instructions in VMX non-root operation always cause VM exits (see Section 25.1.3). If the VMCS shadowing VM-execution control is 1, execution of the VMREAD and VMWRITE instructions in VMX non-root operation can access the VMCS referenced by the VMCS link pointer (see Section 30.3). If the VMCS shadowing VM-execution control is 1, VM entry ensures that any VMCS referenced by the VMCS link pointer is a shadow VMCS (see Section 26.3.1.5). In VMX root operation, both types of VMCSs can be accessed with the VMREAD and VMWRITE instructions. Software should not modify the shadow-VMCS indicator in the VMCS region of a VMCS that is active. Doing so may cause the VMCS to become corrupted (see Section 24.11.1). Before modifying the shadow-VMCS indicator, software should execute VMCLEAR for the VMCS to ensure that it is not active. 1. Software can discover whether these fields can be written by reading the VMX capability MSR IA32_VMX_MISC (see Appendix A.6).
224
...
(Software can avoid these hazards by removing any linear-address mappings to a VMCS region before executing a VMPTRLD for that region and by not remapping it until after executing VMCLEAR for that region.) If a logical processor leaves VMX operation, any VMCSs active on that logical processor may be corrupted (see below). To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor, software should execute VMCLEAR for that VMCS before executing the VMXOFF instruction or removing power from the processor (e.g., as part of a transition to the S3 and S4 power states). This section has identified operations that may cause a VMCS to become corrupted. These operations may cause the VMCSs data to become undefined. Behavior may be unpredictable if that VMCS used subsequently on any logical processor. The following items detail some hazards of VMCS corruption: ... VM entries may fail for unexplained reasons or may load undesired processor state. The processor may not correctly support VMX non-root operation as documented in Chapter 25 and may generate unexpected VM exits. VM exits may load undesired processor state, save incorrect state into the VMCS, or cause the logical processor to transition to a shutdown state.
225
given, in 64-bit mode, an operand that sets an encoding bit beyond bit 32. See Chapter 30 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C, for a description of these instructions. The structure of the 32-bit encodings of the VMCS components is determined principally by the width of the fields and their function in the VMCS. See Table 24-17.
The following items detail the meaning of the bits in each encoding: Field width. Bits 14:13 encode the width of the field. A value of 0 indicates a 16-bit field. A value of 1 indicates a 64-bit field. A value of 2 indicates a 32-bit field. A value of 3 indicates a natural-width field. Such fields have 64 bits on processors that support Intel 64 architecture and 32 bits on processors that do not. Fields whose encodings use value 1 are specially treated to allow 32-bit software access to all 64 bits of the field. Such access is allowed by defining, for each such field, an encoding that allows direct access to the high 32 bits of the field. See below. Field type. Bits 11:10 encode the type of VMCS field: control, guest-state, host-state, or VM-exit information. (The last category also includes the VM-instruction error field.) Index. Bits 9:1 distinguish components with the same field width and type. Access type. Bit 0 must be 0 for all fields except for 64-bit fields (those with field-width 1; see above). A VMREAD or VMWRITE using an encoding with this bit cleared to 0 accesses the entire field. For a 64-bit field with field-width 1, a VMREAD or VMWRITE using an encoding with this bit set to 1 accesses only the high 32 bits of the field.
Appendix B gives the encodings of all fields in the VMCS. The following describes the operation of VMREAD and VMWRITE based on processor mode, VMCS-field width, and access type: 16-bit fields:
226
A VMREAD returns the value of the field in bits 15:0 of the destination operand; other bits of the destination operand are cleared to 0. A VMWRITE writes the value of bits 15:0 of the source operand into the VMCS field; other bits of the source operand are not used. 32-bit fields: A VMREAD returns the value of the field in bits 31:0 of the destination operand; in 64-bit mode, bits 63:32 of the destination operand are cleared to 0. A VMWRITE writes the value of bits 31:0 of the source operand into the VMCS field; in 64-bit mode, bits 63:32 of the source operand are not used. 64-bit fields and natural-width fields using the full access type outside IA-32e mode. A VMREAD returns the value of bits 31:0 of the field in its destination operand; bits 63:32 of the field are ignored. A VMWRITE writes the value of its source operand to bits 31:0 of the field and clears bits 63:32 of the field. 64-bit fields and natural-width fields using the full access type in 64-bit mode (only on processors that support Intel 64 architecture). A VMREAD returns the value of the field in bits 63:0 of the destination operand A VMWRITE writes the value of bits 63:0 of the source operand into the VMCS field. 64-bit fields using the high access type. A VMREAD returns the value of bits 63:32 of the field in bits 31:0 of the destination operand; in 64-bit mode, bits 63:32 of the destination operand are cleared to 0. A VMWRITE writes the value of bits 31:0 of the source operand to bits 63:32 of the field; in 64-bit mode, bits 63:32 of the source operand are not used. Software seeking to read a 64-bit field outside IA-32e mode can use VMREAD with the full access type (reading bits 31:0 of the field) and VMREAD with the high access type (reading bits 63:32 of the field); the order of the two VMREAD executions is not important. Software seeking to modify a 64-bit field outside IA-32e mode should first use VMWRITE with the full access type (establishing bits 31:0 of the field while clearing bits 63:32) and then use VMWRITE with the high access type (establishing bits 63:32 of the field). ...
Before executing VMXON, software should write the VMCS revision identifier (see Section 24.2) to the VMXON region. (Specifically, it should write the 31-bit VMCS revision identifier to bits 30:0 of the first 4 bytes of the 1. The amount of memory required for the VMXON region is the same as that required for a VMCS region. This size is implementation specific and can be determined by consulting the VMX capability MSR IA32_VMX_BASIC (see Appendix A.1). 2. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX. 3. If IA32_VMX_BASIC[48] is read as 1, the VMXON pointer must not set any bits in the range 63:32; see Appendix A.1.
227
VMXON region; bit 31 should be cleared to 0.) It need not initialize the VMXON region in any other way. Software should use a separate region for each logical processor and should not access or modify the VMXON region of a logical processor between execution of VMXON and VMXOFF on that logical processor. Doing otherwise may lead to unpredictable behavior (including behaviors identified in Section 24.11.1). ...
In a virtualized environment using VMX, the guest software stack typically runs on a logical processor in VMX nonroot operation. This mode of operation is similar to that of ordinary processor operation outside of the virtualized environment. This chapter describes the differences between VMX non-root operation and ordinary processor operation with special attention to causes of VM exits (which bring a logical processor from VMX non-root operation to root operation). The differences between VMX non-root operation and ordinary processor operation are described in the following sections: Section 25.1, Instructions That Cause VM Exits Section 25.2, Other Causes of VM Exits Section 25.3, Changes to Instruction Behavior in VMX Non-Root Operation Section 25.4, Other Changes in VMX Non-Root Operation Section 25.5, Features Specific to VMX Non-Root Operation Section 25.6, Unrestricted Guests
Chapter 24, Virtual-Machine Control Structures, describes the data control structures that govern VMX non-root operation. Chapter 26, VM Entries, describes the operation of VM entries by which the processor transitions from VMX root operation to VMX non-root operation. Chapter 27, VM Exits, describes the operation of VM exits by which the processor transitions from VMX non-root operation to VMX root operation. Chapter 28, VMX Support for Address Translation, describes two features that support address translation in VMX non-root operation. Chapter 29, APIC Virtualization and Virtual Interrupts, describes features that support virtualization of interrupts and the Advanced Programmable Interrupt Controller (APIC) in VMX non-root operation. ...
25.1.2
The following instructions cause VM exits when they are executed in VMX non-root operation: CPUID, GETSEC,1 INVD, and XSETBV. This is also true of instructions introduced with VMX, which include: INVEPT, INVVPID, VMCALL,2 VMCLEAR, VMLAUNCH, VMPTRLD, VMPTRST, VMRESUME, VMXOFF, and VMXON. 1. An execution of GETSEC in VMX non-root operation causes a VM exit if CR4.SMXE[Bit 14] = 1 regardless of the value of CPL or RAX. An execution of GETSEC causes an invalid-opcode exception (#UD) if CR4.SMXE[Bit 14] = 0.
228
25.1.3
Certain instructions cause VM exits in VMX non-root operation depending on the setting of the VM-execution controls. The following instructions can cause fault-like VM exits based on the conditions described: CLTS. The CLTS instruction causes a VM exit if the bits in position 3 (corresponding to CR0.TS) are set in both the CR0 guest/host mask and the CR0 read shadow. HLT. The HLT instruction causes a VM exit if the HLT exiting VM-execution control is 1. IN, INS/INSB/INSW/INSD, OUT, OUTS/OUTSB/OUTSW/OUTSD. The behavior of each of these instructions is determined by the settings of the unconditional I/O exiting and use I/O bitmaps VM-execution controls: If both controls are 0, the instruction executes normally. If the unconditional I/O exiting VM-execution control is 1 and the use I/O bitmaps VM-execution control is 0, the instruction causes a VM exit. If the use I/O bitmaps VM-execution control is 1, the instruction causes a VM exit if it attempts to access an I/O port corresponding to a bit set to 1 in the appropriate I/O bitmap (see Section 24.6.4). If an I/O operation wraps around the 16-bit I/O-port space (accesses ports FFFFH and 0000H), the I/O instruction causes a VM exit (the unconditional I/O exiting VM-execution control is ignored if the use I/O bitmaps VM-execution control is 1). See Section 25.1.1 for information regarding the priority of VM exits relative to faults that may be caused by the INS and OUTS instructions. INVLPG. The INVLPG instruction causes a VM exit if the INVLPG exiting VM-execution control is 1. INVPCID. The INVPCID instruction causes a VM exit if the INVLPG exiting and enable INVPCID VM-execution controls are both 1.1 LGDT, LIDT, LLDT, LTR, SGDT, SIDT, SLDT, STR. These instructions cause VM exits if the descriptortable exiting VM-execution control is 1.2 LMSW. In general, the LMSW instruction causes a VM exit if it would write, for any bit set in the low 4 bits of the CR0 guest/host mask, a value different than the corresponding bit in the CR0 read shadow. LMSW never clears bit 0 of CR0 (CR0.PE); thus, LMSW causes a VM exit if either of the following are true: The bits in position 0 (corresponding to CR0.PE) are set in both the CR0 guest/mask and the source operand, and the bit in position 0 is clear in the CR0 read shadow. For any bit position in the range 3:1, the bit in that position is set in the CR0 guest/mask and the values of the corresponding bits in the source operand and the CR0 read shadow differ. MONITOR. The MONITOR instruction causes a VM exit if the MONITOR exiting VM-execution control is 1. MOV from CR3. The MOV from CR3 instruction causes a VM exit if the CR3-store exiting VM-execution control is 1. The first processors to support the virtual-machine extensions supported only the 1-setting of this control. MOV from CR8. The MOV from CR8 instruction causes a VM exit if the CR8-store exiting VM-execution control is 1.
2. Under the dual-monitor treatment of SMIs and SMM, executions of VMCALL cause SMM VM exits in VMX root operation outside SMM. See Section 34.15.2. 1. Enable INVPCID is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the enable INVPCID VM-execution control were 0. See Section 24.6.2. 2. Descriptor-table exiting is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the descriptor-table exiting VM-execution control were 0. See Section 24.6.2.
229
MOV to CR0. The MOV to CR0 instruction causes a VM exit unless the value of its source operand matches, for the position of each bit set in the CR0 guest/host mask, the corresponding bit in the CR0 read shadow. (If every bit is clear in the CR0 guest/host mask, MOV to CR0 cannot cause a VM exit.) MOV to CR3. The MOV to CR3 instruction causes a VM exit unless the CR3-load exiting VM-execution control is 0 or the value of its source operand is equal to one of the CR3-target values specified in the VMCS. If the CR3-target count in n, only the first n CR3-target values are considered; if the CR3-target count is 0, MOV to CR3 always causes a VM exit. The first processors to support the virtual-machine extensions supported only the 1-setting of the CR3-load exiting VM-execution control. These processors always consult the CR3-target controls to determine whether an execution of MOV to CR3 causes a VM exit.
MOV to CR4. The MOV to CR4 instruction causes a VM exit unless the value of its source operand matches, for the position of each bit set in the CR4 guest/host mask, the corresponding bit in the CR4 read shadow. MOV to CR8. The MOV to CR8 instruction causes a VM exit if the CR8-load exiting VM-execution control is 1. MOV DR. The MOV DR instruction causes a VM exit if the MOV-DR exiting VM-execution control is 1. Such VM exits represent an exception to the principles identified in Section 25.1.1 in that they take priority over the following: general-protection exceptions based on privilege level; and invalid-opcode exceptions that occur because CR4.DE=1 and the instruction specified access to DR4 or DR5. MWAIT. The MWAIT instruction causes a VM exit if the MWAIT exiting VM-execution control is 1. If this control is 0, the behavior of the MWAIT instruction may be modified (see Section 25.3). PAUSE.The behavior of each of this instruction depends on CPL and the settings of the PAUSE exiting and PAUSE-loop exiting VM-execution controls:1 CPL = 0. If the PAUSE exiting and PAUSE-loop exiting VM-execution controls are both 0, the PAUSE instruction executes normally. If the PAUSE exiting VM-execution control is 1, the PAUSE instruction causes a VM exit (the PAUSEloop exiting VM-execution control is ignored if CPL = 0 and the PAUSE exiting VM-execution control is 1). If the PAUSE exiting VM-execution control is 0 and the PAUSE-loop exiting VM-execution control is 1, the following treatment applies. The processor determines the amount of time between this execution of PAUSE and the previous execution of PAUSE at CPL 0. If this amount of time exceeds the value of the VM-execution control field PLE_Gap, the processor considers this execution to be the first execution of PAUSE in a loop. (It also does so for the first execution of PAUSE at CPL 0 after VM entry.) Otherwise, the processor determines the amount of time since the most recent execution of PAUSE that was considered to be the first in a loop. If this amount of time exceeds the value of the VMexecution control field PLE_Window, a VM exit occurs. For purposes of these computations, time is measured based on a counter that runs at the same rate as the timestamp counter (TSC). CPL > 0. If the PAUSE exiting VM-execution control is 0, the PAUSE instruction executes normally. If the PAUSE exiting VM-execution control is 1, the PAUSE instruction causes a VM exit.
The PAUSE-loop exiting VM-execution control is ignored if CPL > 0. RDMSR. The RDMSR instruction causes a VM exit if any of the following are true:
1. PAUSE-loop exiting is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the PAUSE-loop exiting VM-execution control were 0. See Section 24.6.2.
230
The use MSR bitmaps VM-execution control is 0. The value of ECX is not in the ranges 00000000H 00001FFFH and C0000000H C0001FFFH. The value of ECX is in the range 00000000H 00001FFFH and bit n in read bitmap for low MSRs is 1, where n is the value of ECX. The value of ECX is in the range C0000000H C0001FFFH and bit n in read bitmap for high MSRs is 1, where n is the value of ECX & 00001FFFH. See Section 24.6.9 for details regarding how these bitmaps are identified. RDPMC. The RDPMC instruction causes a VM exit if the RDPMC exiting VM-execution control is 1. RDRAND. The RDRAND instruction causes a VM exit if the RDRAND exiting VM-execution control is 1.1 RDTSC. The RDTSC instruction causes a VM exit if the RDTSC exiting VM-execution control is 1. RDTSCP. The RDTSCP instruction causes a VM exit if the RDTSC exiting and enable RDTSCP VM-execution controls are both 1.2 RSM. The RSM instruction causes a VM exit if executed in system-management mode (SMM).3 VMREAD. The VMREAD instruction causes a VM exit if any of the following are true: The VMCS shadowing VM-execution control is 0.4 Bits 63:15 (bits 31:15 outside 64-bit mode) of the register source operand are not all 0. Bit n in VMREAD bitmap is 1, where n is the value of bits 14:0 of the register source operand. See Section 24.6.15 for details regarding how the VMREAD bitmap is identified. If the VMREAD instruction does not cause a VM exit, it reads from the VMCS referenced by the VMCS link pointer. See Chapter 30, VMREADRead Field from Virtual-Machine Control Structure for details of the operation of the VMREAD instruction. VMWRITE. The VMWRITE instruction causes a VM exit if any of the following are true: The VMCS shadowing VM-execution control is 0. Bits 63:15 (bits 31:15 outside 64-bit mode) of the register source operand are not all 0. Bit n in VMWRITE bitmap is 1, where n is the value of bits 14:0 of the register source operand. See Section 24.6.15 for details regarding how the VMWRITE bitmap is identified. If the VMWRITE instruction does not cause a VM exit, it writes to the VMCS referenced by the VMCS link pointer. See Chapter 30, VMWRITEWrite Field to Virtual-Machine Control Structure for details of the operation of the VMWRITE instruction. WBINVD. The WBINVD instruction causes a VM exit if the WBINVD exiting VM-execution control is 1.5 WRMSR. The WRMSR instruction causes a VM exit if any of the following are true: The use MSR bitmaps VM-execution control is 0. The value of ECX is not in the ranges 00000000H 00001FFFH and C0000000H C0001FFFH. 1. RDRAND exiting is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the RDRAND exiting VM-execution control were 0. See Section 24.6.2. 2. Enable RDTSCP is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the enable RDTSCP VM-execution control were 0. See Section 24.6.2. 3. Execution of the RSM instruction outside SMM causes an invalid-opcode exception regardless of whether the processor is in VMX operation. It also does so in VMX root operation in SMM; see Section 34.15.3. 4. VMCS shadowing is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the VMCS shadowing VM-execution control were 0. See Section 24.6.2. 5. WBINVD exiting is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VMX non-root operation functions as if the WBINVD exiting VM-execution control were 0. See Section 24.6.2.
231
The value of ECX is in the range 00000000H 00001FFFH and bit n in write bitmap for low MSRs is 1, where n is the value of ECX. The value of ECX is in the range C0000000H C0001FFFH and bit n in write bitmap for high MSRs is 1, where n is the value of ECX & 00001FFFH. See Section 24.6.9 for details regarding how these bitmaps are identified. ...
25.5
Some VM-execution controls support features that are specific to VMX non-root operation. These are the VMXpreemption timer (Section 25.5.1) and the monitor trap flag (Section 25.5.2), translation of guest-physical addresses (Section 25.5.3), VM functions (Section 25.5.5), and virtualization exceptions (Section 25.5.6). ...
25.5.4
APIC Virtualization
APIC virtualization is a collection of features that can be used to support the virtualization of interrupts and the Advanced Programmable Interrupt Controller (APIC). When APIC virtualization is enabled, the processor emulates many accesses to the APIC, tracks the state of the virtual APIC, and delivers virtual interrupts all in VMX non-root operation without a VM exit. Details of the APIC virtualization are given in Chapter 29. ...
25.5.5.3
EPTP Switching
EPTP switching is VM function 0. This VM function allows software in VMX non-root operation to load a new value for the EPT pointer (EPTP), thereby establishing a different EPT paging-structure hierarchy (see Section 28.2 for details of the operation of EPT). Software is limited to selecting from a list of potential EPTP values configured in advance by software in VMX root operation. Specifically, the value of ECX is used to select an entry from the EPTP list, the 4-KByte structure referenced by the EPTP-list address (see Section 24.6.14; because this structure contains 512 8-Byte entries, VMFUNC causes a VM exit if ECX 512). If the selected entry is a valid EPTP value (it would not cause VM entry to fail; see Section 26.2.1.1), it is stored in the EPTP field of the current VMCS and is used for subsequent accesses using guest-physical addresses. The following pseudocode provides details: IF ECX 512 THEN VM exit; ELSE tent_EPTP 8 bytes from EPTP-list address + 8 * ECX; IF tent_EPTP is not a valid EPTP value (would cause VM entry to fail if in EPTP) THEN VMexit; ELSE write tent_EPTP to the EPTP field in the current VMCS; use tent_EPTP as the new EPTP value for address translation; IF processor supports the 1-setting of the EPT-violation #VE VM-execution control THEN write ECX[15:0] to EPTP-index field in current VMCS;
232
use ECX[15:0] as EPTP index for subsequent EPT-violation virtualization exceptions (see Section
Execution of the EPTP-switching VM function does not modify the state of any registers; no flags are modified. As noted in Section 25.5.5.2, an execution of the EPTP-switching VM function that causes a VM exit (as specified above), uses the basic exit reason 59, indicating VMFUNC. The length of the VMFUNC instruction is saved into the VM-exit instruction-length field. No additional VM-exit information is provided. An execution of VMFUNC loads EPTP from the EPTP list (and thus does not cause a fault or VM exit) is called an EPTP-switching VMFUNC. After an EPTP-switching VMFUNC, control passes to the next instruction. The logical processor starts creating and using guest-physical and combined mappings associated with the new value of bits 51:12 of EPTP; the combined mappings created and used are associated with the current VPID and PCID (these are not changed by VMFUNC).1 If the enable VPID VM-execution control is 0, an EPTP-switching VMFUNC invalidates combined mappings associated with VPID 0000H (for all PCIDs and for all EP4TA values, where EP4TA is the value of bits 51:12 of EPTP). Because an EPTP-switching VMFUNC may change the translation of guest-physical addresses, it may affect use of the guest-physical address in CR3. The EPTP-switching VMFUNC cannot itself cause a VM exit due to an EPT violation or an EPT misconfiguration due to the translation of that guest-physical address through the new EPT paging structures. The following items provide details that apply if CR0.PG = 1: If 32-bit paging or IA-32e paging is in use (either CR4.PAE = 0 or IA32_EFER.LMA = 1), the next memory access with a linear address uses the translation of the guest-physical address in CR3 through the new EPT paging structures. As a result, this access may cause a VM exit due to an EPT violation or an EPT misconfiguration encountered during that translation. If PAE paging is in use (CR4.PAE = 1 and IA32_EFER.LMA = 0), an EPTP-switching VMFUNC does not load the four page-directory-pointer-table entries (PDPTEs) from the guest-physical address in CR3. The logical processor continues to use the four guest-physical addresses already present in the PDPTEs. The guestphysical address in CR3 is not translated through the new EPT paging structures (until some operation that would load the PDPTEs). The EPTP-switching VMFUNC cannot itself cause a VM exit due to an EPT violation or an EPT misconfiguration encountered during the translation of a guest-physical address in any of the PDPTEs. A subsequent memory access with a linear address uses the translation of the guest-physical address in the appropriate PDPTE through the new EPT paging structures. As a result, such an access may cause a VM exit due to an EPT violation or an EPT misconfiguration encountered during that translation. If an EPTP-switching VMFUNC establishes an EPTP value that enables accessed and dirty flags for EPT (by setting bit 6), subsequent memory accesses may fail to set those flags as specified if there has been no appropriate execution of INVEPT since the last use of an EPTP value that does not enable accessed and dirty flags for EPT (because bit 6 is clear) and that is identical to the new value on bits 51:12. IF the processor supports the 1-setting of the EPT-violation #VE VM-execution control, an EPTP-switching VMFUNC loads the value in ECX[15:0] into to EPTP-index field in current VMCS. Subsequent EPT-violation virtualization exceptions will save this value into the virtualization-exception information area (see Section 25.5.6.2);
25.5.6
Virtualization Exceptions
A virtualization exception is a new processor exception. It uses vector 20 and is abbreviated #VE.
1. If the enable VPID VM-execution control is 0, the current VPID is 0000H; if CR4.PCIDE = 0, the current PCID is 000H.
233
A virtualization exception can occur only in VMX non-root operation. Virtualization exceptions occur only with certain settings of certain VM-execution controls. Generally, these settings imply that certain conditions that would normally cause VM exits instead cause virtualization exceptions In particular, the 1-setting of the EPT-violation #VE VM-execution control causes some EPT violations to generate virtualization exceptions instead of VM exits. Section 25.5.6.1 provides the details of how the processor determines whether an EPT violation causes a virtualization exception or a VM exit. When the processor encounters a virtualization exception, it saves information about the exception to the virtualization-exception information area; see Section 25.5.6.2. After saving virtualization-exception information, the processor delivers a virtualization exception as it would any other exception; see Section 25.5.6.3 for details.
25.5.6.1
If the EPT-violation #VE VM-execution control is 0 (e.g., on processors that do not support this feature), EPT violations always cause VM exits.1 If instead the control is 1, certain EPT violations may be converted to cause virtualization exceptions instead; such EPT violations are convertible. The values of certain EPT paging-structure entries determine which EPT violations are convertible. Specifically, bit 63 of certain EPT paging-structure entries may be defined to mean suppress #VE: If bits 2:0 of an EPT paging-structure entry are all 0, the entry is not present. If the processor encounters such an entry while translating a guest-physical address, it causes an EPT violation. The EPT violation is convertible if and only if bit 63 of the entry is 0. If bits 2:0 of an EPT paging-structure entry are not all 0, the following cases apply: If the value of the EPT paging-structure entry is not supported, the entry is misconfigured. If the processor encounters such an entry while translating a guest-physical address, it causes an EPT misconfiguration (not an EPT violation). EPT misconfigurations always cause VM exits. If the value of the EPT paging-structure entry is supported, the following cases apply: If bit 7 of the entry is 1, or if the entry is an EPT PTE, the entry maps a page. If the processor uses such an entry to translate a guest-physical address, and if an access to that address causes an EPT violation, the EPT violation is convertible if and only if bit 63 of the entry is 0. If bit 7 of the entry is 0 and the entry is not an EPT PTE, the entry references another EPT paging structure. The processor does not use the value of bit 63 of the entry to determine whether any subsequent EPT violation is convertible.
If an access to a guest-physical address causes an EPT violation, bit 63 of exactly one of the EPT paging-structure entries used to translate that address is used to determine whether the EPT violation is convertible: either a entry that is not present (if the guest-physical address does not translate to a physical address) or an entry that maps a page (if it does). A convertible EPT violation instead causes a virtualization exception if the following all hold: CR0.PE = 1; the logical processor is not in the process of delivering an event through the IDT; and the 32 bits at offset 4 in the virtualization-exception information area are all 0.
Delivery of virtualization exceptions writes the value FFFFFFFFH to offset 4 in the virtualization-exception information area (see Section 25.5.6.2). Thus, once a virtualization exception occurs, another can occur only if software clears this field.
234
25.5.6.2
Virtualization-Exception Information
Virtualization exceptions save data into the virtualization-exception information area (see Section 24.6.16). Table 25-1 enumerates the data saved and the format of the area.
25.5.6.3
After saving virtualization-exception information, the processor treats a virtualization exception as it does other exceptions: If bit 20 (#VE) is 1 in the exception bitmap in the VMCS, a virtualization exception causes a VM exit (see below). If the bit is 0, the virtualization exception is delivered using gate descriptor 20 in the IDT. Virtualization exceptions produce no error code. Delivery of a virtualization exception pushes no error code on the stack. With respect to double faults, virtualization exceptions have the same severity as page faults. If delivery of a virtualization exception encounters a nested fault that is either contributory or a page fault, a double fault (#DF) is generated. See Chapter 6, Interrupt 8Double Fault Exception (#DF) in Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A. It is not possible for a virtualization exception to be encountered while delivering another exception (see Section 25.5.6.1). If a virtualization exception causes a VM exit directly (because bit 20 is 1 in the exception bitmap), information about the exception is saved normally in the VM-exit interruption information field in the VMCS (see Section 27.2.2). Specifically, the event is reported as a hardware exception with vector 20 and no error code. Bit 12 of the field (NMI unblocking due to IRET) is set normally. If a virtualization exception causes a VM exit indirectly (because bit 20 is 0 in the exception bitmap and delivery of the exception generates an event that causes a VM exit), information about the exception is saved normally in the IDT-vectoring information field in the VMCS (see Section 27.2.3). Specifically, the event is reported as a hardware exception with vector 20 and no error code. ...
235
Change bars show changes to Chapter 26 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C: System Programming Guide, Part 3. -----------------------------------------------------------------------------------------...
26.1
Before a VM entry commences, the current state of the logical processor is checked in the following order: 1. If the logical processor is in virtual-8086 mode or compatibility mode, an invalid-opcode exception is generated. 2. If the current privilege level (CPL) is not zero, a general-protection exception is generated. 3. If there is no current VMCS, RFLAGS.CF is set to 1 and control passes to the next instruction. 4. If there is a current VMCS but the current VMCS is a shadow VMCS (see Section 24.10), RFLAGS.CF is set to 1 and control passes to the next instruction. 5. If there is a current VMCS that is not a shadow VMCS, the following conditions are evaluated in order; any of these cause VM entry to fail: a. if there is MOV-SS blocking (see Table 24-3) b. if the VM entry is invoked by VMLAUNCH and the VMCS launch state is not clear c. if the VM entry is invoked by VMRESUME and the VMCS launch state is not launched If any of these checks fail, RFLAGS.ZF is set to 1 and control passes to the next instruction. An error number indicating the cause of the failure is stored in the VM-instruction error field. See Chapter 30 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C for the error numbers. ...
26.2.1.1
VM entries perform the following checks on the VM-execution control fields:1 Reserved bits in the pin-based VM-execution controls must be set properly. Software may consult the VMX capability MSRs to determine the proper settings (see Appendix A.3.1). Reserved bits in the primary processor-based VM-execution controls must be set properly. Software may consult the VMX capability MSRs to determine the proper settings (see Appendix A.3.2). If the activate secondary controls primary processor-based VM-execution control is 1, reserved bits in the secondary processor-based VM-execution controls must be cleared. Software may consult the VMX capability MSRs to determine which bits are reserved (see Appendix A.3.3). If the activate secondary controls primary processor-based VM-execution control is 0 (or if the processor does not support the 1-setting of that control), no checks are performed on the secondary processor-based VM-execution controls. The logical processor operates as if all the secondary processor-based VM-execution controls were 0. The CR3-target count must not be greater than 4. Future processors may support a different number of CR3target values. Software should read the VMX capability MSR IA32_VMX_MISC to determine the number of values supported (see Appendix A.6).
1. If the activate secondary controls primary processor-based VM-execution control is 0, VM entry operates as if each secondary processor-based VM-execution control were 0.
236
If the use I/O bitmaps VM-execution control is 1, bits 11:0 of each I/O-bitmap address must be 0. Neither address should set any bits beyond the processors physical-address width.1,2 If the use MSR bitmaps VM-execution control is 1, bits 11:0 of the MSR-bitmap address must be 0. The address should not set any bits beyond the processors physical-address width.3 If the use TPR shadow VM-execution control is 1, the virtual-APIC address must satisfy the following checks: Bits 11:0 of the address must be 0. The address should not set any bits beyond the processors physical-address width.4 If all of the above checks are satisfied and the use TPR shadow VM-execution control is 1, bytes 3:1 of VTPR (see Section 29.1.1) may be cleared (behavior may be implementation-specific). The clearing of these bytes may occur even if the VM entry fails. This is true either if the failure causes control to pass to the instruction following the VM-entry instruction or if it causes processor state to be loaded from the host-state area of the VMCS.
If the use TPR shadow VM-execution control is 1 and the virtual-interrupt delivery VM-execution control is 0, bits 31:4 of the TPR threshold VM-execution control field must be 0.5 The following check is performed if the use TPR shadow VM-execution control is 1 and the virtualize APIC accesses and virtual-interrupt delivery VM-execution controls are both 0: the value of bits 3:0 of the TPR threshold VM-execution control field should not be greater than the value of bits 7:4 of VTPR (see Section 29.1.1). If the NMI exiting VM-execution control is 0, the virtual NMIs VM-execution control must be 0. If the virtual NMIs VM-execution control is 0, the NMI-window exiting VM-execution control must be 0. If the virtualize APIC-accesses VM-execution control is 1, the APIC-access address must satisfy the following checks: Bits 11:0 of the address must be 0. The address should not set any bits beyond the processors physical-address width.6
If the use TPR shadow VM-execution control is 0, the following VM-execution controls must also be 0: virtualize x2APIC mode, APIC-register virtualization, and virtual-interrupt delivery.7 If the virtualize x2APIC mode VM-execution control is 1, the virtualize APIC accesses VM-execution control must be 0. If the virtual-interrupt delivery VM-execution control is 1, the external-interrupt exiting VM-execution control must be 1. If the process posted interrupts VM-execution control is 1, the following must be true:8 The virtual-interrupt delivery VM-execution control is 1.
1. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX. 2. If IA32_VMX_BASIC[48] is read as 1, these addresses must not set any bits in the range 63:32; see Appendix A.1. 3. If IA32_VMX_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1. 4. If IA32_VMX_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1. 5. Virtual-interrupt delivery is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the virtual-interrupt delivery VM-execution control were 0. See Section 24.6.2. 6. If IA32_VMX_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1. 7. Virtualize x2APIC mode and APIC-register virtualization are secondary processor-based VM-execution controls. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if these controls were 0. See Section 24.6.2. 8. Process posted interrupts is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VMexecution controls is 0, VM entry functions as if the process posted interrupts VM-execution control were 0. See Section 24.6.2.
237
The acknowledge interrupt on exit VM-exit control is 1. The posted-interrupt notification vector has a value in the range 0255 (bits 15:8 are all 0). Bits 5:0 of the posted-interrupt descriptor address are all 0. The posted-interrupt descriptor address does not set any bits beyond the processor's physical-address width.1 If the enable VPID VM-execution control is 1, the value of the VPID VM-execution control field must not be 0000H.2 If the enable EPT VM-execution control is 1, the EPTP VM-execution control field (see Table 24-8 in Section 24.6.11) must satisfy the following checks:3 The EPT memory type (bits 2:0) must be a value supported by the processor as indicated in the IA32_VMX_EPT_VPID_CAP MSR (see Appendix A.10). Bits 5:3 (1 less than the EPT page-walk length) must be 3, indicating an EPT page-walk length of 4; see Section 28.2.2. Bit 6 (enable bit for accessed and dirty flags for EPT) must be 0 if bit 21 of the IA32_VMX_EPT_VPID_CAP MSR (see Appendix A.10) is read as 0, indicating that the processor does not support accessed and dirty flags for EPT. Reserved bits 11:7 and 63:N (where N is the processors physical-address width) must all be 0. If the unrestricted guest VM-execution control is 1, the enable EPT VM-execution control must also be 1.4 If the enable VM functions processor-based VM-execution control is 1, reserved bits in the VM-function controls must be clear.5 Software may consult the VMX capability MSRs to determine which bits are reserved (see Appendix A.11). In addition, the following check is performed based on the setting of bits in the VMfunction controls (see Section 24.6.14): If EPTP switching VM-function control is 1, the enable EPT VM-execution control must also 1. In addition, the EPTP-list address must satisfy the following checks: Bits 11:0 of the address must be 0. The address must not set any bits beyond the processors physical-address width.
If the enable VM functions processor-based VM-execution control is 0, no checks are performed on the VMfunction controls. If the VMCS shadowing VM-execution control is 1, the VMREAD-bitmap and VMWRITE-bitmap addresses must each satisfy the following checks:6 Bits 11:0 of the address must be 0. The address must not set any bits beyond the processors physical-address width. 1. If IA32_VMX_BASIC[48] is read as 1, this address must not set any bits in the range 63:32; see Appendix A.1. 2. Enable VPID is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the enable VPID VM-execution control were 0. See Section 24.6.2. 3. Enable EPT is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the enable EPT VM-execution control were 0. See Section 24.6.2. 4. Unrestricted guest and enable EPT are both secondary processor-based VM-execution controls. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if both these controls were 0. See Section 24.6.2. 5. Enable VM functions is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the enable VM functions VM-execution control were 0. See Section 24.6.2. 6. VMCS shadowing is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the VMCS shadowing VM-execution control were 0. See Section 24.6.2.
238
If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks:1 Bits 11:0 of the address must be 0. The address must not set any bits beyond the processors physical-address width.
...
26.3.1.5
The following checks are performed on fields in the guest-state area corresponding to non-register state: Activity state. The activity-state field must contain a value in the range 0 3, indicating an activity state supported by the implementation (see Section 24.4.2). Future processors may include support for other activity states. Software should read the VMX capability MSR IA32_VMX_MISC (see Appendix A.6) to determine what activity states are supported. The activity-state field must not indicate the HLT state if the DPL (bits 6:5) in the access-rights field for SS is not 0.2 The activity-state field must indicate the active state if the interruptibility-state field indicates blocking by either MOV-SS or by STI (if either bit 0 or bit 1 in that field is 1). If the valid bit (bit 31) in the VM-entry interruption-information field is 1, the interruption to be delivered (as defined by interruption type and vector) must not be one that would normally be blocked while a logical processor is in the activity state corresponding to the contents of the activity-state field. The following items enumerate the interruptions (as specified in the VM-entry interruption-information field) whose injection is allowed for the different activity states: Active. Any interruption is allowed. HLT. The only events allowed are the following: Those with interruption type external interrupt or non-maskable interrupt (NMI). Those with interruption type hardware exception and vector 1 (debug exception) or vector 18 (machine-check exception). Those with interruption type other event and vector 0 (pending MTF VM exit). See Table 24-13 in Section 24.8.3 for details regarding the format of the VM-entry interruptioninformation field. Shutdown. Only NMIs and machine-check exceptions are allowed. Wait-for-SIPI. No interruptions are allowed.
The activity-state field must not indicate the wait-for-SIPI state if the entry to SMM VM-entry control is 1. Interruptibility state. The reserved bits (bits 31:4) must be 0. The field cannot indicate blocking by both STI and MOV SS (bits 0 and 1 cannot both be 1). Bit 0 (blocking by STI) must be 0 if the IF flag (bit 9) is 0 in the RFLAGS field.
1. EPT-violation #VE is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the EPT-violation #VE VM-execution control were 0. See Section 24.6.2. 2. As noted in Section 24.4.1, SS.DPL corresponds to the logical processors current privilege level (CPL).
239
Bit 0 (blocking by STI) and bit 1 (blocking by MOV-SS) must both be 0 if the valid bit (bit 31) in the VM-entry interruption-information field is 1 and the interruption type (bits 10:8) in that field has value 0, indicating external interrupt. Bit 1 (blocking by MOV-SS) must be 0 if the valid bit (bit 31) in the VM-entry interruption-information field is 1 and the interruption type (bits 10:8) in that field has value 2, indicating non-maskable interrupt (NMI). Bit 2 (blocking by SMI) must be 0 if the processor is not in SMM. Bit 2 (blocking by SMI) must be 1 if the entry to SMM VM-entry control is 1. A processor may require bit 0 (blocking by STI) to be 0 if the valid bit (bit 31) in the VM-entry interruption-information field is 1 and the interruption type (bits 10:8) in that field has value 2, indicating NMI. Other processors may not make this requirement. Bit 3 (blocking by NMI) must be 0 if the virtual NMIs VM-execution control is 1, the valid bit (bit 31) in the VM-entry interruption-information field is 1, and the interruption type (bits 10:8) in that field has value 2 (indicating NMI).
NOTE
If the virtual NMIs VM-execution control is 0, there is no requirement that bit 3 be 0 if the valid bit in the VM-entry interruption-information field is 1 and the interruption type in that field has value 2. Pending debug exceptions. Bits 11:4, bit 13, and bits 63:15 (bits 31:15 on processors that do not support Intel 64 architecture) must be 0. The following checks are performed if any of the following holds: (1) the interruptibility-state field indicates blocking by STI (bit 0 in that field is 1); (2) the interruptibility-state field indicates blocking by MOV SS (bit 1 in that field is 1); or (3) the activity-state field indicates HLT: Bit 14 (BS) must be 1 if the TF flag (bit 8) in the RFLAGS field is 1 and the BTF flag (bit 1) in the IA32_DEBUGCTL field is 0. Bit 14 (BS) must be 0 if the TF flag (bit 8) in the RFLAGS field is 0 or the BTF flag (bit 1) in the IA32_DEBUGCTL field is 1.
VMCS link pointer. The following checks apply if the field contains a value other than FFFFFFFF_FFFFFFFFH: Bits 11:0 must be 0. Bits beyond the processors physical-address width must be 0.1,2 The 4 bytes located in memory referenced by the value of the field (as a physical address) must satisfy the following: Bits 30:0 must contain the processors VMCS revision identifier (see Section 24.2).3 Bit 31 must contain the setting of the VMCS shadowing VM-execution control.4 This implies that the referenced VMCS is a shadow VMCS (see Section 24.10) if and only if the VMCS shadowing VMexecution control is 1.
1. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX. 2. If IA32_VMX_BASIC[48] is read as 1, this field must not set any bits in the range 63:32; see Appendix A.1. 3. Earlier versions of this manual specified that the VMCS revision identifier was a 32-bit field. For all processors produced prior to this change, bit 31 of the VMCS revision identifier was 0.
240
If the processor is not in SMM or the entry to SMM VM-entry control is 1, the field must not contain the current VMCS pointer. If the processor is in SMM and the entry to SMM VM-entry control is 0, the field must differ from the executive-VMCS pointer. ...
26.5.1
Vectored-Event Injection
VM entry delivers an injected vectored event within the guest context established by VM entry. This means that delivery occurs after all components of guest state have been loaded (including MSRs) and after the VM-execution control fields have been established.1 The event is delivered using the vector in that field to select a descriptor in the IDT. Since event injection occurs after loading IDTR from the guest-state area, this is the guest IDT. Section 26.5.1.1 provides details of vectored-event injection. In general, the event is delivered exactly as if it had been generated normally. If event delivery encounters a nested exception (for example, a general-protection exception because the vector indicates a descriptor beyond the IDT limit), the exception bitmap is consulted using the vector of that exception: If the bit for the nested exception is 0, the nested exception is delivered normally. If the nested exception is benign, it is delivered through the IDT. If it is contributory or a page fault, a double fault may be generated, depending on the nature of the event whose delivery encountered the nested exception. See Chapter 6, Interrupt 8Double Fault Exception (#DF) in Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A.2 If the bit for the nested exception is 1, a VM exit occurs. Section 26.5.1.2 details cases in which event injection causes a VM exit.
...
27.4
SAVING MSRS
After processor state is saved to the guest-state area, values of MSRs may be stored into the VM-exit MSR-store area (see Section 24.7.2). Specifically each entry in that area (up to the number specified in the VM-exit MSR-
4. VMCS shadowing is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, VM entry functions as if the VMCS shadowing VM-execution control were 0. See Section 24.6.2. 1. This does not imply that injection of an exception or interrupt will cause a VM exit due to the settings of VM-execution control fields (such as the exception bitmap) that would cause a VM exit if the event had occurred in VMX non-root operation. In contrast, a nested exception encountered during event delivery may cause a VM exit; see Section 26.5.1.1. 2. Hardware exceptions with the following unused vectors are considered benign: 15 and 2131. A hardware exception with vector 20 is considered benign unless the processor supports the 1-setting of the EPT-violation #VE VM-execution control; in that case, it has the same severity as page faults.
241
store count) is processed in order by storing the value of the MSR indexed by bits 31:0 (as they would be read by RDMSR) into bits 127:64. Processing of an entry fails in either of the following cases: The value of bits 31:8 is 000008H, meaning that the indexed MSR is one that allows access to an APIC register when the local APIC is in x2APIC mode. The value of bits 31:0 indicates an MSR that can be read only in system-management mode (SMM) and the VM exit will not end in SMM. (IA32_SMBASE is an MSR that can be read only in SMM.) The value of bits 31:0 indicates an MSR that cannot be saved on VM exits for model-specific reasons. A processor may prevent certain MSRs (based on the value of bits 31:0) from being stored on VM exits, even if they can normally be read by RDMSR. Such model-specific behavior is documented in Chapter 35. Bits 63:32 of the entry are not all 0. An attempt to read the MSR indexed by bits 31:0 would cause a general-protection exception if executed via RDMSR with CPL = 0.
A VMX abort occurs if processing fails for any entry. See Section 27.7. ...
28.2.1
EPT Overview
EPT is used when the enable EPT VM-execution control is 1.1 It translates the guest-physical addresses used in VMX non-root operation and those used by VM entry for event injection. The translation from guest-physical addresses to physical addresses is determined by a set of EPT paging structures. The EPT paging structures are similar to those used to translate linear addresses while the processor is in IA-32e mode. Section 28.2.2 gives the details of the EPT paging structures. If CR0.PG = 1, linear addresses are translated through paging structures referenced through control register CR3. While the enable EPT VM-execution control is 1, these are called guest paging structures. There are no guest paging structures if CR0.PG = 0.2 When the enable EPT VM-execution control is 1, the identity of guest-physical addresses depends on the value of CR0.PG: If CR0.PG = 0, each linear address is treated as a guest-physical address. If CR0.PG = 1, guest-physical addresses are those derived from the contents of control register CR3 and the guest paging structures. (This includes the values of the PDPTEs, which logical processors store in internal, non-architectural registers.) The latter includes (in page-table entries and in other paging-structure entries for which bit 7PSis 1) the addresses to which linear addresses are translated by the guest paging structures.
1. Enable EPT is a secondary processor-based VM-execution control. If bit 31 of the primary processor-based VM-execution controls is 0, the logical processor operates as if the enable EPT VM-execution control were 0. See Section 24.6.2. 2. If the capability MSR IA32_VMX_CR0_FIXED0 reports that CR0.PG must be 1 in VMX operation, CR0.PG can be 0 in VMX non-root operation only if the unrestricted guest VM-execution control and bit 31 of the primary processor-based VM-execution controls are both 1.
242
If CR0.PG = 1, the translation of a linear address to a physical address requires multiple translations of guestphysical addresses using EPT. Assume, for example, that CR4.PAE = CR4.PSE = 0. The translation of a 32-bit linear address then operates as follows: Bits 31:22 of the linear address select an entry in the guest page directory located at the guest-physical address in CR3. The guest-physical address of the guest page-directory entry (PDE) is translated through EPT to determine the guest PDEs physical address. Bits 21:12 of the linear address select an entry in the guest page table located at the guest-physical address in the guest PDE. The guest-physical address of the guest page-table entry (PTE) is translated through EPT to determine the guest PTEs physical address. Bits 11:0 of the linear address is the offset in the page frame located at the guest-physical address in the guest PTE. The guest-physical address determined by this offset is translated through EPT to determine the physical address to which the original linear address translates.
In addition to translating a guest-physical address to a physical address, EPT specifies the privileges that software is allowed when accessing the address. Attempts at disallowed accesses are called EPT violations and cause VM exits. See Section 28.2.3. A logical processor uses EPT to translate guest-physical addresses only when those addresses are used to access memory. This principle implies the following: The MOV to CR3 instruction loads CR3 with a guest-physical address. Whether that address is translated through EPT depends on whether PAE paging is being used.1 If PAE paging is not being used, the instruction does not use that address to access memory and does not cause it to be translated through EPT. (If CR0.PG = 1, the address will be translated through EPT on the next memory accessing using a linear address.) If PAE paging is being used, the instruction loads the four (4) page-directory-pointer-table entries (PDPTEs) from that address and it does cause the address to be translated through EPT. Section 4.4.1 identifies executions of MOV to CR0 and MOV to CR4 that load the PDPTEs from the guestphysical address in CR3. Such executions cause that address to be translated through EPT. The PDPTEs contain guest-physical addresses. The instructions that load the PDPTEs (see above) do not use those addresses to access memory and do not cause them to be translated through EPT. The address in a PDPTE will be translated through EPT on the next memory accessing using a linear address that uses that PDPTE.
28.2.2
The EPT translation mechanism uses only bits 47:0 of each guest-physical address.2 It uses a page-walk length of 4, meaning that at most 4 EPT paging-structure entries are accessed to translate a guest-physical address.3 These 48 bits are partitioned by the logical processor to traverse the EPT paging structures: A 4-KByte naturally aligned EPT PML4 table is located at the physical address specified in bits 51:12 of the extended-page-table pointer (EPTP), a VM-execution control field (see Table 24-8 in Section 24.6.11). An EPT
1. A logical processor uses PAE paging if CR0.PG = 1, CR4.PAE = 1 and IA32_EFER.LMA = 0. See Section 4.4 in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A. 2. No processors supporting the Intel 64 architecture support more than 48 physical-address bits. Thus, no such processor can produce a guest-physical address with more than 48 bits. An attempt to use such an address causes a page fault. An attempt to load CR3 with such an address causes a general-protection fault. If PAE paging is being used, an attempt to load CR3 that would load a PDPTE with such an address causes a general-protection fault. 3. Future processors may include support for other EPT page-walk lengths. Software should read the VMX capability MSR IA32_VMX_EPT_VPID_CAP (see Appendix A.10) to determine what EPT page-walk lengths are supported.
243
PML4 table comprises 512 64-bit entries (EPT PML4Es). An EPT PML4E is selected using the physical address defined as follows: Bits 63:52 are all 0. Bits 51:12 are from the EPTP. Bits 11:3 are bits 47:39 of the guest-physical address. Bits 2:0 are all 0. Because an EPT PML4E is identified using bits 47:39 of the guest-physical address, it controls access to a 512GByte region of the guest-physical-address space. The format of an EPT PML4E is given in Table 28-1.
Contents Read access; indicates whether reads are allowed from the 512-GByte region controlled by this entry Write access; indicates whether writes are allowed to the 512-GByte region controlled by this entry Execute access; indicates whether instruction fetches are allowed from the 512-GByte region controlled by this entry Reserved (must be 0) If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether software has accessed the 512-GByte region controlled by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 Ignored Physical address of 4-KByte aligned EPT page-directory-pointer table referenced by this entry1 Reserved (must be 0) Ignored
1. N is the physical-address width supported by the processor. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX. A 4-KByte naturally aligned EPT page-directory-pointer table is located at the physical address specified in bits 51:12 of the EPT PML4E. An EPT page-directory-pointer table comprises 512 64-bit entries (EPT PDPTEs). An EPT PDPTE is selected using the physical address defined as follows: Bits 63:52 are all 0. Bits 51:12 are from the EPT PML4E. Bits 11:3 are bits 38:30 of the guest-physical address. Bits 2:0 are all 0. Because an EPT PDPTE is identified using bits 47:30 of the guest-physical address, it controls access to a 1-GByte region of the guest-physical-address space. Use of the EPT PDPTE depends on the value of bit 7 in that entry:1 If bit 7 of the EPT PDPTE is 1, the EPT PDPTE maps a 1-GByte page. The final physical address is computed as follows:
1. Not all processors allow bit 7 of an EPT PDPTE to be set to 1. Software should read the VMX capability MSR IA32_VMX_EPT_VPID_CAP (see Appendix A.10) to determine whether this is allowed.
244
Bits 63:52 are all 0. Bits 51:30 are from the EPT PDPTE. Bits 29:0 are from the original guest-physical address. The format of an EPT PDPTE that maps a 1-GByte page is given in Table 28-2.
Table 28-2. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-GByte Page
Bit Position(s) 0 1 2 5:3 6 7 8 9 11:10 29:12 (N1):30 51:N 62:52 63 Contents Read access; indicates whether reads are allowed from the 1-GByte page referenced by this entry Write access; indicates whether writes are allowed to the 1-GByte page referenced by this entry Execute access; indicates whether instruction fetches are allowed from the 1-GByte page referenced by this entry EPT memory type for this 1-GByte page (see Section 28.2.5) Ignore PAT memory type for this 1-GByte page (see Section 28.2.5) Must be 1 (otherwise, this entry references an EPT page directory) If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether software has accessed the 1-GByte page referenced by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 If bit 6 of EPTP is 1, dirty flag for EPT; indicates whether software has written to the 1-GByte page referenced by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 Ignored Reserved (must be 0) Physical address of the 1-GByte page referenced by this entry1 Reserved (must be 0) Ignored Suppress #VE. If the EPT-violation #VE VM-execution control is 1, EPT violations caused by accesses to this page are convertible to virtualization exceptions only if this bit is 0 (see Section 25.5.6.1). If EPT-violation #VE VMexecution control is 0, this bit is ignored.
NOTES:
245
If bit 7 of the EPT PDPTE is 0, a 4-KByte naturally aligned EPT page directory is located at the physical address specified in bits 51:12 of the EPT PDPTE. The format of an EPT PDPTE that references an EPT page directory is given in Table 28-3.
Table 28-3. Format of an EPT Page-Directory-Pointer-Table Entry (PDPTE) that References an EPT Page Directory
Bit Position(s) 0 1 2 7:3 8 11:9 (N1):12 51:N 63:52
NOTES:
Contents Read access; indicates whether reads are allowed from the 1-GByte region controlled by this entry Write access; indicates whether writes are allowed to the 1-GByte region controlled by this entry Execute access; indicates whether instruction fetches are allowed from the 1-GByte region controlled by this entry Reserved (must be 0) If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether software has accessed the 1-GByte region controlled by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 Ignored Physical address of 4-KByte aligned EPT page directory referenced by this entry1 Reserved (must be 0) Ignored
1. N is the physical-address width supported by the logical processor. An EPT page-directory comprises 512 64-bit entries (PDEs). An EPT PDE is selected using the physical address defined as follows: Bits 63:52 are all 0. Bits 51:12 are from the EPT PDPTE. Bits 11:3 are bits 29:21 of the guest-physical address. Bits 2:0 are all 0. Because an EPT PDE is identified using bits 47:21 of the guest-physical address, it controls access to a 2-MByte region of the guest-physical-address space. Use of the EPT PDE depends on the value of bit 7 in that entry: If bit 7 of the EPT PDE is 1, the EPT PDE maps a 2-MByte page. The final physical address is computed as follows: Bits 63:52 are all 0. Bits 51:21 are from the EPT PDE. Bits 20:0 are from the original guest-physical address. The format of an EPT PDE that maps a 2-MByte page is given in Table 28-4. If bit 7 of the EPT PDE is 0, a 4-KByte naturally aligned EPT page table is located at the physical address specified in bits 51:12 of the EPT PDE. The format of an EPT PDE that references an EPT page table is given in Table 28-5. An EPT page table comprises 512 64-bit entries (PTEs). An EPT PTE is selected using a physical address defined as follows: Bits 63:52 are all 0. Bits 51:12 are from the EPT PDE.
246
Table 28-4. Format of an EPT Page-Directory Entry (PDE) that Maps a 2-MByte Page
Bit Position(s) 0 1 2 5:3 6 7 8 9 11:10 20:12 (N1):21 51:N 62:52 63 Contents Read access; indicates whether reads are allowed from the 2-MByte page referenced by this entry Write access; indicates whether writes are allowed to the 2-MByte page referenced by this entry Execute access; indicates whether instruction fetches are allowed from the 2-MByte page referenced by this entry EPT memory type for this 2-MByte page (see Section 28.2.5) Ignore PAT memory type for this 2-MByte page (see Section 28.2.5) Must be 1 (otherwise, this entry references an EPT page table) If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether software has accessed the 2-MByte page referenced by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 If bit 6 of EPTP is 1, dirty flag for EPT; indicates whether software has written to the 2-MByte page referenced by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 Ignored Reserved (must be 0) Physical address of the 2-MByte page referenced by this entry1 Reserved (must be 0) Ignored Suppress #VE. If the EPT-violation #VE VM-execution control is 1, EPT violations caused by accesses to this page are convertible to virtualization exceptions only if this bit is 0 (see Section 25.5.6.1). If EPT-violation #VE VMexecution control is 0, this bit is ignored.
1. N is the physical-address width supported by the logical processor. Bits 11:3 are bits 20:12 of the guest-physical address. Bits 2:0 are all 0. Because an EPT PTE is identified using bits 47:12 of the guest-physical address, every EPT PTE maps a 4KByte page. The final physical address is computed as follows: Bits 63:52 are all 0. Bits 51:12 are from the EPT PTE. Bits 11:0 are from the original guest-physical address. The format of an EPT PTE is given in Table 28-6. If bits 2:0 of an EPT paging-structure entry are all 0, the entry is not present. The processor ignores bits 62:3 and uses the entry neither to reference another EPT paging-structure entry nor to produce a physical address. A reference using a guest-physical address whose translation encounters an EPT paging-structure that is not present causes an EPT violation (see Section 28.2.3.2). (If the EPT-violation #VE VM-execution control is 1, the EPT violation is convertible to a virtualization exception only if bit 63 is 0; see Section 25.5.6.1. If the EPT-violation #VE VM-execution control is 0, this bit is ignored.)
NOTES:
247
Table 28-5. Format of an EPT Page-Directory Entry (PDE) that References an EPT Page Table
Bit Position(s) 0 1 2 6:3 7 8 11:9 (N1):12 51:N 63:52
NOTES:
Contents Read access; indicates whether reads are allowed from the 2-MByte region controlled by this entry Write access; indicates whether writes are allowed to the 2-MByte region controlled by this entry Execute access; indicates whether instruction fetches are allowed from the 2-MByte region controlled by this entry Reserved (must be 0) Must be 0 (otherwise, this entry maps a 2-MByte page) If bit 6 of EPTP is 1, accessed flag for EPT; indicates whether software has accessed the 2-MByte region controlled by this entry (see Section 28.2.4). Ignored if bit 6 of EPTP is 0 Ignored Physical address of 4-KByte aligned EPT page table referenced by this entry1 Reserved (must be 0) Ignored
1. N is the physical-address width supported by the logical processor. The discussion above describes how the EPT paging structures reference each other and how the logical processor traverses those structures when translating a guest-physical address. It does not cover all details of the translation process. Additional details are provided as follows: Situations in which the translation process may lead to VM exits (sometimes before the process completes) are described in Section 28.2.3. Interactions between the EPT translation mechanism and memory typing are described in Section 28.2.5.
Figure 28-1 gives a summary of the formats of the EPTP and the EPT paging-structure entries. For the EPT paging structure entries, it identifies separately the format of entries that map pages, those that reference other EPT paging structures, and those that do neither because they are not present; bits 2:0 and bit 7 are highlighted because they determine how a paging-structure entry is used.
28.2.3
EPT-Induced VM Exits
Accesses using guest-physical addresses may cause VM exits due to EPT misconfigurations and EPT violations. An EPT misconfiguration occurs when, in the course of translation a guest-physical address, the logical processor encounters an EPT paging-structure entry that contains an unsupported value. An EPT violation occurs when there is no EPT misconfiguration but the EPT paging-structure entries disallow an access using the guestphysical address. EPT misconfigurations and EPT violations occur only due to an attempt to access memory with a guest-physical address. Loading CR3 with a guest-physical address with the MOV to CR3 instruction can cause neither an EPT configuration nor an EPT violation until that address is used to access a paging structure.1
1. If the logical processor is using PAE pagingbecause CR0.PG = CR4.PAE = 1 and IA32_EFER.LMA = 0the MOV to CR3 instruction loads the PDPTEs from memory using the guest-physical address being loaded into CR3. In this case, therefore, the MOV to CR3 instruction may cause an EPT misconfiguration or an EPT violation.
248
1. N is the physical-address width supported by the logical processor. If the EPT-violation #VE VM-execution control is 1, certain EPT violations may cause virtualization exceptions instead of VM exits. See Section 25.5.6.1.
NOTES:
28.2.3.1
EPT Misconfigurations
AN EPT misconfiguration occurs if any of the following is identified while translating a guest-physical address: The value of bits 2:0 of an EPT paging-structure entry is either 010b (write-only) or 110b (write/execute). The value of bits 2:0 of an EPT paging-structure entry is 100b (execute-only) and this value is not supported by the logical processor. Software should read the VMX capability MSR IA32_VMX_EPT_VPID_CAP to determine whether this value is supported (see Appendix A.10). The value of bits 2:0 of an EPT paging-structure entry is not 000b (the entry is present) and one of the following holds: A reserved bit is set. This includes the setting of a bit in the range 51:12 that is beyond the logical processors physical-address width.1 See Section 28.2.2 for details of which bits are reserved in which EPT paging-structure entries. 1. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX.
249
The entry is the last one used to translate a guest physical address (either an EPT PDE with bit 7 set to 1 or an EPT PTE) and the value of bits 5:3 (EPT memory type) is 2, 3, or 7 (these values are reserved). EPT misconfigurations result when an EPT paging-structure entry is configured with settings reserved for future functionality. Software developers should be aware that such settings may be used in the future and that an EPT paging-structure entry that causes an EPT misconfiguration on one processor might not do so in the future.
28.2.3.2
EPT Violations
An EPT violation may occur during an access using a guest-physical address whose translation does not cause an EPT misconfiguration. An EPT violation occurs in any of the following situations: Translation of the guest-physical address encounters an EPT paging-structure entry that is not present (see Section 28.2.2). The access is a data read and bit 0 was clear in any of the EPT paging-structure entries used to translate the guest-physical address. Reads by the logical processor of guest paging structures to translate a linear address are considered to be data reads.
250
M1 M-1
33322222222221111111111 210987654321098765432109876543210 A EPT EPT Address of EPT PML4 table Rsvd. / PWL PS D 1 MT Address of EPT page-directory-pointer table
EPTP2
Ignored
S V E3
Rsvd.
PML4E: Ign. A Reserved X W R present 000 I P EPT X W R Ign. D A 1 A MT T Ign. A 0 Rsvd. X W R PML4E: not present PDPTE: 1GB page PDPTE: page directory PDTPE: not present PDE: 2MB page PDE: page table PDE: not present PTE: 4KB page PTE: not present
S V E
Ignored
Rsvd.
Reserved
Ignored S V E S V E Ignored
Rsvd.
Rsvd.
Reserved
Ignored S V E S V E S V E
NOTES:
Rsvd.
Ignored I Ign. D A g n
Ignored
Rsvd.
Ignored
251
(This does not apply to loads of the PDPTE registers by the MOV to CR instruction for PAE paging; see Section 4.4.1. Those loads of guest PDPTEs are treated as reads and do not cause EPT violations due to a guestphysical address not being writable.) ... The access is an instruction fetch and bit 2 was clear in any of the EPT paging-structure entries used to translate the guest-physical address.
30.1
OVERVIEW
This chapter describes the virtual-machine extensions (VMX) for the Intel 64 and IA-32 architectures. VMX is intended to support virtualization of processor hardware and a system software layer acting as a host to multiple guest software environments. The virtual-machine extensions (VMX) includes five instructions that manage the virtual-machine control structure (VMCS), four instructions that manage VMX operation, two TLB-management instructions, and two instructions for use by guest software. Additional details of VMX are described in Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C. The behavior of the VMCS-maintenance instructions is summarized below: VMPTRLD This instruction takes a single 64-bit source operand that is in memory. It makes the referenced VMCS active and current, loading the current-VMCS pointer with this operand and establishes the current VMCS based on the contents of VMCS-data area in the referenced VMCS region. Because this makes the referenced VMCS active, a logical processor may start maintaining on the processor some of the VMCS data for the VMCS. VMPTRST This instruction takes a single 64-bit destination operand that is in memory. The current-VMCS pointer is stored into the destination operand. VMCLEAR This instruction takes a single 64-bit operand that is in memory. The instruction sets the launch state of the VMCS referenced by the operand to clear, renders that VMCS inactive, and ensures that data for the VMCS have been written to the VMCS-data area in the referenced VMCS region. If the operand is the same as the current-VMCS pointer, that pointer is made invalid. VMREAD This instruction reads a component from a VMCS (the encoding of that field is given in a register operand) and stores it into a destination operand that may be a register or in memory. VMWRITE This instruction writes a component to a VMCS (the encoding of that field is given in a register operand) from a source operand that may be a register or in memory. VMLAUNCH This instruction launches a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM. VMRESUME This instruction resumes a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM. VMXOFF This instruction causes the processor to leave VMX operation.
252
VMXON This instruction takes a single 64-bit source operand that is in memory. It causes a logical processor to enter VMX root operation and to use the memory referenced by the operand to support VMX operation. INVEPT This instruction invalidates entries in the TLBs and paging-structure caches that were derived from extended page tables (EPT). INVVPID This instruction invalidates entries in the TLBs and paging-structure caches based on a VirtualProcessor Identifier (VPID).
None of the instructions above can be executed in compatibility mode; they generate invalid-opcode exceptions if executed in compatibility mode. The behavior of the guest-available instructions is summarized below: ... VMCALL This instruction allows software in VMX non-root operation to call the VMM for service. A VM exit occurs, transferring control to the VMM. VMFUNC This instruction allows software in VMX non-root operation to invoke a VM function (processor functionality enabled and configured by software in VMX root operation) without a VM exit.
Description
Effects a VM entry managed by the current VMCS. VMLAUNCH fails if the launch state of current VMCS is not clear. If the instruction is successful, it sets the launch state to launched. VMRESUME fails if the launch state of the current VMCS is not launched.
If VM entry is attempted, the logical processor performs a series of consistency checks as detailed in Chapter 26, VM Entries, in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C. Failure to pass checks on the VMX controls or on the host-state area passes control to the instruction following the VMLAUNCH or VMRESUME instruction. If these pass but checks on the guest-state area fail, the logical processor loads state from the host-state area of the VMCS, passing control to the instruction referenced by the RIP field in the hoststate area. VM entry is not allowed when events are blocked by MOV SS or POP SS. Neither VMLAUNCH nor VMRESUME should be used immediately after either MOV to SS or POP to SS.
Operation
IF (not in VMX operation) or (CR0.PE = 0) or (RFLAGS.VM = 1) or (IA32_EFER.LMA = 1 and CS.L = 0) THEN #UD; ELSIF in VMX non-root operation THEN VMexit; ELSIF CPL > 0 THEN #GP(0); ELSIF current-VMCS pointer is not valid
253
THEN VMfailInvalid; ELSIF events are being blocked by MOV SS THEN VMfailValid(VM entry with events blocked by MOV SS); ELSIF (VMLAUNCH and launch state of current VMCS is not clear) THEN VMfailValid(VMLAUNCH with non-clear VMCS); ELSIF (VMRESUME and launch state of current VMCS is not launched) THEN VMfailValid(VMRESUME with non-launched VMCS); ELSE Check settings of VMX controls and host-state area; IF invalid settings THEN VMfailValid(VM entry with invalid VMX-control field(s)) or VMfailValid(VM entry with invalid host-state field(s)) or VMfailValid(VM entry with invalid executive-VMCS pointer)) or VMfailValid(VM entry with non-launched executive VMCS) or VMfailValid(VM entry with executive-VMCS pointer not VMXON pointer) or VMfailValid(VM entry with invalid VM-execution control fields in executive VMCS) as appropriate; ELSE Attempt to load guest state and PDPTRs as appropriate; clear address-range monitoring; IF failure in checking guest state or PDPTRs THEN VM entry fails (see Section 26.7, in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C); ELSE Attempt to load MSRs from VM-entry MSR-load area; IF failure THEN VM entry fails (see Section 26.7, in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C); ELSE IF VMLAUNCH THEN launch state of VMCS launched; FI; IF in SMM and entry to SMM VM-entry control is 0 THEN IF deactivate dual-monitor treatment VM-entry control is 0 THEN SMM-transfer VMCS pointer current-VMCS pointer; FI; IF executive-VMCS pointer is VMX pointer THEN current-VMCS pointer VMCS-link pointer; ELSE current-VMCS pointer executive-VMCS pointer; FI; leave SMM; FI; VM entry succeeds;
254
FI; FI; FI; FI; Further details of the operation of the VM-entry appear in Chapter 26 of Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C.
Flags Affected
See the operation section and Section 30.2.
255
Description
Marks the current-VMCS pointer valid and loads it with the physical address in the instruction operand. The instruction fails if its operand is not properly aligned, sets unsupported physical-address bits, or is equal to the VMXON pointer. In addition, the instruction fails if the 32 bits in memory referenced by the operand do not match the VMCS revision identifier supported by this processor.1 The operand of this instruction is always 64 bits and is always in memory.
Operation
IF (register operand) or (not in VMX operation) or (CR0.PE = 0) or (RFLAGS.VM = 1) or (IA32_EFER.LMA = 1 and CS.L = 0) THEN #UD; ELSIF in VMX non-root operation THEN VMexit; ELSIF CPL > 0 THEN #GP(0); ELSE addr contents of 64-bit in-memory source operand; IF addr is not 4KB-aligned OR addr sets any bits beyond the physical-address width2 THEN VMfail(VMPTRLD with invalid physical address); ELSIF addr = VMXON pointer THEN VMfail(VMPTRLD with VMXON pointer); ELSE rev 32 bits located at physical address addr; IF rev[30:0] VMCS revision identifier supported by processor OR rev[31] = 1 AND processor does not support 1-setting of VMCS shadowing THEN VMfail(VMPTRLD with incorrect VMCS revision identifier); ELSE current-VMCS pointer addr; VMsucceed; FI; FI; FI;
Flags Affected
See the operation section and Section 30.2.
1. Software should consult the VMX capability MSR VMX_BASIC to discover the VMCS revision identifier supported by this processor (see Appendix A, VMX Capability Reporting Facility, in the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3C). 2. If IA32_VMX_BASIC[48] is read as 1, VMfail occurs if addr sets any bits in the range 63:32; see Appendix A.1.
256
257
Description
Reads a specified field from a VMCS and stores it into a specified destination operand (register or memory). In VMX root operation, the instruction reads from the current VMCS. If executed in VMX non-root operation, the instruction reads from the VMCS referenced by the VMCS link pointer field in the current VMCS. The VMCS field is specified by the VMCS-field encoding contained in the register source operand. Outside IA-32e mode, the source operand has 32 bits, regardless of the value of CS.D. In 64-bit mode, the source operand has 64 bits; however, if bits 63:32 of the source operand are not zero, VMREAD will fail due to an attempt to access an unsupported VMCS component (see operation section). The effective size of the destination operand, which may be a register or in memory, is always 32 bits outside IA32e mode (the setting of CS.D is ignored with respect to operand size) and 64 bits in 64-bit mode. If the VMCS field specified by the source operand is shorter than this effective operand size, the high bits of the destination operand are cleared to 0. If the VMCS field is longer, then the high bits of the field are not read. Note that any faults resulting from accessing a memory destination operand can occur only after determining, in the operation section below, that the relevant VMCS pointer is valid and that the specified VMCS field is supported.
Operation
IF (not in VMX operation) or (RFLAGS.VM = 1) or (IA32_EFER.LMA = 1 and CS.L = 0) THEN #UD; ELSIF in VMX non-root operation AND (VMCS shadowing is 0 OR source operand sets bits in range 63:15 OR VMREAD bit corresponding to bits 14:0 of source operand is 1)1 THEN VMexit; ELSIF CPL > 0 THEN #GP(0); ELSIF (in VMX root operation AND current-VMCS pointer is not valid) OR (in VMX non-root operation AND VMCS link pointer is not valid) THEN VMfailInvalid; ELSIF source operand does not correspond to any VMCS field THEN VMfailValid(VMREAD/VMWRITE from/to unsupported VMCS component); ELSE IF in VMX root operation THEN destination operand contents of field indexed by source operand in current VMCS; ELSE destination operand contents of field indexed by source operand in VMCS referenced by VMCS link pointer; FI; VMsucceed; FI;
1. The VMREAD bit for a source operand is defined as follows. Let x be the value of bits 14:0 of the source operand and let addr be the VMREAD-bitmap address. The corresponding VMREAD bit is in bit position x & 7 of the byte at physical address addr | (x 3).
258
Flags Affected
See the operation section and Section 30.2.
259
Description
Writes the contents of a primary source operand (register or memory) to a specified field in a VMCS. In VMX root operation, the instruction writes to the current VMCS. If executed in VMX non-root operation, the instruction writes to the VMCS referenced by the VMCS link pointer field in the current VMCS. The VMCS field is specified by the VMCS-field encoding contained in the register secondary source operand. Outside IA-32e mode, the secondary source operand is always 32 bits, regardless of the value of CS.D. In 64-bit mode, the secondary source operand has 64 bits; however, if bits 63:32 of the secondary source operand are not zero, VMWRITE will fail due to an attempt to access an unsupported VMCS component (see operation section). The effective size of the primary source operand, which may be a register or in memory, is always 32 bits outside IA-32e mode (the setting of CS.D is ignored with respect to operand size) and 64 bits in 64-bit mode. If the VMCS field specified by the secondary source operand is shorter than this effective operand size, the high bits of the primary source operand are ignored. If the VMCS field is longer, then the high bits of the field are cleared to 0. Note that any faults resulting from accessing a memory source operand occur after determining, in the operation section below, that the relevant VMCS pointer is valid but before determining if the destination VMCS field is supported.
Operation
IF (not in VMX operation) or (CR0.PE = 0) or (RFLAGS.VM = 1) or (IA32_EFER.LMA = 1 and CS.L = 0) THEN #UD; ELSIF in VMX non-root operation AND (VMCS shadowing is 0 OR secondary source operand sets bits in range 63:15 OR VMWRITE bit corresponding to bits 14:0 of secondary source operand is 1)1 THEN VMexit; ELSIF CPL > 0 THEN #GP(0); ELSIF (in VMX root operation AND current-VMCS pointer is not valid) OR (in VMX non-root operation AND VMCS-link pointer is not valid) THEN VMfailInvalid; ELSIF secondary source operand does not correspond to any VMCS field THEN VMfailValid(VMREAD/VMWRITE from/to unsupported VMCS component); ELSIF VMCS field indexed by secondary source operand is a VM-exit information field AND processor does not support writing to such fields2 THEN VMfailValid(VMWRITE to read-only VMCS component); ELSE IF in VMX root operation THEN field indexed by secondary source operand in current VMCS primary source operand; 1. The VMWRITE bit for a secondary source operand is defined as follows. Let x be the value of bits 14:0 of the secondary source operand and let addr be the VMWRITE-bitmap address. The corresponding VMWRITE bit is in bit position x & 7 of the byte at physical address addr | (x 3). 2. Software can discover whether these fields can be written by reading the VMX capability MSR IA32_VMX_MISC (see Appendix A.6).
260
THEN field indexed by secondary source operand in VMCS referenced by VMCS link pointer primary source operand; FI; VMsucceed; FI;
Flags Affected
See the operation section and Section 30.2. ...
Description
Puts the logical processor in VMX operation with no current VMCS, blocks INIT signals, disables A20M, and clears any address-range monitoring established by the MONITOR instruction.1 The operand of this instruction is a 4KB-aligned physical address (the VMXON pointer) that references the VMXON region, which the logical processor may use to support VMX operation. This operand is always 64 bits and is always in memory.
Operation
IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or (RFLAGS.VM = 1) or (IA32_EFER.LMA = 1 and CS.L = 0) THEN #UD; ELSIF not in VMX operation THEN IF (CPL > 0) or (in A20M mode) or (the values of CR0 and CR4 are not supported in VMX operation2) or (bit 0 (lock bit) of IA32_FEATURE_CONTROL MSR is clear) or (in SMX operation3 and bit 1 of IA32_FEATURE_CONTROL MSR is clear) or (outside SMX operation and bit 2 of IA32_FEATURE_CONTROL MSR is clear) THEN #GP(0); ELSE addr contents of 64-bit in-memory source operand; IF addr is not 4KB-aligned or addr sets any bits beyond the physical-address width4 THEN VMfailInvalid; ELSE 1. See the information on MONITOR/MWAIT in Chapter 8, Multiple-Processor Management, of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A. 2. See Section 19.8 of the Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3B. 3. A logical processor is in SMX operation if GETSEC[SEXIT] has not been executed since the last execution of GETSEC[SENTER]. A logical processor is outside SMX operation if GETSEC[SENTER] has not been executed or if GETSEC[SEXIT] was executed after the last execution of GETSEC[SENTER]. See Chapter 6, Safer Mode Extensions Reference. 4. If IA32_VMX_BASIC[48] is read as 1, VMfailInvalid occurs if addr sets any bits in the range 63:32; see Appendix A.1.
261
rev 32 bits located at physical address addr; IF rev[30:0] VMCS revision identifier supported by processor OR rev[31] = 1 THEN VMfailInvalid; ELSE current-VMCS pointer FFFFFFFF_FFFFFFFFH; enter VMX operation; block INIT signals; block and disable A20M; clear address-range monitoring; VMsucceed; FI; FI; FI; ELSIF in VMX non-root operation THEN VMexit; ELSIF CPL > 0 THEN #GP(0); ELSE VMfail(VMXON executed in VMX root operation); FI;
Flags Affected
See the operation section and Section 30.2.
262
If executed in A20M mode. If the source operand is in the CS, DS, ES, FS, or GS segments and the memory address is in a non-canonical form. #PF(fault-code) #SS(0) #UD If a page fault occurs in accessing the memory source operand. If the source operand is in the SS segment and the memory address is in a non-canonical form. If operand is a register. If executed with CR4.VMXE = 0. ...
31.5
VMMs need to ensure that the processor is running in protected mode with paging before entering VMX operation. The following list describes the minimal steps required to enter VMX root operation with a VMM running at CPL = 0. Check VMX support in processor using CPUID. Determine the VMX capabilities supported by the processor through the VMX capability MSRs. See Section 31.5.1 and Appendix A. Create a VMXON region in non-pageable memory of a size specified by IA32_VMX_BASIC MSR and aligned to a 4-KByte boundary. Software should read the capability MSRs to determine width of the physical addresses that may be used for the VMXON region and ensure the entire VMXON region can be addressed by addresses with that width. Also, software must ensure that the VMXON region is hosted in cache-coherent memory. Initialize the version identifier in the VMXON region (the first 31 bits) with the VMCS revision identifier reported by capability MSRs. Clear bit 31 of the first 4 bytes of the VMXON region. Ensure the current processor operating mode meets the required CR0 fixed bits (CR0.PE = 1, CR0.PG = 1). Other required CR0 fixed bits can be detected through the IA32_VMX_CR0_FIXED0 and IA32_VMX_CR0_FIXED1 MSRs. Enable VMX operation by setting CR4.VMXE = 1. Ensure the resultant CR4 value supports all the CR4 fixed bits reported in the IA32_VMX_CR4_FIXED0 and IA32_VMX_CR4_FIXED1 MSRs. Ensure that the IA32_FEATURE_CONTROL MSR (MSR index 3AH) has been properly programmed and that its lock bit is set (Bit 0 = 1). This MSR is generally configured by the BIOS using WRMSR. Execute VMXON with the physical address of the VMXON region as the operand. Check successful execution of VMXON by checking if RFLAGS.CF = 0.
Upon successful execution of the steps above, the processor is in VMX root operation. A VMM executing in VMX root operation and CPL = 0 leaves VMX operation by executing VMXOFF and verifies successful execution by checking if RFLAGS.CF = 0 and RFLAGS.ZF = 0. If an SMM monitor has been configured to service SMIs while in VMX operation (see Section 34.15), the SMM monitor needs to be torn down before the executive monitor can leave VMX operation (see Section 34.15.7).
263
VMXOFF fails for the executive monitor (a VMM that entered VMX operation by way of issuing VMXON) if SMM monitor is configured. ...
31.6
The following list describes the minimal steps required by the VMM to set up and launch a guest VM. Create a VMCS region in non-pageable memory of size specified by the VMX capability MSR IA32_VMX_BASIC and aligned to 4-KBytes. Software should read the capability MSRs to determine width of the physical addresses that may be used for a VMCS region and ensure the entire VMCS region can be addressed by addresses with that width. The term guest-VMCS address refers to the physical address of the new VMCS region for the following steps. Initialize the version identifier in the VMCS (first 31 bits) with the VMCS revision identifier reported by the VMX capability MSR IA32_VMX_BASIC. Clear bit 31 of the first 4 bytes of the VMCS region. Execute the VMCLEAR instruction by supplying the guest-VMCS address. This will initialize the new VMCS region in memory and set the launch state of the VMCS to clear. This action also invalidates the workingVMCS pointer register to FFFFFFFF_FFFFFFFFH. Software should verify successful execution of VMCLEAR by checking if RFLAGS.CF = 0 and RFLAGS.ZF = 0. Execute the VMPTRLD instruction by supplying the guest-VMCS address. This initializes the working-VMCS pointer with the new VMCS regions physical address. Issue a sequence of VMWRITEs to initialize various host-state area fields in the working VMCS. The initialization sets up the context and entry-points to the VMM upon subsequent VM exits from the guest. Host-state fields include control registers (CR0, CR3 and CR4), selector fields for the segment registers (CS, SS, DS, ES, FS, GS and TR), and base-address fields (for FS, GS, TR, GDTR and IDTR; RSP, RIP and the MSRs that control fast system calls). Chapter 25 describes the host-state consistency checking done by the processor for VM entries. The VMM is required to set up host-state that comply with these consistency checks. For example, VMX requires the hostarea to have a task register (TR) selector with TI and RPL fields set to 0 and pointing to a valid TSS. Use VMWRITEs to set up the various VM-exit control fields, VM-entry control fields, and VM-execution control fields in the VMCS. Care should be taken to make sure the settings of individual fields match the allowed 0 and 1 settings for the respective controls as reported by the VMX capability MSRs (see Appendix A). Any settings inconsistent with the settings reported by the capability MSRs will cause VM entries to fail. Use VMWRITE to initialize various guest-state area fields in the working VMCS. This sets up the context and entry-point for guest execution upon VM entry. Chapter 25 describes the guest-state loading and checking done by the processor for VM entries to protected and virtual-8086 guest execution. The VMM is required to set up guest-state that complies with these consistency checks: If the VMM design requires the initial VM launch to cause guest software (typically the guest virtual BIOS) execution from the guests reset vector, it may need to initialize the guest execution state to reflect the state of a physical processor at power-on reset (described in Chapter 9, Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A). The VMM may need to initialize additional guest execution state that is not captured in the VMCS gueststate area by loading them directly on the respective processor registers. Examples include general purpose registers, the CR2 control register, debug registers, floating point registers and so forth. VMM may support lazy loading of FPU, MMX, SSE, and SSE2 states with CR0.TS = 1 (described in Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3A). Execute VMLAUNCH to launch the guest VM. If VMLAUNCH fails due to any consistency checks before gueststate loading, RFLAGS.CF or RFLAGS.ZF will be set and the VM-instruction error field (see Section 24.9.5) will
264
contain the error-code. If guest-state consistency checks fail upon guest-state loading, the processor loads state from the host-state area as if a VM exit had occurred (see Section 31.6). VMLAUNCH updates the controlling-VMCS pointer with the working-VMCS pointer and saves the old value of controlling-VMCS as the parent pointer. In addition, the launch state of the guest VMCS is changed to launched from clear. Any programmed exit conditions will cause the guest to VM exit to the VMM. The VMM should execute VMRESUME instruction for subsequent VM entries to guests in a launched state. ...
31.9.4
A 32-bit guest can be launched by either IA-32e-mode hosts or non-IA-32e-mode hosts. A 64-bit guests can only be launched by a IA-32e-mode host. In addition to the steps outlined in Section 31.6, VMM writers need to: Set the IA-32e-mode guest VM-entry control to 1 in the VMCS to assure VM-entry (VMLAUNCH or VMRESUME) will establish a 64-bit (or 32-bit compatible) guest operating environment. Enable paging (CR0.PG) and PAE mode (CR4.PAE) to assure VM-entry to a 64-bit guest will succeed. Ensure that the host to be in IA-32e mode (the IA32_EFER.LMA must be set to 1) and the setting of the VMexit host address-space size control bit in the VMCS must also be set to 1.
If each of the above conditions holds true, then VM-entry will copy the value of the VM-entry IA-32e-mode guest control bit into the guests IA32_EFER.LME bit, which will result in subsequent activation of IA-32e mode. If any of the above conditions is false, the VM-entry will fail and load state from the host-state area of the working VMCS as if a VM exit had occurred (see Section 26.7). The following VMCS controls determine the value of IA32_EFER on a VM entry: the IA-32e-mode guest VMentry control (described above), the load IA32_EFER VM-entry control, the VM-entry MSR-load count, and the VM-entry MSR-load address (see Section 26.4). If the load IA32_EFER VM-entry control is 1, the value of the LME and LMA bits in the IA32_EFER field in the guest-state area must be the value of the IA-32e-mode guest VM-entry control. Otherwise, the VM entry fails. The loading of IA32_EFER.LME bit (described above) precedes any loading of the IA32_EFER MSR from the VMentry MSR-load area of the VMCS. If loading of IA32_EFER is specified in the VM-entry MSR-load area, the value of the LME bit in the load image should be match the setting of the IA-32e-mode guest VM-entry control. Otherwise, the attempt to modify the LME bit (while paging is enabled) results in a failed VM entry. However, IA32_EFER.LMA is always set by the processor to equal IA32_EFER.LME & CR0.PG; the value specified for LMA in the load image of the IA32_EFER MSR is ignored. For these and performance reasons, VMM writers may choose to not use the VM-exit/entry MSR-load/save areas for IA32_EFER MSR. Note that the VMM can control the processors architectural state when transferring control to a VM. VMM writers may choose to launch guests in protected mode and subsequently allow the guest to activate IA-32e mode or they may allow guests to toggle in and out of IA-32e mode. In this case, the VMM should require VM exit on accesses to the IA32_EFER MSR to detect changes in the operating mode and modify the VM-entry IA-32e-mode guest control accordingly. A VMM should save/restore the extended (full 64-bit) contents of the guest general-purpose registers, the new general-purpose registers (R8-R15) and the SIMD registers introduced in 64-bit mode should it need to modify these upon VM exit. ...
265
-----------------------------------------------------------------------------------------...
34.1
SMM is a special-purpose operating mode provided for handling system-wide functions like power management, system hardware control, or proprietary OEM-designed code. It is intended for use only by system firmware, not by applications software or general-purpose systems software. The main benefit of SMM is that it offers a distinct and easily isolated processor environment that operates transparently to the operating system or executive and software applications. When SMM is invoked through a system management interrupt (SMI), the processor saves the current state of the processor (the processors context), then switches to a separate operating environment defined by a new address space. The system management software executive (SMI handler) starts execution in that environment, and the critical code and data of the SMI handler reside in a physical memory region (SMRAM) within that address space. While in SMM, the processor executes SMI handler code to perform operations such as powering down unused disk drives or monitors, executing proprietary code, or placing the whole system in a suspended state. When the SMI handler has completed its operations, it executes a resume (RSM) instruction. This instruction causes the processor to reload the saved context of the processor, switch back to protected or real mode, and resume executing the interrupted application or operating-system program or task. The following SMM mechanisms make it transparent to applications programs and operating systems: The only way to enter SMM is by means of an SMI. The processor executes SMM code in a separate address space that can be made inaccessible from the other operating modes. Upon entering SMM, the processor saves the context of the interrupted program or task. All interrupts normally handled by the operating system are disabled upon entry into SMM. The RSM instruction can be executed only in SMM.
Section 34.3 describes transitions into and out of SMM. The execution environment after entering SMM is in realaddress mode with paging disabled (CR0.PE = CR0.PG = 0). In this initial execution environment, the SMI handler can address up to 4 GBytes of memory and can execute all I/O and system instructions. Section 34.5 describes in detail the initial SMM execution environment for an SMI handler and operation within that environment. The SMI handler may subsequently switch to other operating modes while remaining in SMM.
NOTES
Software developers should be aware that, even if a logical processor was using the physicaladdress extension (PAE) mechanism (introduced in the P6 family processors) or was in IA-32e mode before an SMI, this will not be the case after the SMI is delivered. This is because delivery of an SMI disables paging (see Table 34-4). (This does not apply if the dual-monitor treatment of SMIs and SMM is active; see Section 34.15.) ...
34.3.1
Entering SMM
The processor always handles an SMI on an architecturally defined interruptible point in program execution (which is commonly at an IA-32 architecture instruction boundary). When the processor receives an SMI, it waits for all instructions to retire and for all stores to complete. The processor then saves its current context in SMRAM (see Section 34.4), enters SMM, and begins to execute the SMI handler.
266
Upon entering SMM, the processor signals external hardware that SMI handling has begun. The signaling mechanism used is implementation dependent. For the P6 family processors, an SMI acknowledge transaction is generated on the system bus and the multiplexed status signal EXF4 is asserted each time a bus transaction is generated while the processor is in SMM. For the Pentium and Intel486 processors, the SMIACT# pin is asserted. An SMI has a greater priority than debug exceptions and external interrupts. Thus, if an NMI, maskable hardware interrupt, or a debug exception occurs at an instruction boundary along with an SMI, only the SMI is handled. Subsequent SMI requests are not acknowledged while the processor is in SMM. The first SMI interrupt request that occurs while the processor is in SMM (that is, after SMM has been acknowledged to external hardware) is latched and serviced when the processor exits SMM with the RSM instruction. The processor will latch only one SMI while in SMM. See Section 34.5 for a detailed description of the execution environment when in SMM.
34.3.2
The only way to exit SMM is to execute the RSM instruction. The RSM instruction is only available to the SMI handler; if the processor is not in SMM, attempts to execute the RSM instruction result in an invalid-opcode exception (#UD) being generated. The RSM instruction restores the processors context by loading the state save image from SMRAM back into the processors registers. The processor then returns an SMIACK transaction on the system bus and returns program control back to the interrupted program. Upon successful completion of the RSM instruction, the processor signals external hardware that SMM has been exited. For the P6 family processors, an SMI acknowledge transaction is generated on the system bus and the multiplexed status signal EXF4 is no longer generated on bus cycles. For the Pentium and Intel486 processors, the SMIACT# pin is deserted. If the processor detects invalid state information saved in the SMRAM, it enters the shutdown state and generates a special bus cycle to indicate it has entered shutdown state. Shutdown happens only in the following situations: A reserved bit in control register CR4 is set to 1 on a write to CR4. This error should not happen unless SMI handler code modifies reserved areas of the SMRAM saved state map (see Section 34.4.1). CR4 is saved in the state map in a reserved location and cannot be read or modified in its saved state. An illegal combination of bits is written to control register CR0, in particular PG set to 1 and PE set to 0, or NW set to 1 and CD set to 0. CR4.PCIDE would be set to 1 and IA32_EFER.LMA to 0. (For the Pentium and Intel486 processors only.) If the address stored in the SMBASE register when an RSM instruction is executed is not aligned on a 32-KByte boundary. This restriction does not apply to the P6 family processors.
In the shutdown state, Intel processors stop executing instructions until a RESET#, INIT# or NMI# is asserted. While Pentium family processors recognize the SMI# signal in shutdown state, P6 family and Intel486 processors do not. Intel does not support using SMI# to recover from shutdown states for any processor family; the response of processors in this circumstance is not well defined. On Pentium 4 and later processors, shutdown will inhibit INTR and A20M but will not change any of the other inhibits. On these processors, NMIs will be inhibited if no action is taken in the SMI handler to uninhibit them (see Section 34.8). If the processor is in the HALT state when the SMI is received, the processor handles the return from SMM slightly differently (see Section 34.10). Also, the SMBASE address can be changed on a return from SMM (see Section 34.11).
267
34.4
SMRAM
Upon entering SMM, the processor switches to a new address space. Because paging is disabled upon entering SMM, this initial address space maps all memory accesses to the low 4 GBytes of the processor's physical address space. The SMI handler's critical code and data reside in a memory region referred to as system-management RAM (SMRAM). The processor uses a pre-defined region within SMRAM to save the processor's pre-SMI context. SMRAM can also be used to store system management information (such as the system configuration and specific information about powered-down devices) and OEM-specific information. The default SMRAM size is 64 KBytes beginning at a base physical address in physical memory called the SMBASE (see Figure 34-1). The SMBASE default value following a hardware reset is 30000H. The processor looks for the first instruction of the SMI handler at the address [SMBASE + 8000H]. It stores the processors state in the area from [SMBASE + FE00H] to [SMBASE + FFFFH]. See Section 34.4.1 for a description of the mapping of the state save area. The system logic is minimally required to decode the physical address range for the SMRAM from [SMBASE + 8000H] to [SMBASE + FFFFH]. A larger area can be decoded if needed. The size of this SMRAM can be between 32 KBytes and 4 GBytes. The location of the SMRAM can be changed by changing the SMBASE value (see Section 34.11). It should be noted that all processors in a multiple-processor system are initialized with the same SMBASE value (30000H). Initialization software must sequentially place each processor in SMM and change its SMBASE so that it does not overlap those of other processors. The actual physical location of the SMRAM can be in system memory or in a separate RAM memory. The processor generates an SMI acknowledge transaction (P6 family processors) or asserts the SMIACT# pin (Pentium and Intel486 processors) when the processor receives an SMI (see Section 34.3.1). System logic can use the SMI acknowledge transaction or the assertion of the SMIACT# pin to decode accesses to the SMRAM and redirect them (if desired) to specific SMRAM memory. If a separate RAM memory is used for SMRAM, system logic should provide a programmable method of mapping the SMRAM into system memory space when the processor is not in SMM. This mechanism will enable start-up procedures to initialize the SMRAM space (that is, load the SMI handler) before executing the SMI handler during SMM. ...
34.5
Section 34.5.1 describes the initial execution environment for an SMI handler. An SMI handler may re-configure its execution environment to other supported operating modes. Section 34.5.2 discusses modifications an SMI handler can make to its execution environment.
34.5.1
After saving the current context of the processor, the processor initializes its core registers to the values shown in Table 34-4. Upon entering SMM, the PE and PG flags in control register CR0 are cleared, which places the processor in an environment similar to real-address mode. The differences between the SMM execution environment and the real-address mode execution environment are as follows: The addressable address space ranges from 0 to FFFFFFFFH (4 GBytes). The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes. The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-address-mode code. However, operand-size and address-size override prefixes can be used to access the address space beyond the 1-MByte.
268
The value in segment register CS is automatically set to the default of 30000H for the SMBASE shifted 4 bits to the right; that is, 3000H. The EIP register is set to 8000H. When the EIP value is added to shifted CS value (the SMBASE), the resulting linear address points to the first instruction of the SMI handler. The other segment registers (DS, SS, ES, FS, and GS) are cleared to 0 and their segment limits are set to 4 GBytes. In this state, the SMRAM address space may be treated as a single flat 4-GByte linear address space. If a segment register is loaded with a 16-bit value, that value is then shifted left by 4 bits and loaded into the segment base (hidden part of the segment register). The limits and attributes are not modified. Maskable hardware interrupts, exceptions, NMI interrupts, SMI interrupts, A20M interrupts, single-step traps, breakpoint traps, and INIT operations are inhibited when the processor enters SMM. Maskable hardware interrupts, exceptions, single-step traps, and breakpoint traps can be enabled in SMM if the SMM execution environment provides and initializes an interrupt table and the necessary interrupt and exception handlers (see Section 34.6).
34.5.2
Within SMM, an SMI handler may change the processor's operating mode (e.g., to enable PAE paging, enter 64bit mode, etc.) after it has made proper preparation and initialization to do so. For example, if switching to 32-bit protected mode, the SMI handler should follow the guidelines provided in Chapter 9, Processor Management and Initialization. If the SMI handler does wish to change operating mode, it is responsible for executing the appropriate mode-transition code after each SMI.
269
It is recommended that the SMI handler make use of all means available to protect the integrity of its critical code and data. In particular, it should use the system-management range register (SMRR) interface if it is available (see Section 11.11.2.4). The SMRR interface can protect only the first 4 GBytes of the physical address space. The SMI handler should take that fact into account if it uses operating modes that allow access to physical addresses beyond that 4-GByte limit (e.g. PAE paging or 64-bit mode). Execution of the RSM instruction restores the pre-SMI processor state from the SMRAM state-state map (see Section 34.4.1) into which it was stored when the processor entered SMM. (The SMBASE field in the SMRAM state-save map does not determine the state following RSM but rather the initial environment following the next entry to SMM.) Any required change to operating mode is performed by the RSM instruction; there is no need for the SMI handler to change modes explicitly prior to executing RSM.
34.6
When the processor enters SMM, all hardware interrupts are disabled in the following manner: The IF flag in the EFLAGS register is cleared, which inhibits maskable hardware interrupts from being generated. The TF flag in the EFLAGS register is cleared, which disables single-step traps. Debug register DR7 is cleared, which disables breakpoint traps. (This action prevents a debugger from accidentally breaking into an SMI handler if a debug breakpoint is set in normal address space that overlays code or data in SMRAM.) NMI, SMI, and A20M interrupts are blocked by internal SMM logic. (See Section 34.8 for more information about how NMIs are handled in SMM.)
Software-invoked interrupts and exceptions can still occur, and maskable hardware interrupts can be enabled by setting the IF flag. Intel recommends that SMM code be written in so that it does not invoke software interrupts (with the INT n, INTO, INT 3, or BOUND instructions) or generate exceptions. If the SMI handler requires interrupt and exception handling, an SMM interrupt table and the necessary exception and interrupt handlers must be created and initialized from within SMM. Until the interrupt table is correctly initialized (using the LIDT instruction), exceptions and software interrupts will result in unpredictable processor behavior. The following restrictions apply when designing SMM interrupt and exception-handling facilities: The interrupt table should be located at linear address 0 and must contain real-address mode style interrupt vectors (4 bytes containing CS and IP). Due to the real-address mode style of base address formation, an interrupt or exception cannot transfer control to a segment with a base address of more that 20 bits. An interrupt or exception cannot transfer control to a segment offset of more than 16 bits (64 KBytes). When an exception or interrupt occurs, only the 16 least-significant bits of the return address (EIP) are pushed onto the stack. If the offset of the interrupted procedure is greater than 64 KBytes, it is not possible for the interrupt/exception handler to return control to that procedure. (One solution to this problem is for a handler to adjust the return address on the stack.) The SMBASE relocation feature affects the way the processor will return from an interrupt or exception generated while the SMI handler is executing. For example, if the SMBASE is relocated to above 1 MByte, but the exception handlers are below 1 MByte, a normal return to the SMI handler is not possible. One solution is to provide the exception handler with a mechanism for calculating a return address above 1 MByte from the 16-bit return address on the stack, then use a 32-bit far call to return to the interrupted procedure. If an SMI handler needs access to the debug trap facilities, it must insure that an SMM accessible debug handler is available and save the current contents of debug registers DR0 through DR3 (for later restoration). Debug registers DR0 through DR3 and DR7 must then be initialized with the appropriate values.
270
If an SMI handler needs access to the single-step mechanism, it must insure that an SMM accessible singlestep handler is available, and then set the TF flag in the EFLAGS register. If the SMI design requires the processor to respond to maskable hardware interrupts or software-generated interrupts while in SMM, it must ensure that SMM accessible interrupt handlers are available and then set the IF flag in the EFLAGS register (using the STI instruction). Software interrupts are not blocked upon entry to SMM, so they do not need to be enabled.
...
34.8
NMI interrupts are blocked upon entry to the SMI handler. If an NMI request occurs during the SMI handler, it is latched and serviced after the processor exits SMM. Only one NMI request will be latched during the SMI handler. If an NMI request is pending when the processor executes the RSM instruction, the NMI is serviced before the next instruction of the interrupted code sequence. This assumes that NMIs were not blocked before the SMI occurred. If NMIs were blocked before the SMI occurred, they are blocked after execution of RSM. Although NMI requests are blocked when the processor enters SMM, they may be enabled through software by executing an IRET instruction. If the SMI handler requires the use of NMI interrupts, it should invoke a dummy interrupt service routine for the purpose of executing an IRET instruction. Once an IRET instruction is executed, NMI interrupt requests are serviced in the same real mode manner in which they are handled outside of SMM. A special case can occur if an SMI handler nests inside an NMI handler and then another NMI occurs. During NMI interrupt handling, NMI interrupts are disabled, so normally NMI interrupts are serviced and completed with an IRET instruction one at a time. When the processor enters SMM while executing an NMI handler, the processor saves the SMRAM state save map but does not save the attribute to keep NMI interrupts disabled. Potentially, an NMI could be latched (while in SMM or upon exit) and serviced upon exit of SMM even though the previous NMI handler has still not completed. One or more NMIs could thus be nested inside the first NMI handler. The NMI interrupt handler should take this possibility into consideration. Also, for the Pentium processor, exceptions that invoke a trap or fault handler will enable NMI interrupts from inside of SMM. This behavior is implementation specific for the Pentium processor and is not part of the IA-32 architecture. ...
34.11
SMBASE RELOCATION
The default base address for the SMRAM is 30000H. This value is contained in an internal processor register called the SMBASE register. The operating system or executive can relocate the SMRAM by setting the SMBASE field in the saved state map (at offset 7EF8H) to a new value (see Figure 34-4). The RSM instruction reloads the internal SMBASE register with the value in the SMBASE field each time it exits SMM. All subsequent SMI requests will use the new SMBASE value to find the starting address for the SMI handler (at SMBASE + 8000H) and the SMRAM state save area (from SMBASE + FE00H to SMBASE + FFFFH). (The processor resets the value in its internal SMBASE register to 30000H on a RESET, but does not change it on an INIT.)
31 SMM Base
271
In multiple-processor systems, initialization software must adjust the SMBASE value for each processor so that the SMRAM state save areas for each processor do not overlap. (For Pentium and Intel486 processors, the SMBASE values must be aligned on a 32-KByte boundary or the processor will enter shutdown state during the execution of a RSM instruction.) If the SMBASE relocation flag in the SMM revision identifier field is set, it indicates the ability to relocate the SMBASE (see Section 34.9).
34.12
If the I/O instruction restart flag in the SMM revision identifier field is set (see Section 34.9), the I/O instruction restart mechanism is present on the processor. This mechanism allows an interrupted I/O instruction to be reexecuted upon returning from SMM mode. For example, if an I/O instruction is used to access a powered-down I/ O device, a chip set supporting this device can intercept the access and respond by asserting SMI#. This action invokes the SMI handler to power-up the device. Upon returning from the SMI handler, the I/O instruction restart mechanism can be used to re-execute the I/O instruction that caused the SMI. The I/O instruction restart field (at offset 7F00H in the SMM state-save area, see Figure 34-5) controls I/O instruction restart. When an RSM instruction is executed, if this field contains the value FFH, then the EIP register is modified to point to the I/O instruction that received the SMI request. The processor will then automatically reexecute the I/O instruction that the SMI trapped. (The processor saves the necessary machine state to insure that re-execution of the instruction is handled coherently.) ...
The checks above are performed before the checks described in Section 34.15.4.2 and before any of the following checks: 'If the deactivate dual-monitor treatment VM-entry control is 0 and the executive-VMCS pointer field does not contain the VMXON pointer, the launch state of the executive VMCS (the VMCS referenced by the executive-VMCS pointer field) must be launched (see Section 24.11.3). If the deactivate dual-monitor treatment VM-entry control is 1, the executive-VMCS pointer field must contain the VMXON pointer (see Section 34.15.7).3
...
1. Software can determine a processors physical-address width by executing CPUID with 80000008H in EAX. The physical-address width is returned in bits 7:0 of EAX. 2. If IA32_VMX_BASIC[48] is read as 1, this pointer must not set any bits in the range 63:32; see Appendix A.1. 3. The STM can determine the VMXON pointer by reading the executive-VMCS pointer field in the current VMCS after the SMM VM exit that activates the dual-monitor treatment.
272
The 32 bytes located at the MSEG base address are called the MSEG header. The format of the MSEG header is given in Table 34-10 (each field is 32 bits).
1. Software should consult the VMX capability MSR IA32_VMX_BASIC (see Appendix A.1) to determine whether the dual-monitor treatment is supported.
273
To ensure proper behavior in VMX operation, software should maintain the MSEG header in writeback cacheable memory. Future implementations may allow or require a different memory type.1 Software should consult the VMX capability MSR IA32_VMX_BASIC (see Appendix A.1). SMM code should enable the dual-monitor treatment (by setting the valid bit in IA32_SMM_MONITOR_CTL MSR) only after establishing the content of the MSEG header as follows: Bytes 3:0 contain the MSEG revision identifier. Different processors may use different MSEG revision identifiers. These identifiers enable software to avoid using an MSEG header formatted for one processor on a processor that uses a different format. Software can discover the MSEG revision identifier that a processor uses by reading the VMX capability MSR IA32_VMX_MISC (see Appendix A.6). Bytes 7:4 contain the SMM-transfer monitor features field. Bits 31:1 of this field are reserved and must be zero. Bit 0 of the field is the IA-32e mode SMM feature bit. It indicates whether the logical processor will be in IA-32e mode after the STM is activated (see Section 34.15.6). Bytes 31:8 contain fields that determine how processor state is loaded when the STM is activated (see Section 34.15.6.6). SMM code should establish these fields so that activating of the STM invokes the STMs initialization code.
...
274
2. The logical processor reads the SMM-transfer monitor features field: Bit 0 of the field is the IA-32e mode SMM feature bit, and it indicates whether the logical processor will be in IA-32e mode after the SMM-transfer monitor (STM) is activated. If the VMCALL is executed on a processor that does not support Intel 64 architecture, the IA-32e mode SMM feature bit must be 0. If the VMCALL is executed in 64-bit mode, the IA-32e mode SMM feature bit must be 1.
Bits 31:1 of this field are currently reserved and must be zero. If any of these checks fail, subsequent checks are skipped and the VMCALL fails. ...
...
34.16
On processors that support processor extended states using XSAVE/XRSTOR (see Chapter 13, System Programming for Instruction Set Extensions and Processor Extended States), the processor does not save any XSAVE/ XRSTOR related state on an SMI. It is the responsibility of the SMI handler code to properly preserve the state information (including CR4.OSXSAVE, XCR0, and possibly processor extended states using XSAVE/XRSTOR). Therefore, the SMI handler must follow the rules described in Chapter 13. ...
275
06_0EH 06_0DH 06_36H 06_1CH, 06_26H, 06_27H, 06_35, 06_36 0F_06H 0F_03H, 0F_04H 06_09H 0F_02H 0F_0H, 0F_01H 06_7H, 06_08H, 06_0AH, 06_0BH 06_03H, 06_05H 06_01H 05_01H, 05_02H, 05_04H
...
276
35.1
ARCHITECTURAL MSRS
Many MSRs have carried over from one generation of IA-32 processors to the next and to Intel 64 processors. A subset of MSRs and associated bit fields, which do not change on future processor generations, are now considered architectural MSRs. For historical reasons (beginning with the Pentium 4 processor), these architectural MSRs were given the prefix IA32_. Table 35-2 lists the architectural MSRs, their addresses, their current names, their names in previous IA-32 processors, and bit fields that are considered architectural. MSR addresses outside Table 35-2 and certain bitfields in an MSR address that may overlap with architectural MSR addresses are model-specific. Code that accesses a machine specified MSR and that is executed on a processor that does not support that MSR will generate an exception. Architectural MSR or individual bit fields in an architectural MSR may be introduced or transitioned at the granularity of certain processor family/model or the presence of certain CPUID feature flags. The right-most column of Table 35-2 provides information on the introduction of each architectural MSR or its individual fields. This information is expressed either as signature values of DF_DM (see Table 35-1) or via CPUID flags. Certain bit field position may be related to the maximum physical address width, the value of which is expressed as MAXPHYWID in Table 35-2. MAXPHYWID is reported by CPUID.8000_0008H leaf. MSR address range between 40000000H - 400000FFH is marked as a specially reserved range. All existing and future processors will not implement any features using any MSR in this range.
MSR/Bit Description See Section 35.15, MSRs in Pentium Processors. See Section 35.15, MSRs in Pentium Processors. See Section 8.10.5, Monitor/Mwait Address Range Determination. See Section 17.13, Time-Stamp Counter. Platform ID (RO) The operating system can use this MSR to determine slot information for the processor and the proper microcode update to load. Reserved.
49:0
277
MSR/Bit Description Platform Id (RO) Contains information concerning the intended platform for the processor. 52 0 0 0 0 1 1 1 1 51 0 0 1 1 0 0 1 1 50 0 1 0 1 0 1 0 1 Processor Flag 0 Processor Flag 1 Processor Flag 2 Processor Flag 3 Processor Flag 4 Processor Flag 5 Processor Flag 6 Processor Flag 7
63:53 1BH 27 IA32_APIC_BASE (APIC_BASE) 7:0 8 9 10 11 (MAXPHYWID - 1):12 63: MAXPHYWID 3AH 58 IA32_FEATURE_CONTROL
Reserved. 06_01H Reserved BSP flag (R/W) Reserved Enable x2APIC mode APIC Global Enable (R/W) APIC Base (R/W) Reserved Control Features in Intel 64 Processor (R/W) Lock bit (R/WO): (1 = locked). When set, locks this MSR from being written, writes to this bit will result in GP(0). Note: Once the Lock bit is set, the contents of this register cannot be modified. Therefore the lock bit must be set after configuring support for Intel Virtualization Technology and prior to transferring control to an option ROM or the OS. Hence, once the Lock bit is set, the entire IA32_FEATURE_CONTROL_MSR contents are preserved across RESET when PWRGOOD is not deasserted. If CPUID.01H: ECX[bit 5 or bit 6] = 1 If CPUID.01H:ECX[bit 5 or bit 6] = 1 06_1AH
278
MSR/Bit Description Enable VMX inside SMX operation (R/WL): This bit enables a system executive to use VMX in conjunction with SMX to support Intel Trusted Execution Technology. BIOS must set this bit only when the CPUID function 1 returns VMX feature flag and SMX feature flag set (ECX bits 5 and 6 respectively).
Enable VMX outside SMX operation (R/WL): This bit enables VMX for system executive that do not require SMX. BIOS must set this bit only when the CPUID function 1 returns VMX feature flag set (ECX bit 5).
If CPUID.01H:ECX[bit 5 or bit 6] = 1
7:3 14:8
Reserved SENTER Local Function Enables (R/WL): When set, each bit in the field represents an enable control for a corresponding SENTER function. This bit is supported only if CPUID.1:ECX.[bit 6] is set SENTER Global Enable (R/WL): This bit must be set to enable SENTER leaf functions. This bit is supported only if CPUID.1:ECX.[bit 6] is set Reserved Per Logical Processor TSC Adjust (R/Write to clear) THREAD_ADJUST: Local offset value of the IA32_TSC for a logical processor. Reset value is Zero. A write to IA32_TSC will modify the local offset in IA32_TSC_ADJUST and the content of IA32_TSC, but does not affect the internal invariant TSC hardware. If CPUID.(EAX=07H, ECX=0H): EBX[1] = 1 If CPUID.01H:ECX[bit 6] = 1
15
If CPUID.01H:ECX[bit 6] = 1
79H
121
IA32_BIOS_UPDT_TRIG (BIOS_UPDT_TRIG)
BIOS Update Trigger (W) Executing a WRMSR instruction to this MSR causes a microcode update to be loaded into the processor. See Section 9.11.6, Microcode Update Loader. A processor may prevent writing to this MSR when loading guest states on VM entries or saving guest states on VM exits.
06_01H
279
MSR/Bit Description BIOS Update Signature (RO) Returns the microcode update signature following the execution of CPUID.01H. A processor may prevent writing to this MSR when loading guest states on VM entries or saving guest states on VM exits.
31:0 63:32
Reserved It is recommended that this field be preloaded with 0 prior to executing CPUID. If the field remains 0 following the execution of CPUID; this indicates that no microcode update is loaded. Any non-zero value is the microcode update signature.
9BH
155
SMM Monitor Configuration (R/W) Valid (R/W) Reserved Controls SMI unblocking by VMXOFF (see Section 34.14.4) Reserved MSEG Base (R/W) Reserved Base address of the logical processors SMRAM image (RO, SMM only) General Performance Counter 0 (R/W) General Performance Counter 1 (R/W) General Performance Counter 2 (R/W) General Performance Counter 3 (R/W) General Performance Counter 4 (R/W) General Performance Counter 5 (R/W) General Performance Counter 6 (R/W) General Performance Counter 7 (R/W)
If IA32_VMX_MISC[bit 28])
IA32_SMBASE IA32_PMC0 (PERFCTR0) IA32_PMC1 (PERFCTR1) IA32_PMC2 IA32_PMC3 IA32_PMC4 IA32_PMC5 IA32_PMC6 IA32_PMC7
If IA32_VMX_MISC[bit 15]) If CPUID.0AH: EAX[15:8] > 0 If CPUID.0AH: EAX[15:8] > 1 If CPUID.0AH: EAX[15:8] > 2 If CPUID.0AH: EAX[15:8] > 3 If CPUID.0AH: EAX[15:8] > 4 If CPUID.0AH: EAX[15:8] > 5 If CPUID.0AH: EAX[15:8] > 6 If CPUID.0AH: EAX[15:8] > 7
280
MSR/Bit Description Maximum Qualified Performance Clock Counter (R/Write to clear) C0_MCNT: C0 Maximum Frequency Clock Count Increments at fixed interval (relative to TSC freq.) when the logical processor is in C0. Cleared upon overflow / wrap-around of IA32_APERF.
E8H
232
IA32_APERF 63:0
Actual Performance Clock Counter (R/Write to clear) C0_ACNT: C0 Actual Frequency Clock Count Accumulates core clock counts at the coordinated clock frequency, when the logical processor is in C0. Cleared upon overflow / wrap-around of IA32_MPERF.
If CPUID.06H: ECX[0] = 1
FEH
254
MTRR Capability (RO) Section 11.11.2.1, IA32_MTRR_DEF_TYPE MSR. VCNT: The number of variable memory type ranges in the processor. Fixed range MTRRs are supported when set. Reserved. WC Supported when set. SMRR Supported when set. Reserved. SYSENTER_CS_MSR (R/W) CS Selector Reserved. SYSENTER_ESP_MSR (R/W) SYSENTER_EIP_MSR (R/W) Global Machine Check Capability (RO) Count: Number of reporting banks. MCG_CTL_P: IA32_MCG_CTL is present if this bit is set MCG_EXT_P: Extended machine check state registers are present if this bit is set
06_01H
174H
372
06_01H
281
MSR/Bit Description MCP_CMCI_P: Support for corrected MC error event is present. MCG_TES_P: Threshold-based error status register are present if this bit is set. Reserved MCG_EXT_CNT: Number of extended machine check state registers present. MCG_SER_P: The processor supports software error recovery if this bit is set. Reserved. Global Machine Check Status (RO) Global Machine Check Control (R/W)
16 17 18 19 20 21
22
23
282
MSR/Bit Description CMASK: When CMASK is not zero, the corresponding performance counter increments each cycle if the event count is greater than or equal to the CMASK. Reserved. Performance Event Select Register 1 (R/W) Performance Event Select Register 2 (R/W) Performance Event Select Register 3 (R/W)
63:32 187H 188H 189H 18AH197H 198H 391 392 393 394407 408 IA32_PERFEVTSEL1 (PERFEVTSEL1) IA32_PERFEVTSEL2 IA32_PERFEVTSEL3 Reserved IA32_PERF_STATUS 15:0 63:16 199H 409 IA32_PERF_CTL 15:0 31:16 32 63:33 19AH 410 IA32_CLOCK_MODULATION
If CPUID.0AH: EAX[15:8] > 1 If CPUID.0AH: EAX[15:8] > 2 If CPUID.0AH: EAX[15:8] > 3 06_0EH2
(RO) Current performance State Value Reserved. (R/W) Target performance State Value Reserved. IDA Engage. (R/W) When set to 1: disengages IDA Reserved. Clock Modulation Control (R/W) See Section 14.5.3, Software Controlled Clock Modulation.
0F_03H
0F_03H
06_0FH (Mobile)
0F_0H
0 3:1
Extended On-Demand Clock Modulation Duty Cycle: On-Demand Clock Modulation Duty Cycle: Specific encoded values for target duty cycle modulation. On-Demand Clock Modulation Enable: Set 1 to enable modulation. Reserved. Thermal Interrupt Control (R/W) Enables and disables the generation of an interrupt on temperature transitions detected with the processors thermal sensors and thermal monitor. See Section 14.5.2, Thermal Monitor.
If CPUID.06H:EAX[5] = 1
0F_0H
283
MSR/Bit Description High-Temperature Interrupt Enable Low-Temperature Interrupt Enable PROCHOT# Interrupt Enable FORCEPR# Interrupt Enable Critical Temperature Interrupt Enable Reserved. Threshold #1 Value Threshold #1 Interrupt Enable Threshold #2 Value Threshold #2 Interrupt Enable Power Limit Notification Enable Reserved. Thermal Status Information (RO) Contains status information about the processors thermal sensor and automatic thermal monitoring facilities. See Section 14.5.2, Thermal Monitor
If CPUID.06H:EAX[4] = 1 0F_0H
Thermal Status (RO): Thermal Status Log (R/W): PROCHOT # or FORCEPR# event (RO) PROCHOT # or FORCEPR# log (R/WC0) Critical Temperature Status (RO) Critical Temperature Status log (R/WC0) Thermal Threshold #1 Status (RO) Thermal Threshold #1 log (R/WC0) Thermal Threshold #2 Status (RO) Thermal Threshold #1 log (R/WC0) Power Limitation Status (RO) Power Limitation log (R/WC0) Reserved. Digital Readout (RO) Reserved. Resolution in Degrees Celsius (RO) Reading Valid (RO) If CPUID.06H:EAX[0] = 1 If CPUID.06H:EAX[0] = 1 If CPUID.06H:EAX[0] = 1 If CPUID.01H:ECX[8] = 1 If CPUID.01H:ECX[8] = 1 If CPUID.01H:ECX[8] = 1 If CPUID.01H:ECX[8] = 1 If CPUID.06H:EAX[4] = 1 If CPUID.06H:EAX[4] = 1
284
MSR/Bit Description
Reserved. Branch Trace Storage Unavailable (RO) 1= 0= Processor doesnt support branch trace storage (BTS) BTS is supported 06_0FH
12
Precise Event Based Sampling (PEBS) Unavailable (RO) 1 = PEBS is not supported; 0 = PEBS is supported.
15:13
Reserved.
285
MSR/Bit Description Enhanced Intel SpeedStep Technology Enable (R/W) 0= 1= Enhanced Intel SpeedStep Technology disabled Enhanced Intel SpeedStep Technology enabled
17 18
Reserved. ENABLE MONITOR FSM (R/W) When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0). This indicates that MONITOR/ MWAIT are not supported. Software attempts to execute MONITOR/ MWAIT will cause #UD when this bit is 0. When this bit is set to 1 (default), MONITOR/MWAIT are supported (CPUID.01H:ECX[bit 3] = 1). If the SSE3 feature flag ECX[0] is not set (CPUID.01H:ECX[bit 0] = 0), the OS must not attempt to alter this bit. BIOS must leave it in the default state. Writing this bit when the SSE3 feature flag is set to 0 may generate a #GP exception. 0F_03H
21:19 22
Reserved. Limit CPUID Maxval (R/W) When this bit is set to 1, CPUID.00H returns a maximum value in EAX[7:0] of 3. BIOS should contain a setup question that allows users to specify when the installed OS does not support CPUID functions greater than 3. Before setting this bit, BIOS must execute the CPUID.0H and examine the maximum value returned in EAX[7:0]. If the maximum value is greater than 3, the bit is supported. Otherwise, the bit is not supported. Writing to this bit when the maximum value is greater than 3 may generate a #GP exception. Setting this bit may cause unexpected behavior in software that depends on the availability of CPUID leaves greater than 3. 0F_03H
286
MSR/Bit Description xTPR Message Disable (R/W) When set to 1, xTPR messages are disabled. xTPR messages are optional messages that allow the processor to inform the chipset of its priority.
33:24 34
Reserved. XD Bit Disable (R/W) When set to 1, the Execute Disable Bit feature (XD Bit) is disabled and the XD Bit extended feature flag will be clear (CPUID.80000001H: EDX[20]=0). When set to a 0 (default), the Execute Disable Bit feature (if available) allows the OS to enable PAE paging and take advantage of data only pages. BIOS must not alter the contents of this bit location, if XD bit is not supported.. Writing this bit to 1 when the XD Bit extended feature flag is set to 0 may generate a #GP exception. if CPUID.80000001H:EDX[2 0] = 1
Reserved. Performance Energy Bias Hint (R/W) Power Policy Preference: 0 indicates preference to highest performance. 15 indicates preference to maximize energy saving. if CPUID.6H:ECX[3] = 1
Reserved. Package Thermal Status Information (RO) Contains status information about the packages thermal sensor. See Section 14.6, Package Level Thermal Management. 06_2AH
0 1 2 3 4 5
Pkg Thermal Status (RO): Pkg Thermal Status Log (R/W): Pkg PROCHOT # event (RO) Pkg PROCHOT # log (R/WC0) Pkg Critical Temperature Status (RO) Pkg Critical Temperature Status log (R/ WC0)
287
MSR/Bit Description Pkg Thermal Threshold #1 Status (RO) Pkg Thermal Threshold #1 log (R/WC0) Pkg Thermal Threshold #2 Status (RO) Pkg Thermal Threshold #1 log (R/WC0) Pkg Power Limitation Status (RO) Pkg Power Limitation log (R/WC0) Reserved. Pkg Digital Readout (RO) Reserved. Pkg Thermal Interrupt Control (R/W) Enables and disables the generation of an interrupt on temperature transitions detected with the packages thermal sensor. See Section 14.6, Package Level Thermal Management.
06_2AH
Pkg High-Temperature Interrupt Enable Pkg Low-Temperature Interrupt Enable Pkg PROCHOT# Interrupt Enable Reserved. Pkr Overheat Interrupt Enable Reserved. Pkg Threshold #1 Value Pkg Threshold #1 Interrupt Enable Pkg Threshold #2 Value Pkg Threshold #2 Interrupt Enable Pkg Power Limit Notification Enable Reserved. Trace/Profile Resource Control (R/W) LBR: Setting this bit to 1 enables the processor to record a running trace of the most recent branches taken by the processor in the LBR stack. BTF: Setting this bit to 1 enables the processor to treat EFLAGS.TF as single-step on branches instead of single-step on instructions. 06_0EH 06_01H
06_01H
288
MSR/Bit Description
06_0EH
9 10 11
06_0FH 06_0FH If CPUID.01H: ECX[15] = 1 and CPUID.0AH: EAX[7:0] > 1 If CPUID.01H: ECX[15] = 1 and CPUID.0AH: EAX[7:0] > 1 06_1AH
12
13
14
289
MSR/Bit Description
290
MSR/Bit Description MTRRphysBase6 MTRRphysMask6 MTRRphysBase7 MTRRphysMask7 MTRRphysBase8 MTRRphysMask8 MTRRphysBase9 MTRRphysMask9 MTRRfix64K_00000 MTRRfix16K_80000 MTRRfix16K_A0000 See Section 11.11.2.2, Fixed Range MTRRs. MTRRfix4K_C8000 MTRRfix4K_D0000 MTRRfix4K_D8000 MTRRfix4K_E0000 MTRRfix4K_E8000 MTRRfix4K_F0000 MTRRfix4K_F8000 IA32_PAT (R/W) PA0 Reserved. PA1 Reserved. PA2 Reserved. PA3 Reserved. PA4 Reserved.
291
MSR/Bit Description
292
MSR/Bit Description
293
MSR/Bit Description EN1_OS: Enable Fixed Counter 1to count while CPL = 0. EN1_Usr: Enable Fixed Counter 1to count while CPL > 0. AnyThread: When set to 1, it enables counting the associated event conditions occurring across all logical processors sharing a processor core. When set to 0, the counter only increments the associated event conditions occurring in the logical processor which programmed the MSR. EN1_PMI: Enable PMI when fixed counter 1 overflows. EN2_OS: Enable Fixed Counter 2 to count while CPL = 0. EN2_Usr: Enable Fixed Counter 2 to count while CPL > 0. AnyThread: When set to 1, it enables counting the associated event conditions occurring across all logical processors sharing a processor core. When set to 0, the counter only increments the associated event conditions occurring in the logical processor which programmed the MSR. EN2_PMI: Enable PMI when fixed counter 2 overflows. Reserved. Global Performance Counter Status (RO) Ovf_PMC0: Overflow status of IA32_PMC0. Ovf_PMC1: Overflow status of IA32_PMC1. Ovf_PMC2: Overflow status of IA32_PMC2. Ovf_PMC3: Overflow status of IA32_PMC3. Reserved. Ovf_FixedCtr0: Overflow status of IA32_FIXED_CTR0. Ovf_FixedCtr1: Overflow status of IA32_FIXED_CTR1. Ovf_FixedCtr2: Overflow status of IA32_FIXED_CTR2. Reserved.
7 8 9 10
If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 0 06_2EH 06_2EH If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 1
294
MSR/Bit Description Ovf_Uncore: Uncore counter overflow status. OvfBuf: DS SAVE area Buffer overflow status. CondChg: status bits of this register has changed. Global Performance Counter Control (R/W) Counter increments while the result of ANDing respective enable bit in this MSR with the corresponding OS or USR bits in the general-purpose or fixed counter control MSR is true. EN_PMC0 EN_PMC1 Reserved. EN_FIXED_CTR0 EN_FIXED_CTR1 EN_FIXED_CTR2 Reserved. Global Performance Counter Overflow Control (R/W) Set 1 to Clear Ovf_PMC0 bit. Set 1 to Clear Ovf_PMC1 bit. Reserved. Set 1 to Clear Ovf_FIXED_CTR0 bit. Set 1 to Clear Ovf_FIXED_CTR1 bit. Set 1 to Clear Ovf_FIXED_CTR2 bit. Reserved. Set 1 to Clear Ovf_Uncore: bit. Set 1 to Clear OvfBuf: bit. Set to 1to clear CondChg: bit. PEBS Control (R/W) Enable PEBS on IA32_PMC0. Reserved or Model specific . Reserved. Reserved or Model specific . Reserved.
0 1 31:2 32 33 34 63:35 390H 912 IA32_PERF_GLOBAL_OVF_CTRL (MSR_PERF_GLOBAL_OVF_CTRL) 0 1 31:2 32 33 34 60:35 61 62 63 3F1H 1009 IA32_PEBS_ENABLE 0 1-3 31:4 35-32 63:36
If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 1 If CPUID.0AH: EAX[7:0] > 1 06_2EH If CPUID.0AH: EAX[7:0] > 0 If CPUID.0AH: EAX[7:0] > 0 06_0FH
295
MSR/Bit Description MC0_CTL MC0_STATUS MC0_ADDR MC0_MISC MC1_CTL MC1_STATUS MC1_ADDR MC1_MISC MC2_CTL MC2_STATUS MC2_ADDR MC2_MISC MC3_CTL MC3_STATUS MC3_ADDR MC3_MISC MC4_CTL MC4_STATUS MC4_ADDR MC4_MISC MC5_CTL MC5_STATUS MC5_ADDR MC5_MISC MC6_CTL MC6_STATUS MC6_ADDR MC6_MISC MC7_CTL MC7_STATUS MC7_ADDR MC7_MISC MC8_CTL MC8_STATUS
Introduced as Architectural MSR P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors P6 Family Processors 06_0FH 06_0FH 06_0FH 06_0FH 06_1DH 06_1DH 06_1DH 06_1DH 06_1AH 06_1AH 06_1AH 06_1AH 06_1AH 06_1AH
296
MSR/Bit Description MC8_ADDR MC8_MISC MC9_CTL MC9_STATUS MC9_ADDR MC9_MISC MC10_CTL MC10_STATUS MC10_ADDR MC10_MISC MC11_CTL MC11_STATUS MC11_ADDR MC11_MISC MC12_CTL MC12_STATUS MC12_ADDR MC12_MISC MC13_CTL MC13_STATUS MC13_ADDR MC13_MISC MC14_CTL MC14_STATUS MC14_ADDR MC14_MISC MC15_CTL MC15_STATUS MC15_ADDR MC15_MISC MC16_CTL MC16_STATUS MC16_ADDR MC16_MISC
297
MSR/Bit Description
298
MSR/Bit Description Capability Reporting Register of VMentry Controls (R/O) See Appendix A.5, VM-Entry Controls. Reporting Register of Miscellaneous VMX Capabilities (R/O) See Appendix A.6, Miscellaneous Data. Capability Reporting Register of CR0 Bits Fixed to 0 (R/O) See Appendix A.7, VMX-Fixed Bits in CR0. Capability Reporting Register of CR0 Bits Fixed to 1 (R/O) See Appendix A.7, VMX-Fixed Bits in CR0. Capability Reporting Register of CR4 Bits Fixed to 0 (R/O) See Appendix A.8, VMX-Fixed Bits in CR4. Capability Reporting Register of CR4 Bits Fixed to 1 (R/O) See Appendix A.8, VMX-Fixed Bits in CR4. Capability Reporting Register of VMCS Field Enumeration (R/O) See Appendix A.9, VMCS Enumeration. Capability Reporting Register of Secondary Processor-based VM-execution Controls (R/O) See Appendix A.3.3, Secondary ProcessorBased VM-Execution Controls.
485H
1157
IA32_VMX_MISC
486H
1158
IA32_VMX_CRO_FIXED0
487H
1159
IA32_VMX_CRO_FIXED1
488H
1160
IA32_VMX_CR4_FIXED0
489H
1161
IA32_VMX_CR4_FIXED1
48AH
1162
IA32_VMX_VMCS_ENUM
48BH
1163
IA32_VMX_PROCBASED_CTLS2
48CH
1164
IA32_VMX_EPT_VPID_CAP
Capability Reporting Register of EPT and VPID (R/O) See Appendix A.10, VPID and EPT Capabilities.
48DH
1165
IA32_VMX_TRUE_PINBASED_CTLS
Capability Reporting Register of Pinbased VM-execution Flex Controls (R/O) See Appendix A.3.1, Pin-Based VMExecution Controls.
48EH
1166
IA32_VMX_TRUE_PROCBASED_CTLS
Capability Reporting Register of Primary Processor-based VM-execution Flex Controls (R/O) See Appendix A.3.2, Primary ProcessorBased VM-Execution Controls.
299
MSR/Bit Description Capability Reporting Register of VM-exit Flex Controls (R/O) See Appendix A.4, VM-Exit Controls. Capability Reporting Register of VMentry Flex Controls (R/O) See Appendix A.5, VM-Entry Controls. Full Width Writable IA32_PMC0 Alias (R/W)
490H
1168
IA32_VMX_TRUE_ENTRY_CTLS
4C1H
1217
IA32_A_PMC0
300
MSR/Bit Description DS Save Area (R/W) Points to the linear address of the first byte of the DS buffer management area, which is used to manage the BTS and PEBS buffers. See Section 18.11.4, Debug Store (DS) Mechanism.
63:0
The linear address of the first byte of the DS buffer management area, if IA-32e mode is active. The linear address of the first byte of the DS buffer management area, if not in IA32e mode. Reserved iff not in IA-32e mode. TSC Target of Local APICs TSC Deadline Mode (R/W) x2APIC ID Register (R/O) See x2APIC Specification x2APIC Version Register (R/O) x2APIC Task Priority Register (R/W) x2APIC Processor Priority Register (R/O) x2APIC EOI Register (W/O) x2APIC Logical Destination Register (R/ O) x2APIC Spurious Interrupt Vector Register (R/W) x2APIC In-Service Register Bits 31:0 (R/ O) x2APIC In-Service Register Bits 63:32 (R/O) x2APIC In-Service Register Bits 95:64 (R/O) x2APIC In-Service Register Bits 127:96 (R/O) x2APIC In-Service Register Bits 159:128 (R/O) If( CPUID.01H:ECX.[bit 25] =1 If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1) If ( CPUID.01H:ECX.[bit 21] =1)
31:0
63:32 6E0H 802H 803H 808H 80AH 80BH 80DH 80FH 810H 811H 812H 813H 814H 1760 2050 2051 2056 2058 2059 2061 2063 2064 2065 2066 2067 2068 IA32_TSC_DEADLINE IA32_X2APIC_APICID IA32_X2APIC_VERSION IA32_X2APIC_TPR IA32_X2APIC_PPR IA32_X2APIC_EOI IA32_X2APIC_LDR IA32_X2APIC_SIVR IA32_X2APIC_ISR0 IA32_X2APIC_ISR1 IA32_X2APIC_ISR2 IA32_X2APIC_ISR3 IA32_X2APIC_ISR4
301
MSR/Bit Description x2APIC In-Service Register Bits 191:160 (R/O) x2APIC In-Service Register Bits 223:192 (R/O) x2APIC In-Service Register Bits 255:224 (R/O) x2APIC Trigger Mode Register Bits 31:0 (R/O) x2APIC Trigger Mode Register Bits 63:32 (R/O) x2APIC Trigger Mode Register Bits 95:64 (R/O) x2APIC Trigger Mode Register Bits 127:96 (R/O) x2APIC Trigger Mode Register Bits 159:128 (R/O) x2APIC Trigger Mode Register Bits 191:160 (R/O) x2APIC Trigger Mode Register Bits 223:192 (R/O) x2APIC Trigger Mode Register Bits 255:224 (R/O) x2APIC Interrupt Request Register Bits 31:0 (R/O) x2APIC Interrupt Request Register Bits 63:32 (R/O) x2APIC Interrupt Request Register Bits 95:64 (R/O) x2APIC Interrupt Request Register Bits 127:96 (R/O) x2APIC Interrupt Request Register Bits 159:128 (R/O) x2APIC Interrupt Request Register Bits 191:160 (R/O) x2APIC Interrupt Request Register Bits 223:192 (R/O) x2APIC Interrupt Request Register Bits 255:224 (R/O) x2APIC Error Status Register (R/W)
302
MSR/Bit Description x2APIC LVT Corrected Machine Check Interrupt Register (R/W) x2APIC Interrupt Command Register (R/ W) x2APIC LVT Timer Interrupt Register (R/ W) x2APIC LVT Thermal Sensor Interrupt Register (R/W) x2APIC LVT Performance Monitor Interrupt Register (R/W) x2APIC LVT LINT0 Register (R/W) x2APIC LVT LINT1 Register (R/W) x2APIC LVT Error Register (R/W) x2APIC Initial Count Register (R/W) x2APIC Current Count Register (R/O)
x2APIC Divide Configuration Register (R/ If ( CPUID.01H:ECX.[bit 21] W) =1) x2APIC Self IPI Register (W/O) If ( CPUID.01H:ECX.[bit 21] =1)
QoS Monitoring Event Select Register (R/ If ( CPUID.(EAX=07H, W) ECX=0):EBX.[bit 12] = 1 ) Event ID: ID of a supported QoS monitoring event to report via IA32_QM_CTR. Reserved. Resource Monitoring ID: ID for QoS monitoring hardware to report monitored data via IA32_QM_CTR. Reserved. QoS Monitoring Counter Register (R/O) Resource Monitored Data Unavailable: If 1, indicates data for this RMID is not available or not monitored for this resource or RMID. If ( CPUID.(EAX=07H, ECX=0):EBX.[bit 12] = 1 )
N = Log2 ( CPUID.(EAX= 0FH, ECX=0H).EBX[31:0] +1)
303
MSR/Bit Description Error: If 1, indicates and unsupported RMID or event type was written to IA32_PQR_QM_EVTSEL.
C8FH
3215
IA32_PQR_ASSOC N-1:0
QoS Resource Association Register (R/W) If ( CPUID.(EAX=07H, ECX=0):EBX.[bit 12] = 1 ) Resource Monitoring ID: ID for QoS monitoring hardware to track internal operation, e.g. memory access. Reserved. All existing and future processors will not implement MSR in this range.
N = Log2 ( CPUID.(EAX= 0FH, ECX=0H).EBX[31:0] +1)
63:N 4000_ 0000H 4000_ 00FFH C000_ 0080H Reserved MSR Address Space
IA32_EFER
Reserved. IA-32e Mode Enable (R/W) Enables IA-32e mode operation. Reserved. IA-32e Mode Active (R) Indicates IA-32e mode is active when set. Execute Disable Bit Enable (R/W) Reserved. System Call Target Address (R/W) If CPUID.80000001.EDX.[bit 29] = 1 If CPUID.80000001.EDX.[bit 29] = 1 If CPUID.80000001.EDX.[bit 29] = 1
IA32_LSTAR
IA-32e Mode System Call Target Address (R/W) System Call Flag Mask (R/W)
IA32_FMASK
304
IA32_GS_BASE
IA32_KERNEL_GS_BASE
Swap Target of BASE Address of GS (R/ W) Auxiliary TSC (RW) AUX: Auxiliary signature of TSC Reserved.
1. In processors based on Intel NetBurst microarchitecture, MSR addresses 180H-197H are supported, software must treat them as model-specific. Starting with Intel Core Duo processors, MSR addresses 180H-185H, 188H-197H are reserved. 2. The *_ADDR MSRs may or may not be present; this depends on flag settings in IA32_MCi_STATUS. See Section 15.3.2.3 and Section 15.3.2.4 for more information.
NOTES:
...
305
306
307
308
133.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 001B. 166.67 MHz should be utilized if performing calculation with System Bus Speed when encoding is 011B. 266.67 MHz should be utilized if performing calculation with System Bus Speed when encoding is 110B. 333.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 111B. 63:3 E7H E8H FEH 11EH 231 232 254 281 IA32_MPERF IA32_APERF IA32_MTRRCAP 11 MSR_BBL_CR_CTL3 0 Unique Unique Unique Unique Shared L2 Hardware Enabled (RO) 1= 0= 7:1 If the L2 is hardware-enabled Indicates if the L2 is hardware-disabled Reserved. Maximum Performance Frequency Clock Count (RW) See Table 35-2. Actual Performance Frequency Clock Count (RW) See Table 35-2. See Table 35-2. SMRR Capability Using MSR 0A0H and 0A1H (R)
Reserved.
309
Reserved. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2.
310
311
312
313
314
315
316
317
480H
1152
IA32_VMX_BASIC
Unique
318
319
320
321
61H
97
MSR_ LASTBRANCH_1_TO_IP
322
323
Reserved. L2 Enabled. (R/W) 1 = L2 cache has been initialized 0 = Disabled (default) Until this bit is set the processor will not respond to the WBINVD instruction or the assertion of the FLUSH# input.
22:9 23
63:24 174H 175H 176H 17AH 372 373 374 378 IA32_SYSENTER_CS IA32_SYSENTER_ESP IA32_SYSENTER_EIP IA32_MCG_STATUS 0 Unique Unique Unique Unique
Reserved. See Table 35-2. See Table 35-2. See Table 35-2. RIPV When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) can be used to restart the program. If cleared, the program cannot be reliably restarted
EIPV When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) is directly associated with the error.
MCIP When set, bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception.
63:3 186H 187H 198H 198H 390 391 408 408 IA32_PERFEVTSEL0 IA32_PERFEVTSEL1 IA32_PERF_STATUS MSR_PERF_STATUS Unique Unique Shared Shared
Reserved. See Table 35-2. See Table 35-2. See Table 35-2.
324
325
326
327
328
329
330
Register Name
Bit Description
331
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 3AH 79H 8BH C1H C2H C3H C4H CEH Dec 63:32 58 121 139 193 194 195 196 206 IA32_FEATURE_CONTROL IA32_BIOS_ UPDT_TRIG IA32_BIOS_ SIGN_ID IA32_PMC0 IA32_PMC1 IA32_PMC2 IA32_PMC3 MSR_PLATFORM_INFO 7:0 15:8 Package Thread Core Thread Thread Thread Thread Thread Package Reserved. Control Features in Intel 64Processor (R/W) See Table 35-2. BIOS Update Trigger Register (W) See Table 35-2. BIOS Update Signature ID (RO) See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. see http://biosbits.org. Reserved. Maximum Non-Turbo Ratio (R/O) The is the ratio of the frequency that invariant TSC runs at. The invariant TSC frequency can be computed by multiplying this ratio by 133.33 MHz. 27:16 28 Package Reserved. Programmable Ratio Limit for Turbo Mode (R/O) When set to 1, indicates that Programmable Ratio Limits for Turbo mode is enabled, and when set to 0, indicates Programmable Ratio Limits for Turbo mode is disabled. 29 Package Programmable TDC-TDP Limit for Turbo Mode (R/O) When set to 1, indicates that TDC/TDP Limits for Turbo mode are programmable, and when set to 0, indicates TDC and TDP Limits for Turbo mode are not programmable. 39:30 47:40 Package Reserved. Maximum Efficiency Ratio (R/O) The is the minimum ratio (maximum efficiency) that the processor can operates, in units of 133.33MHz. 63:48 Reserved. Scope
Register Name
Bit Description
332
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex E2H Dec 226 MSR_PKG_CST_CONFIG_ CONTROL Core C-State Configuration Control (R/W) Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. See http://biosbits.org. Package C-State Limit (R/W) Specifies the lowest processor-specific C-state code name (consuming the least power). for the package. The default is set as factory-configured package C-state limit. The following C-state code name encodings are supported: 000b: C0 (no package C-sate support) 001b: C1 (Behavior is the same as 000b) 010b: C3 011b: C6 100b: C7 101b and 110b: Reserved 111: No package C-state limit. Note: This field cannot be used to limit package C-state to C3. 9:3 10 Reserved. I/O MWAIT Redirection Enable (R/W) When set, will map IO_read instructions sent to IO register specified by MSR_PMG_IO_CAPTURE_BASE to MWAIT instructions. 14:11 15 23:16 24 Reserved. CFG Lock (R/WO) When set, lock bits 15:0 of this register until next reset. Reserved. Interrupt filtering enable (R/W) When set, processor cores in a deep C-State will wake only when the event message is destined for that core. When 0, all processor cores in a deep C-State will wake for an event message. 25 C3 state auto demotion enable (R/W) When set, the processor will conditionally demote C6/C7 requests to C3 based on uncore auto-demote information. 26 C1 state auto demotion enable (R/W) When set, the processor will conditionally demote C3/C6/C7 requests to C1 based on uncore auto-demote information. 63:27 E4H 228 MSR_PMG_IO_CAPTURE_ BASE Core Reserved. Power Management IO Redirection in C-state (R/W) See http://biosbits.org. Scope
Register Name
Bit Description
2:0
333
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex Dec 15:0 LVL_2 Base Address (R/W) Specifies the base address visible to software for IO redirection. If IO MWAIT Redirection is enabled, reads to this address will be consumed by the power management logic and decoded to MWAIT instructions. When IO port address redirection is enabled, this is the IO port address reported to the OS/software. 18:16 C-state Range (R/W) Specifies the encoding value of the maximum C-State code name to be included when IO read to MWAIT redirection is enabled by MSR_PMG_CST_CONFIG_CONTROL[bit10]: 000b - C3 is the max C-State to include 001b - C6 is the max C-State to include 010b - C7 is the max C-State to include 63:19 E7H E8H FEH 174H 175H 176H 179H 17AH 231 232 254 372 373 374 377 378 IA32_MPERF IA32_APERF IA32_MTRRCAP IA32_SYSENTER_CS IA32_SYSENTER_ESP IA32_SYSENTER_EIP IA32_MCG_CAP IA32_MCG_STATUS 0 Thread Thread Thread Thread Thread Thread Thread Thread RIPV When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) can be used to restart the program. If cleared, the program cannot be reliably restarted. 1 EIPV When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) is directly associated with the error. 2 MCIP When set, bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception. 63:3 Reserved. Reserved. Maximum Performance Frequency Clock Count (RW) See Table 35-2. Actual Performance Frequency Clock Count (RW) See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Scope
Register Name
Bit Description
334
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 186H 187H 188H 189H 198H Dec 390 391 392 393 408 IA32_PERFEVTSEL0 IA32_PERFEVTSEL1 IA32_PERFEVTSEL2 IA32_PERFEVTSEL3 IA32_PERF_STATUS 15:0 63:16 199H 19AH 409 410 IA32_PERF_CTL IA32_CLOCK_MODULATION Thread Thread Thread Thread Thread Thread Core See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Current Performance State Value. Reserved. See Table 35-2. Clock Modulation (R/W) See Table 35-2. IA32_CLOCK_MODULATION MSR was originally named IA32_THERM_CONTROL MSR. 0 3:1 4 63:5 19BH 19CH 1A0 411 412 416 IA32_THERM_INTERRUPT IA32_THERM_STATUS IA32_MISC_ENABLE 0 2:1 3 6:4 7 10:8 11 Thread Thread Thread Thread Core Core Reserved. On demand Clock Modulation Duty Cycle (R/W) On demand Clock Modulation Enable (R/W) Reserved. Thermal Interrupt Control (R/W) See Table 35-2. Thermal Monitor Status (R/W) See Table 35-2. Enable Misc. Processor Features (R/W) Allows a variety of processor functions to be enabled and disabled. Fast-Strings Enable See Table 35-2. Reserved. Automatic Thermal Control Circuit Enable (R/W) See Table 35-2. Reserved. Performance Monitoring Available (R) See Table 35-2. Reserved. Branch Trace Storage Unavailable (RO) See Table 35-2. Scope
Register Name
Bit Description
335
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex Dec 12 15:13 16 18 21:19 22 23 33:24 34 37:35 38 Package Thread Thread Thread Package Thread Thread Precise Event Based Sampling Unavailable (RO) See Table 35-2. Reserved. Enhanced Intel SpeedStep Technology Enable (R/W) See Table 35-2. ENABLE MONITOR FSM. (R/W) See Table 35-2. Reserved. Limit CPUID Maxval (R/W) See Table 35-2. xTPR Message Disable (R/W) See Table 35-2. Reserved. XD Bit Disable (R/W) See Table 35-2. Reserved. Turbo Mode Disable (R/W) When set to 1 on processors that support Intel Turbo Boost Technology, the turbo mode feature is disabled and the IDA_Enable feature flag will be clear (CPUID.06H: EAX[1]=0). When set to a 0 on processors that support IDA, CPUID.06H: EAX[1] reports the processors support of turbo mode is enabled. Note: the power-on default value is used by BIOS to detect hardware support of turbo mode. If power-on default value is 1, turbo mode is available in the processor. If power-on default value is 0, turbo mode is not available. 63:39 1A2H 418 MSR_ TEMPERATURE_TARGET 15:0 23:16 Thread Reserved. Temperature Target (R) The minimum temperature at which PROCHOT# will be asserted. The value is degree C. 63:24 1A6H 1AAH 422 426 MSR_OFFCORE_RSP_0 MSR_MISC_PWR_MGMT Thread Reserved. Offcore Response Event Select Register (R/W) See http://biosbits.org. Reserved. Scope
Register Name
Bit Description
336
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex Dec 0 Package EIST Hardware Coordination Disable (R/W) When 0, enables hardware coordination of Enhanced Intel Speedstep Technology request from processor cores; When 1, disables hardware coordination of Enhanced Intel Speedstep Technology requests. 1 Thread Energy/Performance Bias Enable (R/W) This bit makes the IA32_ENERGY_PERF_BIAS register (MSR 1B0h) visible to software with Ring 0 privileges. This bits status (1 or 0) is also reflected by CPUID.(EAX=06h):ECX[3]. 63:2 1ADH 428 MSR_TURBO_POWER_ CURRENT_LIMIT 14:0 15 Package Package Reserved. See http://biosbits.org. TDP Limit (R/W) TDP limit in 1/8 Watt granularity. TDP Limit Override Enable (R/W) A value = 0 indicates override is not active, and a value = 1 indicates active. 30:16 31 Package Package TDC Limit (R/W) TDC limit in 1/8 Amp granularity. TDC Limit Override Enable (R/W) A value = 0 indicates override is not active, and a value = 1 indicates active. 63:32 1ADH 429 MSR_TURBO_RATIO_LIMIT Package Reserved. Maximum Ratio Limit of Turbo Mode RO if MSR_PLATFORM_INFO.[28] = 0, RW if MSR_PLATFORM_INFO.[28] = 1 7:0 15:8 23:16 31:24 63:32 1C8H 456 MSR_LBR_SELECT Core Package Package Package Package Maximum Ratio Limit for 1C Maximum turbo ratio limit of 1 core active. Maximum Ratio Limit for 2C Maximum turbo ratio limit of 2 core active. Maximum Ratio Limit for 3C Maximum turbo ratio limit of 3 core active. Maximum Ratio Limit for 4C Maximum turbo ratio limit of 4 core active. Reserved. Last Branch Record Filtering Select Register (R/W) See Section 17.6.2, Filtering of Last Branch Records. Scope
Register Name
Bit Description
337
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 1C9H Dec 457 MSR_LASTBRANCH_TOS Thread Last Branch Record Stack TOS (R/W) Contains an index (bits 0-3) that points to the MSR containing the most recent branch record. See MSR_LASTBRANCH_0_FROM_IP (at 680H). 1D9H 1DDH 473 477 IA32_DEBUGCTL MSR_LER_FROM_LIP Thread Thread Debug Control (R/W) See Table 35-2. Last Exception Record From Linear IP (R) Contains a pointer to the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. 1DEH 478 MSR_LER_TO_LIP Thread Last Exception Record To Linear IP (R) This area contains a pointer to the target of the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. 1F2H 1F3H 1FCH 498 499 508 IA32_SMRR_PHYSBASE IA32_SMRR_PHYSMASK MSR_POWER_CTL 0 1 Package Core Core Core See Table 35-2. See Table 35-2. Power Control Register. See http://biosbits.org. Reserved. C1E Enable (R/W) When set to 1, will enable the CPU to switch to the Minimum Enhanced Intel SpeedStep Technology operating point when all execution cores enter MWAIT (C1). 63:2 200H 201H 202H 203H 204H 205H 206H 207H 208H 209H 20AH 512 513 514 515 516 517 518 519 520 521 522 IA32_MTRR_PHYSBASE0 IA32_MTRR_PHYSMASK0 IA32_MTRR_PHYSBASE1 IA32_MTRR_PHYSMASK1 IA32_MTRR_PHYSBASE2 IA32_MTRR_PHYSMASK2 IA32_MTRR_PHYSBASE3 IA32_MTRR_PHYSMASK3 IA32_MTRR_PHYSBASE4 IA32_MTRR_PHYSMASK4 IA32_MTRR_PHYSBASE5 Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Reserved. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Scope
Register Name
Bit Description
338
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 20BH 20CH 20DH 20EH 20FH 210H 211H 212H 213H 250H 258H 259H 268H 269H 26AH 26BH 26CH 26DH 26EH 26FH 277H 280H 281H 282H 283H 284H 285H 286H 287H 288H Dec 523 524 525 526 527 528 529 530 531 592 600 601 616 617 618 619 620 621 622 623 631 640 641 642 643 644 645 646 647 648 IA32_MTRR_PHYSMASK5 IA32_MTRR_PHYSBASE6 IA32_MTRR_PHYSMASK6 IA32_MTRR_PHYSBASE7 IA32_MTRR_PHYSMASK7 IA32_MTRR_PHYSBASE8 IA32_MTRR_PHYSMASK8 IA32_MTRR_PHYSBASE9 IA32_MTRR_PHYSMASK9 IA32_MTRR_FIX64K_ 00000 IA32_MTRR_FIX16K_ 80000 IA32_MTRR_FIX16K_ A0000 IA32_MTRR_FIX4K_C0000 IA32_MTRR_FIX4K_C8000 IA32_MTRR_FIX4K_D0000 IA32_MTRR_FIX4K_D8000 IA32_MTRR_FIX4K_E0000 IA32_MTRR_FIX4K_E8000 IA32_MTRR_FIX4K_F0000 IA32_MTRR_FIX4K_F8000 IA32_PAT IA32_MC0_CTL2 IA32_MC1_CTL2 IA32_MC2_CTL2 IA32_MC3_CTL2 IA32_MC4_CTL2 IA32_MC5_CTL2 IA32_MC6_CTL2 IA32_MC7_CTL2 IA32_MC8_CTL2 Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Package Package Core Core Core Core Package Package Package See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Scope
Register Name
Bit Description
339
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 2FFH 309H 30AH 30BH 345H Dec 767 777 778 779 837 IA32_MTRR_DEF_TYPE IA32_FIXED_CTR0 IA32_FIXED_CTR1 IA32_FIXED_CTR2 IA32_PERF_CAPABILITIES 5:0 6 7 11:8 12 63:13 38DH 38EH 38EH 909 910 910 IA32_FIXED_CTR_CTRL IA32_PERF_GLOBAL_ STAUS 61 38FH 390H 390H 911 912 912 IA32_PERF_GLOBAL_CTRL IA32_PERF_GLOBAL_OVF_ CTRL MSR_PERF_GLOBAL_OVF_ CTRL 61 3F1H 1009 MSR_PEBS_ENABLE 0 1 2 3 Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Default Memory Types (R/W) See Table 35-2. Fixed-Function Performance Counter Register 0 (R/W) See Table 35-2. Fixed-Function Performance Counter Register 1 (R/W) See Table 35-2. Fixed-Function Performance Counter Register 2 (R/W) See Table 35-2. See Table 35-2. See Section 17.4.1, IA32_DEBUGCTL MSR. LBR Format. See Table 35-2. PEBS Record Format. PEBSSaveArchRegs. See Table 35-2. PEBS_REC_FORMAT. See Table 35-2. SMM_FREEZE. See Table 35-2. Reserved. Fixed-Function-Counter Control Register (R/W) See Table 35-2. See Table 35-2. See Section 18.4.2, Global Counter Control Facilities. (RO) UNC_Ovf Uncore overflowed if 1. See Table 35-2. See Section 18.4.2, Global Counter Control Facilities. See Table 35-2. See Section 18.4.2, Global Counter Control Facilities. (R/W) CLR_UNC_Ovf Set 1 to clear UNC_Ovf. See Section 18.6.1.1, Precise Event Based Sampling (PEBS). Enable PEBS on IA32_PMC0. (R/W) Enable PEBS on IA32_PMC1. (R/W) Enable PEBS on IA32_PMC2. (R/W) Enable PEBS on IA32_PMC3. (R/W) Scope
Register Name
Bit Description
MSR_PERF_GLOBAL_STAUS Thread
340
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex Dec 31:4 32 33 34 35 63:36 3F6H 1014 MSR_PEBS_LD_LAT 15:0 63:36 3F8H 1016 MSR_PKG_C3_RESIDENCY Package Thread Reserved. Enable Load Latency on IA32_PMC0. (R/W) Enable Load Latency on IA32_PMC1. (R/W) Enable Load Latency on IA32_PMC2. (R/W) Enable Load Latency on IA32_PMC3. (R/W) Reserved. See Section 18.6.1.2, Load Latency Performance Monitoring Facility. Minimum threshold latency value of tagged load operation that will be counted. (R/W) Reserved. Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C3 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C3 states. Count at the same frequency as the TSC. 3F9H 1017 MSR_PKG_C6_RESIDENCY Package Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C6 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C6 states. Count at the same frequency as the TSC. 3FAH 1018 MSR_PKG_C7_RESIDENCY Package Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C7 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C7 states. Count at the same frequency as the TSC. 3FCH 1020 MSR_CORE_C3_RESIDENCY Core Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. CORE C3 Residency Counter. (R/O) Value since last reset that this core is in processor-specific C3 states. Count at the same frequency as the TSC. 3FDH 1021 MSR_CORE_C6_RESIDENCY Core Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Scope
Register Name
Bit Description
63:0
63:0
63:0
63:0
341
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex Dec 63:0 CORE C6 Residency Counter. (R/O) Value since last reset that this core is in processor-specific C6 states. Count at the same frequency as the TSC. 400H 401H 402H 1024 1025 1026 IA32_MC0_CTL IA32_MC0_STATUS IA32_MC0_ADDR Package Package Package See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC0_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC0_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 403H 404H 405H 406H 1027 1028 1029 1030 MSR_MC0_MISC IA32_MC1_CTL IA32_MC1_STATUS IA32_MC1_ADDR Package Package Package Package See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC1_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC1_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 407H 408H 409H 40AH 1031 1032 1033 1034 MSR_MC1_MISC IA32_MC2_CTL IA32_MC2_STATUS IA32_MC2_ADDR Package Core Core Core See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC2_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC2_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 40BH 40CH 40DH 40EH 1035 1036 1037 1038 MSR_MC2_MISC MSR_MC3_CTL MSR_MC3_STATUS MSR_MC3_ADDR Core Core Core Core See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The MSR_MC4_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR_MC4_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. Scope
Register Name
Bit Description
342
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 40FH 410H 411H 412H Dec 1039 1040 1041 1042 MSR_MC3_MISC MSR_MC4_CTL MSR_MC4_STATUS MSR_MC4_ADDR Core Core Core Core See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The MSR_MC3_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR_MC3_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 413H 414H 415H 416H 417H 418H 419H 41AH 41BH 41CH 41DH 41EH 41FH 420H 421H 422H 423H 480H 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1152 MSR_MC4_MISC MSR_MC5_CTL MSR_MC5_STATUS MSR_MC5_ADDR MSR_MC5_MISC MSR_MC6_CTL MSR_MC6_STATUS MSR_MC6_ADDR MSR_MC6_MISC MSR_MC7_CTL MSR_MC7_STATUS MSR_MC7_ADDR MSR_MC7_MISC MSR_MC8_CTL MSR_MC8_STATUS MSR_MC8_ADDR MSR_MC8_MISC IA32_VMX_BASIC Core Core Core Core Core Package Package Package Package Package Package Package Package Package Package Package Package Thread See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. Reporting Register of Basic VMX Capabilities (R/O) See Table 35-2. See Appendix A.1, Basic VMX Information. 481H 1153 IA32_VMX_PINBASED_ CTLS Thread Capability Reporting Register of Pin-based VM-execution Controls (R/O) See Table 35-2. See Appendix A.3, VM-Execution Controls. Scope
Register Name
Bit Description
343
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 482H Dec 1154 IA32_VMX_PROCBASED_ CTLS IA32_VMX_EXIT_CTLS Thread Capability Reporting Register of Primary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls. 483H 1155 Thread Capability Reporting Register of VM-exit Controls (R/O) See Table 35-2. See Appendix A.4, VM-Exit Controls. 484H 1156 IA32_VMX_ENTRY_CTLS Thread Capability Reporting Register of VM-entry Controls (R/O) See Table 35-2. See Appendix A.5, VM-Entry Controls. 485H 1157 IA32_VMX_MISC Thread Reporting Register of Miscellaneous VMX Capabilities (R/O) See Table 35-2. See Appendix A.6, Miscellaneous Data. 486H 1158 IA32_VMX_CR0_FIXED0 Thread Capability Reporting Register of CR0 Bits Fixed to 0 (R/O) See Table 35-2. See Appendix A.7, VMX-Fixed Bits in CR0. 487H 1159 IA32_VMX_CR0_FIXED1 Thread Capability Reporting Register of CR0 Bits Fixed to 1 (R/O) See Table 35-2. See Appendix A.7, VMX-Fixed Bits in CR0. 488H 1160 IA32_VMX_CR4_FIXED0 Thread Capability Reporting Register of CR4 Bits Fixed to 0 (R/O) See Table 35-2. See Appendix A.8, VMX-Fixed Bits in CR4. 489H 1161 IA32_VMX_CR4_FIXED1 Thread Capability Reporting Register of CR4 Bits Fixed to 1 (R/O) See Table 35-2. See Appendix A.8, VMX-Fixed Bits in CR4. 48AH 1162 IA32_VMX_VMCS_ENUM Thread Capability Reporting Register of VMCS Field Enumeration (R/ O). See Table 35-2. See Appendix A.9, VMCS Enumeration. 48BH 1163 IA32_VMX_PROCBASED_ CTLS2 IA32_DS_AREA Thread Capability Reporting Register of Secondary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls. 600H 1536 Thread DS Save Area (R/W) See Table 35-2. See Section 18.11.4, Debug Store (DS) Mechanism. Scope
Register Name
Bit Description
344
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 680H Dec 1664 MSR_ LASTBRANCH_0_FROM_IP Thread Last Branch Record 0 From IP (R/W) One of sixteen pairs of last branch record registers on the last branch record stack. This part of the stack contains pointers to the source instruction for one of the last sixteen branches, exceptions, or interrupts taken by the processor. See also: Last Branch Record Stack TOS at 1C9H Section 17.6.1, LBR Stack. 681H 682H 683H 684H 685H 686H 687H 688H 689H 68AH 68BH 68CH 68DH 68EH 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 MSR_ LASTBRANCH_1_FROM_IP MSR_ LASTBRANCH_2_FROM_IP MSR_ LASTBRANCH_3_FROM_IP MSR_ LASTBRANCH_4_FROM_IP MSR_ LASTBRANCH_5_FROM_IP MSR_ LASTBRANCH_6_FROM_IP MSR_ LASTBRANCH_7_FROM_IP MSR_ LASTBRANCH_8_FROM_IP MSR_ LASTBRANCH_9_FROM_IP Thread Thread Thread Thread Thread Thread Thread Thread Thread Last Branch Record 1 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 2 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 3 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 4 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 5 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 6 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 7 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 8 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 9 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 10 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 11 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 12 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 13 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 14 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Scope
Register Name
Bit Description
MSR_ Thread LASTBRANCH_10_FROM_IP MSR_ Thread LASTBRANCH_11_FROM_IP MSR_ Thread LASTBRANCH_12_FROM_IP MSR_ Thread LASTBRANCH_13_FROM_IP MSR_ Thread LASTBRANCH_14_FROM_IP
345
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 68FH 6C0H Dec 1679 1728 MSR_ Thread LASTBRANCH_15_FROM_IP MSR_ LASTBRANCH_0_TO_IP Thread Last Branch Record 15 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 0 To IP (R/W) One of sixteen pairs of last branch record registers on the last branch record stack. This part of the stack contains pointers to the destination instruction for one of the last sixteen branches, exceptions, or interrupts taken by the processor. Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Last Branch Record 1 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 2 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 3 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 4 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 5 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 6 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 7 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 8 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 9 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 10 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 11 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 12 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 13 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 14 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Scope
Register Name
Bit Description
6C1H 6C2H 6C3H 6C4H 6C5H 6C6H 6C7H 6C8H 6C9H 6CAH 6CBH 6CCH 6CDH 6CEH
1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742
MSR_ LASTBRANCH_1_TO_IP MSR_ LASTBRANCH_2_TO_IP MSR_ LASTBRANCH_3_TO_IP MSR_ LASTBRANCH_4_TO_IP MSR_ LASTBRANCH_5_TO_IP MSR_ LASTBRANCH_6_TO_IP MSR_ LASTBRANCH_7_TO_IP MSR_ LASTBRANCH_8_TO_IP MSR_ LASTBRANCH_9_TO_IP MSR_ LASTBRANCH_10_TO_IP MSR_ LASTBRANCH_11_TO_IP MSR_ LASTBRANCH_12_TO_IP MSR_ LASTBRANCH_13_TO_IP MSR_ LASTBRANCH_14_TO_IP
346
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 6CFH 802H 803H 808H 80AH 80BH 80DH 80FH 810H 811H 812H 813H 814H 815H 816H 817H 818H 819H 81AH 81BH 81CH 81DH 81EH 81FH 820H 821H 822H 823H 824H 825H 826H 827H Dec 1743 2050 2051 2056 2058 2059 2061 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 MSR_ LASTBRANCH_15_TO_IP IA32_X2APIC_APICID IA32_X2APIC_VERSION IA32_X2APIC_TPR IA32_X2APIC_PPR IA32_X2APIC_EOI IA32_X2APIC_LDR IA32_X2APIC_SIVR IA32_X2APIC_ISR0 IA32_X2APIC_ISR1 IA32_X2APIC_ISR2 IA32_X2APIC_ISR3 IA32_X2APIC_ISR4 IA32_X2APIC_ISR5 IA32_X2APIC_ISR6 IA32_X2APIC_ISR7 IA32_X2APIC_TMR0 IA32_X2APIC_TMR1 IA32_X2APIC_TMR2 IA32_X2APIC_TMR3 IA32_X2APIC_TMR4 IA32_X2APIC_TMR5 IA32_X2APIC_TMR6 IA32_X2APIC_TMR7 IA32_X2APIC_IRR0 IA32_X2APIC_IRR1 IA32_X2APIC_IRR2 IA32_X2APIC_IRR3 IA32_X2APIC_IRR4 IA32_X2APIC_IRR5 IA32_X2APIC_IRR6 IA32_X2APIC_IRR7 Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Last Branch Record 15 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. x2APIC ID register (R/O) See x2APIC Specification. x2APIC Version register (R/O) x2APIC Task Priority register (R/W) x2APIC Processor Priority register (R/O) x2APIC EOI register (W/O) x2APIC Logical Destination register (R/O) x2APIC Spurious Interrupt Vector register (R/W) x2APIC In-Service register bits [31:0] (R/O) x2APIC In-Service register bits [63:32] (R/O) x2APIC In-Service register bits [95:64] (R/O) x2APIC In-Service register bits [127:96] (R/O) x2APIC In-Service register bits [159:128] (R/O) x2APIC In-Service register bits [191:160] (R/O) x2APIC In-Service register bits [223:192] (R/O) x2APIC In-Service register bits [255:224] (R/O) x2APIC Trigger Mode register bits [31:0] (R/O) x2APIC Trigger Mode register bits [63:32] (R/O) x2APIC Trigger Mode register bits [95:64] (R/O) x2APIC Trigger Mode register bits [127:96] (R/O) x2APIC Trigger Mode register bits [159:128] (R/O) x2APIC Trigger Mode register bits [191:160] (R/O) x2APIC Trigger Mode register bits [223:192] (R/O) x2APIC Trigger Mode register bits [255:224] (R/O) x2APIC Interrupt Request register bits [31:0] (R/O) x2APIC Interrupt Request register bits [63:32] (R/O) x2APIC Interrupt Request register bits [95:64] (R/O) x2APIC Interrupt Request register bits [127:96] (R/O) x2APIC Interrupt Request register bits [159:128] (R/O) x2APIC Interrupt Request register bits [191:160] (R/O) x2APIC Interrupt Request register bits [223:192] (R/O) x2APIC Interrupt Request register bits [255:224] (R/O) Scope
Register Name
Bit Description
347
Table 35-6. MSRs in ProcessorsBased on Intel Microarchitecture Code Name Nehalem (Contd.)
Register Address Hex 828H 82FH 830H 832H 833H 834H 835H 836H 837H 838H 839H 83EH 83FH C000_ 0080H C000_ 0081H C000_ 0082H C000_ 0084H C000_ 0100H C000_ 0101H C000_ 0102H C000_ 0103H ... Dec 2088 2095 2096 2098 2099 2100 2101 2102 2103 2104 2105 2110 2111 IA32_X2APIC_ESR IA32_X2APIC_LVT_CMCI IA32_X2APIC_ICR IA32_X2APIC_LVT_TIMER IA32_X2APIC_LVT_THERM AL IA32_X2APIC_LVT_PMI IA32_X2APIC_LVT_LINT0 IA32_X2APIC_LVT_LINT1 IA32_X2APIC_LVT_ERROR IA32_X2APIC_INIT_COUNT IA32_X2APIC_CUR_COUNT IA32_X2APIC_DIV_CONF IA32_X2APIC_SELF_IPI IA32_EFER IA32_STAR IA32_LSTAR IA32_FMASK IA32_FS_BASE IA32_GS_BASE IA32_KERNEL_GSBASE IA32_TSC_AUX Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread x2APIC Error Status register (R/W) x2APIC LVT Corrected Machine Check Interrupt register (R/W) x2APIC Interrupt Command register (R/W) x2APIC LVT Timer Interrupt register (R/W) x2APIC LVT Thermal Sensor Interrupt register (R/W) x2APIC LVT Performance Monitor register (R/W) x2APIC LVT LINT0 register (R/W) x2APIC LVT LINT1 register (R/W) x2APIC LVT Error register (R/W) x2APIC Initial Count register (R/W) x2APIC Current Count register (R/O) x2APIC Divide Configuration register (R/W) x2APIC Self IPI register (W/O) Extended Feature Enables See Table 35-2. System Call Target Address (R/W) See Table 35-2. IA-32e Mode System Call Target Address (R/W) See Table 35-2. System Call Flag Mask (R/W) See Table 35-2. Map of BASE Address of FS (R/W) See Table 35-2. Map of BASE Address of GS (R/W) See Table 35-2. Swap Target of BASE Address of GS (R/W) See Table 35-2. AUXILIARY TSC Signature. (R/W) See Table 35-2 and Section 17.13.2, IA32_TSC_AUX Register and RDTSCP Support. Scope
Register Name
Bit Description
348
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge
Register Address Hex 0H 1H 6H 10H 17H 1BH 34H Dec 0 1 6 16 23 27 52 IA32_P5_MC_ADDR IA32_P5_MC_TYPE IA32_MONITOR_FILTER_ SIZE IA32_TIME_STAMP_ COUNTER IA32_PLATFORM_ID IA32_APIC_BASE MSR_SMI_COUNT 31:0 63:32 3AH 79H 8BH C1H C2H C3H C4H C5H C6H C7H 58 121 139 193 194 195 196 197 198 199 IA32_FEATURE_CONTROL IA32_BIOS_UPDT_TRIG IA32_BIOS_SIGN_ID IA32_PMC0 IA32_PMC1 IA32_PMC2 IA32_PMC3 IA32_PMC4 IA32_PMC5 IA32_PMC6 Thread Core Thread Thread Thread Thread Thread Core Core Core Thread Thread Thread Thread Package Thread Thread See Section 35.15, MSRs in Pentium Processors. See Section 35.15, MSRs in Pentium Processors. See Section 8.10.5, Monitor/Mwait Address Range Determination, and Table 35-2. See Section 17.13, Time-Stamp Counter, and see Table 35-2. Platform ID (R) See Table 35-2. See Section 10.4.4, Local APIC Status and Location, and Table 35-2. SMI Counter (R/O) SMI Count (R/O) Count SMIs. Reserved. Control Features in Intel 64Processor (R/W) See Table 35-2. BIOS Update Trigger Register (W) See Table 35-2. BIOS Update Signature ID (RO) See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Performance Counter Register See Table 35-2. Register Name Scope Bit Description
349
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex C8H CEH Dec 200 206 IA32_PMC7 MSR_PLATFORM_INFO 7:0 15:8 Package Core Package Performance Counter Register See Table 35-2. See http://biosbits.org. Reserved. Maximum Non-Turbo Ratio (R/O) The is the ratio of the frequency that invariant TSC runs at. Frequency = ratio * 100 MHz. 27:16 28 Package Reserved. Programmable Ratio Limit for Turbo Mode (R/O) When set to 1, indicates that Programmable Ratio Limits for Turbo mode is enabled, and when set to 0, indicates Programmable Ratio Limits for Turbo mode is disabled. 29 Package Programmable TDP Limit for Turbo Mode (R/O) When set to 1, indicates that TDP Limits for Turbo mode are programmable, and when set to 0, indicates TDP Limit for Turbo mode is not programmable. 39:30 47:40 Package Reserved. Maximum Efficiency Ratio (R/O) The is the minimum ratio (maximum efficiency) that the processor can operates, in units of 100MHz. 63:48 E2H 226 MSR_PKG_CST_CONFIG_ CONTROL Core Reserved. C-State Configuration Control (R/W) Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. See http://biosbits.org. Register Name Scope Bit Description
350
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex Dec 2:0 Package C-State Limit (R/W) Specifies the lowest processor-specific C-state code name (consuming the least power). for the package. The default is set as factory-configured package C-state limit. The following C-state code name encodings are supported: 000b: C0/C1 (no package C-sate support) 001b: C2 010b: C6 no retention 011b: C6 retention 100b: C7 101b: C7s 111: No package C-state limit. Note: This field cannot be used to limit package C-state to C3. 9:3 10 Reserved. I/O MWAIT Redirection Enable (R/W) When set, will map IO_read instructions sent to IO register specified by MSR_PMG_IO_CAPTURE_BASE to MWAIT instructions 14:11 15 24:16 25 Reserved. CFG Lock (R/WO) When set, lock bits 15:0 of this register until next reset. Reserved. C3 state auto demotion enable (R/W) When set, the processor will conditionally demote C6/C7 requests to C3 based on uncore auto-demote information. 26 C1 state auto demotion enable (R/W) When set, the processor will conditionally demote C3/C6/C7 requests to C1 based on uncore auto-demote information. 27 28 63:29 E4H 228 MSR_PMG_IO_CAPTURE_ BASE Core Enable C3 undemotion (R/W) When set, enables undemotion from demoted C3. Enable C1 undemotion (R/W) When set, enables undemotion from demoted C1. Reserved. Power Management IO Redirection in C-state (R/W) See http://biosbits.org. Register Name Scope Bit Description
351
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex Dec 15:0 LVL_2 Base Address (R/W) Specifies the base address visible to software for IO redirection. If IO MWAIT Redirection is enabled, reads to this address will be consumed by the power management logic and decoded to MWAIT instructions. When IO port address redirection is enabled, this is the IO port address reported to the OS/software. 18:16 C-state Range (R/W) Specifies the encoding value of the maximum C-State code name to be included when IO read to MWAIT redirection is enabled by MSR_PMG_CST_CONFIG_CONTROL[bit10]: 000b - C3 is the max C-State to include 001b - C6 is the max C-State to include 010b - C7 is the max C-State to include 63:19 E7H E8H FEH 174H 175H 176H 179H 17AH 231 232 254 372 373 374 377 378 IA32_MPERF IA32_APERF IA32_MTRRCAP IA32_SYSENTER_CS IA32_SYSENTER_ESP IA32_SYSENTER_EIP IA32_MCG_CAP IA32_MCG_STATUS 0 Thread Thread Thread Thread Thread Thread Thread Thread RIPV When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) can be used to restart the program. If cleared, the program cannot be reliably restarted. 1 EIPV When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) is directly associated with the error. 2 MCIP When set, bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception. 63:3 Reserved. Reserved. Maximum Performance Frequency Clock Count (RW) See Table 35-2. Actual Performance Frequency Clock Count (RW) See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Register Name Scope Bit Description
352
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 186H 187H 188H 189H 18AH 18BH 18CH 18DH 198H Dec 390 391 392 393 394 395 396 397 408 IA32_ PERFEVTSEL0 IA32_ PERFEVTSEL1 IA32_ PERFEVTSEL2 IA32_ PERFEVTSEL3 IA32_ PERFEVTSEL4 IA32_ PERFEVTSEL5 IA32_ PERFEVTSEL6 IA32_ PERFEVTSEL7 IA32_PERF_STATUS 15:0 63:16 198H 408 MSR_PERF_STATUS 47:32 Package Core Voltage (R/O) P-state core voltage can be computed by MSR_PERF_STATUS[37:32] * (float) 1/(2^13). 199H 19AH 409 410 IA32_PERF_CTL IA32_CLOCK_ MODULATION Thread Thread See Table 35-2. Clock Modulation (R/W) See Table 35-2 IA32_CLOCK_MODULATION MSR was originally named IA32_THERM_CONTROL MSR. 3:0 4 63:5 19BH 19CH 411 412 IA32_THERM_INTERRUPT IA32_THERM_STATUS Core Core On demand Clock Modulation Duty Cycle (R/W) In 6.25% increment On demand Clock Modulation Enable (R/W) Reserved. Thermal Interrupt Control (R/W) See Table 35-2. Thermal Monitor Status (R/W) See Table 35-2. Thread Thread Thread Thread Core Core Core Core Package See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2; If CPUID.0AH:EAX[15:8] = 8 See Table 35-2; If CPUID.0AH:EAX[15:8] = 8 See Table 35-2; If CPUID.0AH:EAX[15:8] = 8 See Table 35-2; If CPUID.0AH:EAX[15:8] = 8 See Table 35-2. Current Performance State Value. Reserved. Register Name Scope Bit Description
353
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 1A0 Dec 416 IA32_MISC_ENABLE 0 6:1 7 10:8 11 12 15:13 16 18 21:19 22 23 33:24 34 37:35 38 Package Thread Thread Thread Package Thread Thread Thread Thread Thread Enable Misc. Processor Features (R/W) Allows a variety of processor functions to be enabled and disabled. Fast-Strings Enable See Table 35-2 Reserved. Performance Monitoring Available (R) See Table 35-2. Reserved. Branch Trace Storage Unavailable (RO) See Table 35-2. Precise Event Based Sampling Unavailable (RO) See Table 35-2. Reserved. Enhanced Intel SpeedStep Technology Enable (R/W) See Table 35-2. ENABLE MONITOR FSM. (R/W) See Table 35-2. Reserved. Limit CPUID Maxval (R/W) See Table 35-2. xTPR Message Disable (R/W) See Table 35-2. Reserved. XD Bit Disable (R/W) See Table 35-2. Reserved. Turbo Mode Disable (R/W) When set to 1 on processors that support Intel Turbo Boost Technology, the turbo mode feature is disabled and the IDA_Enable feature flag will be clear (CPUID.06H: EAX[1]=0). When set to a 0 on processors that support IDA, CPUID.06H: EAX[1] reports the processors support of turbo mode is enabled. Note: the power-on default value is used by BIOS to detect hardware support of turbo mode. If power-on default value is 1, turbo mode is available in the processor. If power-on default value is 0, turbo mode is not available. 63:39 Reserved. Register Name Scope Bit Description
354
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 1A2H Dec 418 MSR_ TEMPERATURE_TARGET 15:0 23:16 Unique Reserved. Temperature Target (R) The minimum temperature at which PROCHOT# will be asserted. The value is degree C. 63:24 1A6H 1A7H 1AAH 1ADH 1B0H 1B1H 1B2H 1C8H 1C9H 422 422 426 428 432 433 434 456 457 MSR_OFFCORE_RSP_0 MSR_OFFCORE_RSP_1 MSR_MISC_PWR_MGMT MSR_TURBO_PWR_ CURRENT_LIMIT IA32_ENERGY_PERF_BIAS IA32_PACKAGE_THERM_ STATUS IA32_PACKAGE_THERM_ INTERRUPT MSR_LBR_SELECT MSR_LASTBRANCH_TOS Package Package Package Thread Thread Thread Thread Reserved. Offcore Response Event Select Register (R/W) Offcore Response Event Select Register (R/W) See http://biosbits.org. See http://biosbits.org. See Table 35-2. See Table 35-2. See Table 35-2. Last Branch Record Filtering Select Register (R/W) See Section 17.6.2, Filtering of Last Branch Records. Last Branch Record Stack TOS (R/W) Contains an index (bits 0-3) that points to the MSR containing the most recent branch record. See MSR_LASTBRANCH_0_FROM_IP (at 680H). 1D9H 1DDH 473 477 IA32_DEBUGCTL MSR_LER_FROM_LIP Thread Thread Debug Control (R/W) See Table 35-2. Last Exception Record From Linear IP (R) Contains a pointer to the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. 1DEH 478 MSR_LER_TO_LIP Thread Last Exception Record To Linear IP (R) This area contains a pointer to the target of the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. 1F2H 1F3H 498 499 IA32_SMRR_PHYSBASE IA32_SMRR_PHYSMASK Core Core See Table 35-2. See Table 35-2. Register Name Scope Bit Description
355
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 1FCH 200H 201H 202H 203H 204H 205H 206H 207H 208H 209H 20AH 20BH 20CH 20DH 20EH 20FH 210H 211H 212H 213H 250H 258H 259H 268H 269H 26AH 26BH 26CH 26DH 26EH Dec 508 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 592 600 601 616 617 618 619 620 621 622 MSR_POWER_CTL IA32_MTRR_PHYSBASE0 IA32_MTRR_PHYSMASK0 IA32_MTRR_PHYSBASE1 IA32_MTRR_PHYSMASK1 IA32_MTRR_PHYSBASE2 IA32_MTRR_PHYSMASK2 IA32_MTRR_PHYSBASE3 IA32_MTRR_PHYSMASK3 IA32_MTRR_PHYSBASE4 IA32_MTRR_PHYSMASK4 IA32_MTRR_PHYSBASE5 IA32_MTRR_PHYSMASK5 IA32_MTRR_PHYSBASE6 IA32_MTRR_PHYSMASK6 IA32_MTRR_PHYSBASE7 IA32_MTRR_PHYSMASK7 IA32_MTRR_PHYSBASE8 IA32_MTRR_PHYSMASK8 IA32_MTRR_PHYSBASE9 IA32_MTRR_PHYSMASK9 IA32_MTRR_FIX64K_ 00000 IA32_MTRR_FIX16K_ 80000 IA32_MTRR_FIX16K_ A0000 IA32_MTRR_FIX4K_C0000 IA32_MTRR_FIX4K_C8000 IA32_MTRR_FIX4K_D0000 IA32_MTRR_FIX4K_D8000 IA32_MTRR_FIX4K_E0000 IA32_MTRR_FIX4K_E8000 IA32_MTRR_FIX4K_F0000 Core Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread See http://biosbits.org. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Register Name Scope Bit Description
356
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 26FH 277H 280H 281H 282H 283H 284H 2FFH Dec 623 631 640 641 642 643 644 767 IA32_MTRR_FIX4K_F8000 IA32_PAT IA32_MC0_CTL2 IA32_MC1_CTL2 IA32_MC2_CTL2 IA32_MC3_CTL2 MSR_MC4_CTL2 IA32_MTRR_DEF_TYPE Thread Thread Core Core Core Core Package Thread See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Always 0 (CMCI not supported). Default Memory Types (R/W) See Table 35-2. 309H 30AH 30BH 345H 777 778 779 837 IA32_FIXED_CTR0 IA32_FIXED_CTR1 IA32_FIXED_CTR2 IA32_PERF_CAPABILITIES 5:0 6 7 11:8 12 63:13 38DH 38EH 38FH 390H 3F1H 909 910 911 912 1009 IA32_FIXED_CTR_CTRL IA32_PERF_GLOBAL_ STAUS IA32_PERF_GLOBAL_CTRL IA32_PERF_GLOBAL_OVF_ CTRL MSR_PEBS_ENABLE 0 1 2 Thread Thread Thread Thread Thread Thread Thread Thread Thread Fixed-Function Performance Counter Register 0 (R/W) See Table 35-2. Fixed-Function Performance Counter Register 1 (R/W) See Table 35-2. Fixed-Function Performance Counter Register 2 (R/W) See Table 35-2. See Table 35-2. See Section 17.4.1, IA32_DEBUGCTL MSR. LBR Format. See Table 35-2. PEBS Record Format. PEBSSaveArchRegs. See Table 35-2. PEBS_REC_FORMAT. See Table 35-2. SMM_FREEZE. See Table 35-2. Reserved. Fixed-Function-Counter Control Register (R/W) See Table 35-2. See Table 35-2. See Section 18.4.2, Global Counter Control Facilities. See Table 35-2. See Section 18.4.2, Global Counter Control Facilities. See Table 35-2. See Section 18.4.2, Global Counter Control Facilities. See Section 18.6.1.1, Precise Event Based Sampling (PEBS). Enable PEBS on IA32_PMC0. (R/W) Enable PEBS on IA32_PMC1. (R/W) Enable PEBS on IA32_PMC2. (R/W) Register Name Scope Bit Description
357
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex Dec 3 31:4 32 33 34 35 63:36 3F6H 1014 MSR_PEBS_LD_LAT 15:0 63:36 3F8H 1016 MSR_PKG_C3_RESIDENCY Package Thread Enable PEBS on IA32_PMC3. (R/W) Reserved. Enable Load Latency on IA32_PMC0. (R/W) Enable Load Latency on IA32_PMC1. (R/W) Enable Load Latency on IA32_PMC2. (R/W) Enable Load Latency on IA32_PMC3. (R/W) Reserved. see See Section 18.6.1.2, Load Latency Performance Monitoring Facility. Minimum threshold latency value of tagged load operation that will be counted. (R/W) Reserved. Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C3 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C3 states. Count at the same frequency as the TSC. 3F9H 1017 MSR_PKG_C6_RESIDENCY Package Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C6 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C6 states. Count at the same frequency as the TSC. 3FAH 1018 MSR_PKG_C7_RESIDENCY Package Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C7 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C7 states. Count at the same frequency as the TSC. 3FCH 1020 MSR_CORE_C3_RESIDENCY Core Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. CORE C3 Residency Counter. (R/O) Value since last reset that this core is in processor-specific C3 states. Count at the same frequency as the TSC. 3FDH 1021 MSR_CORE_C6_RESIDENCY Core Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Register Name Scope Bit Description
63:0
63:0
63:0
63:0
358
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex Dec 63:0 CORE C6 Residency Counter. (R/O) Value since last reset that this core is in processor-specific C6 states. Count at the same frequency as the TSC. 3FEH 1022 MSR_CORE_C7_RESIDENCY Core Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. CORE C7 Residency Counter. (R/O) Value since last reset that this core is in processor-specific C7 states. Count at the same frequency as the TSC. 400H 401H 402H 403H 404H 405H 406H 407H 408H 409H 40AH 40BH 40CH 40DH 40EH 40FH 410H 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 IA32_MC0_CTL IA32_MC0_STATUS IA32_MC0_ADDR IA32_MC0_MISC IA32_MC1_CTL IA32_MC1_STATUS IA32_MC1_ADDR IA32_MC1_MISC IA32_MC2_CTL IA32_MC2_STATUS IA32_MC2_ADDR IA32_MC2_MISC IA32_MC3_CTL IA32_MC3_STATUS IA32_MC3_ADDR IA32_MC3_MISC MSR_MC4_CTL 0 1 2 63:2 411H 1041 IA32_MC4_STATUS Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. PCU Hardware Error (R/W) When set, enables signaling of PCU hardware detected errors. PCU Controller Error (R/W) When set, enables signaling of PCU controller detected errors PCU Firmware Error (R/W) When set, enables signaling of PCU firmware detected errors Reserved. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. Register Name Scope Bit Description
63:0
359
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 480H Dec 1152 IA32_VMX_BASIC Thread Reporting Register of Basic VMX Capabilities (R/O) See Table 35-2. See Appendix A.1, Basic VMX Information. 481H 1153 IA32_VMX_PINBASED_ CTLS Thread Capability Reporting Register of Pin-based VM-execution Controls (R/O) See Table 35-2. See Appendix A.3, VM-Execution Controls. 482H 1154 IA32_VMX_PROCBASED_ CTLS IA32_VMX_EXIT_CTLS Thread Capability Reporting Register of Primary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls. 483H 1155 Thread Capability Reporting Register of VM-exit Controls (R/O) See Table 35-2. See Appendix A.4, VM-Exit Controls. 484H 1156 IA32_VMX_ENTRY_CTLS Thread Capability Reporting Register of VM-entry Controls (R/O) See Table 35-2. See Appendix A.5, VM-Entry Controls. 485H 1157 IA32_VMX_MISC Thread Reporting Register of Miscellaneous VMX Capabilities (R/O) See Table 35-2. See Appendix A.6, Miscellaneous Data. 486H 1158 IA32_VMX_CR0_FIXED0 Thread Capability Reporting Register of CR0 Bits Fixed to 0 (R/O) See Table 35-2. See Appendix A.7, VMX-Fixed Bits in CR0. 487H 1159 IA32_VMX_CR0_FIXED1 Thread Capability Reporting Register of CR0 Bits Fixed to 1 (R/O) See Table 35-2. See Appendix A.7, VMX-Fixed Bits in CR0. 488H 1160 IA32_VMX_CR4_FIXED0 Thread Capability Reporting Register of CR4 Bits Fixed to 0 (R/O) See Table 35-2. See Appendix A.8, VMX-Fixed Bits in CR4. 489H 1161 IA32_VMX_CR4_FIXED1 Thread Capability Reporting Register of CR4 Bits Fixed to 1 (R/O) See Table 35-2. See Appendix A.8, VMX-Fixed Bits in CR4. 48AH 1162 IA32_VMX_VMCS_ENUM Thread Capability Reporting Register of VMCS Field Enumeration (R/O) See Table 35-2. See Appendix A.9, VMCS Enumeration. 48BH 1163 IA32_VMX_PROCBASED_ CTLS2 Thread Capability Reporting Register of Secondary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls. Register Name Scope Bit Description
360
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 4C1H 4C2H 4C3H 4C4H 4C5H 4C6H 4C7H C8H 600H Dec 1217 1218 1219 1220 1221 1222 1223 200 1536 IA32_A_PMC0 IA32_A_PMC1 IA32_A_PMC2 IA32_A_PMC3 IA32_A_PMC4 IA32_A_PMC5 IA32_A_PMC6 IA32_A_PMC7 IA32_DS_AREA Thread Thread Thread Thread Core Core Core Core Thread See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. DS Save Area (R/W) See Table 35-2. See Section 18.11.4, Debug Store (DS) Mechanism. 606H 60AH 1542 1546 MSR_RAPL_POWER_UNIT MSR_PKGC3_IRTL Package Package Unit Multipliers used in RAPL Interfaces (R/O) See Section 14.7.1, RAPL Interfaces. Package C3 Interrupt Response Limit (R/W) Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. 9:0 Interrupt response time limit (R/W) Specifies the limit that should be used to decide if the package should be put into a package C3 state. 12:10 Time Unit (R/W) Specifies the encoding value of time unit of the interrupt response time limit. The following time unit encodings are supported: 000b: 1 ns 001b: 32 ns 010b: 1024 ns 011b: 32768 ns 100b: 1048576 ns 101b: 33554432 ns 14:13 15 Reserved. Valid (R/W) Indicates whether the values in bits 12:0 are valid and can be used by the processor for package C-sate management. 63:16 Reserved. Register Name Scope Bit Description
361
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 60BH Dec 1547 MSR_PKGC6_IRTL Package Package C6 Interrupt Response Limit (R/W) This MSR defines the budget allocated for the package to exit from C6 to a C0 state, where interrupt request can be delivered to the core and serviced. Additional core-exit latency amy be applicable depending on the actual C-state the core is in. Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. 9:0 Interrupt response time limit (R/W) Specifies the limit that should be used to decide if the package should be put into a package C6 state. 12:10 Time Unit (R/W) Specifies the encoding value of time unit of the interrupt response time limit. The following time unit encodings are supported: 000b: 1 ns 001b: 32 ns 010b: 1024 ns 011b: 32768 ns 100b: 1048576 ns 101b: 33554432 ns 14:13 15 Reserved. Valid (R/W) Indicates whether the values in bits 12:0 are valid and can be used by the processor for package C-sate management. 63:16 60CH 1548 MSR_PKGC7_IRTL Package Reserved. Package C7 Interrupt Response Limit (R/W) This MSR defines the budget allocated for the package to exit from C7 to a C0 state, where interrupt request can be delivered to the core and serviced. Additional core-exit latency amy be applicable depending on the actual C-state the core is in. Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. 9:0 Interrupt response time limit (R/W) Specifies the limit that should be used to decide if the package should be put into a package C7 state. Register Name Scope Bit Description
362
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex Dec 12:10 Time Unit (R/W) Specifies the encoding value of time unit of the interrupt response time limit. The following time unit encodings are supported: 000b: 1 ns 001b: 32 ns 010b: 1024 ns 011b: 32768 ns 100b: 1048576 ns 101b: 33554432 ns 14:13 15 Reserved. Valid (R/W) Indicates whether the values in bits 12:0 are valid and can be used by the processor for package C-sate management. 63:16 60DH 1549 MSR_PKG_C2_RESIDENCY Package Reserved. Note: C-state values are processor specific C-state code names, unrelated to MWAIT extension C-state parameters or ACPI CStates. Package C2 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C2 states. Count at the same frequency as the TSC. 610H 611H 614H 638H 639H 63AH 63BH 1552 1553 1556 1592 1593 1594 1595 MSR_PKG_POWER_LIMIT MSR_PKG_ENERY_STATUS MSR_PKG_POWER_INFO MSR_PP0_POWER_LIMIT MSR_PP0_ENERY_STATUS MSR_PP0_POLICY MSR_PP0_PERF_STATUS Package Package Package Package Package Package Package PKG RAPL Power Limit Control (R/W) See Section 14.7.3, Package RAPL Domain. PKG Energy Status (R/O) See Section 14.7.3, Package RAPL Domain. PKG RAPL Parameters (R/W) See Section 14.7.3, Package RAPL Domain. PP0 RAPL Power Limit Control (R/W) See Section 14.7.4, PP0/PP1 RAPL Domains. PP0 Energy Status (R/O) See Section 14.7.4, PP0/PP1 RAPL Domains. PP0 Balance Policy (R/W) See Section 14.7.4, PP0/PP1 RAPL Domains. PP0 Performance Throttling Status (R/O) See Section 14.7.4, PP0/PP1 RAPL Domains. Register Name Scope Bit Description
63:0
363
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 680H Dec 1664 MSR_ LASTBRANCH_0_FROM_IP Thread Last Branch Record 0 From IP (R/W) One of sixteen pairs of last branch record registers on the last branch record stack. This part of the stack contains pointers to the source instruction for one of the last sixteen branches, exceptions, or interrupts taken by the processor. See also: Last Branch Record Stack TOS at 1C9H Section 17.6.1, LBR Stack. 681H 682H 683H 684H 685H 686H 687H 688H 689H 68AH 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 MSR_ LASTBRANCH_1_FROM_IP MSR_ LASTBRANCH_2_FROM_IP MSR_ LASTBRANCH_3_FROM_IP MSR_ LASTBRANCH_4_FROM_IP MSR_ LASTBRANCH_5_FROM_IP MSR_ LASTBRANCH_6_FROM_IP MSR_ LASTBRANCH_7_FROM_IP MSR_ LASTBRANCH_8_FROM_IP MSR_ LASTBRANCH_9_FROM_IP MSR_ LASTBRANCH_10_FROM_ IP MSR_ LASTBRANCH_11_FROM_ IP MSR_ LASTBRANCH_12_FROM_ IP MSR_ LASTBRANCH_13_FROM_ IP Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Last Branch Record 1 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 2 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 3 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 4 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 5 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 6 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 7 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 8 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 9 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Last Branch Record 10 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Thread Last Branch Record 11 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Thread Last Branch Record 12 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Thread Last Branch Record 13 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Register Name Scope Bit Description
68BH
1675
68CH
1676
68DH
1677
364
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 68EH Dec 1678 MSR_ LASTBRANCH_14_FROM_ IP MSR_ LASTBRANCH_15_FROM_ IP MSR_ LASTBRANCH_0_TO_IP Thread Last Branch Record 14 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Thread Last Branch Record 15 From IP (R/W) See description of MSR_LASTBRANCH_0_FROM_IP. Thread Last Branch Record 0 To IP (R/W) One of sixteen pairs of last branch record registers on the last branch record stack. This part of the stack contains pointers to the destination instruction for one of the last sixteen branches, exceptions, or interrupts taken by the processor. Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Last Branch Record 1 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 2 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 3 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 4 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 5 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 6 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 7 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 8 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 9 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 10 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 11 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 12 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 13 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Register Name Scope Bit Description
68FH
1679
6C0H
1728
6C1H 6C2H 6C3H 6C4H 6C5H 6C6H 6C7H 6C8H 6C9H 6CAH 6CBH 6CCH 6CDH
1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741
MSR_ LASTBRANCH_1_TO_IP MSR_ LASTBRANCH_2_TO_IP MSR_ LASTBRANCH_3_TO_IP MSR_ LASTBRANCH_4_TO_IP MSR_ LASTBRANCH_5_TO_IP MSR_ LASTBRANCH_6_TO_IP MSR_ LASTBRANCH_7_TO_IP MSR_ LASTBRANCH_8_TO_IP MSR_ LASTBRANCH_9_TO_IP MSR_ LASTBRANCH_10_TO_IP MSR_ LASTBRANCH_11_TO_IP MSR_ LASTBRANCH_12_TO_IP MSR_ LASTBRANCH_13_TO_IP
365
Table 35-11. MSRs Supported by Intel Processors Based on Intel Microarchitecture Code Name Sandy Bridge (Contd.)
Register Address Hex 6CEH 6CFH 6E0H C000_ 0080H C000_ 0081H C000_ 0082H C000_ 0084H C000_ 0100H C000_ 0101H C000_ 0102H C000_ 0103H Dec 1742 1743 1760 MSR_ LASTBRANCH_14_TO_IP MSR_ LASTBRANCH_15_TO_IP IA32_TSC_DEADLINE IA32_EFER IA32_STAR IA32_LSTAR IA32_FMASK IA32_FS_BASE IA32_GS_BASE IA32_KERNEL_GSBASE IA32_TSC_AUX Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Last Branch Record 14 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. Last Branch Record 15 To IP (R/W) See description of MSR_LASTBRANCH_0_TO_IP. See Table 35-2. Extended Feature Enables See Table 35-2. System Call Target Address (R/W) See Table 35-2. IA-32e Mode System Call Target Address (R/W) See Table 35-2. System Call Flag Mask (R/W) See Table 35-2. Map of BASE Address of FS (R/W) See Table 35-2. Map of BASE Address of GS (R/W) See Table 35-2. Swap Target of BASE Address of GS (R/W) See Table 35-2. AUXILIARY TSC Signature (R/W) See Table 35-2 and Section 17.13.2, IA32_TSC_AUX Register and RDTSCP Support. Register Name Scope Bit Description
...
Table 35-13. Selected MSRs Supported by Intel Xeon Processors E5 Family (Based on Intel Microarchitecture Code Name Sandy Bridge)
Register Address Hex 17FH Dec 383 MSR_ERROR_CONTROL 0 1 Package MC Bank Error Configuration (R/W) Reserved MemError Log Enable (R/W) When set, enables IMC status bank to log additional info in bits 36:32. 63:2 Reserved. Register Name Scope Bit Description
366
Table 35-13. Selected MSRs Supported by Intel Xeon Processors E5 Family (Based on Intel Microarchitecture Code Name Sandy Bridge) (Contd.)
Register Address Hex 285H 286H 287H 288H 289H 28AH 28BH 28CH 28DH 28EH 28FH 290H 291H 292H 293H 39CH Dec 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 924 IA32_MC5_CTL2 IA32_MC6_CTL2 IA32_MC7_CTL2 IA32_MC8_CTL2 IA32_MC9_CTL2 IA32_MC10_CTL2 IA32_MC11_CTL2 IA32_MC12_CTL2 IA32_MC13_CTL2 IA32_MC14_CTL2 IA32_MC15_CTL2 IA32_MC16_CTL2 IA32_MC17_CTL2 IA32_MC18_CTL2 IA32_MC19_CTL2 MSR_PEBS_NUM_ALT 0 Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package ENABLE_PEBS_NUM_ALT (RW) Write 1 to enable alternate PEBS counting logic for specific events requiring additional configuration, see Table 19-9 63:1 414H 415H 416H 417H 418H 419H 41AH 41BH 41CH 41DH 41EH 41FH 420H 421H 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 MSR_MC5_CTL MSR_MC5_STATUS MSR_MC5_ADDR MSR_MC5_MISC MSR_MC6_CTL MSR_MC6_STATUS MSR_MC6_ADDR MSR_MC6_MISC MSR_MC7_CTL MSR_MC7_STATUS MSR_MC7_ADDR MSR_MC7_MISC MSR_MC8_CTL MSR_MC8_STATUS Package Package Package Package Package Package Package Package Package Package Package Package Package Package Reserved (must be zero). See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Register Name Scope Bit Description
367
Table 35-13. Selected MSRs Supported by Intel Xeon Processors E5 Family (Based on Intel Microarchitecture Code Name Sandy Bridge) (Contd.)
Register Address Hex 422H 423H 424H 425H 426H 427H 428H 429H 42AH 42BH 42CH 42DH 42EH 42FH 430H 431H 432H 433H 434H 435H 436H 437H 438H 439H 43AH 43BH 43CH 43DH 43EH 43FH 440H 441H 442H Dec 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 MSR_MC8_ADDR MSR_MC8_MISC MSR_MC9_CTL MSR_MC9_STATUS MSR_MC9_ADDR MSR_MC9_MISC MSR_MC10_CTL MSR_MC10_STATUS MSR_MC10_ADDR MSR_MC10_MISC MSR_MC11_CTL MSR_MC11_STATUS MSR_MC11_ADDR MSR_MC11_MISC MSR_MC12_CTL MSR_MC12_STATUS MSR_MC12_ADDR MSR_MC12_MISC MSR_MC13_CTL MSR_MC13_STATUS MSR_MC13_ADDR MSR_MC13_MISC MSR_MC14_CTL MSR_MC14_STATUS MSR_MC14_ADDR MSR_MC14_MISC MSR_MC15_CTL MSR_MC15_STATUS MSR_MC15_ADDR MSR_MC15_MISC MSR_MC16_CTL MSR_MC16_STATUS MSR_MC16_ADDR Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. Register Name Scope Bit Description
368
Table 35-13. Selected MSRs Supported by Intel Xeon Processors E5 Family (Based on Intel Microarchitecture Code Name Sandy Bridge) (Contd.)
Register Address Hex 443H 444H 445H 446H 447H 448H 449H 44AH 44BH 44CH 44DH 44EH 44FH 613H 618H 619H 61BH 61CH Dec 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1555 1560 1561 1563 1564 MSR_MC16_MISC MSR_MC17_CTL MSR_MC17_STATUS MSR_MC17_ADDR MSR_MC17_MISC MSR_MC18_CTL MSR_MC18_STATUS MSR_MC18_ADDR MSR_MC18_MISC MSR_MC19_CTL MSR_MC19_STATUS MSR_MC19_ADDR MSR_MC19_MISC MSR_PKG_PERF_STATUS MSR_DRAM_POWER_LIMIT MSR_DRAM_ENERY_ STATUS MSR_DRAM_PERF_STATUS MSR_DRAM_POWER_INFO Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. Package RAPL Perf Status (R/O) DRAM RAPL Power Limit Control (R/W) See Section 14.7.5, DRAM RAPL Domain. DRAM Energy Status (R/O) See Section 14.7.5, DRAM RAPL Domain. DRAM Performance Throttling Status (R/O) See Section 14.7.5, DRAM RAPL Domain. DRAM RAPL Parameters (R/W) See Section 14.7.5, DRAM RAPL Domain. ... Register Name Scope Bit Description
Table 35-15. Selected MSRs Supported by Intel Xeon Processors E5 Family v2 (Based on Intel Microarchitecture Code Name Ivy Bridge)
Register Address Hex 17FH Dec 383 MSR_ERROR_CONTROL 0 1 Package MC Bank Error Configuration (R/W) Reserved MemError Log Enable (R/W) When set, enables IMC status bank to log additional info in bits 36:32. 63:2 Reserved. Register Name Scope Bit Description
369
Table 35-15. Selected MSRs Supported by Intel Xeon Processors E5 Family v2 (Based on Intel Microarchitecture Code Name Ivy Bridge) (Contd.)
Register Address Hex 285H 286H 287H 288H 289H 28AH 28BH 28CH 28DH 28EH 28FH 290H 291H 292H 293H 414H 415H 416H 417H 418H 419H 41AH 41BH 41CH 41DH 41EH 41FH 420H 421H 422H 423H 424H 425H Dec 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 IA32_MC5_CTL2 IA32_MC6_CTL2 IA32_MC7_CTL2 IA32_MC8_CTL2 IA32_MC9_CTL2 IA32_MC10_CTL2 IA32_MC11_CTL2 IA32_MC12_CTL2 IA32_MC13_CTL2 IA32_MC14_CTL2 IA32_MC15_CTL2 IA32_MC16_CTL2 IA32_MC17_CTL2 IA32_MC18_CTL2 IA32_MC19_CTL2 MSR_MC5_CTL MSR_MC5_STATUS MSR_MC5_ADDR MSR_MC5_MISC MSR_MC6_CTL MSR_MC6_STATUS MSR_MC6_ADDR MSR_MC6_MISC MSR_MC7_CTL MSR_MC7_STATUS MSR_MC7_ADDR MSR_MC7_MISC MSR_MC8_CTL MSR_MC8_STATUS MSR_MC8_ADDR MSR_MC8_MISC MSR_MC9_CTL MSR_MC9_STATUS Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. Register Name Scope Bit Description
370
Table 35-15. Selected MSRs Supported by Intel Xeon Processors E5 Family v2 (Based on Intel Microarchitecture Code Name Ivy Bridge) (Contd.)
Register Address Hex 426H 427H 428H 429H 42AH 42BH 42CH 42DH 42EH 42FH 430H 431H 432H 433H 434H 435H 436H 437H 438H 439H 43AH 43BH 43CH 43DH 43EH 43FH 440H 441H 442H 443H 444H 445H 446H Dec 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 MSR_MC9_ADDR MSR_MC9_MISC MSR_MC10_CTL MSR_MC10_STATUS MSR_MC10_ADDR MSR_MC10_MISC MSR_MC11_CTL MSR_MC11_STATUS MSR_MC11_ADDR MSR_MC11_MISC MSR_MC12_CTL MSR_MC12_STATUS MSR_MC12_ADDR MSR_MC12_MISC MSR_MC13_CTL MSR_MC13_STATUS MSR_MC13_ADDR MSR_MC13_MISC MSR_MC14_CTL MSR_MC14_STATUS MSR_MC14_ADDR MSR_MC14_MISC MSR_MC15_CTL MSR_MC15_STATUS MSR_MC15_ADDR MSR_MC15_MISC MSR_MC16_CTL MSR_MC16_STATUS MSR_MC16_ADDR MSR_MC16_MISC MSR_MC17_CTL MSR_MC17_STATUS MSR_MC17_ADDR Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. Register Name Scope Bit Description
371
Table 35-15. Selected MSRs Supported by Intel Xeon Processors E5 Family v2 (Based on Intel Microarchitecture Code Name Ivy Bridge) (Contd.)
Register Address Hex 447H 448H 449H 44AH 44BH 44CH 44DH 44EH 44FH 450H 451H 452H 453H 454H 455H 456H 457H 458H 459H 45AH 45BH 45CH 45DH 45EH 45FH 460H 461H 462H 463H 613H 618H Dec 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1555 1560 MSR_MC17_MISC MSR_MC18_CTL MSR_MC18_STATUS MSR_MC18_ADDR MSR_MC18_MISC MSR_MC19_CTL MSR_MC19_STATUS MSR_MC19_ADDR MSR_MC19_MISC MSR_MC20_CTL MSR_MC20_STATUS MSR_MC20_ADDR MSR_MC20_MISC MSR_MC21_CTL MSR_MC21_STATUS MSR_MC21_ADDR MSR_MC21_MISC MSR_MC22_CTL MSR_MC22_STATUS MSR_MC22_ADDR MSR_MC22_MISC MSR_MC23_CTL MSR_MC23_STATUS MSR_MC23_ADDR MSR_MC23_MISC MSR_MC24_CTL MSR_MC24_STATUS MSR_MC24_ADDR MSR_MC24_MISC MSR_PKG_PERF_STATUS MSR_DRAM_POWER_LIMIT Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package Package See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS, and Chapter 16. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. See Section 15.3.2.4, IA32_MCi_MISC MSRs. Package RAPL Perf Status (R/O) DRAM RAPL Power Limit Control (R/W) See Section 14.7.5, DRAM RAPL Domain. Register Name Scope Bit Description
372
Table 35-15. Selected MSRs Supported by Intel Xeon Processors E5 Family v2 (Based on Intel Microarchitecture Code Name Ivy Bridge) (Contd.)
Register Address Hex 619H 61BH 61CH Dec 1561 1563 1564 MSR_DRAM_ENERY_ STATUS MSR_DRAM_PERF_STATUS MSR_DRAM_POWER_INFO Package Package Package DRAM Energy Status (R/O) See Section 14.7.5, DRAM RAPL Domain. DRAM Performance Throttling Status (R/O) See Section 14.7.5, DRAM RAPL Domain. DRAM RAPL Parameters (R/W) See Section 14.7.5, DRAM RAPL Domain. ... Register Name Scope Bit Description
35.10
The following MSRs are available in future generation of Intel Xeon Processor Family (CPUID DisplayFamily_DisplayModel = 06_3F) if CPUID.(EAX=07H, ECX=0):EBX.QoS[bit 12] = 1.
Table 35-17. Additional MSRs Supported by Future Generation Intel Xeon Processors
Register Address Hex C8DH Dec 3113 IA32_QM_EVTSEL 7:0 31:8 41:32 63:42 C8EH 3114 IA32_QM_CTR 61:0 62 63 C8FH 3115 IA32_PQR_ASSOC 9:0 63: 10 ... THREAD THREAD THREAD QoS Monitoring Event Select Register(R/W). EventID (RW) Reserved. RMID (RW) Reserved. QoS Monitoring Counter Register (R/O). Resource Monitored Data Unavailable: If 1, indicates data for this RMID is not available or not monitored for this resource or RMID. Error: If 1, indicates and unsupported RMID or event type was written to IA32_PQR_QM_EVTSEL. QoS Resource Association Register (R/W). RMID Reserved Register Name Scope Bit Description
373
2AH
42
MSR_EBC_HARD_POWERON
374
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec 3 Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description MCERR# Observation Disabled (R) Indicates whether MCERR# observation is enabled (0) or disabled (1) as determined by the strapping of A9#. The value in this bit is written on the deassertion of RESET#; the bit is set to 1 when the address bus signal is asserted. 4 BINIT# Observation Enabled (R) Indicates whether BINIT# observation is enabled (0) or disabled (1) as determined by the strapping of A10#. The value in this bit is written on the deassertion of RESET#; the bit is set to 1 when the address bus signal is asserted. 6:5 APIC Cluster ID (R) Contains the logical APIC cluster ID value as set by the strapping of A12# and A11#. The logical cluster ID value is written into the field on the deassertion of RESET#; the field is set to 1 when the address bus signal is asserted. 7 Bus Park Disable (R) Indicates whether bus park is enabled (0) or disabled (1) as set by the strapping of A15#. The value in this bit is written on the deassertion of RESET#; the bit is set to 1 when the address bus signal is asserted. 11:8 13:12 Reserved. Agent ID (R) Contains the logical agent ID value as set by the strapping of BR[3:0]. The logical ID value is written into the field on the deassertion of RESET#; the field is set to 1 when the address bus signal is asserted. 63:14 2BH 43 MSR_EBC_SOFT_POWERON 0 0, 1, 2, 3, 4, 6 Shared Reserved. Processor Soft Power-On Configuration (R/W) Enables and disables processor features. RCNT/SCNT On Request Encoding Enable (R/W) Controls the driving of RCNT/SCNT on the request encoding. Set to enable (1); clear to disabled (0, default). 1 Data Error Checking Disable (R/W) Set to disable system data bus parity checking; clear to enable parity checking.
375
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec 2 3 4 Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Response Error Checking Disable (R/W) Set to disable (default); clear to enable. Address/Request Error Checking Disable (R/W) Set to disable (default); clear to enable. Initiator MCERR# Disable (R/W) Set to disable MCERR# driving for initiator bus requests (default); clear to enable. 5 Internal MCERR# Disable (R/W) Set to disable MCERR# driving for initiator internal errors (default); clear to enable. 6 BINIT# Driver Disable (R/W) Set to disable BINIT# driver (default); clear to enable driver. 63:7 2CH 44 MSR_EBC_FREQUENCY_ID 2,3, 4, 6 Shared Reserved. Processor Frequency Configuration The bit field layout of this MSR varies according to the MODEL value in the CPUID version information. The following bit field layout applies to Pentium 4 and Xeon Processors with MODEL encoding equal or greater than 2. (R) The field Indicates the current processor frequency configuration. 15:0 18:16 Reserved. Scalable Bus Speed (R/W) Indicates the intended scalable bus speed: Encoding Scalable Bus Speed 000B 100 MHz (Model 2) 000B 266 MHz (Model 3 or 4) 001B 133 MHz 010B 200 MHz 011B 166 MHz 100B 333 MHz (Model 6) 133.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 001B. 166.67 MHz should be utilized if performing calculation with System Bus Speed when encoding is 011B.
376
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description 266.67 MHz should be utilized if performing calculation with System Bus Speed when encoding is 000B and model encoding = 3 or 4. 333.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 100B and model encoding = 6. All other values are reserved. 23:19 31:24 Reserved. Core Clock Frequency to System Bus Frequency Ratio (R) The processor core clock frequency to system bus frequency ratio observed at the de-assertion of the reset pin. 63:25 2CH 44 MSR_EBC_FREQUENCY_ID 0, 1 Shared Reserved. Processor Frequency Configuration (R) The bit field layout of this MSR varies according to the MODEL value of the CPUID version information. This bit field layout applies to Pentium 4 and Xeon Processors with MODEL encoding less than 2. Indicates current processor frequency configuration. 20:0 23:21 Reserved. Scalable Bus Speed (R/W) Indicates the intended scalable bus speed: Encoding Scalable Bus Speed 000B 100 MHz All others values reserved. 63:24 3AH 58 IA32_FEATURE_CONTROL 3, 4, 6 Unique Reserved. Control Features in IA-32 Processor (R/W) See Table 35-2 (If CPUID.01H:ECX.[bit 5]) 79H 8BH 9BH 121 139 155 IA32_BIOS_UPDT_TRIG IA32_BIOS_SIGN_ID IA32_SMM_MONITOR_CTL 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 3, 4, 6 Shared Unique Unique BIOS Update Trigger Register (W) See Table 35-2. BIOS Update Signature ID (R/W) See Table 35-2. SMM Monitor Configuration (R/W) See Table 35-2.
377
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex FEH Dec 254 IA32_MTRRCAP Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Unique Bit Description MTRR Information See Section 11.11.1, MTRR Feature Identification.. Unique CS register target for CPL 0 code (R/W) See Table 35-2. See Section 5.8.7, Performing Fast Calls to System Procedures with the SYSENTER and SYSEXIT Instructions. 175H 373 IA32_SYSENTER_ESP 0, 1, 2, 3, 4, 6 Unique Stack pointer for CPL 0 stack (R/W) See Table 35-2. See Section 5.8.7, Performing Fast Calls to System Procedures with the SYSENTER and SYSEXIT Instructions. 176H 374 IA32_SYSENTER_EIP 0, 1, 2, 3, 4, 6 Unique CPL 0 code entry point (R/W) See Table 35-2. See Section 5.8.7, Performing Fast Calls to System Procedures with the SYSENTER and SYSEXIT Instructions. Unique Machine Check Capabilities (R) See Table 35-2. See Section 15.3.1.1, IA32_MCG_CAP MSR. Unique Machine Check Status. (R) See Table 35-2. See Section 15.3.1.2, IA32_MCG_STATUS MSR. Machine Check Feature Enable (R/W) See Table 35-2. See Section 15.3.1.3, IA32_MCG_CTL MSR. 180H 384 MSR_MCG_RAX 0, 1, 2, 3, 4, 6 Unique Machine Check EAX/RAX Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check EBX/RBX Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check ECX/RCX Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs.
174H
372
IA32_SYSENTER_CS
179H
377
IA32_MCG_CAP
0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6
17AH
378
IA32_MCG_STATUS
17BH
379
IA32_MCG_CTL
63:0
181H
385
MSR_MCG_RBX
63:0
182H
386
MSR_MCG_RCX
378
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec 63:0 Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check EDX/RDX Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check ESI/RSI Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check EDI/RDI Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check EBP/RBP Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check ESP/RSP Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check EFLAGS/RFLAG Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data.
183H
387
MSR_MCG_RDX
63:0
184H
388
MSR_MCG_RSI
63:0
185H
389
MSR_MCG_RDI
63:0
186H
390
MSR_MCG_RBP
63:0
187H
391
MSR_MCG_RSP
63:0
188H
392
MSR_MCG_RFLAGS
63:0
379
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 189H Dec 393 MSR_MCG_RIP Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 Shared/ Unique1 Unique Bit Description Machine Check EIP/RIP Save State See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Contains register state at time of machine check error. When in non-64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check Miscellaneous See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. DS When set, the bit indicates that a page assist or page fault occurred during DS normal operation. The processors response is to shut down. The bit is used as an aid for debugging DS handling code. It is the responsibility of the user (BIOS or operating system) to clear this bit for normal operation. 63:1 18BH 18FH 190H 395 400 MSR_MCG_RESERVED1 MSR_MCG_RESERVED5 MSR_MCG_R8 0, 1, 2, 3, 4, 6 Unique Reserved. Reserved. Machine Check R8 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R9D/R9 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R10 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs.
63:0
18AH
394
MSR_MCG_MISC
63-0
191H
401
MSR_MCG_R9
63-0
192H
402
MSR_MCG_R10
380
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec 63-0 Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R11 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R12 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R13 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R14 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs. Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R15 See Section 15.3.2.6, IA32_MCG Extended Machine Check State MSRs.
193H
403
MSR_MCG_R11
63-0
194H
404
MSR_MCG_R12
63-0
195H
405
MSR_MCG_R13
63-0
196H
406
MSR_MCG_R14
63-0
197H
407
MSR_MCG_R15
381
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec 63-0 Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 3, 4, 6 3, 4, 6 0, 1, 2, 3, 4, 6 Unique Unique Unique See Table 35-2. See Section 14.1, Enhanced Intel Speedstep Technology. See Table 35-2. See Section 14.1, Enhanced Intel Speedstep Technology. Thermal Monitor Control (R/W) See Table 35-2. See Section 14.5.3, Software Controlled Clock Modulation. 19BH 411 IA32_THERM_INTERRUPT 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Unique Thermal Interrupt Control (R/W) See Section 14.5.2, Thermal Monitor, and see Table 35-2. Shared Thermal Monitor Status (R/W) See Section 14.5.2, Thermal Monitor, and see Table 35-2. Thermal Monitor 2 Control. 3, Shared For Family F, Model 3 processors: When read, specifies the value of the target TM2 transition last written. When set, it sets the next target value for TM2 transition. For Family F, Model 4 and Model 6 processors: When read, specifies the value of the target TM2 transition last written. Writes may cause #GP exceptions. Enable Miscellaneous Processor Features (R/W) Fast-Strings Enable. See Table 35-2. Reserved. x87 FPU Fopcode Compatibility Mode Enable Thermal Monitor 1 Enable See Section 14.5.2, Thermal Monitor, and see Table 35-2.
19CH
412
IA32_THERM_STATUS
19DH
413
MSR_THERM2_CTL
4, 6
Shared
1A0H
416
IA32_MISC_ENABLE 0 1 2 3
0, 1, 2, 3, 4, 6
Shared
382
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec 4 Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Split-Lock Disable When set, the bit causes an #AC exception to be issued instead of a split-lock cycle. Operating systems that set this bit must align system structures to avoid split-lock scenarios. When the bit is clear (default), normal split-locks are issued to the bus. This debug feature is specific to the Pentium 4 processor. 5 6 Reserved. Third-Level Cache Disable (R/W) When set, the third-level cache is disabled; when clear (default) the third-level cache is enabled. This flag is reserved for processors that do not have a third-level cache. Note that the bit controls only the third-level cache; and only if overall caching is enabled through the CD flag of control register CR0, the page-level cache controls, and/or the MTRRs. See Section 11.5.4, Disabling and Enabling the L3 Cache. 7 8 Performance Monitoring Available (R) See Table 35-2. Suppress Lock Enable When set, assertion of LOCK on the bus is suppressed during a Split Lock access. When clear (default), LOCK is not suppressed. 9 Prefetch Queue Disable When set, disables the prefetch queue. When clear (default), enables the prefetch queue. 10 FERR# Interrupt Reporting Enable (R/W) When set, interrupt reporting through the FERR# pin is enabled; when clear, this interrupt reporting function is disabled. When this flag is set and the processor is in the stop-clock state (STPCLK# is asserted), asserting the FERR# pin signals to the processor that an interrupt (such as, INIT#, BINIT#, INTR, NMI, SMI#, or RESET#) is pending and that the processor should return to normal operation to handle the interrupt.
383
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description This flag does not affect the normal operation of the FERR# pin (to indicate an unmasked floatingpoint error) when the STPCLK# pin is not asserted. 11 Branch Trace Storage Unavailable (BTS_UNAVILABLE) (R) See Table 35-2. When set, the processor does not support branch trace storage (BTS); when clear, BTS is supported. 12 PEBS_UNAVILABLE: Precise Event Based Sampling Unavailable (R) See Table 35-2. When set, the processor does not support precise event-based sampling (PEBS); when clear, PEBS is supported. 13 3 TM2 Enable (R/W) When this bit is set (1) and the thermal sensor indicates that the die temperature is at the predetermined threshold, the Thermal Monitor 2 mechanism is engaged. TM2 will reduce the bus to core ratio and voltage according to the value last written to MSR_THERM2_CTL bits 15:0. When this bit is clear (0, default), the processor does not change the VID signals or the bus to core ratio when the processor enters a thermal managed state. If the TM2 feature flag (ECX[8]) is not set to 1 after executing CPUID with EAX = 1, then this feature is not supported and BIOS must not alter the contents of this bit location. The processor is operating out of spec if both this bit and the TM1 bit are set to disabled states. 17:14 18 19 3, 4, 6 Reserved. ENABLE MONITOR FSM (R/W) See Table 35-2. Adjacent Cache Line Prefetch Disable (R/W) When set to 1, the processor fetches the cache line of the 128-byte sector containing currently required data. When set to 0, the processor fetches both cache lines in the sector.
384
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex Dec Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Single processor platforms should not set this bit. Server platforms should set or clear this bit based on platform performance observed in validation and testing. BIOS may contain a setup option that controls the setting of this bit. 21:20 22 3, 4, 6 Reserved. Limit CPUID MAXVAL (R/W) See Table 35-2. Setting this can cause unexpected behavior to software that depends on the availability of CPUID leaves greater than 3. 23 24 Shared xTPR Message Disable (R/W) See Table 35-2. L1 Data Cache Context Mode (R/W) When set, the L1 data cache is placed in shared mode; when clear (default), the cache is placed in adaptive mode. This bit is only enabled for IA-32 processors that support Intel Hyper-Threading Technology. See Section 11.5.6, L1 Data Cache Context Mode. When L1 is running in adaptive mode and CR3s are identical, data in L1 is shared across logical processors. Otherwise, L1 is not shared and cache use is competitive. If the Context ID feature flag (ECX[10]) is set to 0 after executing CPUID with EAX = 1, the ability to switch modes is not supported. BIOS must not alter the contents of IA32_MISC_ENABLE[24]. 33:25 34 63:35 1A1H 417 MSR_PLATFORM_BRV 17:0 18 3, 4, 6 Shared Unique Reserved. XD Bit Disable (R/W) See Table 35-2. Reserved. Platform Feature Requirements (R) Reserved. PLATFORM Requirements When set to 1, indicates the processor has specific platform requirements. The details of the platform requirements are listed in the respective data sheets of the processor. 63:19 Reserved.
385
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 1D7H Dec 471 MSR_LER_FROM_LIP Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 Shared/ Unique1 Unique Bit Description Last Exception Record From Linear IP (R) Contains a pointer to the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. See Section 17.9.3, Last Exception Records. 31:0 63:32 1D7H 471 63:0 Unique From Linear IP Linear address of the last branch instruction. Reserved. From Linear IP Linear address of the last branch instruction (If IA32e mode is active). 1D8H 472 MSR_LER_TO_LIP 0, 1, 2, 3, 4, 6 Unique Last Exception Record To Linear IP (R) This area contains a pointer to the target of the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. See Section 17.9.3, Last Exception Records. 31:0 From Linear IP Linear address of the target of the last branch instruction. 63:32 1D8H 472 63:0 Unique Reserved. From Linear IP Linear address of the target of the last branch instruction (If IA-32e mode is active). 1D9H 473 MSR_DEBUGCTLA 0, 1, 2, 3, 4, 6 Unique Debug Control (R/W) Controls how several debug features are used. Bit definitions are discussed in the referenced section. See Section 17.9.1, MSR_DEBUGCTLA MSR. 1DAH 474 MSR_LASTBRANCH _TOS 0, 1, 2, 3, 4, 6 Unique Last Branch Record Stack TOS (R/W) Contains an index (0-3 or 0-15) that points to the top of the last branch record stack (that is, that points the index of the MSR containing the most recent branch record). See Section 17.9.2, LBR Stack for Processors Based on Intel NetBurst Microarchitecture; and addresses 1DBH-1DEH and 680H-68FH.
386
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 1DBH Dec 475 MSR_LASTBRANCH_0 Register Name Fields and Flags Model Availability 0, 1, 2 Shared/ Unique1 Unique Bit Description Last Branch Record 0 (R/W) One of four last branch record registers on the last branch record stack. It contains pointers to the source and destination instruction for one of the last four branches, exceptions, or interrupts that the processor took. MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3 at 1DBH-1DEH are available only on family 0FH, models 0H-02H. They have been replaced by the MSRs at 680H68FH and 6C0H-6CFH. See Section 17.9, Last Branch, Interrupt, and Exception Recording (Processors based on Intel NetBurst Microarchitecture). 1DDH 477 MSR_LASTBRANCH_2 0, 1, 2 Unique Last Branch Record 2 See description of the MSR_LASTBRANCH_0 MSR at 1DBH. 1DEH 478 MSR_LASTBRANCH_3 0, 1, 2 Unique Last Branch Record 3 See description of the MSR_LASTBRANCH_0 MSR at 1DBH. 200H 201H 202H 203H 204H 205H 206H 207H 208H 512 513 514 515 516 517 518 519 520 IA32_MTRR_PHYSBASE0 IA32_MTRR_PHYSMASK0 IA32_MTRR_PHYSBASE1 IA32_MTRR_PHYSMASK1 IA32_MTRR_PHYSBASE2 IA32_MTRR_PHYSMASK2 IA32_MTRR_PHYSBASE3 IA32_MTRR_PHYSMASK3 IA32_MTRR_PHYSBASE4 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared Shared Shared Shared Shared Shared Shared Shared Variable Range Base MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs.
387
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 209H 20AH 20BH 20CH 20DH 20EH 20FH 250H 258H 259H 268H 269H 26AH 26BH 26CH 26DH 26EH 26FH Dec 521 522 523 524 525 526 527 592 600 601 616 617 618 619 620 621 622 623 IA32_MTRR_PHYSMASK4 IA32_MTRR_PHYSBASE5 IA32_MTRR_PHYSMASK5 IA32_MTRR_PHYSBASE6 IA32_MTRR_PHYSMASK6 IA32_MTRR_PHYSBASE7 IA32_MTRR_PHYSMASK7 IA32_MTRR_FIX64K_00000 IA32_MTRR_FIX16K_80000 IA32_MTRR_FIX16K_A0000 IA32_MTRR_FIX4K_C0000 IA32_MTRR_FIX4K_C8000 IA32_MTRR_FIX4K_D0000 IA32_MTRR_FIX4K_D8000 IA32_MTRR_FIX4K_E0000 IA32_MTRR_FIX4K_E8000 IA32_MTRR_FIX4K_F0000 IA32_MTRR_FIX4K_F8000 Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Bit Description Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Variable Range Mask MTRR See Section 11.11.2.3, Variable Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs. Fixed Range MTRR See Section 11.11.2.2, Fixed Range MTRRs.
388
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 277H 2FFH Dec 631 767 IA32_PAT IA32_MTRR_DEF_TYPE Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Unique Shared Bit Description Page Attribute Table See Section 11.11.2.2, Fixed Range MTRRs. Default Memory Types (R/W) See Table 35-2. See Section 11.11.2.1, IA32_MTRR_DEF_TYPE MSR. 300H 301H 302H 303H 304H 305H 306H 307H 308H 309H 30AH 30BH 3OCH 3ODH 3OEH 3OFH 310H 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 MSR_BPU_COUNTER0 MSR_BPU_COUNTER1 MSR_BPU_COUNTER2 MSR_BPU_COUNTER3 MSR_MS_COUNTER0 MSR_MS_COUNTER1 MSR_MS_COUNTER2 MSR_MS_COUNTER3 MSR_FLAME_COUNTER0 MSR_FLAME_COUNTER1 MSR_FLAME_COUNTER2 MSR_FLAME_COUNTER3 MSR_IQ_COUNTER0 MSR_IQ_COUNTER1 MSR_IQ_COUNTER2 MSR_IQ_COUNTER3 MSR_IQ_COUNTER4 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters. See Section 18.11.2, Performance Counters.
389
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 311H 360H 361H 362H 363H 364H 365H 366H 367H 368H 369H 36AH 36BH 36CH 36DH 36EH 36FH 370H 371H 3A0H Dec 785 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 928 MSR_IQ_COUNTER5 MSR_BPU_CCCR0 MSR_BPU_CCCR1 MSR_BPU_CCCR2 MSR_BPU_CCCR3 MSR_MS_CCCR0 MSR_MS_CCCR1 MSR_MS_CCCR2 MSR_MS_CCCR3 MSR_FLAME_CCCR0 MSR_FLAME_CCCR1 MSR_FLAME_CCCR2 MSR_FLAME_CCCR3 MSR_IQ_CCCR0 MSR_IQ_CCCR1 MSR_IQ_CCCR2 MSR_IQ_CCCR3 MSR_IQ_CCCR4 MSR_IQ_CCCR5 MSR_BSU_ESCR0 Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Bit Description See Section 18.11.2, Performance Counters. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.3, CCCR MSRs. See Section 18.11.1, ESCR MSRs.
390
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 3A1H 3A2H 3A3H 3A4H 3A5H 3A6H 3A7H 3A8H 3A9H 3AAH 3ABH 3ACH 3ADH 3AEH 3AFH 3B0H 3B1H 3B2H 3B3H 3B4H Dec 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 MSR_BSU_ESCR1 MSR_FSB_ESCR0 MSR_FSB_ESCR1 MSR_FIRM_ESCR0 MSR_FIRM_ESCR1 MSR_FLAME_ESCR0 MSR_FLAME_ESCR1 MSR_DAC_ESCR0 MSR_DAC_ESCR1 MSR_MOB_ESCR0 MSR_MOB_ESCR1 MSR_PMH_ESCR0 MSR_PMH_ESCR1 MSR_SAAT_ESCR0 MSR_SAAT_ESCR1 MSR_U2L_ESCR0 MSR_U2L_ESCR1 MSR_BPU_ESCR0 MSR_BPU_ESCR1 MSR_IS_ESCR0 Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Bit Description See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs.
391
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 3B5H 3B6H 3B7H 3B8H 3B9H 3BAH Dec 949 950 951 952 953 954 MSR_IS_ESCR1 MSR_ITLB_ESCR0 MSR_ITLB_ESCR1 MSR_CRU_ESCR0 MSR_CRU_ESCR1 MSR_IQ_ESCR0 Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2 Shared/ Unique1 Shared Shared Shared Shared Shared Shared Bit Description See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. This MSR is not available on later processors. It is only available on processor family 0FH, models 01H-02H. 3BBH 955 MSR_IQ_ESCR1 0, 1, 2 Shared See Section 18.11.1, ESCR MSRs. This MSR is not available on later processors. It is only available on processor family 0FH, models 01H-02H. 3BCH 3BDH 3BEH 3C0H 3C1H 3C2H 3C3H 3C4H 3C5H 3C8H 3C9H 956 957 958 960 961 962 963 964 965 968 969 MSR_RAT_ESCR0 MSR_RAT_ESCR1 MSR_SSU_ESCR0 MSR_MS_ESCR0 MSR_MS_ESCR1 MSR_TBPU_ESCR0 MSR_TBPU_ESCR1 MSR_TC_ESCR0 MSR_TC_ESCR1 MSR_IX_ESCR0 MSR_IX_ESCR0 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs.
392
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 3CAH 3CBH 3CCH 3CDH 3E0H 3E1H 3FOH 3F1H Dec 970 971 972 973 992 993 1008 1009 MSR_ALF_ESCR0 MSR_ALF_ESCR1 MSR_CRU_ESCR2 MSR_CRU_ESCR3 MSR_CRU_ESCR4 MSR_CRU_ESCR5 MSR_TC_PRECISE_EVENT MSR_PEBS_ENABLE Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Shared Shared Shared Shared Shared Shared Shared Shared Bit Description See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. See Section 18.11.1, ESCR MSRs. Precise Event-Based Sampling (PEBS) (R/W) Controls the enabling of precise event sampling and replay tagging. See Table 19-25. Reserved. UOP Tag Enables replay tagging when set. 25 ENABLE_PEBS_MY_THR (R/W) Enables PEBS for the target logical processor when set; disables PEBS when clear (default). See Section 18.12.3, IA32_PEBS_ENABLE MSR, for an explanation of the target logical processor. This bit is called ENABLE_PEBS in IA-32 processors that do not support Intel HyperThreading Technology. 26 ENABLE_PEBS_OTH_THR (R/W) Enables PEBS for the target logical processor when set; disables PEBS when clear (default). See Section 18.12.3, IA32_PEBS_ENABLE MSR, for an explanation of the target logical processor. This bit is reserved for IA-32 processors that do not support Intel Hyper-Threading Technology. 63:27 3F2H 1010 MSR_PEBS_MATRIX_VERT 0, 1, 2, 3, 4, 6 Shared Reserved. See Table 19-25.
12:0 23:13 24
393
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 400H 401H 402H Dec 1024 1025 1026 IA32_MC0_CTL IA32_MC0_STATUS IA32_MC0_ADDR Register Name Fields and Flags Model Availability 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared/ Unique1 Shared Shared Shared Bit Description See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC0_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC0_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 403H 1027 IA32_MC0_MISC 0, 1, 2, 3, 4, 6 Shared See Section 15.3.2.4, IA32_MCi_MISC MSRs. The IA32_MC0_MISC MSR is either not implemented or does not contain additional information if the MISCV flag in the IA32_MC0_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 404H 405H 406H 1028 1029 1030 IA32_MC1_CTL IA32_MC1_STATUS IA32_MC1_ADDR 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared Shared See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC1_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC1_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 407H 1031 IA32_MC1_MISC Shared See Section 15.3.2.4, IA32_MCi_MISC MSRs. The IA32_MC1_MISC MSR is either not implemented or does not contain additional information if the MISCV flag in the IA32_MC1_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 408H 409H 1032 1033 IA32_MC2_CTL IA32_MC2_STATUS 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS.
394
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 40AH Dec 1034 IA32_MC2_ADDR Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC2_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC2_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 40BH 1035 IA32_MC2_MISC See Section 15.3.2.4, IA32_MCi_MISC MSRs. The IA32_MC2_MISC MSR is either not implemented or does not contain additional information if the MISCV flag in the IA32_MC2_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 40CH 40DH 40EH 1036 1037 1038 IA32_MC3_CTL IA32_MC3_STATUS IA32_MC3_ADDR 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared Shared See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC3_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC3_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 40FH 1039 IA32_MC3_MISC 0, 1, 2, 3, 4, 6 Shared See Section 15.3.2.4, IA32_MCi_MISC MSRs. The IA32_MC3_MISC MSR is either not implemented or does not contain additional information if the MISCV flag in the IA32_MC3_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 410H 411H 1040 1041 IA32_MC4_CTL IA32_MC4_STATUS 0, 1, 2, 3, 4, 6 0, 1, 2, 3, 4, 6 Shared Shared See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS.
395
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 412H Dec 1042 IA32_MC4_ADDR Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC2_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC4_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 413H 1043 IA32_MC4_MISC See Section 15.3.2.4, IA32_MCi_MISC MSRs. The IA32_MC2_MISC MSR is either not implemented or does not contain additional information if the MISCV flag in the IA32_MC4_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 480H 1152 IA32_VMX_BASIC 3, 4, 6 Unique Reporting Register of Basic VMX Capabilities (R/O) See Table 35-2. See Appendix A.1, Basic VMX Information. 481H 1153 IA32_VMX_PINBASED_CTLS 3, 4, 6 Unique Capability Reporting Register of Pin-based VM-execution Controls (R/O) See Table 35-2. See Appendix A.3, VM-Execution Controls. 482H 1154 IA32_VMX_PROCBASED_CTLS 3, 4, 6 Unique Capability Reporting Register of Primary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls, and see Table 35-2. 483H 1155 IA32_VMX_EXIT_CTLS 3, 4, 6 Unique Capability Reporting Register of VM-exit Controls (R/O) See Appendix A.4, VM-Exit Controls, and see Table 35-2. 484H 1156 IA32_VMX_ENTRY_CTLS 3, 4, 6 Unique Capability Reporting Register of VM-entry Controls (R/O) See Appendix A.5, VM-Entry Controls, and see Table 35-2. 485H 1157 IA32_VMX_MISC 3, 4, 6 Unique Reporting Register of Miscellaneous VMX Capabilities (R/O) See Appendix A.6, Miscellaneous Data, and see Table 35-2.
396
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 486H Dec 1158 IA32_VMX_CR0_FIXED0 Register Name Fields and Flags Model Availability 3, 4, 6 Shared/ Unique1 Unique Bit Description Capability Reporting Register of CR0 Bits Fixed to 0 (R/O) See Appendix A.7, VMX-Fixed Bits in CR0, and see Table 35-2. 487H 1159 IA32_VMX_CR0_FIXED1 3, 4, 6 Unique Capability Reporting Register of CR0 Bits Fixed to 1 (R/O) See Appendix A.7, VMX-Fixed Bits in CR0, and see Table 35-2. 488H 1160 IA32_VMX_CR4_FIXED0 3, 4, 6 Unique Capability Reporting Register of CR4 Bits Fixed to 0 (R/O) See Appendix A.8, VMX-Fixed Bits in CR4, and see Table 35-2. 489H 1161 IA32_VMX_CR4_FIXED1 3, 4, 6 Unique Capability Reporting Register of CR4 Bits Fixed to 1 (R/O) See Appendix A.8, VMX-Fixed Bits in CR4, and see Table 35-2. 48AH 1162 IA32_VMX_VMCS_ENUM 3, 4, 6 Unique Capability Reporting Register of VMCS Field Enumeration (R/O) See Appendix A.9, VMCS Enumeration, and see Table 35-2. 48BH 1163 IA32_VMX_PROCBASED_CTLS2 3, 4, 6 Unique Capability Reporting Register of Secondary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls, and see Table 35-2. 600H 1536 IA32_DS_AREA 0, 1, 2, 3, 4, 6 Unique DS Save Area (R/W) See Table 35-2. See Section 18.11.4, Debug Store (DS) Mechanism. 680H 1664 MSR_LASTBRANCH_0_FROM_IP 3, 4, 6 Unique Last Branch Record 0 (R/W) One of 16 pairs of last branch record registers on the last branch record stack (680H-68FH). This part of the stack contains pointers to the source instruction for one of the last 16 branches, exceptions, or interrupts taken by the processor. The MSRs at 680H-68FH, 6C0H-6CfH are not available in processor releases before family 0FH, model 03H. These MSRs replace MSRs previously located at 1DBH-1DEH.which performed the same function for early releases. See Section 17.9, Last Branch, Interrupt, and Exception Recording (Processors based on Intel NetBurst Microarchitecture).
397
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 681H 682H 683H 684H 685H 686H 687H 688H 689H 68AH 68BH 68CH 68DH 68EH 68FH Dec 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 MSR_LASTBRANCH_1_FROM_IP MSR_LASTBRANCH_2_FROM_IP MSR_LASTBRANCH_3_FROM_IP MSR_LASTBRANCH_4_FROM_IP MSR_LASTBRANCH_5_FROM_IP MSR_LASTBRANCH_6_FROM_IP MSR_LASTBRANCH_7_FROM_IP MSR_LASTBRANCH_8_FROM_IP MSR_LASTBRANCH_9_FROM_IP Register Name Fields and Flags Model Availability 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 Shared/ Unique1 Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Bit Description Last Branch Record 1 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 2 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 3 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 4 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 5 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 6 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 7 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 8 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 9 See description of MSR_LASTBRANCH_0 at 680H. MSR_LASTBRANCH_10_FROM_IP 3, 4, 6 MSR_LASTBRANCH_11_FROM_IP 3, 4, 6 MSR_LASTBRANCH_12_FROM_IP 3, 4, 6 MSR_LASTBRANCH_13_FROM_IP 3, 4, 6 MSR_LASTBRANCH_14_FROM_IP 3, 4, 6 MSR_LASTBRANCH_15_FROM_IP 3, 4, 6 Last Branch Record 10 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 11 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 12 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 13 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 14 See description of MSR_LASTBRANCH_0 at 680H. Last Branch Record 15 See description of MSR_LASTBRANCH_0 at 680H.
398
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 6C0H Dec 1728 MSR_LASTBRANCH_0_TO_IP Register Name Fields and Flags Model Availability 3, 4, 6 Shared/ Unique1 Unique Bit Description Last Branch Record 0 (R/W) One of 16 pairs of last branch record registers on the last branch record stack (6C0H-6CFH). This part of the stack contains pointers to the destination instruction for one of the last 16 branches, exceptions, or interrupts that the processor took. See Section 17.9, Last Branch, Interrupt, and Exception Recording (Processors based on Intel NetBurst Microarchitecture). 6C1H 6C2H 6C3H 6C4H 6C5H 6C6H 6C7H 6C8H 6C9H 6CAH 6CBH 6CCH 6CDH 6CEH 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 MSR_LASTBRANCH_1_TO_IP MSR_LASTBRANCH_2_TO_IP MSR_LASTBRANCH_3_TO_IP MSR_LASTBRANCH_4_TO_IP MSR_LASTBRANCH_5_TO_IP MSR_LASTBRANCH_6_TO_IP MSR_LASTBRANCH_7_TO_IP MSR_LASTBRANCH_8_TO_IP MSR_LASTBRANCH_9_TO_IP MSR_LASTBRANCH_10_TO_IP MSR_LASTBRANCH_11_TO_IP MSR_LASTBRANCH_12_TO_IP MSR_LASTBRANCH_13_TO_IP MSR_LASTBRANCH_14_TO_IP 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Last Branch Record 1 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 2 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 3 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 4 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 5 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 6 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 7 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 8 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 9 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 10 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 11 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 12 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 13 See description of MSR_LASTBRANCH_0 at 6C0H. Last Branch Record 14 See description of MSR_LASTBRANCH_0 at 6C0H.
399
Table 35-18. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.)
Register Address Hex 6CFH C000_ 0080H C000_ 0081H C000_ 0082H C000_ 0084H C000_ 0100H C000_ 0101H C000_ 0102H Dec 1743 MSR_LASTBRANCH_15_TO_IP IA32_EFER IA32_STAR IA32_LSTAR IA32_FMASK IA32_FS_BASE IA32_GS_BASE IA32_KERNEL_GSBASE Register Name Fields and Flags Model Availability 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 3, 4, 6 Shared/ Unique1 Unique Unique Unique Unique Unique Unique Unique Unique Bit Description Last Branch Record 15 See description of MSR_LASTBRANCH_0 at 6C0H. Extended Feature Enables See Table 35-2. System Call Target Address (R/W) See Table 35-2. IA-32e Mode System Call Target Address (R/W) See Table 35-2. System Call Flag Mask (R/W) See Table 35-2. Map of BASE Address of FS (R/W) See Table 35-2. Map of BASE Address of GS (R/W) See Table 35-2. Swap Target of BASE Address of GS (R/W) See Table 35-2.
NOTES 1. For HT-enabled processors, there may be more than one logical processors per physical unit. If an MSR is Shared, this means that one MSR is shared between logical processors. If an MSR is unique, this means that each logical processor has its own MSR. ...
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV
Register Address Hex 0H 1H 6H 10H Dec 0 1 6 16 P5_MC_ADDR P5_MC_TYPE IA32_MONITOR_FILTER_ SIZE IA32_TIME_STAMP_ COUNTER Unique Unique Unique Unique See Section 35.15, MSRs in Pentium Processors, and see Table 35-2. See Section 35.15, MSRs in Pentium Processors, and see Table 35-2. See Section 8.10.5, Monitor/Mwait Address Range Determination, and see Table 35-2. See Section 17.13, Time-Stamp Counter, and see Table 35-2. Shared/ Unique
Register Name
Bit Description
400
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex 17H Dec 23 IA32_PLATFORM_ID Shared Platform ID (R) See Table 35-2. The operating system can use this MSR to determine slot information for the processor and the proper microcode update to load. 1BH 2AH 27 42 IA32_APIC_BASE MSR_EBL_CR_POWERON Unique Shared See Section 10.4.4, Local APIC Status and Location, and see Table 35-2. Processor Hard Power-On Configuration (R/W) Enables and disables processor features; (R) indicates current processor configuration. 0 1 Reserved. Data Error Checking Enable (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 2 Response Error Checking Enable (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 3 MCERR# Drive Enable (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 4 Address Parity Enable (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 6: 5 7 Reserved BINIT# Driver Enable (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 8 9 10 11 12 13 Output Tri-state Enabled (R/O) 1 = Enabled; 0 = Disabled Execute BIST (R/O) 1 = Enabled; 0 = Disabled MCERR# Observation Enabled (R/O) 1 = Enabled; 0 = Disabled Reserved BINIT# Observation Enabled (R/O) 1 = Enabled; 0 = Disabled Reserved Register Name Shared/ Unique Bit Description
401
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex Dec 14 15 17:16 18 1 MByte Power on Reset Vector (R/O) 1 = 1 MByte; 0 = 4 GBytes Reserved APIC Cluster ID (R/O) System Bus Frequency (R/O) 0 = 100 MHz 1 = Reserved 19 21: 20 26:22 3AH 40H 58 64 IA32_FEATURE_CONTROL MSR_LASTBRANCH_0 Unique Unique Reserved. Symmetric Arbitration ID (R/O) Clock Frequency Ratio (R/O) Control Features in IA-32 Processor (R/W) See Table 35-2. Last Branch Record 0 (R/W) One of 8 last branch record registers on the last branch record stack: bits 31-0 hold the from address and bits 63-32 hold the to address. See also: Last Branch Record Stack TOS at 1C9H Section 17.11, Last Branch, Interrupt, and Exception Recording (Pentium M Processors). 41H 42H 43H 44H 45H 46H 47H 79H 65 66 67 68 69 70 71 121 MSR_LASTBRANCH_1 MSR_LASTBRANCH_2 MSR_LASTBRANCH_3 MSR_LASTBRANCH_4 MSR_LASTBRANCH_5 MSR_LASTBRANCH_6 MSR_LASTBRANCH_7 IA32_BIOS_UPDT_TRIG Unique Unique Unique Unique Unique Unique Unique Unique Last Branch Record 1 (R/W) See description of MSR_LASTBRANCH_0. Last Branch Record 2 (R/W) See description of MSR_LASTBRANCH_0. Last Branch Record 3 (R/W) See description of MSR_LASTBRANCH_0. Last Branch Record 4 (R/W) See description of MSR_LASTBRANCH_0. Last Branch Record 5 (R/W) See description of MSR_LASTBRANCH_0. Last Branch Record 6 (R/W) See description of MSR_LASTBRANCH_0. Last Branch Record 7 (R/W) See description of MSR_LASTBRANCH_0. BIOS Update Trigger Register (W) See Table 35-2. Register Name Shared/ Unique Bit Description
402
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex 8BH C1H C2H CDH Dec 139 193 194 205 IA32_BIOS_SIGN_ID IA32_PMC0 IA32_PMC1 MSR_FSB_FREQ 2:0 Unique Unique Unique Shared BIOS Update Signature ID (RO) See Table 35-2. Performance counter register See Table 35-2. Performance counter register See Table 35-2. Scaleable Bus Speed (RO) This field indicates the scaleable bus clock speed: 101B: 100 MHz (FSB 400) 001B: 133 MHz (FSB 533) 011B: 167 MHz (FSB 667) 133.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 101B. 166.67 MHz should be utilized if performing calculation with System Bus Speed when encoding is 001B. 63:3 E7H E8H FEH 11EH 231 232 254 281 IA32_MPERF IA32_APERF IA32_MTRRCAP MSR_BBL_CR_CTL3 0 Unique Unique Unique Shared L2 Hardware Enabled (RO) 1= 0= 7:1 8 If the L2 is hardware-enabled Indicates if the L2 is hardware-disabled Reserved. Maximum Performance Frequency Clock Count. (RW) See Table 35-2. Actual Performance Frequency Clock Count. (RW) See Table 35-2. See Table 35-2. Register Name Shared/ Unique Bit Description
Reserved. L2 Enabled (R/W) 1 = L2 cache has been initialized 0 = Disabled (default) Until this bit is set the processor will not respond to the WBINVD instruction or the assertion of the FLUSH# input.
22:9 23
63:24
Reserved.
403
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex 174H 175H 176H 179H 17AH Dec 372 373 374 377 378 IA32_SYSENTER_CS IA32_SYSENTER_ESP IA32_SYSENTER_EIP IA32_MCG_CAP IA32_MCG_STATUS 0 Unique Unique Unique Unique Unique RIPV When set, this bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) can be used to restart the program. If this bit is cleared, the program cannot be reliably restarted. 1 EIPV When set, this bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) is directly associated with the error. 2 MCIP When set, this bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception. 63:3 186H 187H 198H 199H 19AH 19BH 390 391 408 409 410 411 IA32_PERFEVTSEL0 IA32_PERFEVTSEL1 IA32_PERF_STATUS IA32_PERF_CTL IA32_CLOCK_ MODULATION IA32_THERM_ INTERRUPT IA32_THERM_STATUS Unique Unique Shared Unique Unique Unique Reserved. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Clock Modulation (R/W) See Table 35-2. Thermal Interrupt Control (R/W) See Table 35-2. See Section 14.5.2, Thermal Monitor. 19CH 412 Unique Thermal Monitor Status (R/W) See Table 35-2. See Section 14.5.2, Thermal Monitor. 19DH 413 MSR_THERM2_CTL 15:0 Unique Reserved. See Table 35-2. See Table 35-2. See Table 35-2. See Table 35-2. Register Name Shared/ Unique Bit Description
404
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex Dec 16 TM_SELECT (R/W) Mode of automatic thermal monitor: 0 = Thermal Monitor 1 (thermally-initiated on-die modulation of the stop-clock duty cycle) 1 = Thermal Monitor 2 (thermally-initiated frequency transitions) If bit 3 of the IA32_MISC_ENABLE register is cleared, TM_SELECT has no effect. Neither TM1 nor TM2 will be enabled. 63:16 1A0 416 IA32_MISC_ ENABLE 2:0 3 6:4 7 9:8 10 Shared Shared Unique Reserved. Enable Miscellaneous Processor Features (R/W) Allows a variety of processor functions to be enabled and disabled. Reserved. Automatic Thermal Control Circuit Enable (R/W) See Table 35-2. Reserved. Performance Monitoring Available (R) See Table 35-2. Reserved. FERR# Multiplexing Enable (R/W) 1= FERR# asserted by the processor to indicate a pending break event within the processor 0 = Indicates compatible FERR# signaling behavior This bit must be set to 1 to support XAPIC interrupt model usage. 11 12 13 Shared Shared Branch Trace Storage Unavailable (RO) See Table 35-2. Reserved. TM2 Enable (R/W) When this bit is set (1) and the thermal sensor indicates that the die temperature is at the pre-determined threshold, the Thermal Monitor 2 mechanism is engaged. TM2 will reduce the bus to core ratio and voltage according to the value last written to MSR_THERM2_CTL bits 15:0. Register Name Shared/ Unique Bit Description
405
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex Dec When this bit is clear (0, default), the processor does not change the VID signals or the bus to core ratio when the processor enters a thermal managed state. If the TM2 feature flag (ECX[8]) is not set to 1 after executing CPUID with EAX = 1, then this feature is not supported and BIOS must not alter the contents of this bit location. The processor is operating out of spec if both this bit and the TM1 bit are set to disabled states. 15:14 16 18 19 22 Shared Shared Shared Reserved. Enhanced Intel SpeedStep Technology Enable (R/W) 1= Enhanced Intel SpeedStep Technology enabled ENABLE MONITOR FSM (R/W) See Table 35-2. Reserved. Limit CPUID Maxval (R/W) See Table 35-2. Setting this bit may cause behavior in software that depends on the availability of CPUID leaves greater than 3. 33:23 34 63:35 1C9H 457 MSR_LASTBRANCH_TOS Unique Shared Reserved. XD Bit Disable (R/W) See Table 35-2. Reserved. Last Branch Record Stack TOS (R/W) Contains an index (bits 0-3) that points to the MSR containing the most recent branch record. See MSR_LASTBRANCH_0_FROM_IP (at 40H). 1D9H 473 IA32_DEBUGCTL Unique Debug Control (R/W) Controls how several debug features are used. Bit definitions are discussed in the referenced section. 1DDH 477 MSR_LER_FROM_LIP Unique Last Exception Record From Linear IP (R) Contains a pointer to the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. 1DEH 478 MSR_LER_TO_LIP Unique Last Exception Record To Linear IP (R) This area contains a pointer to the target of the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. 1E0H 480 ROB_CR_ BKUPTMPDR6 Unique Register Name Shared/ Unique Bit Description
406
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex Dec 1:0 2 200H 201H 202H 203H 204H 205H 206H 207H 208H 209H 20AH 20BH 20CH 20DH 20EH 20FH 250H 258H 259H 268H 269H 26AH 26BH 26CH 26DH 26EH 26FH 2FFH 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 592 600 601 616 617 618 619 620 621 622 623 767 MTRRphysBase0 MTRRphysMask0 MTRRphysBase1 MTRRphysMask1 MTRRphysBase2 MTRRphysMask2 MTRRphysBase3 MTRRphysMask3 MTRRphysBase4 MTRRphysMask4 MTRRphysBase5 MTRRphysMask5 MTRRphysBase6 MTRRphysMask6 MTRRphysBase7 MTRRphysMask7 MTRRfix64K_00000 MTRRfix16K_80000 MTRRfix16K_A0000 MTRRfix4K_C0000 MTRRfix4K_C8000 MTRRfix4K_D0000 MTRRfix4K_D8000 MTRRfix4K_E0000 MTRRfix4K_E8000 MTRRfix4K_F0000 MTRRfix4K_F8000 IA32_MTRR_DEF_TYPE Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Unique Default Memory Types (R/W) See Table 35-2. See Section 11.11.2.1, IA32_MTRR_DEF_TYPE MSR. 400H 1024 IA32_MC0_CTL Unique See Section 15.3.2.1, IA32_MCi_CTL MSRs. Reserved. Fast String Enable bit. (Default, enabled) Register Name Shared/ Unique Bit Description
407
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex 401H 402H Dec 1025 1026 IA32_MC0_STATUS IA32_MC0_ADDR Unique Unique See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC0_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC0_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 404H 405H 406H 1028 1029 1030 IA32_MC1_CTL IA32_MC1_STATUS IA32_MC1_ADDR Unique Unique Unique See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC1_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC1_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 408H 409H 40AH 1032 1033 1034 IA32_MC2_CTL IA32_MC2_STATUS IA32_MC2_ADDR Unique Unique Unique See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The IA32_MC2_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC2_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 40CH 40DH 40EH 1036 1037 1038 MSR_MC4_CTL MSR_MC4_STATUS MSR_MC4_ADDR Unique Unique Unique See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The MSR_MC4_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR_MC4_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 410H 411H 412H 1040 1041 1042 MSR_MC3_CTL MSR_MC3_STATUS MSR_MC3_ADDR Unique See Section 15.3.2.1, IA32_MCi_CTL MSRs. See Section 15.3.2.2, IA32_MCi_STATUS MSRS. See Section 15.3.2.3, IA32_MCi_ADDR MSRs. The MSR_MC3_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR_MC3_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception. 413H 414H 415H 416H 1043 1044 1045 1046 MSR_MC3_MISC MSR_MC5_CTL MSR_MC5_STATUS MSR_MC5_ADDR Unique Unique Unique Unique Register Name Shared/ Unique Bit Description
408
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex 417H 480H Dec 1047 1152 MSR_MC5_MISC IA32_VMX_BASIC Unique Unique Reporting Register of Basic VMX Capabilities (R/O) See Table 35-2. See Appendix A.1, Basic VMX Information (If CPUID.01H:ECX.[bit 9]) 481H 1153 IA32_VMX_PINBASED_ CTLS Unique Capability Reporting Register of Pin-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls (If CPUID.01H:ECX.[bit 9]) 482H 1154 IA32_VMX_PROCBASED_ CTLS Unique Capability Reporting Register of Primary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls (If CPUID.01H:ECX.[bit 9]) 483H 1155 IA32_VMX_EXIT_CTLS Unique Capability Reporting Register of VM-exit Controls (R/O) See Appendix A.4, VM-Exit Controls (If CPUID.01H:ECX.[bit 9]) 484H 1156 IA32_VMX_ENTRY_CTLS Unique Capability Reporting Register of VM-entry Controls (R/O) See Appendix A.5, VM-Entry Controls (If CPUID.01H:ECX.[bit 9]) 485H 1157 IA32_VMX_MISC Unique Reporting Register of Miscellaneous VMX Capabilities (R/O) See Appendix A.6, Miscellaneous Data (If CPUID.01H:ECX.[bit 9]) 486H 1158 IA32_VMX_CR0_FIXED0 Unique Capability Reporting Register of CR0 Bits Fixed to 0 (R/O) See Appendix A.7, VMX-Fixed Bits in CR0 (If CPUID.01H:ECX.[bit 9]) 487H 1159 IA32_VMX_CR0_FIXED1 Unique Capability Reporting Register of CR0 Bits Fixed to 1 (R/O) See Appendix A.7, VMX-Fixed Bits in CR0 (If CPUID.01H:ECX.[bit 9]) 488H 1160 IA32_VMX_CR4_FIXED0 Unique Capability Reporting Register of CR4 Bits Fixed to 0 (R/O) See Appendix A.8, VMX-Fixed Bits in CR4 (If CPUID.01H:ECX.[bit 9]) 489H 1161 IA32_VMX_CR4_FIXED1 Unique Capability Reporting Register of CR4 Bits Fixed to 1 (R/O) See Appendix A.8, VMX-Fixed Bits in CR4 (If CPUID.01H:ECX.[bit 9]) 48AH 1162 IA32_VMX_VMCS_ENUM Unique Capability Reporting Register of VMCS Field Enumeration (R/O) See Appendix A.9, VMCS Enumeration (If CPUID.01H:ECX.[bit 9]) Register Name Shared/ Unique Bit Description
409
Table 35-21. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.)
Register Address Hex 48BH Dec 1163 IA32_VMX_PROCBASED_ CTLS2 Unique Capability Reporting Register of Secondary Processor-based VM-execution Controls (R/O) See Appendix A.3, VM-Execution Controls (If CPUID.01H:ECX.[bit 9] and IA32_VMX_PROCBASED_CTLS[bit 63]) 600H 1536 IA32_DS_AREA Unique DS Save Area (R/W) See Table 35-2. See Section 18.11.4, Debug Store (DS) Mechanism. 31:0 63:32 C000_ 0080H IA32_EFER 10:0 11 63:12 ... Unique DS Buffer Management Area Linear address of the first byte of the DS buffer management area. Reserved. See Table 35-2. Reserved. Execute Disable Bit Enable Reserved. Register Name Shared/ Unique Bit Description
410
411
412
413
414
415
416
A.1
The IA32_VMX_BASIC MSR (index 480H) consists of the following fields: Bits 30:0 contain the 31-bit VMCS revision identifier used by the processor. Processors that use the same VMCS revision identifier use the same size for VMCS regions (see subsequent item on bits 44:32).1 Bit 31 is always 0. Bits 44:32 report the number of bytes that software should allocate for the VMXON region and any VMCS region. It is a value greater than 0 and at most 4096 (bit 44 is set if and only if bits 43:32 are clear).
1. Earlier versions of this manual specified that the VMCS revision identifier was a 32-bit field in bits 31:0 of this MSR. For all processors produced prior to this change, bit 31 of this MSR was read as 0.
417
Bit 48 indicates the width of the physical addresses that may be used for the VMXON region, each VMCS, and data structures referenced by pointers in a VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If the bit is 0, these addresses are limited to the processors physical-address width.1 If the bit is 1, these addresses are limited to 32 bits. This bit is always 0 for processors that support Intel 64 architecture. If bit 49 is read as 1, the logical processor supports the dual-monitor treatment of system-management interrupts and system-management mode. See Section 34.15 for details of this treatment. Bits 53:50 report the memory type that the logical processor uses to access the VMCS for VMREAD and VMWRITE and to access the VMCS, data structures referenced by pointers in the VMCS (I/O bitmaps, virtualAPIC page, MSR areas for VMX transitions), and the MSEG header during VM entries, VM exits, and in VMX non-root operation.2
The first processors to support VMX operation use the write-back type. The values used are given in
Table A-1.
...
A.6
MISCELLANEOUS DATA
1. On processors that support Intel 64 architecture, the pointer must not set bits beyond the processor's physical address width. 2. If the MTRRs are disabled by clearing the E bit (bit 11) in the IA32_MTRR_DEF_TYPE MSR, the logical processor uses the UC memory type to access the indicated data structures, regardless of the value reported in bits 53:50 in the IA32_VMX_BASIC MSR. The processor will also use the UC memory type if the setting of CR0.CD on this logical processor (or another logical processor on the same physical processor) would cause it to do so for all memory accesses. The values of IA32_MTRR_DEF_TYPE.E and CR0.CD do not affect the value reported in IA32_VMX_BASIC[53:50]. 3. Alternatively, software may map any of these regions or structures with the UC memory type. (This may be necessary for the MSEG header.) Doing so is discouraged unless necessary as it will cause the performance of software accesses to those structures to suffer. The processor will continue to use the memory type reported in the VMX capability MSR IA32_VMX_BASIC with the exceptions noted.
418
Bits 4:0 report a value X that specifies the relationship between the rate of the VMX-preemption timer and that of the timestamp counter (TSC). Specifically, the VMX-preemption timer (if it is active) counts down by 1 every time bit X in the TSC changes due to a TSC increment. If bit 5 is read as 1, VM exits store the value of IA32_EFER.LMA into the IA-32e mode guest VM-entry control; see Section 27.2 for more details. This bit is read as 1 on any logical processor that supports the 1setting of the unrestricted guest VM-execution control. Bits 8:6 report, as a bitmap, the activity states supported by the implementation: Bit 6 reports (if set) the support for activity state 1 (HLT). Bit 7 reports (if set) the support for activity state 2 (shutdown). Bit 8 reports (if set) the support for activity state 3 (wait-for-SIPI). If an activity state is not supported, the implementation causes a VM entry to fail if it attempts to establish that activity state. All implementations support VM entry to activity state 0 (active).
If bit 15 is read as 1, the RDMSR instruction can be used in system-management mode (SMM) to read the IA32_SMBASE MSR (MSR address 9EH). See Section 34.15.6.4. Bits 24:16 indicate the number of CR3-target values supported by the processor. This number is a value between 0 and 256, inclusive (bit 24 is set if and only if bits 23:16 are clear). Bits 27:25 is used to compute the recommended maximum number of MSRs that should appear in the VM-exit MSR-store list, the VM-exit MSR-load list, or the VM-entry MSR-load list. Specifically, if the value bits 27:25 of IA32_VMX_MISC is N, then 512 * (N + 1) is the recommended maximum number of MSRs to be included in each list. If the limit is exceeded, undefined processor behavior may result (including a machine check during the VMX transition). If bit 28 is read as 1, bit 2 of the IA32_SMM_MONITOR_CTL can be set to 1. VMXOFF unblocks SMIs unless IA32_SMM_MONITOR_CTL[bit 2] is 1 (see Section 34.14.4). If bit 29 is read as 1, software can use VMWRITE to write to any supported field in the VMCS; otherwise, VMWRITE cannot be used to modify VM-exit information fields. Bits 63:32 report the 32-bit MSEG revision identifier used by the processor. Bits 14:9 and bits 31:28 are reserved and are read as 0.
...
B.1
16-BIT FIELDS
A value of 0 in bits 14:13 of an encoding indicates a 16-bit field. Only guest-state areas and the host-state area contain 16-bit fields. As noted in Section 24.11.2, each 16-bit field allows only full access, meaning that bit 0 of its encoding is 0. Each such encoding is thus an even number.
419
B.1.1
A value of 0 in bits 11:10 of an encoding indicates a control field. These fields are distinguished by their index value in bits 9:1. Table B-1 enumerates the 16-bit control fields.
1. This field exists only on processors that support the 1-setting of the enable VPID VM-execution control. 2. This field exists only on processors that support the 1-setting of the process posted interrupts VM-execution control. 3. This field exists only on processors that support the 1-setting of the EPT-violation #VE VM-execution control. ...
B.2
64-BIT FIELDS
A value of 1 in bits 14:13 of an encoding indicates a 64-bit field. There are 64-bit fields only for controls and for guest state. As noted in Section 24.11.2, every 64-bit field has two encodings, which differ on bit 0, the access type. Thus, each such field has an even encoding for full access and an odd encoding for high access.
B.2.1
A value of 0 in bits 11:10 of an encoding indicates a control field. These fields are distinguished by their index value in bits 9:1. Table B-4 enumerates the 64-bit control fields. Field Name Address of I/O bitmap A (full) Address of I/O bitmap A (high) Address of I/O bitmap B (full) Address of I/O bitmap B (high) Address of MSR bitmaps (full)1 Address of MSR bitmaps (high)1 VM-exit MSR-store address (full) VM-exit MSR-store address (high) VM-exit MSR-load address (full) VM-exit MSR-load address (high) VM-entry MSR-load address (full) VM-entry MSR-load address (high) Executive-VMCS pointer (full) Executive-VMCS pointer (high) Index 000000000B 000000001B 000000010B 000000011B 000000100B 000000101B 000000110B Encoding 00002000H 00002001H 00002002H 00002003H 00002004H 00002005H 00002006H 00002007H 00002008H 00002009H 0000200AH 0000200BH 0000200CH 0000200DH
420
Encoding 00002010H 00002011H 00002012H 00002013H 00002014H 00002015H 00002016H 00002017H 00002018H 00002019H 0000201AH 0000201BH 0000201CH 0000201DH 0000201EH 0000201FH 00002020H 00002021H 00002022H 00002023H 00002024H 00002025H 00002026H 00002027H 00002028H 00002029H 0000202AH 0000202BH
(high)3 (high)4
Posted-interrupt descriptor address (full) Posted-interrupt descriptor address VM-function controls EPT pointer (EPTP; EPT pointer (EPTP; (full)5
5
EOI-exit bitmap 0 (EOI_EXIT0; EOI-exit bitmap 0 (EOI_EXIT0; EOI-exit bitmap 1 (EOI_EXIT1; EOI-exit bitmap 2 (EOI_EXIT2; EOI-exit bitmap 2 (EOI_EXIT2; EOI-exit bitmap 3 (EOI_EXIT3; EOI-exit bitmap 3 (EOI_EXIT3; EPTP-list address (full)8
8
1. This field exists only on processors that support the 1-setting of the use MSR bitmaps VM-execution control. 2. This field exists only on processors that support either the 1-setting of the use TPR shadow VM-execution control. 3. This field exists only on processors that support the 1-setting of the virtualize APIC accesses VM-execution control. 4. This field exists only on processors that support the 1-setting of the process posted interrupts VM-execution control. 5. This field exists only on processors that support the 1-setting of the enable VM functions VM-execution control. 6. This field exists only on processors that support the 1-setting of the enable EPT VM-execution control. 7. This field exists only on processors that support the 1-setting of the virtual-interrupt delivery VM-execution control. 8. This field exists only on processors that support the 1-setting of the EPTP switching VM-function control.
421
9. This field exists only on processors that support the 1-setting of the VMCS shadowing VM-execution control. 10.This field exists only on processors that support the 1-setting of the EPT-violation #VE VM-execution control. ...
B.3
32-BIT FIELDS
A value of 2 in bits 14:13 of an encoding indicates a 32-bit field. As noted in Section 24.11.2, each 32-bit field allows only full access, meaning that bit 0 of its encoding is 0. Each such encoding is thus an even number. ...
B.4
NATURAL-WIDTH FIELDS
A value of 3 in bits 14:13 of an encoding indicates a natural-width field. As noted in Section 24.11.2, each of these fields allows only full access, meaning that bit 0 of its encoding is 0. Each such encoding is thus an even number.
...
422