R 600 Isa
R 600 Isa
R 600 Isa
R600-Family
Instruction Set
Architecture
Trademarks
AMD, the AMD arrow logo, ATI, the ATI logo, AMD Athlon, and AMD Opteron, and combinations thereof, are trade-
marks of Advanced Micro Devices, Inc.
Other product names used in this publication are for identification purposes only and may be trademarks of their
respective companies.
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Contents
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
About This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Contact Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Endian Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2 Program Organization and State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
2.1 Program Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Geometry Program Absent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Geometry Shader Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Instruction Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Control Flow and Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Instruction Types and Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Program State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Control Flow (CF) Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
3.1 CF Microcode Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Summary of Fields in CF Microcode Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Clause-Initiation Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ALU Clause initiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Vertex-Fetch Clause Initiation and Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Texture-Fetch Clause Initiation and Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Allocation, Import, and Export Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Normal Exports (Pixel, Position, Parameter Cache) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Memory Reads and Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Synchronization with Other Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Pixel State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
WHOLE_QUAD_MODE and VALID_PIXEL_MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
The Condition (COND) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Computation of Condition Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Stack Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Branch and Loop Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Contents i
AMD R600 Technology ProductID—Rev. 0.31—May 2007
ADDR Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Stack Operations and Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
DirectX9 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
DirectX10 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Repeat Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
ALU Branch-Loop Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 ALU Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
4.1 ALU Microcode Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Overview of ALU Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Encoding of ALU Instruction Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Assignment to ALU.[X,Y,Z,W] and ALU.Trans Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 OP2 and OP3 Microcode Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 GPRs and Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Relative Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Previous Vector (PV) and Previous Scalar (PS) Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Out-of-Bounds Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
ALU Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Scalar Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
GPR Read Port Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Constant Register Read Port Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Literal Constant Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Cycle Restrictions for ALU.[X,Y,Z,W] Units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Cycle Restrictions for ALU.Trans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Read-Port Mapping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 ALU Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Instructions for All ALU Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Instructions for ALU.[X,Y,Z,W] Units Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Instructions for ALU.Trans Units Only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.9 ALU Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Predicate Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
NOP Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
MOVA Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.10 Predication and Branch Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.11 Adjacent-Instruction Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Vertex-Fetch Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
5.1 Vertex-Fetch Microcode Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 Texture-Fetch Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69
6.1 Texture-Fetch Microcode Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Constant-Fetch Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
7.1 Control Flow (CF) Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
ALU_BREAK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
ALU_CONTINUE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
ii Contents
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
ALU_ELSE_AFTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
ALU_POP_AFTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
ALU_POP2_AFTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
ALU_PUSH_BEFORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
CALL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
CALL_FS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
CUT_VERTEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
ELSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
EMIT_CUT_VERTEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
EMIT_VERTEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
EXPORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
EXPORT_DONE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
JUMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
KILL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
LOOP_BREAK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
LOOP_CONTINUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
LOOP_END. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
LOOP_START . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
LOOP_START_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
LOOP_START_NO_AL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
MEM_REDUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
MEM_RING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
MEM_SCRATCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
MEM_STREAM0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
MEM_STREAM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
MEM_STREAM2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
MEM_STREAM3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
NOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
POP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
PUSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
PUSH_ELSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
RETURN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
TEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
VTX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
VTX_TC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 ALU Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
ADD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ADD_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
AND_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
ASHR_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
CEIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
CMOVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
CMOVE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
CMOVGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
CMOVGE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
CMOVGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
CMOVGT_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Contents iii
AMD R600 Technology ProductID—Rev. 0.31—May 2007
COS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
CUBE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
DOT4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
DOT4_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
EXP_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
FLOOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
FLT_TO_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
FRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
INT_TO_FLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
KILLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
KILLGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
KILLGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
KILLNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
LOG_CLAMPED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
LOG_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
LSHL_INT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
LSHR_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
MAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
MAX_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
MAX_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
MAX_UINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
MAX4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
MIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
MIN_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
MIN_INT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
MIN_UINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
MOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
MOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
MOVA_FLOOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
MOVA_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
MUL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
MUL_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
MUL_LIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
MUL_LIT_D2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
MUL_LIT_M2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
MUL_LIT_M4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
MULADD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
MULADD_D2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
MULADD_M2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
MULADD_M4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
MULADD_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
MULADD_IEEE_D2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
MULADD_IEEE_M2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
MULADD_IEEE_M4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
MULHI_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
MULHI_UINT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
MULLO_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
iv Contents
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
MULLO_UINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
NOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
NOT_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
OR_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
PRED_SET_CLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
PRED_SET_INV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
PRED_SET_POP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
PRED_SET_RESTORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
PRED_SETE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
PRED_SETE_INT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
PRED_SETE_PUSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
PRED_SETE_PUSH_INT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
PRED_SETGE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
PRED_SETGE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
PRED_SETGE_PUSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
PRED_SETGE_PUSH_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
PRED_SETGT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
PRED_SETGT_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
PRED_SETGT_PUSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
PRED_SETGT_PUSH_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
PRED_SETLE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
PRED_SETLE_PUSH_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
PRED_SETLT_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
PRED_SETLT_PUSH_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
PRED_SETNE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
PRED_SETNE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
PRED_SETNE_PUSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
PRED_SETNE_PUSH_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
RECIP_CLAMPED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
RECIP_FF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
RECIP_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
RECIP_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
RECIP_UINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
RECIPSQRT_CLAMPED. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
RECIPSQRT_FF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
RECIPSQRT_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
RNDNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
SETE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
SETE_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
SETE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
SETGE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
SETGE_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
SETGE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
SETGE_UINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
SETGT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
SETGT_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
SETGT_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Contents v
AMD R600 Technology ProductID—Rev. 0.31—May 2007
SETGT_UINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
SETNE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
SETNE_DX10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
SETNE_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
SIN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
SQRT_IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
SUB_INT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
TRUNC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
UINT_TO_FLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
XOR_INT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.3 Vertex-Fetch Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
FETCH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
SEMANTIC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.4 Texture-Fetch Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
GET_BORDER_COLOR_FRAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
GET_COMP_TEX_LOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
GET_GRADIENTS_H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
GET_GRADIENTS_V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
GET_LERP_FACTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
GET_TEXTURE_RESINFO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
GET_WEIGHTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
LD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
PASS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
SAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
SAMPLE_C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
SAMPLE_C_G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
SAMPLE_C_G_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
SAMPLE_C_G_LB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
SAMPLE_C_G_LZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
SAMPLE_C_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
SAMPLE_C_LB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
SAMPLE_C_LZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
SAMPLE_G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
SAMPLE_G_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
SAMPLE_G_LB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
SAMPLE_G_LZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
SAMPLE_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
SAMPLE_LB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
SAMPLE_LZ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
SET_GRADIENTS_H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
SET_GRADIENTS_V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8 Microcode Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .259
8.1 Control Flow (CF) Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
CF_DWORD0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
CF_DWORD1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
CF_ALU_DWORD0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
CF_ALU_DWORD1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
vi Contents
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
CF_ALLOC_IMP_EXP_DWORD0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
CF_ALLOC_IMP_EXP_DWORD1_BUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
CF_ALLOC_IMP_EXP_DWORD1_SWIZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8.2 ALU Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
ALU_DWORD0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
ALU_DWORD1_OP2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
ALU_DWORD1_OP3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
8.3 Vertex-Fetch Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
VTX_DWORD0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
VTX_DWORD1_SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
VTX_DWORD1_GPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
VTX_DWORD2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
8.4 Texture-Fetch Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
TEX_DWORD0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
TEX_DWORD1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
TEX_DWORD2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Contents vii
AMD R600 Technology ProductID—Rev. 0.31—May 2007
viii Contents
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Figures
Figure 1-1. R600 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Figure 1-2. Programmer’s View of R600 Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 4-1. ALU Microcode-Format Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figure 4-2. Organization of ALU Vector Elements in GPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figure 4-3. ALU Data Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 5-1. Vertex-Fetch Microcode-Format 4-Tuple. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Figure 6-1. Texture-Fetch Microcode-Format 4-Tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figures ix
AMD R600 Technology ProductID—Rev. 0.31—May 2007
x Figures
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Tables
Table 2-1. Order of Program Execution (Geometry Program Absent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Table 2-2. Order of Program Execution (Geometry Program Present) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table 2-3. Basic Instruction-Related Terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Table 2-4. Flow of a Typical Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 2-5. Control-Flow State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Table 2-6. ALU State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table 2-7. Vertex-Fetch State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Table 2-8. Texture-Fetch and Constant-Fetch State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Table 3-1. CF Microcode Field Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 3-2. Types of Clause-Initiation Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Table 3-3. Possible ARRAY_BASE Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Table 3-4. Condition Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 3-5. Stack Subentries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 3-6. Stack Space Required for Flow-Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 3-7. Branch-Loop Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Table 4-1. Index for Relative Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Table 4-2. Example Function’s Loading Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Table 4-3. ALU Instructions (ALU.[X,Y,Z,W] and ALU.Trans Units) . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Table 4-4. ALU Instructions (ALU.[X,Y,Z,W] Units Only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 4-5. ALU Instructions (ALU.Trans Units Only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Table 8-1. Summary of Microcode Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Tables xi
AMD R600 Technology ProductID—Rev. 0.31—May 2007
xii Tables
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Revision History
November 20, Warthman Associates edited and expanded Shader Instructions source
0.2
2006 document.
November 25,
0.21 Warthman Associates edited and edited and expanded the text.
2006
November 26,
0.22 Warthman Associates edited and edited and expanded the text.
2006
December 24,
0.24 Warthman Associates edited and edited and expanded the text.
2006
December 28,
0.25 Warthman Associates edited and edited and expanded the text.
2006
February 5,
0.26 Warthman Associates edited and edited and expanded the text.
2007
April 5, 2007 0.27 Warthman Associates edited and edited and expanded the text.
May 5, 2007 0.28 Warthman Associates edited and edited and expanded the text.
May 17, 2007 0.29 Warthman Associates edited and edited and expanded the text.
May 22, 2007 0.30 Warthman Associates edited and edited and expanded the text.
May 30, 2007 0.31 Warthman Associates edited and edited and expanded the text.
Preface
Audience
This document is intended for programmers writing application and system software, including
operating systems, compilers, loaders, linkers, device drivers, and system utilities. It assumes that
programmers are writing compute-intensive parallel applications, or streaming applications, including
both graphics and general-purpose computation. It assumes an understanding of general programming
practices for either graphics or general-purpose computing. See “Related Documents” on page xxxi
for descriptions of other relevant documents.
Contact Information
To submit questions or comments concerning this document, contact our technical documentation
staff at [email protected].
Organization
This document begins with an overview summarizing the R600 processor’s hardware and
programming environment for graphics computation and general-purpose computation. It then
describes the organization of an R600 program, and the program state that is maintained. Then it
describes the types of microcode instructions in detail, presenting a high-level description of the
instruction fields and discussing restrictions on the fields that must be observed. This is followed by
chapter contains instruction details, in an alphabetic order without four broad categories. Finally, a
Preface xv
AMD R600 Technology ProductID—Rev. 0.31—May 2007
detailed specification of each microcode format is presented. The index at the end cross-references
topics within this volume.
The section that immediately follows defines key terms used in this document.
Definitions
Many of the following definitions assume knowledge of graphics and general-purpose programming.
*
An asterisk in a mnemonic indicates any number of alphanumeric characters in the name of a
microcode format, microcode parameter, or instruction, that define variants of the parameter.
0.0
A single-precision (32-bit) floating-point value.
1011b
A binary value, in this example a 4-bit value.
F0EAh
A hexadecimal value, in this example a 2-byte value.
[1,2]
A range that includes both the left-most and right-most values (in this case, 1 and 2).
[1,2)
A range that includes the left-most value (in this case, 1) but excludes the right-most value (in this
case, 2).
7:4
A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first.
{BUF, SWIZ}
One of the multiple options listed. In this case, the string BUF or the string SWIZ.
A0
Same as “AR”.
absolute
Said of a displacement that references the base of a code segment rather than an instruction pointer.
Contrast with “relative”.
address stack
A stack that contains only addresses (no other state). It is used for flow control. Popping the
address stack overrides the instruction address field of a flow control instruction. The address stack
is only modified if the flow control instruction decides to jump.
xvi Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
aL
The “loop index”. Software can use its current value as an index by specifying this in the
INDEX_MODE field of the ALU_DWORD0 microcode format. Also called AL.
AL
Same as “aL”.
allocate
To reserve storage space for data in an output buffer (a “scratch buffer”, “DirectX 9 supports two
kinds of resources: buffer and texture. Buffer resources hold a collection of vectors (see “vector”).
Texture resources hold a collection of texels (see “texel”). ring buffer”, “stream buffer”, or
“reduction buffer”) or for data in an input buffer (a “scratch buffer” or “DirectX 9 supports two
kinds of resources: buffer and texture. Buffer resources hold a collection of vectors (see “vector”).
Texture resources hold a collection of texels (see “texel”). ring buffer”) prior to exporting (writing)
or importing (reading) data or addresses to or from that buffer. Space is allocated only for data, not
for addresses. After allocating space in a buffer, an “export” operation can be performed.
ALU.[X,Y,Z,W] unit
An ALU unit that can perform four ALU.Trans operations in which the four operands (integers or
single-precision floating-point values) need not be related in any way. ALU.[X,Y,Z,W] units
perform “SIMD” operations. Thus, although the four operands need not be related, all four
operations execute the same instruction. The ability to operate on four unrelated operands
differentiates ALU.[X,Y,Z,W] operations from “vector” operations; in vector operations, all four
operands are typically assumed to be related. See “ALU.Trans unit” for more details.
ALU.Trans unit
An ALU unit that can perform one ALU.Trans, transcendental, or advanced integer operation on
one integer or single-precision floating-point value and replicate the result. A single instruction
can co-issue four ALU.Trans operations to an ALU.[X,Y,Z,W] unit and one (possibly complex)
operation to an ALU.Trans unit, which can then replicate its result across all four elements being
operated on in the associated ALU.[X,Y,Z,W] unit.
AR
Address register. It is set by all MOVA* instructions and is used for constant-file relative
addressing. AR-relative addressing uses “constant waterfalling”; instructions in a clause using AR
must have their USES_WATERFALL bit set.
byte
Eight bits.
b
A bit, as in 1Mb for one megabit, or lsb for least-significant bit.
B
A byte, as in 1MB for one megabyte, or LSB for least-significant byte.
Preface xvii
AMD R600 Technology ProductID—Rev. 0.31—May 2007
border color
Border color is specified by four 32-bit floating-point numbers (XYZW).
cache
A read-only or write-only on-chip or off-chip storage space.
CF
Control flow.
cfile
Same as “constant file” and “Same as “AR” register constant registers”.
channel
An element in a “vector”.
clamp
To hold within a stated range.
clause
A group of instructions that are of the same type (all ALU, all texture-fetch, etc.) executed as a
group. A clause is part of a “thread”.
clause size
The total number of slots required for an ALU clause. See “slot”.
clause temporaries
Temporary values stored at GPR[124,127] that do not need to be preserved past the end of a clause.
clear
To write a bit-value of 0. Compare “set”.
cleartype
A method for improving the quality of fonts on displays that contain repeating patterns of colored
sub-pixels.
command
A value written by the host processor directly to the R600. The commands contain information that
is not typically part of an application program, such as setting configuration registers, specifying
the data domain on which to operate, and initiating the start of data processing. See also, “event”.
command processor
A logic block in the R600 that receives host commands (see “command”), interprets them, and
performs the operations they indicate.
xviii Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
configuration registers
R600 register that can only be written and read by the host processor through its command
interface to the R600. They are not accessible to software running on the R600.
constant cache
The extension of the “Same as “AR” register constant registers” to off-chip memory. The term
cache is a misnomer, because the storage is in off-chip memory.
constant file
Same as “Same as “AR” register constant registers”.
constant index register
Same as “AR” register constant registers
On-chip registers that contain constants. The registers are organized as four 32-bit elements of a
“vector”. There are 256 such registers, each one 128-bits wide. The registers can be extended in
off-chip memory, where the off-chip part is called the “kcache”. Also called “CR”, “Same as “AR”
register constant registers”, “cfile”, or DirectX floating-point constant (F) registers.
constant waterfalling
Relative addressing of a constant file. Compare “waterfall”.
CP
See “command processor”.
CR
See “Same as “AR” register constant registers”.
CTM
The ATI Close-To-Metal architecture, on which implementations such as the R600 “device” are
based. For more information, see the CTM HAL Programming Guide published by AMD.
cut
Finish emitting one “primitive strip” of vertices and start emitting a new “primitive strip”. Cutting
is done in a “GS” program.
DC
See “DMA copy program”.
device
As used in the ATI Close To Metal (CTM) Guide, a device is an entire R600 GPU.
DMA
Direct-memory access.
Preface xix
AMD R600 Technology ProductID—Rev. 0.31—May 2007
xx Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
export shader
(1) Export shader (ES). A type of program. When a “geometry shader” (GS) is active, an ES is
required; the ES is typically a vertex shader (“VS”), which can call a “fetch subroutine”
subroutine. An ES only outputs to memory, never the “parameter cache”. (2) The ELEM_SIZE
field of the CF_ALLOC_IMP_EXP_DWORD0 microcode format. (3) The ENDIAN_SWAP field
of the VTX_DWORD2 microcode format.
F registers
DirectX floating-point constant registers. Same as “Same as “AR” register constant registers”.
FaceID
An identification number [0,5] for a D3DCUBEMAP_FACE defined in Direct3D.
fetch
To load data, using a vertex-fetch or texture-fetch instruction clause. Loads are not necessarily to
general-purpose registers (GPRs); specific types of loads may be confined to specific types of
storage destinations.
fetch program
See “FS”.
fetch subroutine
A global program for fetching vertex data. It can be called by a “vertex shader” (VS), and it runs in
the same thread context as the vertex program, and thus is treated for execution purposes as part of
the vertex program. The FS provides driver independence between the process of fetching data
required by a VS, and the VS itself. This includes having a semantic connection between the
outputs of the fetch process and the inputs of the VS.
flag
(1) A predicate bit that is modified by a CF or ALU operation and that can affect subsequent
operations. (2) An operation encoded in an instruction’s microcode format.
floating-point constant registers.
Same as “Same as “AR” register constant registers”.
flush
An often ambiguous term meaning (1) writeback, if modified, and invalidate, as in flush the cache
line, or (2) invalidate, as in flush the pipeline, or (3) change a value, as in flush to zero.
fragment
A 2D (x,y) grid location and optional associated values that represent the properties of a surface. A
fragment is the result of rasterizing a “primitive”. A fragment has no vertices; instead, it is
represented by 2-dimensional (X-Y) coordinates in a raster buffer.
Preface xxi
AMD R600 Technology ProductID—Rev. 0.31—May 2007
frame
A single two-dimensional screenful of data, or the storage space required for it.
frame buffer
Off-chip memory that stores a “frame”.
FS
See “fetch subroutine”.
GART
Graphics address remapping table. A set up at initialization time that points to portions of system
memory that a GPU can see.
geometry program
See “geometry shader”.
geometry shader
A program that reads primitives from the VS “DirectX 9 supports two kinds of resources: buffer
and texture. Buffer resources hold a collection of vectors (see “vector”). Texture resources hold a
collection of texels (see “texel”). ring buffer”, and for each input primitive writes one or more
primitives as output to the GS ring buffer. When a geometry shader (GS) is active, an “export
shader” (ES) is required; the ES is typically a “vertex shader” (VS), which can call a “fetch
subroutine”.
GPGPU
General-purpose computing on graphics processing units.
GPR
General-purpose register. Each thread has access to 127 GPRs, 128-bits wide, four of which are
reserved as temporary registers that persist only for one ALU clause (and therefore are not
accessible to fetch or export operations). GPRs hold vectors of four 32-bit IEEE floating-point,
unsigned integer, or signed integer data elements.
GPR count
The number of GPRs that a thread can use. The same count applies to all threads, and it is modified
by the host processor in a configuration register which is not accessible to R600 software.
GPU
Graphics processing unit. The R600 is a GPU.
GRB
Graphics register bus.
GRBM
Graphics register bus manager.
xxii Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
GS
See “geometry shader”.
HAL
Hardware abstraction layer.
iff
If and only if.
import
See “export”.
int(2)
A 2-bit field that specifies an integer value.
instruction
A computing function specified by the *_INST_ field of a microcode format. For example, the
mnemonic CF_INST_JUMP is an jump instruction specified by the CF_DWORD[0,1] microcode-
format pair. All instructions have an *_INST_ prefix in their mnemonic. To simplify reading, most
references to instructions throughout this manual omit the *_INST_ prefix. Compare “opcode”,
“operation”, “slot”, and “instruction group”.
instruction group
A set of one to seven instructions. Each instruction controls one of the five ALUs—
ALU[X,Y,Z,W] and ALU.Trans—and up to two additional slots may be used for literal constants.
Compare “instruction”.
ISA
Instruction set architecture.
kcache
A memory area containing “waterfall” (off-chip) constants. These cache lines of these constants
can be locked. The “Same as “AR” register constant registers” are the 256 on-chip constants.
kernel
A small program that is run repeatedly on a stream of data. A “shader” program is one type of
kernel. Unless otherwise specified, an R600 “program” is a kernel.
kill
To prevent rendering of a “An on-chip buffer that holds vertex parameters associated with entries
in the “position buffer”. pixel”.
lerp
Linear interpolation.
Preface xxiii
AMD R600 Technology ProductID—Rev. 0.31—May 2007
LI
See “loop index”.
LIT
An operation that computes diffuse and specular light components based on an input vector
containing information about shininess and normals to the light. It uses Blinn's lighting equation.
LOD
Level of detail.
loop counter
A hardware-maintained register that is initialized by hardware to zero at the beginning of a loop
and that counts in steps of one. Also called “loop iterator”. Compare “loop index”.
loop increment
The step value added to the “loop index” at each iteration of a loop. Software specifies it with the
CF_CONST field of the CF_DWORD1 microcode format.
loop index initializer
The beginning value of the “loop index”. Software specifies it with the CF_CONST field of the
CF_DWORD1 microcode format.
loop index
The “aL” register. A hardware-maintained register that is initialized by software to a beginning
value (see “loop index initializer”) with the CF_CONST field of the CF_DWORD1 microcode
format. Hardware increments the loop index in “loop increment” steps. Compare “loop counter”.
loop iterator
Same as “loop counter”.
loop register
Same as “aL” and “loop index”.
loop trip count
The maximum number of iterations in a loop. Software specifies it with the CF_CONST field of
the CF_DWORD1 microcode format.
lsb
Least-significant bit.
LSB
Least-significant byte.
microcode format
An encoding format whose fields specify instructions and associated parameters. Microcode
formats are used in sets of two or four. For example, the two mnemonics, CF_DWORD[0,1]
xxiv Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
indicate a microcode-format pair, CF_DWORD0 and CF_DWORD1. The microcode formats and
all of their fields are described in Section 8 on page 259.
mipmaps
A group of related texture maps (bitmaps) at various sizes. Each texture map is the same image,
optimized for the size of the map.
MRT
See “multiple render target”.
msb
Most-significant bit.
MSB
Most-significant byte.
multiple render target
One of multiple areas of local GPU memory, such as a “frame buffer”, to which a graphics pipeline
writes data.
octword
Eight words, or 16 bytes, or 128 bits. Same as “double quadword”.
opcode
The numeric value of the CF_INST field of an “instruction”. For example, the opcode for the
CF_INST_JUMP instruction is decimal 16 (10h).
operation
The function performed by an “instruction”.
page
A program-controlled cache, backing up processor-accessible memory.
PARAM
A parameter, or relating to the parameter cache.
parameter
(1) A graphics parameter stored in the “parameter cache”. (2) An attribute of an “instruction” and
specified in the same microcode format as the instruction.
parameter cache
An on-chip buffer that holds vertex parameters associated with entries in the “position buffer”. pixel
(1) The result of placing a “fragment” in a “frame buffer”. (2) The smallest resolvable unit of a
graphic image. It has a specific luminescence and color.
Preface xxv
AMD R600 Technology ProductID—Rev. 0.31—May 2007
PIXEL
Related to the pixel exports to a “frame buffer”.
pixel program
See “pixel shader”.
pixel shader
A program that (a) reads rasterized data from the “position buffer”, “parameter cache”, and “vertex
geometry translator” (VGT), (b) processes individual pixel quads (see “quad”), and (c) writes
output to up to eight local-memory buffers, called multiple render targets (see “MRT”), including
targets such as a “frame buffer”.
pop
Write “stack” entries to their associated hardware-maintained control-flow state. The
POP_COUNT field of the CF_DWORD1 microcode format specifies the number of stack entries
to pop for instructions that pop the stack. Compare “push”.
position buffer
An off-chip buffer that holds vertex-position data associated with entries in the “parameter cache”.
POS
A position of a vertex, or relating to the “position buffer”.
PRED_SET*
An OP2_INST_PRED_SET* instruction of the ALU_DWORD1_OP2 microcode format.
predicate counter
A counter associated with an “execute mask” that is set in the ALU clause but is used in CF
instructions.
predicate register
A register containing predicate bits. The bits are set or cleared by ALU instructions as the result of
evaluating some condition, and the bits are subsequently used either to mask writing an ALU result
or as a condition itself.
predicate mask
A mask that is valid within a single ALU clause.
primitive
(1) A point, line segment, or polygon before rasterization. It has vertices specified by geometric
coordinates. Additional data can be associated with vertices by means of linear interpolation across
the primitive. (2) A group of one, two, or three vertices that covers some number of fragments or
pixels (points on an integer grid).
xxvi Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
primitive strip
In DirectX, a series of connected triangles. Compare “cut”.processor
Unless otherwise stated, the R600 “GPU”.
program
Unless otherwise specified, a program is a “kernel” that can run on the R600. A “shader” program
is a type of “kernel”.
PS
(1) Previous scalar register. It contains the previous result from a ALU.Trans unit within a given
ALU clause. (2) See “pixel shader”. (3) The PRED_SEL field of the ALU_DWORD0 microcode
format.
push
Read hardware-maintained control-flow state and write their contents onto the “stack”. Compare
“pop”.
PV
Previous vector register. It contains the previous 4-element vector result from a ALU.[X,Y,Z,W]
unit within a given clause.
quad
(1) Four pixel-data elements arranged in a 2-by-2 array. (2) Four pixels representing the four
vertices of a quadrilateral. (3) Same as an independent quad in OpenGL v2.1.
quadword
Four words, or eight bytes, or 64 bits.
RB
See “DirectX 9 supports two kinds of resources: buffer and texture. Buffer resources hold a
collection of vectors (see “vector”). Texture resources hold a collection of texels (see “texel”). ring
buffer”.
reduction buffer
An off-chip buffer used to help compute results across multiple threads, such as accumulate
operations.
relative
Referencing with a displacement (also called offset) from an index register, rather than from the
base address of a program. Contrast with “absolute”.
repeat loop
A loop that does not maintain a loop index. Repeat loops are implemented with the
LOOP_START_NO_AL and LOOP_END instructions.
Preface xxvii
AMD R600 Technology ProductID—Rev. 0.31—May 2007
resource
DirectX 9 supports two kinds of resources: buffer and texture. Buffer resources hold a collection of
vectors (see “vector”). Texture resources hold a collection of texels (see “texel”). ring buffer
An on-chip buffer that indexes itself automatically in a circle. There is “VS” and a “GS” ring
buffer.
Rsvd
Reserved.
SC
Scan converter.
scalar
A single data element, as opposed to a complete four-element “vector”.
scalar ALU
See “ALU.Trans unit”.
scratch buffer
A variable-sized space in off-chip memory that stores some of the “GPR”.
scratch memory
Same as “scratch buffer”.
semantic table
A table that specifies GPRs to which vertex data is to be written.
sequencer
R600 control logic.
set
To write a bit-value of 1. Compare “clear”.
shader
A program or hardware block that defines the graphical surface properties of an object. The
following types of shader programs are common: “vertex shader”, “fetch subroutine”, “export
shader”, “geometry shader”, and “pixel shader”.
SIMD
Single instruction, multiple data. See “ALU.[X,Y,Z,W] unit” and “SIMD pipeline”.
SIMD pipeline
A hardware block (also called a SIMD block or a slice) consisting of five ALUs, one ALU
instruction decoder and issuer, one ALU constant fetcher, and support logic. All parts of a SIMD
xxviii Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
pipeline receive the same instruction and operate on “thread group”. Each SIMD pipeline can
process a separate set of instructions, called a “kernel” or “shader”.
slice
Same as “SIMD pipeline”.
slot
A position, in an “instruction group”, for an “instruction” or an associated literal constant. An
ALU instruction group consists of between one and seven slots, each 64 bits wide. The size of an
ALU clause is the total number of slots required for the clause.
slot size
64 bits.
SMX
Shader memory exporter. A hardware block in the R600 processor.
software-visible
Readable and/or writable by a program running on an R600 processor or the host.
SP
Shader Pipeline. A set of arithmetic and logic units (ALUs) and associated logic. Compare “SIMD
pipeline”.
SPI
Shader pipe interpolator. A hardware block in the R600 processor. It is instrumental in loading
threads for execution.
stack
The R600 hardware maintains a single, multi-entry stack for saving and restoring control-flow
state during the execution of certain instructions that alter the control flow. The stack entries store
the state of nested loops, pixels, predicates, and other execution details. Compare “push” and
“pop”.
stream buffer
A variable-sized space in off-chip memory that holds output data. It is an output-only buffer,
configured by the host processor. It does not store inputs from off-chip memory to the R600
processor.
strip
See “primitive strip”.
swizzle
To copy or move any element in a source vector to any element-position in an result vector.
Preface xxix
AMD R600 Technology ProductID—Rev. 0.31—May 2007
SX
Shader exporter.
TA
Texture address.
TB
Thread buffer.
TC
Texture cache.
texel
Texture element. A texel is the basic unit of texture. The smallest addressable unit of a texture
map.
texture buffer
A read-only portion of off-chip memory that contains texture data.
thread
One invocation of a program executing on a set of vectors. The set of vectors can represent one
vertex, one primitive, or one pixel. Each thread has its own unique state.
thread group
All of the threads (see “thread”) that are simultaneously executing on a “SIMD pipeline”.
TP
Texture pipe.
trip count
Same as “loop trip count”.
VC
Vertex cache.
vector
(1) A set of up to four values of the same data type, each of which is an “element”. One instruction
executing in a “SIMD pipeline” operates on vectors containing 64 vertices, primitives, pixels, or
other data, related or unrelated, in a fixed number of clock cycles. A vector operation is the basic
unit of R600 work. (2) See “ALU.[X,Y,Z,W] unit”.
vertex
A set of x,y (2D) coordinates.
vertex geometry translator
A hardware block that translates vertex geometry.
xxx Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
vertex program
See “vertex shader”.
vertex shader
A program that reads vertices, processes them, and outputs to either the VS “DirectX 9 supports
two kinds of resources: buffer and texture. Buffer resources hold a collection of vectors (see
“vector”). Texture resources hold a collection of texels (see “texel”). ring buffer” or the “parameter
cache” and “position buffer”, depending on whether a “geometry shader” (GS) is active. It does
not introduce new primitives. When a GS is active, a vertex shader is a type of “export shader”
(ES). A vertex shader can call a “fetch subroutine” (FS), which is a special global program for
fetching vertex data; the FS is treated, for execution purposes, as part of the VS. The FS provides
driver independence between the process of fetching data required by a VS, and the VS itself.
vfetch
Vertex fetch.
VGT
See “vertex geometry translator”.
VP
(1) Vector processor. (2) “vertex program”.
VS
See “vertex shader”.
waterfall
To use the address register (AR) for indexing the GPRs. Waterfall behavior is determined by a
“configuration registers”.
word
Two bytes, or 16 bits.
Endian Order
The R600 architecture addresses memory and registers using little-endian byte-ordering and bit-
ordering. Multi-byte values are stored with their least-significant (low-order) byte (LSB) at the lowest
byte address, and they are illustrated with their LSB at the right side. Byte values are stored with their
least-significant (low-order) bit (lsb) at the lowest bit address, and they are illustrated with their lsb at
the right side.
Related Documents
• CTM HAL Programming Guide. Published by AMD.
• ATI Intermediate Language (IL) Compiler Reference Manual. Published by AMD.
Preface xxxi
AMD R600 Technology ProductID—Rev. 0.31—May 2007
xxxii Preface
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
1 Introduction
The R600 processor implements a parallel microarchitecture that provides an excellent platform not
only for computer graphics applications but also for general-purpose streaming applications. Any
data-intensive application that can be mapped to a 2D matrix is a potential candidate for running on the
R600.
Figure 1-1 shows a block diagram of the R600 processor. It includes a data-parallel processor (DPP)
array, a command processor, a memory controller, and other logic (not shown). The R600 command
processor reads commands that the host has written to memory-mapped R600 registers in the system-
memory address space, and the command processor sends hardware-generated interrupts to the host
when the command is completed. The R600 memory controller has direct access to all of R600 local
memory and the host-specified areas of system memory. In addition to satisfying read and write
requests, the memory controller performs the functions of a direct-memory access (DMA) controller,
including computing memory-address offsets based on the format of the requested data in memory.
Host Interrupts
Application
Commands, Instructions and data
System-Memory R600
Address Space Command Processor
Memory-Mapped
R600 Registers
Commands
Instructions
Constants
Memory Controller
Inputs
Outputs
Instructions
Constants
Inputs
Outputs
A host application cannot write to R600 local memory directly, but it can command the R600 to copy
programs and data from system memory to R600 memory, or vice versa. A complete application for
the R600 includes two parts: a program running on the host processor, and programs—called kernels
Introduction 1
AMD R600 Technology ProductID—Rev. 0.31—May 2007
or shaders—running on the R600 processor. The R600 programs are controlled by host commands,
which do such things as set R600-internal base-address and other configuration registers, specify the
data domain on which the R600 is to operate, invalidate and flush caches on the R600, and cause the
R600 to begin execution of a program. The R600 driver program runs on the host.
The DPP array is the heart of the R600 processor. The array is organized as a set of SIMD pipelines,
each independent from the other, that operate in parallel on streams of 32-bit floating-point or integer
data. The SIMD pipelines can process data or, via the memory controller, transfer data to or from
memory. Computation in a SIMD pipeline can be made subject to a condition. Outputs written to
memory can also be made subject to a condition. R600 software stores data to memory by first
allocating space in a memory buffer and then exporting data from GPRs to that buffer. The R600
export facility is also used to import (read) data from memory.
Host commands request a SIMD pipeline to execute a kernel by passing it an identifier pair (x, y), a
conditional value, and the location in memory of the kernel code. Upon receiving a request, a SIMD
pipeline loads instructions and data from memory, begins execution, and continues until the end of the
kernel. As kernels are running, the R600 hardware automatically fetches instructions and data from
memory into on-chip caches; R600 software plays no role in this. In addition, R600 software can load
data from off-chip memory into on-chip GPRs and caches.
Conceptually, each SIMD pipeline maintains a separate interface to memory, consisting of index pairs
and a field identifying the type of request (program instruction, floating-point constant, integer
constant, boolean constant, input read, or output write). The index pairs for inputs, outputs, and
constants are specified by the requesting R600 instructions from hardware-maintained program state
in the pipelines.
R600 programs do not support exceptions, interrupts, errors, or any other events that can interrupt its
pipeline operation. In particular, it does not support IEEE floating-point exceptions. The interrupts
shown in Figure 1-1 from the command processor to the host represent hardware-generated interrupts
for signalling command-completion and related management functions.
2 Introduction
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Figure 1-2 shows a programmer’s view of dataflow for three versions of an R600 application. The top
version (a) is a graphics application that includes a geometry shader program and a DMA copy
program. The middle version (b) is a graphics application without a geometry shader and DMA copy
program. The bottom version (c) is a general-purpose application. The square blocks represent
programs running on the DPP array. The circles and cloud represents non-programmable hardware
functions. For graphics applications, each block in the chain processes a particular kind of data and
passes its result on to the next block. For general-purpose applications, only one processing block
performs all computation.
Texture Data
(Local or
System Memory)
Vertex Data
(Local Memory)
Texture Data
(Local or
System Memory) DC DMA Copy Program
GS Geometry Shader Program
PaC Parameter Cache
PoC Frame Data
VS Rasterizer PS (Local Memory)
PoC Position Cache
PaC PS Pixel Shader Program
RB Ring Buffer
Vertex Data VS Vertex Shader Program
(Local Memory)
Input Data
(System Memory)
Output Data
VS (Local Memory)
Input Data
(Local Memory)
The dataflow sequence starts by reading 2D vertices, 2D textures, or other 2D data from local R600
memory or system memory, and it ends by writing 2D pixels or other 2D data results to local R600
memory. The R600 processor hides memory latency by keeping track of potentially hundreds of
threads in different stages of execution, and by overlapping compute operations with memory-access
operations.
Introduction 3
AMD R600 Technology ProductID—Rev. 0.31—May 2007
The remainder of this manual describes the instruction set architecture (ISA) supported by the R600
processor. For more information about the host commands used to control the R600 processor, see the
CTM HAL Programming Guide.
4 Introduction
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
This processing configuration begins with the VS program sending a pointer to a buffer in local
memory containing up to 64 vertex indices. The R600 hardware then groups the vectors for these
vertices in its input buffers. When all vertices are ready to be processed, the R600 allocates GPRs and
thread space for the processing of each of the 64 vertices, based on compiler-provided sizes. The VS
program calls the fetch subroutine (FS) program, which fetches vertex data into GPRs and returns
control to the VS program. Then, the transform and lighting (and whatever else) part of the VS
program runs. The VS program allocates space in the position buffer and exports positions (XYZW).
Before exiting, the VS program allocates parameter-cache and position-buffer space and exports
parameters and positions for each vertex. The program exits, and the R600 deallocates its GPR space.
When the VS program completes, the pixel shader (PS) program begins. The R600 hardware
assembles primitives from data in the position buffer and the vertex geometry translator (VGT),
performs scan conversion and final pixel interpolation, and loads these values into GPRs. The PS
program then runs for each pixel. Upon completion, the program exports data to a frame buffer, and
the R600 deallocates its GPR space.
In this processing configuration, the R600 hardware loads input indices or primitive and vertex IDs
from the vertex geometry translator (VGT) into GPRs. Then, the VS program fetches the vertex or
vertices needed, and the transform and lighting (and whatever else) part of the VS program runs. The
VS program ends by writing vertices out to the VS ring buffer.
Next, the GS program reads multiple vertices from the VS ring buffer, executes its geometry functions,
and outputs one or more vertices per input vertex to the GS ring buffer. Whereas a VS program can
only write a single vertex per single input, a GS program can write a large number of vertices per
single input. Every time a GS program outputs a vertex, it indicates to the vertex VGT that a new
vertex has been output (using EMIT_* instructions1). The VGT counts the total number of vertices
created by each GS program. The GS program divides primitive strips by issuing CUT_VERTEX
instructions. The GS program ends when all vertices have been output. No position or parameters is
exported.
Then, the DC program reads the vertex data from the GS ring buffer and transfers this data to the
parameter cache and position buffer using one of the MEM* memory export instructions. The DC
program exits, and the R600 deallocates the GPR space.
Finally, the PS program runs. The R600 assembles primitives from data in the position buffer,
parameter cache, and VGT. The hardware performs scan conversion and final pixel interpolation, and
hardware loads these values into GPRs. The PS program then runs. When the program reaches the end
of the data, it exports the data to a frame buffer or other render target (up to eight) using EXPORT
instructions. The program exits upon execution of an EXPORT_DONE instruction, and the processor
deallocates GPR space.
1. An asterisk (*) after a mnemonic string indicates that there are additional characters in the string that define variants.
Control flow instructions constitute the main program. Jump statements, loops, and subroutine calls
are expressed directly in the control flow part of the program. Control flow instructions also include
mechanisms to synchronize operations and indicate when a clause has completed. Finally, the control
flow instructions are required for buffer allocation in, and writing to, a program block’s output buffer.
Some program types (VS, GS, DC, PS) have specific control flow instructions for synchronization
with other blocks.
Each clause, invoked by a control flow instruction, is a sequential list of instructions of limited length
(for the maximum length, see sections on individual clauses, below). Clauses contain no flow control
statements, but ALU clause instructions can apply a predicate on a per-instruction basis. Instructions
within a single clause execute serially. Multiple clauses of a program may execute in parallel if they
contain instructions of different types and the clauses are independent of one another (such parallel
execution is invisible to the programmer except for increased performance).
ALU clauses contain instructions for performing operations in each of the five ALUs
(ALU.[X,Y,Z,W] and ALU.Trans) including setting and using predicates, and pixel kill operations
(see Section 4.8.1 on page 57). Texture-fetch clauses contain instructions for performing texture and
constant-fetch reads from memory. Vertex-fetch clauses are devoted to obtaining vertex data from
memory. Systems lacking a vertex cache can perform vertex-fetch operations in a texture clause
instead.
A predicate is a bit that is set or cleared as the result of evaluating some condition, and is subsequently
used either to mask writing an ALU result or as a condition itself. There are two kinds of predicates,
both of which are set in an ALU clause. The first is a single predicate local to the ALU clause itself.
Once computed, the predicate can be referred to in a subsequent instruction to conditionally write an
ALU result to the indicated general purpose register or registers. The second type is a bit in a predicate
stack. An ALU clause computes the predicate bits in the stack and manipulates the stack. A predicate
bit in the stack may be referred to in a control-flow instruction to induce conditional branching.
a. An asterisk (*) after a mnemonic string indicates that there are additional characters in the string that define variants.
• Signal that the geometry shader (GS) has finished exporting a vertex, and optionally the end of a
primitive strip as well.
The end of the CF program is marked by setting the END_OF_PROGRAM bit in the last CF
instruction in the program. The CF program terminates after the end of this instruction, regardless of
whether the instruction is conditionally executed.
CF_ALU_DWORD1 +4
CF_ALU_DWORD0 +0
• CF microcode instructions that reserve storage space in an input or output buffer, write data from
GPRs into an output buffer, or read data from an input buffer into GPRs use the following memory
layout:
31 24 23 16 15 8 7 0
CF_ALLOC_IMP_EXP_DWORD1_{BUF, SWIZ} +4
CF_ALLOC_IMP_EXP_DWORD0 +0
CF_DWORD1 +4
CF_DWORD0 +0
A few fields are available in the majority of CF microcode formats. These include:
• END_OF_PROGRAM Field—A program will terminate after executing an instruction with the
END_OF_PROGRAM bit set, even if the instruction is conditional and no pixels are active during
the execution of the instruction. The stack must be empty when the program encounters this bit;
otherwise, results are undefined when the program restarts on new data or a new program starts.
Thus, instructions inside of loops or subroutines must not be marked with END_OF_PROGRAM.
• BARRIER Field—This expresses dependencies between instructions and allows parallel
execution. If the BARRIER bit is set, all prior instructions will complete before the current
instruction begins. If the BARRIER bit is cleared, the current instruction may co-issue with other
instructions. Instructions of the same clause type never co-issue, but instructions in a texture-fetch
clause and an ALU clause, for example, can co-issue if the BARRIER bit is cleared. If in doubt, set
the BARRIER bit; results are identical whether it is set or not, but using it only when required can
increase program performance.
• VALID_PIXEL_MODE Field—If set, instructions in the clause are executed as if invalid pixels are
inactive. This field is the complement to the WHOLE_QUAD_MODE field. Only one of
WHOLE_QUAD_MODE or VALID_PIXEL_MODE should be set at any one time.
• WHOLE_QUAD_MODE Field—If set, instructions in the clause are executed as if all pixels are
active and valid. This field is the complement to the VALID_PIXEL_MODE field. Only one of
WHOLE_QUAD_MODE or VALID_PIXEL_MODE should be set at any one time.
a. These instructions use the CF_ALU_DWORD[0,1] microcode formats, described in Section 8.1 on page 261.
b. See Section 4.3 on page 40 for a description of ALU slots.
c. These instructions use the CF_DWORD[0,1] microcode formats, described in Section 8.1 on page 261.
d. These instructions use the CF_DWORD[0,1] microcode formats, described in Section 8.1 on page 261.
Each memory write may be swizzled with the fields SEL_[X,Y,Z,W]. To disable writing an element,
write SEL_[X,Y,Z,W] = SEL_MASK.
The RW_GPR and RW_REL fields indicate the GPR address (first_gpr) to read the first value from, or
write the first value to (the GPR address may be relative to the loop register). The value
(BURST_COUNT + 1) * (ELEM_SIZE + 1) is the number of outputs, in doublewords, being written.
The BURST_COUNT and ELEM_SIZE fields store the actual number minus one. ELEM_SIZE must
be three (representing four doublewords) for scratch and reduction buffers, and it is intended that
ELEM_SIZE = 0 (doubleword) for stream-out and ring buffers.
The memory address is based off of the value in the ARRAY_BASE field (see Table 3-3 on page 26).
If the TYPE field is set to EXPORT_*_IND (use_index == 1), then the value contained in the register
specified by the INDEX_GPR field, multiplied by (ELEM_SIZE + 1), is added to this base. The final
equation for the first address in memory to read or write from (in doublewords) is:
first_mem = (ARRAY_BASE + use_index * GPR[INDEX_GPR]) * (ELEM_SIZE + 1)
The ARRAY_SIZE field specifies a point at which the burst will be clamped; no memory will be read
or written past (ARRAY_BASE + ARRAY_SIZE) * (ELEM_SIZE + 1) doublewords. The exact units
of ARRAY_BASE and ARRAY_SIZE differ depending on the memory type; for scratch and
reduction buffers, both are in units of four doublewords (128 bits); for stream and ring buffers, both are
in units of one doubleword (32 bits).
Indexed GPRs may stray out of bounds; if the index takes a GPR address out of bounds, then the rules
specified for ALU GPR reads and writes govern, except for a memory read in which the result is
written to GPR0. See Section 4.6.3 on page 44.
active state are overwritten with the stack contents on each pop, without regard for the current active
state, but when VALID_PIXEL_MODE is set the invalid pixels are deactivated even though they were
active going into the conditional scope.
The following steps loosely illustrate how the per-pixel state may be updated during a CF instruction
that does not unconditionally pop the stack:
1. Evaluate the condition test for each pixel using current state, COND, WHOLE_QUAD_MODE,
and VALID_PIXEL_MODE.
2. Execute the CF instruction for pixels passing the condition test.
3. If the CF instruction is a PUSH, push per-pixel active state onto the stack before updating the
state.
4. If the CF instruction updates the per-pixel state, update per-pixel state using results of condition
test.
ALU clauses that contain multiple PRED_SET* instructions may perform some of these operations
more than once. Such clause instructions push the stack once per PRED_SET* operation.
The following steps loosely illustrate how the active mask (per-pixel state) may be updated during a
CF instruction that pops the stack. These steps only apply to instructions that unconditionally pop the
stack; instructions that may jump or pop if all pixels fail the condition test do not use these steps:
1. Pop the per-pixel state from the stack (may pop zero or more times). Change the per-pixel state to
the result of the last POP.
2. Evaluate the condition test for each pixel using new state, COND, WHOLE_QUAD_MODE, and
VALID_PIXEL_MODE.
3. Update the per-pixel state again using results of condition test.
Each stack entry contains a number of subentries. The number of subentries per stack entry varies,
based the number of thread groups (simultaneously executing threads on a SIMD pipeline) per
program type that are supported by the target processor. If a processor that supports 64 thread groups
per program type is configured logically to use only 48 thread groups per program type, the stack
requirements for a 64-item processor still apply. Table 3-5 shows the number of subentries per stack
entry, based on the physical thread-group width of the processor.
The CALL*, LOOP_START*, and PUSH* instructions each consume a certain number of stack
entries or subentries. These entries are released when the corresponding POP, LOOP_END, or
RETURN instruction is executed. The additional stack space required by each of these flow-control
instructions is described in Table 3-6.
At any point during the execution of a program, if A is the total number of full entries in use, and B is
the total number of subentries in use, then STACK_SIZE should be:
A + B / (# of subentries per entry) <= STACK_SIZE
branch-loop instructions are listed in Table 3-7, along with a summary of their operations. The
instructions listed in this table implicitly begin with “CF_INST_”.
matching LOOP_END instruction. If LOOP_START does not jump, hardware sets up the internal
loop state. Loop-index-relative addressing (as specified by the INDEX_MODE field of the
ALU_DWORD0 microcode format) is well-defined only within the loop. If multiple loops are nested,
relative addressing refers to the loop register of the innermost loop. The loop register of the next-outer
loop is automatically restored when the innermost loop exits.
The LOOP_END instruction jumps to the address specified in the instruction’s ADDR field if the loop
count is nonzero after it is decremented, and at least one pixel hasn’t been deactivated by a
LOOP_BREAK instruction. Software normally sets the ADDR field to the CF instruction following
the matching LOOP_START. The LOOP_END instruction will continue to the next CF instruction
when the processor exits the loop.
DirectX9-style break and continue instructions are supported. The LOOP_BREAK instruction
disables all pixels for which the condition test is true. The pixels remain disabled until the innermost
loop exits. LOOP_BREAK jumps to the end of the loop if all pixels have been disabled by this (or a
prior) LOOP_BREAK or LOOP_CONTINUE instruction. Software normally sets the ADDR field to
the address of the matching LOOP_END instruction. If at least one pixel hasn’t been disabled by
LOOP_BREAK or LOOP_CONTINUE yet, execution continues to the next CF instruction.
The LOOP_CONTINUE instruction disables all pixels for which the condition test is true. The pixels
remain disabled until the end of the current iteration of the loop, and are re-activated by the innermost
LOOP_END instruction. The LOOP_CONTINUE instruction jumps to the end of the loop if all pixels
have been disabled by this (or a prior) LOOP_BREAK or LOOP_CONTINUE instruction. The ADDR
field points to the address of the matching LOOP_END instruction. If at least one pixel hasn’t been
disabled by LOOP_BREAK or LOOP_CONTINUE yet, the program continues to the next CF
instruction.
Each instruction is capable of manipulating the stack. LOOP_START pushes the current per-pixel
state and the prior loop state onto the stack. If LOOP_START does not enter the loop, it pops
POP_COUNT entries (may be zero) from the stack, similar to the behavior of the PUSH instruction
when all pixels fail. The LOOP_END instruction evaluates the condition test at the beginning of the
instruction. If all pixels fail the test it exits the loop. LOOP_END pops loop state and one set of per-
pixel state from the stack when it exits the loop. It ignores POP_COUNT. The LOOP_BREAK and
LOOP_CONTINUE instructions pop POP_COUNT entries (may be zero) from the stack if the jump is
taken.
Manipulations of the stack are the same for LOOP_{START_DX10,END} instructions as those for
LOOP_{START,END} instructions.
3.7.6 Subroutines
The CALL and RETURN instructions implement subroutine calls and the corresponding returns. For
CALL, the ADDR field specifies the address of the first CF instruction in the subroutine. The ADDR
field is ignored by the RETURN instruction (the return address is read from the stack). Calls have a
nesting depth associated with them that is incremented on each CALL instruction via the
CALL_COUNT field. The nesting depth is restored on a RETURN instruction. If the program would
exceed the maximum nesting depth (32) on the subroutine call (current nesting depth +
CALL_COUNT > 32), then the call is ignored. Setting CALL_COUNT to zero prevents the nesting
depth from being updated on a subroutine call. Execution of a RETURN instruction when the program
is not in a subroutine is illegal.
The CALL_FS instruction calls a fetch subroutine (FS) whose address is relative to the address
specified in a host-configured register. The instruction also activates the fetch-program mode, which
affects other operations until the corresponding RETURN instruction is reached. Only a vector shader
(VS) program can call an FS subroutine, as described in Section 2.1 on page 5.
The CALL and CALL_FS instructions may be conditional. The subroutine is skipped if and only if all
pixels fail the condition test or the nesting depth would exceed 32 after the call. The POP_COUNT
field should be zero for CALL and CALL_FS.
4 ALU Clauses
Software initiates an ALU clause with one of the CF_INST_ALU* control-flow instructions, all of
which use the CF_ALU_DWORD[0,1] microcode formats. Instructions within an ALU clause are
called “ALU instructions”. They perform operations using the scalar ALU.[X,Y,Z,W] and ALU.Trans
units, which are described in this chapter.
31 24 23 16 15 8 7 0
ALU_DWORD1_{OP2, OP3} +4
ALU_DWORD0 +0
127 96 95 64 63 32 31 0
The processor contains multiple sets of five scalar ALUs. Four ALUs in each set can perform scalar
operations on up to three 32-bit data elements each, with one 32-bit result. The ALUs are called
ALU.X, ALU.Y, ALU.Z, and ALU.W—or simply ALU.[X,Y,Z,W]. A fifth unit, called ALU.Trans,
performs one scalar operation, the same as those that the ALU.[X,Y,Z,W] units perform, plus
additional operations for transcendental and advanced integer functions, and it can replicate the result
ALU Clauses 39
AMD R600 Technology ProductID—Rev. 0.31—May 2007
across all four elements of a destination vector. Although the processor has multiple sets of these five
scalar ALUs, R600 software can assume that, within a given ALU clause, all instructions will be
processed by a single set of five ALUs.
Software issues ALU instructions in variable-length groups—called instruction groups—that perform
parallel operations on different elements of a vector, as described in Section 4.3 on page 40. The
ALU.[X,Y,Z,W] units are nearly identical in their functions. They differ only in which vector
elements they write their result to at the end of the instruction, and in certain reduction operations (see
Section 4.8.2 on page 60). The ALU.Trans unit can write to any vector element and can evaluate
additional functions.
ALU instructions can access 256 constants from the constant registers and 128 GPRs (each thread
accesses its own set of 128 GPRs). Constant-register addresses and GPR addresses can be absolute,
relative to the loop index (aL), or relative to an index GPR. In addition to reading constants from the
constant registers, an ALU instruction can refer to elements of a literal constant that is embedded in the
instruction group. Instructions also have access to two temporary registers that contain the results of
the previous instruction groups. The previous vector (PV) register contains a 4-element vector that is
the previous result from the ALU.[X,Y,Z,W] units, and the previous scalar (PS) register contains a
scalar that is the previous result from the ALU.Trans unit.
Each instruction has its own set of source operands—SRC0 and SRC1 for instructions using the
ALU_DWORD1_OP2 microcode format, and SRC0, SRC1, and SRC2 for instructions using the
ALU_DWORD1_OP3 microcode format. An instruction group that operates on a 4-element vector is
specified as (at a minimum) four independent scalar instructions, one for each vector element. As a
result, vector operations may perform a complex mix of vector-element and constant swizzles, and
even swizzles across GPR addresses (subject to read-port restrictions, see below). Traditional floating-
point and integer constants for common values (for example, 0, -1, 0.0, 0.5, and 1.0) may be specified
for any source operand.
Each ALU.[X,Y,Z,W] unit writes to an instruction-specified GPR at the end of the instruction. The
GPR address may be absolute, relative to the loop index, or relative to an index GPR. The
ALU.[X,Y,Z,W] units always write to their corresponding vector element, but each unit may write to a
different GPR address. The ALU.Trans unit may write to any vector element of any GPR address. The
outputs of each ALU unit may be clamped to the range [0.0, 1.0] prior to being written, and some
operations may multiply the output by a factor of 2.0 or 4.0.
40 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
and literal constants. The ALU clause size in the CF program is specified as the total number of slots
occupied by the ALU clause.
An ALU instruction group consists of up to five slots. Each instruction in the group has a LAST bit
that is set only for the last instruction in the group. The LAST bit delimits instruction groups from one
another, allowing the R600 hardware to implement parallel processing for each instruction group.
Each instruction has the same bit fields in its microcode format, and each instruction is distinguished
by the destination vector element to which it writes. An instruction is assigned to the ALU.Trans unit if
a prior instruction in the group writes to the same vector element of a GPR, or the instruction is a
transcendental operation.
Up to four of the five instruction slots in an instruction group may be omitted, and the instructions
must be in the following order:
1. Scalar instruction for ALU.X unit.
2. Scalar instruction for ALU.Y unit.
3. Scalar instruction for ALU.Z unit.
4. Scalar instruction for ALU.W unit.
5. Scalar instruction for ALU.Trans unit.
In addition, if any instructions refer to a literal constant by specifying the ALU_SRC_LITERAL value
for a source operand, the first, or both, of the following 2-element literal constant slots must be
provided (the second of these two slots cannot be specified alone):
6. X, Y elements of literal constant (X is the first doubleword).
7. Z, W elements of literal constant (Z is the first doubleword).
There is no LAST bit for literal constants. The number of the literal constants is known from the
operations specified in the instruction.
Given the options described above, the size of an ALU instruction group can range from 64 bits to 448
bits, in increments of 64 bits.
1. This ambiguity is resolved by a bit in the processor state, CONFIG.ALU_INST_PREFER_VECTOR, that is program-
mable only by the host. When the bit is set, ambiguous slots are assigned to ALU.Trans. When cleared (default),
ambiguous slots are assigned to one of ALU.[X,Y,Z,W]. This setting applies to all thread types.
ALU Clauses 41
AMD R600 Technology ProductID—Rev. 0.31—May 2007
The following algorithm illustrates the assignment of instruction-group slots to ALUs. The instruction
order described in Section 4.3 on page 40 must be observed. As a consequence, if the ALU.Trans unit
is specified, it must be done with an instruction that has its LAST bit set.
begin
ALU_[X,Y,Z,W] := undef;
ALU_TRANS := undef;
for $i = 0 to number of instructions – 1
$elem := vector element written by instruction $i;
if instruction $i is transcendental only instruction
$trans := true;
elsif instruction $i is vector-only instruction
$trans := false;
elsif defined(ALU_$elem) or (not CONFIG.ALU_INST_PREFER_VECTOR and
instruction $i is LAST)
$trans := true;
else
$trans := false;
if $trans
if defined(ALU_TRANS)
assert “ALU.Trans has already been allocated,
cannot give to instruction $i.”;
ALU_TRANS := $i;
else
if defined(ALU_$elem)
assert “ALU.$elem has already been allocated,
cannot give to instruction $i.”;
ALU_$elem := $i;
end
After all instructions in the instruction group are processed, any ALU.[X,Y,Z,W] or ALU.Trans
operation that is unspecified implicitly executes a NOP instruction, thus invalidating the values in the
corresponding elements of the PV and PS registers.
42 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
2. The number of clause temporaries can be programed only by the host processor using the configuration-register field
GPR_RESOURCE_MGMT_1.NUM_CLAUSE_TEMP_GPRS. A typical setting for this field is 4. If the field has N >
0, then GPR[127 – N + 1, 127] are set aside as clause temporaries.
ALU Clauses 43
AMD R600 Technology ProductID—Rev. 0.31—May 2007
The term flow-control loop index refers to the DirectX9-style loop index. Each instruction gets its own
INDEX_MODE control, so a single instruction group may still refer to more than one type of index.
When using an AR index, the index must be initialized by a MOVA* operation that is present in a prior
instruction group of the same clause. As a consequence, AR indexing is never valid on the first
instruction of a clause.
An AR index cannot be used in an instruction group that executes a MOVA* instruction in any slot.
Any slot in an instruction group with a MOVA* instruction using relative constant addressing may use
only an INDEX_MODE of INDEX_LOOP. To issue a MOVA* from an AR-relative source, the
source must be split into two separate instruction groups, the first performing a MOV from the relative
source into a temporary GPR, and the second performing a MOVA* on the temporary GPR.
Only one AR element can be used per instruction group. For example, it is not legal for one slot in an
instruction group to use INDEX_AR_X, and another slot in the same instruction group to use
INDEX_AR_Y. Also, AR cannot be used to provide relative indexing for a kcache constant. kcache
constants may use only the INDEX_LOOP mode for relative indexing.
GPR clause temporaries may not be indexed.
44 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Assume N GPRs are declared per thread and K clause temporaries are also declared. The GPR base
address specified in SRC*_SEL must be in either the interval [0, N – 1] (normal clause GPR) or [128 –
K, 127] (clause temporary), before any relative index is applied. If SRC*_SEL is a GPR address and
does not fall into either of these intervals, the resulting behavior is undefined. You cannot, for
example, write code that generates GPRN[-1] to read from the last GPR in a program.
If a GPR read with base address in [0, N – 1] is indexed relatively, and the base plus the index is
outside the interval [0, N – 1], then the value read will always be GPR0 (including for texture- and
vertex-fetch instructions and imports and exports). If a GPR write with base address in [0, N – 1] is
indexed relatively, and the base plus the index is outside the interval [0, N – 1], then the write will be
inhibited (including for texture- and vertex-fetch instructions), unless the instruction is a memory read.
If the instruction is a memory read, the result will be written to GPR0. Relative addressing on GPR
clause temporaries is illegal. Therefore, the behavior is undefined if a GPR with base address in the
range [128 – K, 127] is used with a relative index.
A constant-register base address is always be in-bounds. If a constant-register read is indexed
relatively, and the base plus the index is outside the interval [0, 255], then the value read is NaN
(7FFFFFFFh).
If a kcache base address refers to a cache line that is not locked, the result is undefined. You cannot
refer to kcache constants [0, 15] if the mode (as set by the CF instruction initiating the ALU clause) is
KCACHE_NOP, and you cannot refer to kcache constants [16, 31] if the mode is KCACHE_NOP or
KCACHE_LOCK_1. If a kcache read is indexed relatively and one cache line is locked with
KCACHE_LOCK_1, and the base plus the index is outside the interval [0, 15], then the value read is
NaN (7FFFFFFFh). If a kcache read is indexed relatively and two cache lines are locked, and the base
plus the index is outside the interval [0, 31], then the value read is NaN (7FFFFFFFh).
Constant Cache. Each ALU clause can lock up to four sets of constants into the constant cache.
Each set (one cache line) is 16 128-bit constants. These are split into two groups. Each group can be
from a different constant buffer (out of 16 buffers). Each group of two constants consists of either
[Line] and [Line+1] or [line + loop_ctr] and [line + loop_ctr +1].
ALU Clauses 45
AMD R600 Technology ProductID—Rev. 0.31—May 2007
Literal (in-line) Constants. Literal constants are stored in the instruction store immediately after the
instruction that uses it, and they count against the 16-32 instruction maximum for a clause. Although
only one constant is supplied, multiple arguments in the instruction can reference this constant with
different swizzles. These constants are four 32-bit values and cannot be swizzled.
Statically-indexed Constant Access. The constant-file entries can be accessed either with absolute
addresses, or addresses relative to the current loop index (aL) (static indirect access). In both cases, all
pixels in the vector pick the same constant to use and there is no performance penalty. Swizzling is
allowed.
46 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
ALU Clauses 47
AMD R600 Technology ProductID—Rev. 0.31—May 2007
48 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Each ALU.Trans operation may reference at most two constants of any type. For example, all of the
following are legal, and the four slots shown may occur as a single instruction group:
GPR0.X <= C0.X + GPR0.X
GPR0.Y <= 1.0 + C1.Y // Can mix cfile and non-cfile in one instruction group.
GPR0.Z <= C2.X + GPR0.Z // Multiple reads from cfile X bank are OK.
GPR0.W <= C3.Z + C0.X // Reads from four distinct cfile addresses are OK.
ALU Clauses 49
AMD R600 Technology ProductID—Rev. 0.31—May 2007
In this configuration, if an operand is referenced more than once in a scalar operation, it must be
loaded in two different cycles, sacrificing two read ports. For example:
However, as a special case, if src0 and src1 in an instruction refer to the same GPR element, only one
read port will actually be used, on the cycle corresponding to src0 in the bank swizzle. This
optimization exists to facilitate squaring operations (MUL* x, x, and DOT* v, v). The following
example illustrates the use of this optimization to perform square operations that do not consume more
than one read port per GPR element.
* src1 is shared and fetches its data on the same cycle that src0 fetches. No actual read port is used up in the marked
cycles.
In the above example, the swizzle selects for src0 are used to determine which cycle to load the shared
operand on. The swizzle selects for src1 are ignored. The following programming is legal, even though
at first glance the bank swizzles might suggest it is not.
* src1 is shared and fetches its data on the same cycle that src0 fetches. No actual read port is used up in the marked
cycles.
50 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
This optimization only applies when src0 and src1 share the same GPR element in an instruction. It
does not apply when src0 and src2, nor when src1 and src2, share a GPR element.
Software cannot read two or more values from the same GPR vector element on a single cycle. For
example, software cannot read GPR1.X and GPR2.X on cycle 0 (this restriction does not apply to
constant registers or literal constants). For example, the following programming is illegal:
Software can use BANK_SWIZZLE to work around this limitation, as shown below.
** The above examples illustrate that once a value is read into CYCLEN_DATA, multiple instructions can reference that
value.
The temporary registers PV and PS have no cycle restrictions. Any element in PV or PS can be
accessed on any cycle. Constant operands can be accessed on any cycle.
ALU Clauses 51
AMD R600 Technology ProductID—Rev. 0.31—May 2007
Multiple operands in ALU.Trans may read from the same cycle (this differs from the ALU.[X,Y,Z,W]
case). Not all possible permutations are available. If needed, the unspecified permutations can be
obtained by applying an appropriate inverse mapping on the ALU.[X,Y,Z,W] slots.
Here is an example illustrating how ALU.Trans operations may use unused read ports from GPR
instructions (in all of the following examples, the last instruction in an instruction group is always an
ALU.Trans operation):
When an operand is used by one of ALU.[X,Y,Z,W] units, it may also be used to load an operand into
the ALU.Trans unit:
Any element in PV or PS can be accessed by ALU.Trans, and generally it will be loaded as soon as
possible. PV or PS can be loaded on any cycle, but when constant operands are present the available
bank swizzles may be constrained (see below).
Bank Swizzle with Constant Operands. If the transcendental operation uses a single constant
operand (any type of constant), then the remaining GPR operands must not be loaded on cycle 0. The
instruction group:
GPR0.X <= GPR1.X * GPR2.Y + CFILE0.Z
52 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
ALU Clauses 53
AMD R600 Technology ProductID—Rev. 0.31—May 2007
The following procedure attempts to reserve the GPR read for address $sel and vector element $elem
on cycle number $cycle:
54 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
The following procedure attempts to reserve the constant file read for address $sel and vector element
$elem:
The following procedure is executed for each ALU.[X,Y,Z,W] operation specified in the instruction
group:
procedure check_vector
begin
for $src in {0, ..., number_of_operands(ALU_INST)}
$sel := SRC$src_SEL;
$elem := SRC$src_ELEM;
if isgpr($sel)
$cycle := cycle_for_bank_swizzle(BANK_SWIZZLE, $src);
if $src == 1 and $sel == SRC0_SEL and $elem == SRC0_ELEM
// Nothing to do; special-case optimization,
second source uses first source’s reservation
else
reserve_gpr($sel, $elem, $cycle);
elsif isconst($sel)
// Any constant, including literal and inline constants
if iscfile($sel)
reserve_cfile($sel, $elem);
else
// No restrictions on PV, PS
end
ALU Clauses 55
AMD R600 Technology ProductID—Rev. 0.31—May 2007
Finally, the following procedure is executed for an ALU.Trans operation, if it is specified in the
instruction group. The ALU.Trans unit will attempt to reuse an existing reservation whenever
possible. The constant unit cannot use cycle 0 for GPR loads if one constant operand is specified, and
must use cycle 2 for GPR load if two constant operands are specified.
procedure check_scalar
begin
$const_count := 0;
for $src in {0, ..., number_of_operands(ALU_INST)}
$sel := SRC$src_SEL;
$elem := SRC$src_ELEM;
if isconst($sel)
// Any constant, including literal and inline constants
if $const_count >= 2
assert “More than two references to a constant in transcendental oper-
ation.”;
$const_count++;
if iscfile($sel)
reserve_cfile($sel, $elem);
for $src in {0, ..., number_of_operands(ALU_INST)}
$sel := SRC$src_SEL;
$elem := SRC$src_ELEM;
if isgpr($sel)
$cycle := cycle_for_bank_swizzle(BANK_SWIZZLE, $src);
if $cycle < $const_count
assert “Cycle $cycle for GPR load conflicts with constant
load in transcendental operation.”;
reserve_gpr($sel, $elem, $cycle);
elsif isconst($sel)
// Constants already processed
else
// No restrictions on PV, PS
end
56 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
ALU Clauses 57
AMD R600 Technology ProductID—Rev. 0.31—May 2007
58 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
KILL and PRED_SET* Instruction Restrictions. Only a pixel shader (PS) program can execute a
pixel kill (KILL) instruction. This instruction is illegal in other program types. A KILL instruction
should always be the last instruction in an ALU clause, because the remaining instructions executed in
the clause will not reflect the updated valid state after the kill operation. Two KILL instructions cannot
be co-issued.
The term “PRED_SET*” is used to describe any instruction that computes a new predicate value that
may update the local predicate or execute mask. Two PRED_SET* instructions cannot be co-issued.
Also, PRED_SET* and KILL instructions cannot be co-issued. Behavior is undefined if any of these
co-issue restrictions are violated.
ALU Clauses 59
AMD R600 Technology ProductID—Rev. 0.31—May 2007
Reduction Instruction Restrictions. When any of the reduction instructions (DOT4, DOT4_IEEE,
CUBE, and MAX4) is used, it must be executed on all four elements of a single vector. Reduction
operations only compute one output, so the values in the OMOD and CLAMP fields should be the
same for all four instructions.
MOVA* Restrictions. All MOVA* instructions, shown in Table 4-4, write vector elements of the
address register (AR). They do not need to execute on all of the ALU.[X,Y,Z,W] operands at the same
time. One ALU.[X,Y,Z,W] unit may execute a MOVA* operation while other ALU.[X,Y,Z,W] units
execute other operations. Software can issue up to four MOVA instructions in a single instruction
group to change all four elements of the AR register. MOVA* issued in ALU.X will write AR.X
regardless of any GPR write mask used.
60 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
ALU Clauses 61
AMD R600 Technology ProductID—Rev. 0.31—May 2007
ALU.Trans Instruction Restrictions. At most one of the transcendental and integer instructions
shown in Table 4-5 may be specified in a given instruction group, and it must be specified in the last
instruction slot.
62 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Each instruction for a reduction operation must use the same OMOD value (for instructions with two
source operands).
The second part of the output modification is to clamp the result to [0.0, 1.0]. This is controlled by the
instruction’s CLAMP field. The CLAMP modifier works only with floating-point values; it is not
valid and should be disabled for integer operations. For non-reduction operations, each instruction
may specify a different value for CLAMP. Reduction operations only compute one output. Each
instruction for a reduction operation must use the same CLAMP value.
The results are written to PV or PS and to the destination GPR specified in the DST_GPR field of the
instruction. The destination GPR may be relative to an index. To enable this, set the DST_REL bit and
specify an appropriate INDEX_MODE. The INDEX_MODE parameter is shared with the input
operands for the instruction. If the resulting GPR address is not in [0, GPR_COUNT – 1], which are
the declared GPRs for this thread, and are not in [127 – N + 1, 127], which are the N temporary GPRs,
then no GPR write is performed; only PV and PS are updated.
Instructions with two source operands have a write mask, WRITE_MASK, that controls whether the
result is written to a GPR. The PV or PS result is updated even if WRITE_MASK is 0. Instructions
with three source operands have no write mask. However, you can specify an out-of-bounds GPR
destination to inhibit their write. For example, if the thread is using four clause temporaries and less
than 124 GPRs, then it is safe to use DST_GPR = 123 to ignore the result. Otherwise, you’ll need to
sacrifice one of the temporary GPRs for instructions with three source operands. The PV or PS result is
updated for instructions with three source operands even if the destination GPR address is invalid.
Two instructions running on the ALU.[X,Y,Z,W] units cannot write to the same GPR element.
However, it is possible for ALU.Trans to write to the same GPR element as one of the operations
running in ALU.[X,Y,Z,W]. This can be done either explicitly, as in:
GPR0.X <= GPR1.X
...
GPR0.X <= GPR2.X
or implicitly via relative addressing. If the ALU.Trans unit and one of the ALU.[X,Y,Z,W] units try to
write to the same GPR element, the transcendental operation dominates, and the ALU.Trans result is
written to the GPR element. This affects the GPR write only; PV will still reflect only the vector result.
ALU Clauses 63
AMD R600 Technology ProductID—Rev. 0.31—May 2007
64 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
ALU Clauses 65
AMD R600 Technology ProductID—Rev. 0.31—May 2007
66 ALU Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
5 Vertex-Fetch Clauses
Software initiates a vertex-fetch clause with the VTX or VTX_TC control-flow instructions, both of
which use the CF_DWORD[0,1] microcode formats. Vertex-fetch instructions within the clause use
the VTX_DWORD0, VTX_DWORD1_{SEM, GPR}, and VTX_DWORD2 microcode formats, with
a fourth (high-order) doubleword of zeros.
A vertex-fetch clause consists of instructions that fetch vertices from the vertex buffer based on a GPR
address. A vertex-fetch clause can be at most eight instructions long. Vertex fetches using a semantic
table use the VTX_DWORD1_SEM microcode format to specify the 9-bit semantic ID. This ID is
looked up in the semantic table to determine which GPR to write data to. All other vertex fetches use
the VTX_DWORD1_GPR microcode format, which specifies the destination GPR directly.
Each vertex-fetch instruction within the vertex-fetch clause has a BUFFER_ID field that specifies the
buffer containing the vertex-fetch constants and an OFFSET field for the offset into the buffer at
which reading is to begin. The instruction reads the index to start reading at from SRC_GPR, the
address of which may be absolute or relative to the loop index (aL), using the SRC_REL bit. The result
of non-semantic fetches is written to DST_GPR, the address of which may be absolute or relative to
the loop index (aL), using the DST_REL bit. Semantic fetches determine the destination GPR by
reading the entry in the semantic table that is specified by the instruction’s SEMANTIC_ID field. The
source index and the 4-element result from memory may be swizzled.
The source value can be fetched from any element of the source GPR using the instruction’s
SRC_SEL_X field. Unlike texture instructions, the SRC_SEL_X field may not be a constant; it must
refer to a vector element of a GPR. The destination swizzle is specified in the DST_SEL_[X,Y,Z,W]
fields; the swizzle may write any of the fetched elements, the value 0.0, or the value 1.0. To disable an
element write, set the DST_SEL_[X,Y,Z,W] fields to the SEL_MASK value
Individual vertex-fetch instructions cannot be predicated; predicated vertex fetches must be done at
the CF level by making the vertex-fetch clause instruction conditional. All vertex instructions in the
clause are executed with the conditional constraint specified by the CF instruction.
Vertex-Fetch Clauses 67
AMD R600 Technology ProductID—Rev. 0.31—May 2007
31 24 23 16 15 8 7 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
VTX_DWORD2 +8
VTX_DWORD1_{SEM, GPR} +4
VTX_DWORD0 +0
68 Vertex-Fetch Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
6 Texture-Fetch Clauses
Software initiates a texture-fetch clause with the TEX control-flow instruction, which uses the
CF_DWORD[0 1] microcode formats. Texture-fetch instructions within the clause use the
TEX_DWORD[0,1,2] microcode formats, with a fourth (high-order) doubleword of zeros.
A texture-fetch clause consists of instructions that lookup texture elements, called texels, based on a
GPR address. Texture instructions are used for both texture-fetch and constant-fetch operations. A
texture clause can be at most eight instructions long.
Each texture instruction has a RESOURCE_ID field, which specifies an ID for the buffer address, size,
and format to read, and a SAMPLER_ID field, which specifies an ID for filter and other options. The
instruction reads the texture coordinate from the SRC_GPR, the address of which may be absolute or
relative to the loop index (aL), using the SRC_REL bit. The result is written to the DST_GPR, the
address of which may be absolute or relative to the loop index (aL), using the DST_REL bit. Both the
fetch coordinate and the resulting 4-element data from memory may be swizzled. The source elements
for the swizzle are specified with the SRC_SEL_[X,Y,Z,W] fields; a source element may also use the
swizzle constants 0.0 and 1.0. The destination elements for the swizzle are specified with the
DST_SEL_[X,Y,Z,W] fields; it may write any of the fetched elements, the value 0.0, or the value 1.0.
To disable an element write, set the DST_SEL_[X,Y,Z,W] fields to the SEL_MASK value.
Individual texture instructions cannot be predicated; predicated texture fetches must be done at the CF
level, by making the texture-clause instruction conditional. All texture instructions in the clause are
executed with the conditional constraint specified by the CF instruction.
Texture-Fetch Clauses 69
AMD R600 Technology ProductID—Rev. 0.31—May 2007
31 24 23 16 15 8 7 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
TEX_DWORD2 +8
TEX_DWORD1 +4
TEX_DWORD0 +0
70 Texture-Fetch Clauses
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
7 Instruction Set
This section summarizes the instruction set used by assemblers. The instructions are organized
alphabetically, by mnemonic, according to the clauses in which they are used. All of the instructions
have mnemonic prefixes, such as “CF_INST_”, “OP2_INST_”, or “OP3_INST_”. In this section’s
instruction list, only the portion of the mnemonic following the prefix is shown, although the full
prefix is described in the text. The opcode and microcode formats for each instruction are also given.
The microcode formats are described in Section 8 on page 259, where the instructions are ordered by
their microcode formats rather than alphabetically by mnemonic. The microcode field-name acronyms
are also defined in that chapter.
Instruction Set 71
AMD R600 Technology ProductID—Rev. 0.31—May 2007
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W K
U
B Q CF_INST
W
COUNT KCACHE_ADDR1 KCACHE_ADDR0 M +4
M 1
K K K
M B B ADDR +0
0 1 0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
ELSE Else
Pop POP_COUNT entries (may be zero) from the stack, then invert the status of active and branch-
inactive pixels for pixels that are both active (as of the last surviving PUSH operation) and pass the
condition test. Control then jumps to the specified address if all pixels are inactive.
The operation may be conditional.
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Microcode
W V E
B E
B Q CF_INST P O
C L
COMP_MASK ARRAY_SIZE +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
Or,
W V E
B E
B Q CF_INST P O
C L
Reserved SEL_W SEL_Z SEL_Y SEL_X +4
M M P
E R
S
INDEX_GPR
R
RW_GPR TYPE ARRAY_BASE +0
NOP No Operation
No operation. It ignores all fields in the CF_DWORD[0,1] microcode formats, except the CF_INST,
BARRIER, and END_OF_PROGRAM fields. The instruction does not preserve the current PV or PS
value in the slot in which it executes. Instruction slots that are omitted implicitly execute NOPs in the
corresponding ALU. As a consequence, slots that are unspecified do not preserve PV or PS for the next
instruction. To preserve PV or PS and perform no other operation in an ALU clause, use a MOV
instruction with a disabled write mask.
See the ALU version of NOP on page 171.
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C +4
M M P
ADDR +0
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Compares the first source operand with floating-point zero, and copies either the second or third
source operand to the destination operand based on the result. Execution can be conditioned on a
predicate set by the previous ALU instruction group. If the condition is not satisfied, the instruction
has no effect and control is passed to the next instruction.
The instruction specifies which one of four data elements in a 4-element vector is operated on, and the
result can be stored in any of the four elements of the destination GPR. Operands can be accessed
using absolute addresses or an index in a GPR or the address register (AR).
The source operands are 32-bit data elements in a GPR, in a constant register, in the previous vector
(PV) or previous scalar (PS) register, or they can be a standard constant (0, -1, 0.0, 0.5, or 1.0), a literal
constant included in the instruction group, or the absolute value or negated value of the source. The
elements of each source-operand vector can be swizzled prior to computation.
The destination operand is a 32-bit data element in a GPR. Output to the destination can be masked, or
it can be modified by multiplying by 2.0 or 4.0, dividing by 2.0, or clamped to the range [0.0, 1.0]. A
fog value can be exported by merging a transcendental ALU result into the low-order bits of the vector
destination. The execute mask and predicate bit can be updated by the result.
Microcode
ALU_INST S S S
D D B
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
ALU_INST S S S
D D B
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
ALU_INST S S S
D D B
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
ALU_INST S S S
D D B
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Result = Undefined;
ResultI = Src0;
If (ResultI < -256) {
ResultI = 0x800; //-256
}
If (ResultI > 0xff) {
ResultI = 0x800 //-256
}
Export(ResultI); // signed 9-bit integer
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
S S S
D D B ALU_INST
C
E R
DST_GPR
S (11000)
2 2 2 SRC2_SEL +4
N E R
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
NOP No Operation
No operation. The instruction slot is not used. NOP instructions perform no writes to GPRs, and they
invalidate PV and PS.
After all instructions in an instruction group are processed, any ALU.[X,Y,Z,W] or ALU.Trans operation that is
unspecified implicitly executes a NOP instruction, thus invalidating the values in the corresponding elements of
the PV and PS registers.
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
OR_INT Bit-Wise OR
Logical bit-wise OR.
Result = Src0 | Src1
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
Formats: ALU_DWORD0, page 278.
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U U S S
D D B F W
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
U S S
D D B F W U
C
E R
DST_GPR
S
ALU_INST OMOD
M M P
E 1 0 +4
M A A
S S S S S S
P I
L
S M
1 1 1 SRC1_SEL 0 0 0 SRC0_SEL +0
N E R N E R
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
C
M B E
Reserved
F N S
OFFSET +8
S
S F N U D D D D
D
M C F DATA_FORMAT C S S S S
R
DST_GPR +4
A A A F W Z Y X
M S F
S F
F S
R
SRC_GPR BUFFER_ID W
T
VTX_INST +0
C X Q
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
C
M B E
Reserved
F N S
OFFSET +8
S
S F N U D D D D
M C F DATA_FORMAT C S S S S SEMANTIC_ID +4
A A A F W Z Y X
M S F
S F
F S
R
SRC_GPR BUFFER_ID W
T
VTX_INST +0
C X Q
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_BORDER_COLOR_FRAC, opcode 5 (5h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_COMP_TEX_LOD, opcode 6 (6h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_GRADIENTS_H, opcode 7 (7h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_GRADIENTS_V, opcode 8 (8h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_LERP_FACTORS, opcode 9 (9h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_TEXTURE_RESINFO, opcode 4 (4h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_GET_WEIGHTS, opcode 10 (Ah).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_LD, opcode 3 (3h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_PASS, opcode 13 (Dh).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE, opcode 16 (10h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C, opcode 24 (18h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_G, opcode 28 (1Ch).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_G_L, opcode 29 (1Dh).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_G_LB, opcode 30 (1Eh).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_G_LZ, opcode 31 (1Fh).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_L, opcode 25 (19h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_LB, opcode 26 (1Ah).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_C_LZ, opcode 27 (1Bh).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_G, opcode 20 (14h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_G_L, opcode 21 (15h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_G_LB, opcode 22 (16h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_G_LZ, opcode 23 (17h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_L, opcode 17 (11h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_LB, opcode 18 (12h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SAMPLE_LZ, opcode 19 (13h).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SET_GRADIENTS_H, opcode 11 (Bh).
Microcode
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +12
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X +8
W Z Y X
C C C C D D D D
D
T T T T LOD_BIAS S S S S
R
DST_GPR +4
W Z Y X W Z Y X
F B
S
Reserved
R
SRC_GPR RESOURCE_ID W F TEX_INST +0
Q M
Formats: TEX_DWORD0 (page 298), TEX_DWORD1 (page 301), and TEX_DWORD2 (page 303).
Instruction Field: TEX_INST == TEX_INST_SET_GRADIENTS_V, opcode 12 (Ch).
8 Microcode Formats
This section specifies the microcode formats. The definitions may be used to simplify compilation by
providing standard templates and enumeration names for the various instruction formats. Table 8-1
summarizes the microcode formats and their widths. The sections that follow provide details.
The field-definition tables that accompany the descriptions in the sections below use the following
notation:
• int(2)—A 2-bit field that specifies an integer value.
• enum(7)—A 7-bit field that specifies an enumerated set of values (in this case, a set of up to 27
values). The number of valid values may be less than the maximum.
• VALID_PIXEL_MODE (VPM)—Refers to a field named “VALID_PIXEL_MODE” that is
indicated in the accompanying format diagram by the abbreviated symbol “VPM”.
Unless otherwise stated, all fields are readable and writable (the CF_INST fields of the
CF_ALLOC_IMP_EXP_DWORD1_BUF or the CF_ALLOC_IMP_EXP_DWORD1_SWIZ formats
are the only exceptions). The default value of all fields is zero.
31 0
ADDR
31 30 29 23 22 21 20 19 18 13 12 10 9 8 7 3 2 0
W V E
P
B Q CF_INST P O Rsvd CALL_COUNT COUNT COND CF_CONST
C
M M P
Instruction:
0 CF_INST_NOP: perform no operation.
1 CF_INST_TEX: execute texture-fetch or con-
stant-fetch clause.
2 CF_INST_VTX: execute vertex-fetch clause
3 CF_INST_VTX_TC: execute vertex-fetch clause
through the texture cache (for systems lacking
VC).
4 CF_INST_LOOP_START: execute DirectX9
loop start instruction (push onto stack if loop
body executes).
5 CF_INST_LOOP_END: execute DirectX9 loop
end instruction (pop stack if loop is finished).
6 CF_INST_LOOP_START_DX10: execute
DirectX10 loop start instruction (push onto stack
if loop body executes).
7 CF_INST_LOOP_START_NO_AL: same as
LOOP_START but don't push the loop index
(aL) onto the stack or update aL.
CF_INST 29:23 enum(7) 8 CF_INST_LOOP_CONTINUE: execute con-
tinue statement (jump to end of loop if all pixels
ready to continue).
9 CF_INST_LOOP_BREAK: execute a break
statement (pop stack if all pixels ready to break).
10 CF_INST_PUSH: push current per-pixel active
state onto the stack.
11 CF_INST_PUSH_ELSE: execute push/else
statement. Always pushes per-pixel state onto
the stack.
12 CF_INST_POP: pop current per-pixel state from
the stack.
13 CF_INST_CALL: execute subroutine call
instruction (push onto stack).
14 CF_INST_RETURN: execute subroutine return
instruction (pop stack). Pair with
CF_INST_CALL only.
15 CF_INST_CALL_FS: call fetch program. The
address to call is stored in a host-written regis-
ter.
Active pixels:
0 Do not execute this instruction as if all pixels are
active and valid.
WHOLE_QUAD_MODE 1 Execute this instruction as if all pixels are active
30 int(1)
(WQM) and valid.
This is the antonym of the VALID_PIXEL_MODE field.
Only one of these bits, WHOLE_QUAD_MODE or
VALID_PIXEL_MODE, should be set at any one time.
Synchronization barrier:
0 This instruction may run in parallel with prior
BARRIER (B) 31 int(1) instructions.
1 All prior instructions must complete before this
instruction executes.
31 30 29 26 25 22 21 0
K K K
M B B ADDR
0 1 0
31 30 29 26 25 24 18 17 10 9 2 1 0
W K
U
B Q CF_INST COUNT KCACHE_ADDR1 KCACHE_ADDR0 M
W
M 1
Instruction:
8 CF_INST_ALU: each PRED_SET* instruction
updates the active state but does not update the
stack.
9 CF_INST_ALU_PUSH_BEFORE: each
PRED_SET* causes a stack push first; then
updates the active state.
10 CF_INST_ALU_POP_AFTER: pop the stack
after the clause completes execution.
11 CF_INST_ALU_POP2_AFTER: pop the stack
twice after the clause completes execution.
CF_INST 29:26 enum(4)
12 Reserved
13 CF_INST_ALU_CONTINUE: each PRED_SET*
causes a continue operation on the unmasked
pixels.
14 CF_INST_ALU_BREAK: each PRED_SET*
causes a break operation on the unmasked pix-
els.
15 CF_INST_ALU_ELSE_AFTER: behaves like
PUSH_BEFORE, but also performs an ELSE
operation after the clause completes execution,
which inverts the pixel state.
Active pixels:
0 Do not execute this clause as if all pixels are
active and valid.
WHOLE_QUAD_MODE 1 Execute this clause as if all pixels are active and
30 int(1)
(WQM) valid.
This is the antonym of the VALID_PIXEL_MODE field.
Only one of these bits, WHOLE_QUAD_MODE or
VALID_PIXEL_MODE, should be set at any one time.
Synchronization barrier:
0 This instruction may run in parallel with prior
BARRIER (B) 31 int(1) instructions.
1 All prior instructions must complete before this
instruction executes.
31 30 29 23 22 21 15 14 13 12 0
E R
INDEX_GPR RW_GPR TYPE ARRAY_BASE
S R
RW_GPR 21:15 int(7) GPR register to read data from or write data to.
31 30 29 23 22 21 20 17 16 15 12 11 0
W V E
B E
B Q CF_INST P O COMP_MASK ARRAY_SIZE
C L
M M P
Synchronization barrier:
0 This instruction may run in parallel with prior
BARRIER (B) 31 int(1) instructions.
1 All prior instructions must complete before this
instruction executes.
31 30 29 23 22 21 20 17 16 15 12 11 9 8 6 5 3 2 0
W V E
B E
B Q CF_INST P O Reserved SEL_W SEL_Z SEL_Y SEL_X
C L
M M P
Active pixels:
0 Do not execute this clause as if all pixels are
active and valid.
WHOLE_QUAD_MODE
30 int(1) 1 Execute this clause as if all pixels are active and
(WQM)
valid.
This is the antonym of the VALID_PIXEL_MODE field.
Set at most one of these bits.
Synchronization barrier:
0 This instruction may run in parallel with prior
BARRIER (B) 31 int(1) instructions.
1 All prior instructions must complete before this
instruction executes.
31 30 29 28 26 25 24 23 22 21 13 12 11 10 9 8 0
S S S S S S
P I
L 1 1 1 SRC1_SEL 0 0 0 SRC0_SEL
S M
N E R N E R
Negation:
SRC0_NEG (S0N) 12 int(1) 0 Do not negate input for this operand.
SRC1_NEG (S1N) 25 int(1) 1 Negate input for this operand. Use only for float-
ing-point inputs.
31 30 29 28 27 21 20 18 17 8 7 6 5 4 3 2 1 0
U S S
D D B F W U
C DST_GPR ALU_INST OMOD E 1 0
E R S M M P
M A A
Absolute value:
0 Use the actual value of the input for this oper-
SRC0_ABS (S0A) 0 int(1) and.
SRC1_ABS (S1A) 1 int(1) 1 Use the absolute value of the input for this oper-
and. Use only for floating-point inputs. This
function is performed before negation.
Update predicate:
0 Do not update the stored predicate.
UPDATE_PRED (UP) 3 int(1)
1 Update the stored predicate based on the pred-
icate operation computed here.
Output modifier:
0 ALU_OMOD_OFF: identity. This value must be
used for operations that produce an integer
OMOD 7:6 enum(2) result.
1 ALU_OMOD_M2: multiply by 2.0.
2 ALU_OMOD_M4: multiply by 4.0.
3 ALU_OMOD_D2: divide by 2.0.
44 OP2_INST_KILLE
45 OP2_INST_KILLGT
46 OP2_INST_KILLGE
47 OP2_INST_KILLNE
48 OP2_INST_AND_INT
49 OP2_INST_OR_INT
50 OP2_INST_XOR_INT
51 OP2_INST_NOT_INT
52 OP2_INST_ADD_INT
53 OP2_INST_SUB_INT
54 OP2_INST_MAX_INT
55 OP2_INST_MIN_INT
56 OP2_INST_MAX_UINT
57 OP2_INST_MIN_UINT
58 OP2_INST_SETE_INT
59 OP2_INST_SETGT_INT
60 OP2_INST_SETGE_INT
61 OP2_INST_SETNE_INT
62 OP2_INST_SETGT_UINT
63 OP2_INST_SETGE_UINT
ALU_INST 17:8 enum(10)
66 OP2_INST_PRED_SETE_INT
67 OP2_INST_PRED_SETGT_INT
68 OP2_INST_PRED_SETGE_INT
69 OP2_INST_PRED_SETNE_INT
70 OP2_INST_PRED_SETLT_INT
71 OP2_INST_PRED_SETLE_INT
74 OP2_INST_PRED_SETE_PUSH_INT
75 OP2_INST_PRED_SETGT_PUSH_INT
76 OP2_INST_PRED_SETGE_PUSH_INT
77 OP2_INST_PRED_SETNE_PUSH_INT
78 OP2_INST_PRED_SETLT_PUSH_INT
79 OP2_INST_PRED_SETLE_PUSH_INT
80 OP2_INST_DOT4
81 OP2_INST_DOT4_IEEE
82 OP2_INST_CUBE
83 OP2_INST_MAX4
96 reserved
97 OP2_INST_EXP_IEEE
98 OP2_INST_LOG_CLAMPED
99 OP2_INST_LOG_IEEE
100 OP2_INST_RECIP_CLAMPED
101 OP2_INST_RECIP_FF
102 OP2_INST_RECIP_IEEE
103 OP2_INST_RECIPSQRT_CLAMPED
104 OP2_INST_RECIPSQRT_FF
105 OP2_INST_RECIPSQRT_IEEE
106 OP2_INST_SQRT_IEEE
107 OP2_INST_FLT_TO_INT
108 OP2_INST_INT_TO_FLT
109 OP2_INST_UINT_TO_FLT
ALU_INST 17:8 enum(10) 110 OP2_INST_SIN
111 OP2_INST_COS
112 OP2_INST_ASHR_INT
113 OP2_INST_LSHR_INT
114 OP2_INST_LSHL_INT
115 OP2_INST_MULLO_INT
116 OP2_INST_MULHI_INT
117 OP2_INST_MULLO_UINT
118 OP2_INST_MULHI_UINT
119 OP2_INST_RECIP_INT
120 OP2_INST_RECIP_UINT
Clamp result:
0 Do not clamp the result.
CLAMP (C) 31 int(1) 1 Clamp the result to [0.0, 1.0]. Not mathemati-
cally defined for instructions that produce inte-
ger results.
31 30 29 28 21 20 18 17 13 12 11 10 9 8 0
S S S
D D B
C DST_GPR ALU_INST 2 2 2 SRC2_SEL
E R S
N E R
Negation:
0 Do not negate input for this operand.
SRC2_NEG 12 int(1)
1 Negate input for this operand. Use only for float-
ing-point inputs.
Clamp result:
0 Do not clamp the result.
CLAMP (C) 31 int(1) 1 Clamp the result to [0.0, 1.0]. Not mathemati-
cally defined for instructions that produce inte-
ger results.
31 26 25 24 23 22 16 15 8 7 6 5 4 0
M S F
S F
F S SRC_GPR BUFFER_ID W VTX_INST
R T
C X Q
Instruction:
0 VTX_INST_FETCH: vertex fetch (X = uint32
VTX_INST 4:0 enum(5) index). Use VTX_DWORD1_GPR (page 294).
1 VTX_INST_SEMANTIC: semantic vertex fetch.
Use VTX_DWORD1_SEM (page 292).
SRC_GPR 22:16 int(7) Source GPR address to get fetch address from.
31 30 29 28 27 22 21 20 18 17 15 14 12 11 9 8 7 0
S F N U D D D D
M C F DATA_FORMAT C S S S S SEMANTIC_ID
A A A F W Z Y X
Reserved 8
31 30 29 28 27 22 21 20 18 17 15 14 12 11 9 8 7 6 0
S F N U D D D D
D
M C F DATA_FORMAT C S S S S DST_GPR
R
A A A F W Z Y X
Reserved 8
31 20 19 18 17 16 15 0
C
M B E
Reserved OFFSET
F N S
S
31 24 23 22 16 15 8 7 6 5 4 0
F B
S
Reserved SRC_GPR RESOURCE_ID W F TEX_INST
R
Q M
Instruction:
0 Reserved.
1 Reserved.
2 Reserved.
3 TEX_INST_LD: fetch texel, XYZL are uint32.
4 TEX_INST_GET_TEXTURE_RESINFO:
retrieve width, height, depth, number of mipmap
levels.
TEX_INST 4:0 enum(5) 5 TEX_INST_GET_BORDER_COLOR_FRAC: X
= border color fraction.
6 TEX_INST_GET_COMP_TEX_LOD: X = com-
puted LOD for all pixels in quad.
7 TEX_INST_GET_GRADIENTS_H: slopes rela-
tive to horizontal: X = dx/dh, Y = dy/dh, Z =
dz/dh, W = dw/dh.
8 TEX_INST_GET_GRADIENTS_V: slopes rela-
tive to vertical: X = dx/dv, Y = dy/dv, Z = dz/dv,
W = dw/dv.
9 TEX_INST_GET_LERP_FACTORS: retrieve
weights used for bilinear fetch, X = horizontal
lerp, Y = vertical lerp.
10 TEX_INST_GET_WEIGHTS: retrieve weights
used for bilinear fetch, X = TL weight, Y = TR
weight, Z = BL weight, W = BR weight.
11 TEX_INST_SET_GRADIENTS_H: XYZ set hor-
izontal gradients.
12 TEX_INST_SET_GRADIENTS_V: XYZ set ver-
tical gradients.
13 TEX_INST_PASS: returns the address read in
memory.
14 Reserved.
15 Reserved.
16 TEX_INST_SAMPLE
TEX_INST 4:0 enum(5) 17 TEX_INST_SAMPLE_L
18 TEX_INST_SAMPLE_LB
19 TEX_INST_SAMPLE_LZ
20 TEX_INST_SAMPLE_G.
21 TEX_INST_SAMPLE_G_L
22 TEX_INST_SAMPLE_G_LB
23 TEX_INST_SAMPLE_G_LZ
24 TEX_INST_SAMPLE_C
25 TEX_INST_SAMPLE_C_L
26 TEX_INST_SAMPLE_C_LB
27 TEX_INST_SAMPLE_C_LZ
28 TEX_INST_SAMPLE_C_G
29 TEX_INST_SAMPLE_C_G_L
30 TEX_INST_SAMPLE_C_G_LB
31 TEX_INST_SAMPLE_C_G_LZ
31 30 29 28 27 21 20 18 17 15 14 12 11 9 8 7 6 0
C C C C D D D D
D
T T T T LOD_BIAS S S S S DST_GPR
R
W Z Y X W Z Y X
31 29 28 26 25 23 22 20 19 15 14 10 9 5 4 0
S S S S
S S S S SAMPLER_ID OFFSET_Z OFFSET_Y OFFSET_X
W Z Y X
Index
Symbols clause temporaries........................................... xviii, 43
clause-temporary GPRs ............................................ 15
(x, y) identifier pair .................................................... 2
cleartype .............................................................. xviii
* ............................................................................ xvi
command ............................................................. xviii
[1,2)....................................................................... xvi
command processor ............................................... xviii
[1,2]....................................................................... xvi
configuration registers ............................................. xix
{BUF, SWIZ} ........................................................ xvi
constant cache .................................................. xix, 15
A constant file ............................................................ xix
constant index register ............................................. xix
A0 .................................................................. xvi, 279 constant registers .................................................... xix
absolute ................................................................. xvi
constant registers (CRs) ............................................ 15
active mask ............................................................. 14
constant waterfalling ................................... xix, 15, 24
active pixel state ...................................................... 28
constants ........................................................... 43, 45
address register (AR) .................................. 15, 24, 279 CP ......................................................................... xix
address stack .......................................................... xvi CR ......................................................................... xix
AL ........................................................................ xvii CRs......................................................................... 15
aL xvii, 14, 25, 37, 40, 94, 263, 265, 271, 279, 290, 294, CTM ...................................................................... xix
300, 301
CTM HAL Programming Guide .................................. 4
allocate ................................................................. xvii
cut ......................................................................... xix
ALU.[X,Y,Z,W] unit.............................................. xvii
ALU.Trans .............................................................. 39 D
ALU.Trans unit ..................................................... xvii
DC ..................................................................... xix, 5
ALU.W ................................................................... 39
device .................................................................... xix
ALU.X.................................................................... 39
DMA ..................................................................... xix
ALU.Y.................................................................... 39
DMA copy ................................................................ 5
ALU.Z .................................................................... 39
DMA copy program ................................................. xx
AR........................................................... xvii, 15, 279
DMA program ........................................................... 5
asterisk .................................................................. xvi
double quadword ..................................................... xx
B doubleword ............................................................. xx
B .......................................................................... xvii E
b ........................................................................... xvii
element ............................................................. xx, 39
bicubic weights ........................................................ 17
endian order.......................................................... xxxi
border color .................................................... xviii, 17
enum ............................................................... xx, 259
branch-loop instructions ........................................... 33
errors ........................................................................ 2
buffers .................................................................... 26
ES.............................................................. xx, xxxi, 5
byte ...................................................................... xvii
event ....................................................................... xx
C exceptions ................................................................. 2
execute mask ..................................................... xx, 16
cache ................................................................... xviii
export.................................................................. xx, 9
CF ....................................................................... xviii
export program .......................................................... 5
cf_inst .............................................................. xxv, 20
export shader ...................................................... xxi, 5
cfile ..................................................................... xviii
channel ................................................................ xviii F
clamp ................................................................... xviii
F register .......................................................... xix, 15
clause .................................................................. xviii
F registers............................................................... xxi
clause sequencer ................................................... xviii
FaceID ................................................................... xxi
clause size ............................................................ xviii
fetch....................................................................... xxi
Index 305
AMD R600 Technology ProductID—Rev. 0.31—May 2007
306 Index
ProductID—Rev. 0.31—May 2007 AMD R600 Technology
Index 307
AMD R600 Technology ProductID—Rev. 0.31—May 2007
308 Index