Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining
Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining
Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining
Instruction Scheduling
List Scheduling, Trace Scheduling, Loop Unrolling &
Software Pipelining
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Pedro Diniz 2
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
IF DE EXE MEM WB
Inst 2
Pedro Diniz 4
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
IF DE EXE MEM WB
Inst 2
Pedro Diniz 5
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
IF DE EXE MEM WB
Inst
Pedro Diniz 6
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
IF DE EXE MEM WB
???
IF DE EXE MEM WB
???
IF DE EXE MEM WB
Inst
Pedro Diniz 7
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
IF DE EXE MEM WB
Next seq inst
IF DE EXE MEM WB
Next seq inst
IF DE EXE MEM WB
Branch target inst
Pedro Diniz 8
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Constraints On Scheduling
Data Dependences
Inherent in the code
Control Dependences
Inherent in the code
Resource Constraints
Pedro Diniz 9
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 10
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Representing Dependences
Using a dependence DAG, one per Basic Block
Nodes are instructions, edges represent dependences
1: r2 = *(r1 + 4)
2: r3 = *(r1 + 8)
3: r4 = r2 + r3
4: r5 = r2 - 1
Pedro Diniz 11
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Representing Dependences
Using a dependence DAG, one per Basic Block
Nodes are instructions, edges represent dependences
1 2
1: r2 = *(r1 + 4)
2: r3 = *(r1 + 8)
3: r4 = r2 + r3 2 2
4: r5 = r2 - 1
2
4 3
Edge is labeled with Latency:
v(i j) = delay required between initiation times of i and j minus the
execution time required by i
Pedro Diniz 12
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Resource Constraints
Modern Machines Have Many Resource Constraints
Superscalar Architectures:
Can Execute few Operations Concurrently
But have constraints
Example:
1 integer operation
ALUop dest, src1, src2 # in 1 clock cycle
In parallel with
1 memory operation
LD dst, addr # in 2 clock cycles
ST src, addr # in 1 clock cycle
Pedro Diniz 13
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Pedro Diniz 14
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 15
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 16
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 17
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 18
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 19
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1 3 4
1 3
2 6 5
1 4
7
3 3
8 9
Pedro Diniz 20
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=0
1 3 4
1 3
d=0
2 6 5
1 4
7
3 3
d=0 d=0
8 9
Pedro Diniz 21
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=0
1 3 4
1 3
d=0
2 6 5
1 4
7 d=3
3 3
d=0 d=0
8 9
Pedro Diniz 22
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=0
1 3 4
1 3
d=4 d=7 d=0
2 6 5
1 4
7 d=3
3 3
d=0 d=0
8 9
Pedro Diniz 23
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
Pedro Diniz 24
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
Pedro Diniz 25
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1,3,4,6 1 3 4 f=1
f=1 f=0
1 3
READY = { 6,1,4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
Pedro Diniz 26
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 6,1,4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
Pedro Diniz 27
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 6,1,4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6
Pedro Diniz 28
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 1, 4, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6
Pedro Diniz 29
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 1, 4, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6
Pedro Diniz 30
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 1, 4, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1
Pedro Diniz 31
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
2 1 3 4 f=1
f=1 f=0
1 3
READY = { 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1
Pedro Diniz 32
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 2, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1
Pedro Diniz 33
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 2, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1
Pedro Diniz 34
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 2, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 35
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 36
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
7 1 3 4 f=1
f=1 f=0
1 3
READY = { 4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 37
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 38
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 39
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 40
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2
Pedro Diniz 41
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4
Pedro Diniz 42
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
5 1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4
Pedro Diniz 43
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4
Pedro Diniz 44
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7
Pedro Diniz 45
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7
Pedro Diniz 46
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
8, 9 1 3 4 f=1
f=1 f=0
1 3
READY = { 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7
Pedro Diniz 47
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = {3, 5, 8, 9} d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7
Pedro Diniz 48
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = {3, 5, 8, 9} d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7
Pedro Diniz 49
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = {3, 5, 8, 9} d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3
Pedro Diniz 50
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3
Pedro Diniz 51
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3
Pedro Diniz 52
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3
Pedro Diniz 53
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5
Pedro Diniz 54
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5
Pedro Diniz 55
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5
Pedro Diniz 56
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5
Pedro Diniz 57
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8
Pedro Diniz 58
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8
Pedro Diniz 59
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8
Pedro Diniz 60
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8
Pedro Diniz 61
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8
Pedro Diniz 62
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8 9
Pedro Diniz 63
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8 9
Pedro Diniz 64
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0
1 4
7 d=3
f=2
3 3
d=0 d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8 9
Pedro Diniz 65
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across basic blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Pedro Diniz 66
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 67
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 68
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
Pedro Diniz 69
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
Pedro Diniz 70
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop
MEM 1
MEM 2
Pedro Diniz 71
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop
MEM 1 1
MEM 2 1
Pedro Diniz 72
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop
MEM 1 1
MEM 2 1
Pedro Diniz 73
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2
MEM 1 1
MEM 2 1
Pedro Diniz 74
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2
MEM 1 1
MEM 2 1
Pedro Diniz 75
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2
MEM 1 1 4
MEM 2 1 4
Pedro Diniz 76
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2
MEM 1 1 4
MEM 2 1 4
Pedro Diniz 77
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 78
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 79
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2 5
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 80
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2 5
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 81
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2 5 6
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 82
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2 5 6
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 83
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2 5 6
MEM 1 1 4 3 7
MEM 2 1 4
Pedro Diniz 84
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
1
6
ALUop 2 5 6
MEM 1 1 4 3 7
MEM 2 1 4
Pedro Diniz 85
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Pedro Diniz 86
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 87
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1: LD r2, 0(r1)
2: ADD r3,r3,r2
3: LD r2,4(r5)
4: ADD r6,r6,r2
Pedro Diniz 88
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r2,4(r5) 3 3
4: ADD r6,r6,r2
2 3
1
1 3
Pedro Diniz 89
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r2,4(r5) 3 3
4: ADD r6,r6,r2
2 3
1
1 3
3
ALUop 2 4 4
MEM 1 1 3
MEM 2 1 3
Pedro Diniz 90
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r2,4(r5) 3 3
4: ADD r6,r6,r2
2 3
1
Anti-Dependence between 3 and 2.
There is really no data flowing... 1 3
How to fix this?
How about using a different Register? 3
Pedro Diniz 91
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r4,4(r5) 3
4: ADD r6,r6,r4
2
Pedro Diniz 92
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r4,4(r5) 3
4: ADD r6,r6,r4
2
3
ALUop 2 4 4
MEM 1 1 3
MEM 2 1 3
Pedro Diniz 93
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 94
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 95
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Pedro Diniz 96
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Pedro Diniz 97
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
B C
Pedro Diniz 98
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
B C
Pedro Diniz 99
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
B C
B C
B C
B C
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Trace Scheduling
Find the most common Trace of Basic Blocks
Use profiling information
Combine the Basic Blocks in the trace and schedule
them as one Block
Create clean-up code if the execution goes off-trace
Trace Scheduling
B C
F G
H
Pedro Diniz 106
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Trace Scheduling
B C
F G
H
Pedro Diniz 107
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Trace Scheduling
H
Pedro Diniz 108
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Trace Scheduling
H
Pedro Diniz 109
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
B C
E
Pedro Diniz 110
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
A A
B C B C
D D D
E E E
Pedro Diniz 111
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Scheduling Loops
Loop bodies are small
But, lot of time is spent in loops due to large number
of iterations
Need better ways to schedule loops
Loop Example
Machine Model
One load/store unit
load 2 cycles
store 2 cycles
Two arithmetic units
add 2 cycles
branch 2 cycles (no delay slot)
multiply 3 cycles
Both units are pipelined (initiate one op each cycle)
Source Code
for i = 1 to N
A[i] = A[i] * b
Loop Example
Source Code
for i = 1 to N
A[i] = A[i] * b
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Loop Unrolling
Unroll the Loop Body a few times
Pros:
Create a much larger basic block for the body
Eliminate few loop bounds checks
Cons:
Much larger program
Setup code (# of iterations < unroll factor)
beginning and end of the schedule can still have unused slots
Loop Example
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Loop Example
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Loop Example
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Loop Unrolling
Rename Registers
Use Different Registers in Different Loop Iterations
Loop Example
loop:
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ble r2, r5, loop
Loop Example
loop:
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r2, r2, 4
ble r2, r5, loop
Loop Unrolling
Rename Registers
Use Different Registers in Different Loop Iterations
Loop Example
loop:
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r2, r2, 4
ble r2, r5, loop
Loop Example
loop:
ld r6,(r1)
mul r6, r6, r3
st r6,(r1)
add r2, r1, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r1, r2, 4
ble r1, r5, loop
Loop Example
loop:
ld r6,(r1)
mul r6, r6, r3
st r6,(r1)
add r2, r1, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r1, r2, 4
ble r1, r5, loop
Loop Example
loop:
ld r6,(r1)
mul r6, r6, r3
st r6,(r1)
add r2, r1, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r1, r1, 8
ble r1, r5, loop
Loop Example
loop:
ld r6, (r1)
mul r6, r6, r3
st r6, (r1)
add r2, r1, 4
ld r7, (r2)
mul r7, r7, r3
st r7, (r2)
add r1, r1, 8
ble r1, r5, loop
Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining
Software Pipelining
Try to overlap Multiple Iterations so that the Slots
will be filled
Find the Steady-State Window so that:
All the instructions of the loop body are executed
But from different iterations
Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Schedule
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3 ld6
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3
mul mul1 mul2 ble mul3 ble1 mul4 ble2 mul5
mul mul1 mul2 ble mul3 ble1 mul4 ble2
mul mul1 mul2 mul3 mul4
add add1 add2 add3
add add1 add2 add3
Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Schedule
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3 ld6
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3
mul mul1 mul2 ble mul3 ble1 mul4 ble2 mul5
mul mul1 mul2 ble mul3 ble1 mul4 ble2
mul mul1 mul2 mul3 mul4
add add1 add2 add3
add add1 add2 add3
Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Loop Example
4 Iterations are Overlapped
value of r3 and r5 dont change ld3 st1
4 regs for &A[i] (r2) st ld3
each address incremented by 4*4 mul2 ble
mul2
4 regs to keep value A[i] (r6) mul1
add1
Same registers can be reused after 4 of
these blocks; generate code for 4 add
blocks, otherwise need to move
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop
Software Pipelining
Optimal use of Resources
Need a lot of Registers
Values in multiple iterations need to be kept separated
Issues with Dependences:
Executing a store instruction in an iteration before branch
instruction is executed for a previous iteration (writing
when it should not have)
Loads and stores are issued out-of-order (need to figure-
out dependencies before doing this)
Code Generation Issues:
Generate pre-amble and post-amble code
Multiple blocks so no register copy is needed
Pedro Diniz 136
pedro@isi.edu
CSCI 565 - Compiler Design Spring 2016
Summary
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining