Cuda Synchronization
Cuda Synchronization
Cuda Synchronization
• __syncthreads()
• cudaThreadSynchronize()
2
CUDA synchronization
__syncthreads()
.
.
.
__syncthreads()
Barrier Barrier
.
. Continue Continue
.
}
Separate barriers
4
__syncthreads() constraints
All threads must reach a particular __syncthreads() routine or
deadlock occurs.
if { ...
__syncthreads();
}
else { …
__syncthreads();
}
So could do:
for (i= 0; i < n; i++) {
myKernel<<<B,T>>>( … );
cudaThreadSynchronize();
}
Recursion -- not allowed within kernel but can be used in host code
to launch kernels
7
Code Example
N-body problem
Need to compute forces on each body in each time interval and
then update positions and velocities of bodies and then repeat.
for (t = 0; t < tmax; t++) { // for each time period, force calculation on all bodies
9
Other ways to achieve global
synchronization (if it cannot be avoided)
10
Discussion points
• Using writing to global memory to enforce
synchronization expensive
11
Questions