A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and
heterogeneous application requirements create imbalances in utilization of resources in the
cores. For example, when a GPU is bottlenecked by the available off-chip memory
bandwidth, its computational resources are often overwhelmingly idle, waiting for data from
memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration (CABA) …
execution of thousands of threads. Unfortunately, different bottlenecks during execution and
heterogeneous application requirements create imbalances in utilization of resources in the
cores. For example, when a GPU is bottlenecked by the available off-chip memory
bandwidth, its computational resources are often overwhelmingly idle, waiting for data from
memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration (CABA) …
[PDF][PDF] A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps
NVGPA Jog, SGA Bhowmick, RAC Das, M Kandemir… - research.ece.cmu.edu
Abstract Modern Graphics Processing Units (GPUs) are well provisioned to support the
concurrent execution of thousands of threads. Unfortunately, di erent bottlenecks during
execution and heterogeneous application requirements create imbalances in utilization of
resources in the cores. For example, when a GPU is bottlenecked by the available o-chip
memory bandwidth, its computational resources are often overwhelmingly idle, waiting for
data from memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration …
concurrent execution of thousands of threads. Unfortunately, di erent bottlenecks during
execution and heterogeneous application requirements create imbalances in utilization of
resources in the cores. For example, when a GPU is bottlenecked by the available o-chip
memory bandwidth, its computational resources are often overwhelmingly idle, waiting for
data from memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration …
Showing the best results for this search. See all results