Escalating memory accesses to shared memory by profiling reuse
Proceedings of the 10th International Conference on Ubiquitous Information …, 2016•dl.acm.org
Recently, many studies have been conducted to improve the performance of CUDA or
OpenCL applications, and one of key techniques to improve the performance is using the
shared memory in GPUs. The shared memory is an on-chip memory, which can be
accessed and shared among threads in the same multiprocessor. Since it can be accessed
as fast as L1 cache, data on the shared memory can be accessed much faster than data on
the global memory or local memory that reside in off-chip device memory. However …
OpenCL applications, and one of key techniques to improve the performance is using the
shared memory in GPUs. The shared memory is an on-chip memory, which can be
accessed and shared among threads in the same multiprocessor. Since it can be accessed
as fast as L1 cache, data on the shared memory can be accessed much faster than data on
the global memory or local memory that reside in off-chip device memory. However …
Recently, many studies have been conducted to improve the performance of CUDA or OpenCL applications, and one of key techniques to improve the performance is using the shared memory in GPUs. The shared memory is an on-chip memory, which can be accessed and shared among threads in the same multiprocessor. Since it can be accessed as fast as L1 cache, data on the shared memory can be accessed much faster than data on the global memory or local memory that reside in off-chip device memory. However, programmers can only exploit architectural features of GPUs, if they possess a detailed knowledge about the hierarchy of the GPU memory. Therefore, we analyze data access patterns through profiling applications and transform code to use the shared memory effectively. We also propose a metric that can represent data re-usability based on the reuse distance theory. We can achieve the performance improvement in six applications from Rodinia benchmark by 8.6% on the Kepler architecture and 9.6% on the Maxwell architecture, respectively.
ACM Digital Library
Showing the best result for this search. See all results