Google Scholar

Escalating memory accesses to shared memory by profiling reuse

Y Ko, H Kim, H Han - Proceedings of the 10th International Conference …, 2016 - dl.acm.org

Proceedings of the 10th International Conference on Ubiquitous Information …, 2016•dl.acm.org

Recently, many studies have been conducted to improve the performance of CUDA or OpenCL applications, and one of key techniques to improve the performance is using the shared memory in GPUs. The shared memory is an on-chip memory, which can be accessed and shared among threads in the same multiprocessor. Since it can be accessed as fast as L1 cache, data on the shared memory can be accessed much faster than data on the global memory or local memory that reside in off-chip device memory. However, programmers can only exploit architectural features of GPUs, if they possess a detailed knowledge about the hierarchy of the GPU memory. Therefore, we analyze data access patterns through profiling applications and transform code to use the shared memory effectively. We also propose a metric that can represent data re-usability based on the reuse distance theory. We can achieve the performance improvement in six applications from Rodinia benchmark by 8.6% on the Kepler architecture and 9.6% on the Maxwell architecture, respectively.

ACM Digital Library

Show moreShow less

Save Cite Cited by 2 Related articles

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Escalating memory accesses to shared memory by profiling reuse