Extending OP2 framework to support portable parallel programming of complex applications

Z Chen, K Huang, Y Che, C Xu, J Zhang, Z Dai… - CCF Transactions on …, 2024 - Springer
Z Chen, K Huang, Y Che, C Xu, J Zhang, Z Dai, M Li
CCF Transactions on High Performance Computing, 2024Springer
Current HPC hardware presents the characteristics of heterogeneity and diversity. Portable
parallel programming technologies are attractive for application developers. OP2 is a
domain specific programming framework for unstructured applications. It supports unified
programming and automatic code generation for multiple hardware platforms. However,
current OP2 implementation is faced with some difficulties in programming application with
complex data structures and function calls. To address this issue, we improve the …
Abstract
Current HPC hardware presents the characteristics of heterogeneity and diversity. Portable parallel programming technologies are attractive for application developers. OP2 is a domain specific programming framework for unstructured applications. It supports unified programming and automatic code generation for multiple hardware platforms. However, current OP2 implementation is faced with some difficulties in programming application with complex data structures and function calls. To address this issue, we improve the implementation of OP2 framework in this paper. We modified the source-to-source translator and the runtime library of OP2, making it possible to automatically support applications with complex data structures and function calls during the generation of serial, OpenMP, CUDA, and MPI versions of codes. This avoids tedious manual code rewriting process for the OP2 application developers. HOUR2D, a high order and complex unstructured CFD application, is used as an example to verify the applicability of our extension to the OP2 framework. The results show that our extension enables OP2 to support portable programming for complex unstructured applications without changing its programming mode, ensures the correctness of the results, and achieves comparable or even better performance than manual parallelizations on Intel Xeon Gold CPU, HUAWEI Kunpeng CPU and NVIDIA V100 GPU.
Springer
Showing the best result for this search. See all results