On the Scalability of an Automatically Parallelized Irregular Application

Burtscher, Martin; Kulkarni, Milind; Prountzos, Dimitrios; Pingali, Keshav

doi:10.1007/978-3-540-89740-8_8

Martin Burtscher²,
Milind Kulkarni²,
Dimitrios Prountzos² &
…
Keshav Pingali²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5335))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

919 Accesses
6 Citations

Abstract

Irregular applications, i.e., programs that manipulate pointer-based data structures such as graphs and trees, constitute a challenging target for parallelization because the amount of parallelism is input dependent and changes dynamically. Traditional dependence analysis techniques are too conservative to expose this parallelism. Even manual parallelization is difficult, time consuming, and error prone. The Galois system parallelizes such applications using an optimistic approach that exploits higher-level semantics of abstract data types.

In this paper, we study the performance and scalability of a Galoised, that is, automatically parallelized, version of Delaunay mesh refinement (DR) on a shared-memory system with 128 CPUs. DR is an important irregular application that is used, e.g., in graphics and finite-element codes. The parallelized program scales to 64 threads, where it reaches a speedup of 25.8. For large numbers of threads, the performance is hampered by the load imbalance and the nonuniform memory latency, both of which grow as the number of threads increases. While these two issues will have to be addressed in future work, we believe our results already show the Galois approach to be very promising.

Download to read the full chapter text

Chapter PDF

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

Parallel Meshing for Finite Element Analysis

Multithreaded runtime framework for parallel and adaptive applications

Article 31 July 2022

Keywords

References

Chernikov, A., Chrisochoides, N.: Parallel 2D Constrained Delaunay Mesh Generation. ACM Transactions on Mathematical Software 34(1) (2008)
Google Scholar
Chernikov, A., Chrisochoides, N.: Three-dimensional Delaunay Refinement for Multi-core Processors. In: 22nd International Conference on Supercomputing, pp. 214–224 (2008)
Google Scholar
Chew, L.P.: Guaranteed-quality Mesh Generation for Curved Surfaces. In: Ninth Annual Symposium on Computational Geometry (1993)
Google Scholar
Ghiya, R., Hendren, L.J.: Putting pointer analysis to work. In: 25th Symposium on Principles of Programming Languages, pp. 121–133 (1998)
Google Scholar
Hendren, L.J., Nicolau, A.: Parallelizing Programs with Recursive Data Structures. IEEE Transactions on Parallel and Distributed Systems 1(1), 35–47 (1990)
Article Google Scholar
Allen, R.J., Kennedy, K.: Optimizing Compilers for Modern Architectures: a Dependence-based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Krishnan, V., Torrellas, J.: A Chip-multiprocessor Architecture with Speculative Multithreading. IEEE Transactions on Computers 48(9) (1999)
Google Scholar
Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Scheduling Strategies for Optimistic Parallel Execution of Irregular Programs. In: Symposium on Parallelism in Algorithms and Architectures, pp. 217–228 (2008)
Google Scholar
Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Optimistic Parallelism Benefits from Data Partitioning. In: International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 36(1), pp. 233–243 (2008)
Google Scholar
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic Parallelism Requires Abstractions. In: Conference on Programming Language Design and Implementation, vol. 42(6), pp. 211–222 (2007)
Google Scholar
Larus, J.R., Hilfinger, P.N.: Detecting Conflicts between Structure Accesses. In: Conference on Programming Language Design and Implementation (1988)
Google Scholar
Larus, J., Rajwar, R.: Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, San Francisco (2007)
Google Scholar
Ni, Y., Menon, V.S., Adl-Tabatabai, A.R., Hosking, A.L., Hudson, R.L., Moss, J.E.B., Saha, B., Shpeisman, T.: Open Nesting in Software Transactional Memory. In: 12th Symposium on Principles and Practice of Parallel Programming, pp. 68–78 (2007)
Google Scholar
Ponnusamy, R., Saltz, J., Choudhary, A.: Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In: Conference on Supercomputing, pp. 361–370 (1993)
Google Scholar
Rauchwerger, L., Padua, D.: The LRPD Test: Speculative Runtime Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Transactions on Parallel Distributed Systems 10(2), 160–180 (1999)
Article Google Scholar
Sagiv, M., Reps, T., Wilhelm, R.: Parametric Shape Analysis via 3-valued Logic. In: 26th Symposium on Principles of Programming Languages, pp. 105–118 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Grid and Distributed Computing Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX 78712
Martin Burtscher, Milind Kulkarni, Dimitrios Prountzos & Keshav Pingali

Authors

Martin Burtscher
View author publications
You can also search for this author in PubMed Google Scholar
Milind Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Prountzos
View author publications
You can also search for this author in PubMed Google Scholar
Keshav Pingali
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing Science, University of Alberta, T6G-2E8, Edmonton, AB, Canada
José Nelson Amaral

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burtscher, M., Kulkarni, M., Prountzos, D., Pingali, K. (2008). On the Scalability of an Automatically Parallelized Irregular Application. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-89740-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89739-2
Online ISBN: 978-3-540-89740-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Scalability of an Automatically Parallelized Irregular Application

Abstract

Chapter PDF

Similar content being viewed by others

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

Parallel Meshing for Finite Element Analysis

Multithreaded runtime framework for parallel and adaptive applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On the Scalability of an Automatically Parallelized Irregular Application

Abstract

Chapter PDF

Similar content being viewed by others

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

Parallel Meshing for Finite Element Analysis

Multithreaded runtime framework for parallel and adaptive applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation