Skip to main content
Log in

Tasking framework for adaptive speculative parallel mesh generation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Handling the ever-increasing complexity of mesh generation codes along with the intricacies of newer hardware often results in codes that are both difficult to comprehend and maintain. Different facets of codes such as thread management and load balancing are often intertwined, resulting in efficient but highly complex software. In this work, we present a framework which aids in establishing a core principle, deemed separation of concerns, where functionality is separated from performance aspects of various mesh operations. In particular, thread management and scheduling decisions are elevated into a generic and reusable tasking framework. The results indicate that our approach can successfully abstract the load balancing aspects of two case studies, while providing access to a plethora of different execution back-ends. One would expect, this new flexibility to lead to some additional cost. However, for the configurations studied in this work, we observed up to \(13\%\) speedup for some meshing operations and up to \(5.8\%\) speedup over the entire application runtime compared to hand-optimized code. Moreover, we show that by using different task creation strategies, the overhead compared to straight-forward task execution models can be improved dramatically by as much as \(1200\%\) without compromises in portability and functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Recently, Intel®Threading Building Blocks was renamed to Intel®oneAPI Threading Building Blocks (oneTBB) to highlight that the tool is part of the oneAPI ecosystem.

  2. \(h=\log _2(n/grainsize)\) is the depth of a perfect binary tree with n/grainsize terminal nodes. \(2^{h+1} -1\) is the number of nodes for a perfect binary tree with depth h.

  3. https://www.khronos.org/sycl (Accessed 27th April 2021).

  4. https://kokkos.org (Accessed 27th April 2021).

References

  1. Aldea S, Estebanez A, Llanos DR, Gonzalez-Escribano A (2016) An OpenMP extension that supports thread-level speculation. IEEE Trans Parallel Distrib Syst 27(1):78–91. https://doi.org/10.1109/TPDS.2015.2393870

    Article  Google Scholar 

  2. Antonopoulos CD, Ding X, Chernikov A, Blagojevic F, Nikolopoulos DS, Chrisochoides N (2005) Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures. In: Proceedings of the 19th annual international conference on supercomputing, ICS ’05, pp. 367–376. ACM, New York, NY, USA . https://doi.org/10.1145/1088149.1088198

  3. Barker K, Chrisochoides N (2005) Practical performance model for optimizing dynamic load balancing of adaptive applications. IEEE. https://doi.org/10.1109/IPDPS.2005.352

    Article  Google Scholar 

  4. Batista VHF, Millman DL, Pion S, Singler J (2010) Parallel geometric algorithms for multi-core computers. Comput Geomet 43(8):663–677. https://doi.org/10.1016/j.comgeo.2010.04.008

    Article  MathSciNet  MATH  Google Scholar 

  5. Blandford DK, Blelloch GE, Kadow C (2006) Engineering a Compact Parallel Delaunay Algorithm in 3D. In: Proceedings of the twenty-second annual symposium on computational geometry, SCG ’06, pp. 292–300. ACM, New York, NY, USA . https://doi.org/10.1145/1137856.1137900

  6. Blelloch GE, Anderson D, Dhulipala L (2020) ParlayLib - A Toolkit for Parallel Algorithms on Shared-Memory Multicore Machines. In: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’20, pp. 507–509. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3350755.3400254

  7. Blelloch GE, Fineman JT, Gibbons PB, Shun J (2012) Internally deterministic parallel algorithms can be fast. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pp. 181–192. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2145816.2145840

  8. Blumofe RD, Leiserson CE (1999) Scheduling multithreaded computations by work stealing. J ACM 46(5):720–748. https://doi.org/10.1145/324133.324234

    Article  MathSciNet  MATH  Google Scholar 

  9. Bowyer A (1981) Computing Dirichlet tessellations. The Comput J 24(2):162–166. https://doi.org/10.1093/comjnl/24.2.162

    Article  MathSciNet  Google Scholar 

  10. Bramas B (2019) Increasing the degree of parallelism using speculative execution in task-based runtime systems. PeerJ Comput Sci 5:e183

    Article  Google Scholar 

  11. Caamaño JMM, Sukumaran-Rajam A, Baloian A, Selva M, Clauss P (2017) APOLLO: automatic speculative polyhedral loop optimizer. In: IMPACT 2017 - 7th international workshop on polyhedral compilation techniques, p. 8. Stockholm, Sweden

  12. Chase D, Lev Y (2005) Dynamic circular work-stealing deque. In: Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’05, p. 21–28. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1073970.1073974

  13. Chi Y, Guo L, Choi Yk, Wang J, Cong J (2021) Extending high-level synthesis for task-parallel programs. In: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’21, p. 225. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3431920.3439470

  14. Chrisochoides N, Sukup F (1996) Task parallel implementation of the Bowyer-watson algorithm. In: Proceedings of fifth international conference on numerical grid generation in computational fluid dynamics and related Fields, pp. 773–782

  15. Chrisochoides NP (2016) Telescopic approach for extreme-scale parallel mesh generation for CFD Applications. In: 46th AIAA fluid dynamics conference. American Institute of Aeronautics and Astronautics. https://doi.org/10.2514/6.2016-3181

  16. Conway ME (1963) A multiprocessor system design. In: Proceedings of the November 12-14, 1963, fall joint computer conference, AFIPS ’63 (Fall), pp. 139–146. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1463822.1463838

  17. Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5(1), 46–55. https://doi.org/10.1109/99.660313. Conference Name: IEEE Computational Science and Engineering

  18. Dijkstra EW (1982) On the role of scientific thought. In: Selected writings on computing: a personal perspective, pp. 60–66. Springer-Verlag, Berlin, Heidelberg

  19. Drakopoulos F (2017) Finite element modeling driven by health care and aerospace applications. Ph.D. thesis, Computer Science, Old Dominion University, Virginia. https://doi.org/10.25777/p9kt-9c56. ISBN: 9780355362169

  20. Drakopoulos F, Tsolakis C, Chrisochoides NP (2019) Fine-grained speculative topological transformation scheme for local reconnection methods. AIAA J 57(9):4007–4018

    Article  Google Scholar 

  21. Duran A, Corbalán J, AyguadÉ E (2008) Evaluation of OpenMP Task Scheduling Strategies. In: Hutchison D, Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Sudan M, Terzopoulos D, Tygar D, Vardi MY, Weikum G, Eigenmann R, de Supinski BR (eds.) OpenMP in a New Era of Parallelism, vol. 5004, pp. 100–110. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79561-2_9. Series Title: Lecture Notes in Computer Science

  22. Feng D, Tsolakis C, Chernikov A.N, Chrisochoides N.P (2017) Scalable 3D hybrid parallel delaunay image-to-mesh conversion algorithm for distributed shared memory architectures. Comput Aided Des 85(C):10–19. https://doi.org/10.1016/j.cad.2016.07.010

    Article  Google Scholar 

  23. Fleming PJ, Wallace JJ (1986) How not to lie with statistics: the correct way to summarize benchmark results. Commun ACM 29(3):218–221. https://doi.org/10.1145/5666.5673

    Article  Google Scholar 

  24. Foteinos P (2013) Real-time high-quality image to mesh conversion for finite element simulations. Ph.D, The College of William and Mary, United States - Virginia

  25. Foteinos P, Chrisochoides N (2011) Dynamic parallel 3D delaunay triangulation. In: W.R. Quadros (ed.) Proceedings of the 20th international meshing roundtable, pp. 3–20. Springer Berlin Heidelberg . https://doi.org/10.1007/978-3-642-24734-7_1

  26. Foteinos P, Chrisochoides N (2014) 4D space-time Delaunay meshing for medical images. Eng Comput 31(3):499–511. https://doi.org/10.1007/s00366-014-0380-z

    Article  MATH  Google Scholar 

  27. Foteinos PA, Chrisochoides NP (2014) High quality real-time Image-to-Mesh conversion for finite element simulations. J Parallel Distrib Comput 74(2):2123–2140. https://doi.org/10.1016/j.jpdc.2013.11.002

    Article  Google Scholar 

  28. Furrer FJ (2019) Future-proof software-systems: a sustainable evolution strategy. Springer Vieweg. https://doi.org/10.1007/978-3-658-19938-8

  29. Hoi SCH, Sahoo D, Lu J, Zhao P (2018) Online Learning: a comprehensive survey

  30. Jefferson DR (1985) Virtual time. ACM Trans Program Lang Syst 7(3):404–425. https://doi.org/10.1145/3916.3988

    Article  Google Scholar 

  31. Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K, Chew LP (2007) Optimistic parallelism requires abstractions. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pp. 211–222. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/1250734.1250759

  32. Kung HT, Robinson JT (1981) On optimistic methods for concurrency control. ACM Trans Database Syst 6(2):213–226. https://doi.org/10.1145/319566.319567

    Article  Google Scholar 

  33. Marot C, Pellerin J, Remacle JF (2019) One machine, one minute, three billion tetrahedra. Int J Num Methods Eng 117(9):967–990. https://doi.org/10.1002/nme.5987

    Article  MathSciNet  Google Scholar 

  34. Nave D, Nikos Chrisochoides, Chew LP (2002) Guaranteed: quality parallel delaunay refinement for restricted polyhedral domains. In: Proceedings of the Eighteenth Annual Symposium on Computational Geometry, SCG ’02, pp. 135–144. ACM, New York, NY, USA. https://doi.org/10.1145/513400.513418

  35. Rainey M, Newton RR, Hale K, Hardavellas N, Campanoni S, Dinda P, Acar UA (2021) Task parallel assembly language for uncompromising parallelism. In: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, p. 1064–1079. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3453483.3460969

  36. Raman A, Kim H, Mason TR, Jablin TB, August DI (2010) Speculative parallelization using software multi-threaded transactions. In: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, ASPLOS XV, pp. 65–76. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1736020.1736030

  37. Rauchwerger L, Padua D (1995) The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. ACM Sigplan Not 30(6):218–232. https://doi.org/10.1145/223428.207148

    Article  Google Scholar 

  38. Saltz J, Mirchandaney R, Crowley K (1991) Run-time parallelization and scheduling of loops. IEEE Transactions on Computers 40(5):603–612. https://doi.org/10.1109/12.88484. Conference Name: IEEE Transactions on Computers

  39. Seo S, Amer A, Balaji P, Bordage C, Bosilca G, Brooks A, Carns P, Castelló A, Genet D, Herault T, Iwasaki S, Jindal P, Kalé LV, Krishnamoorthy S, Lifflander J, Lu H, Meneses E, Snir M, Sun Y, Taura K, Beckman P (2018) Argobots: a lightweight low-level threading and tasking framework. IEEE Trans Parallel Distrib Syst 29(3):512–526. https://doi.org/10.1109/TPDS.2017.2766062

    Article  Google Scholar 

  40. Steele GL (1989) Making asynchronous parallelism safe for the world. In: Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’90, pp. 218–231. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/96709.96731

  41. Thomadakis P, Tsolakis C, Chrisochoides N (2021) Multithreaded runtime framework for parallel and adaptive applications. IEEE Transactions on Parallel and Distributed Systems. https://crtc.cs.odu.edu/pub/papers/journal_86.pdf. (under review)

  42. Thoman P, Dichev K, Heller T, Iakymchuk R, Aguilar X, Hasanov K, Gschwandtner P, Lemarinier P, Markidis S, Jordan H, Fahringer T, Katrinis K, Laure E, Nikolopoulos DS (2018) A taxonomy of task-based parallel programming technologies for high-performance computing. The J Supercomput 74(4):1422–1434. https://doi.org/10.1007/s11227-018-2238-4

    Article  Google Scholar 

  43. Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Dev 11(1), 25–33. https://doi.org/10.1147/rd.111.0025. Conference Name: IBM Journal of Research and Development

  44. Tsolakis C, Chrisochoides N, Park MA, Loseille A, Michal TR (2019) Parallel Anisotropic Unstructured Grid Adaptation. In: AIAA Scitech 2019 Forum, AIAA SciTech Forum. American Institute of Aeronautics and Astronautics, San Diego, California. https://doi.org/10.2514/6.2019-1995

  45. Tsolakis C, Chrisochoides N, Park MA, Loseille A, Michal TR (2021) Parallel anisotropic unstructured grid adaptation. AIAA J. https://doi.org/10.2514/1.J060270

    Article  Google Scholar 

  46. Tsolakis C, Thomadakis P, Chrisochoides N (2020) Exascale-era parallel adaptive mesh generation and runtime software system activities at the center for real-time computing . https://epcced.github.io/ELEMENT/workshops.html. (presentation), Accessed on 2021-03-08

  47. Watson DF (1981) Computing the n-dimensional Delaunay tessellation with application to Voronoi polytopes. The Comput J 24(2):167–172. https://doi.org/10.1093/comjnl/24.2.167

    Article  MathSciNet  Google Scholar 

  48. Willhalm T, Popovici N (2008) Putting Intel\(\text{\textregistered} \) threading building blocks to work. In: Proceedings of the 1st international workshop on Multicore software engineering, IWMSE ’08, pp. 3–4. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1370082.1370085

  49. Ying VA, Jeffrey MC, Sanchez D (2020) T4: Compiling sequential code for effective speculative parallelization in hardware. In: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA ’20, p. 159–172. IEEE Press. https://doi.org/10.1109/ISCA45697.2020.00024

Download references

Acknowledgements

We would like to thank the reviewers for providing helpful comments on earlier drafts of the manuscript. This research was sponsored in part by the NASA Transformational Tools and Technologies Project (NNX15AU39A) of the Transformative Aeronautics Concepts Program under the Aeronautics Research Mission Directorate, NSF grant no. CCF-1439079, the Richard T. Cheng Endowment, the Modeling and Simulation fellowship of Old Dominion University and the Dominion Scholar fellowship of Old Dominion University. Experiments were supported by the Research Computing clusters at Old Dominion University. The authors would like to thank Kevin Garner for the corrections of the English text in the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Tsolakis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsolakis, C., Thomadakis, P. & Chrisochoides, N. Tasking framework for adaptive speculative parallel mesh generation. J Supercomput 78, 1–32 (2022). https://doi.org/10.1007/s11227-021-04158-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04158-9

Keywords

Navigation