ABSTRACT
As hardware architectures and software stacks complexity grows, development productivity, performance and software portability, quickly evolve from desirable features to actual needs. SHAD, the Scalable High-performance Algorithms and Data-structures C++ library is designed to mitigate these issues: it provides general purpose building blocks as well as high-level custom utilities, and offers a shared-memory programming abstraction which facilitates the programming of complex systems, scaling up to High Performance Computing clusters. SHAD's portability is achieved through an abstract runtime interface, which decouples the upper layers of the library and hides the low level details of the underlying architecture. This layer enables SHAD to interface with different runtime/threading systems, e.g. Intel TBB and Global Memory and Threading (GMT). However, current backends targeting distributed systems, rely on a centralized controller which may possibly limit scalability up to hundreds of nodes and creates a network hot spot due to all to one communication for synchronization, and possibly resulting in degraded performance at high process counts. In this research, we explore HPX, the C++ standard library for parallelism and concurrency, as an additional backend in support of the SHAD library, and present the methodologies in support of local and remote task executions in SHAD with respect to HPX. Finally, we evaluate the proposed system by comparing against existing backends of SHAD and analyzing their performance on C++ Standard Template Library algorithms.
- V. G. Castellana and M. Minutoli, "SHAD: the scalable high-performance algorithms and data-structures library," in 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, DC, USA, May 1--4, 2018, 2018, pp. 442--451.Google Scholar
- C. Pheatt, "Intel® threading building blocks," Journal of Computing Sciences in Colleges, vol. 23, no. 4, pp. 298--298, 2008.Google ScholarDigital Library
- A. Morari, A. Tumeo, D. Chavarría-Miranda, O. Villa, and M. Valero, "Scaling irregular applications through data aggregation and software multithreading," in 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 2014, pp. 1126--1135.Google Scholar
- B. Wagle, "Managing overheads in asynchronous many-task runtime systems," LSU Doctoral Dissertations, 2019.Google Scholar
- H. Kaiser, P. Diehl, A. S. Lemoine, B. A. Lelbach, P. Amini, A. Berge, J. Biddiscombe, S. R. Brandt, N. Gupta, T. Heller et al., "Hpx - the c++ standard library for parallelism and concurrency," Journal of Open Source Software, vol. 5, no. 53, 2020.Google ScholarCross Ref
- T. Heller, H. Kaiser, P. Diehl, D. Fey, and M. A. Schweitzer, "Closing the performance gap with modern c++," in International Conference on High Performance Computing. Springer, 2016, pp. 18--31.Google Scholar
- H. Kaiser, T. Heller, D. Bourgeois, and D. Fey, "Higher-level parallelization for local and distributed asynchronous task-based programming," in Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015, pp. 29--37.Google ScholarDigital Library
- T. Heller, H. Kaiser, A. Schäfer, and D. Fey, "Using hpx and libgeodecomp for scaling hpc applications on heterogeneous supercomputers," in Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013, pp. 1--8.Google Scholar
- T. Heller, H. Kaiser, and K. Iglberger, "Application of the parallex execution model to stencil-based problems," Computer Science-Research and Development, vol. 28, no. 2--3, pp. 253--261, 2013.Google Scholar
- V. G. Castellana and M. Minutoli, "Shad: The scalable high-performance algorithms and data-structures library," in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 2018, pp. 442--451.Google Scholar
- M. Drocco, V. G. Castellana, and M. Minutoli, "Practical distributed programming in c++," in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020, pp. 35--39.Google Scholar
- "Intel parallel stl," 2022. [Online]. Available: https://github.com/oneapisrc/oneDPLGoogle Scholar
- P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger, "Stapl: An adaptive, generic parallel c++ library," in International Workshop on Languages and Compilers for Parallel Computing. Springer, 2001, pp. 193--208.Google Scholar
- K. Fürlinger, T. Fuchs, and R. Kowalewski, "Dash: A c++ pgas library for distributed data structures and parallel algorithms," in 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). Ieee, 2016, pp. 983--990.Google Scholar
- N. Bell and J. Hoberock, "Thrust: A productivity-oriented library for cuda," in GPU computing gems Jade edition. Elsevier, 2012, pp. 359--371.Google Scholar
Index Terms
- Towards superior software portability with SHAD and HPX C++ libraries
Recommendations
Productive Programming of Distributed Systems with the SHAD C++ Library
HPDC '21: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed ComputingHigh-performance computing (HPC) is often perceived as a matter of making large-scale systems (e.g., clusters) run as fast as possible, regardless the required programming effort. However, the idea of "bringing HPC to the masses" has recently emerged. ...
Practical Distributed Programming in C++
HPDC '20: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed ComputingThe need for coupling high performance with productivity is steering the recent evolution of the C++ language where low-level aspects of parallel and distributed computing are now part of the standard or under discussion for inclusion. The Standard ...
A massively parallel distributed n-body application implemented with HPX
ScalA '16: Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale SystemsOne of the major challenges in parallelization is the difficulty of improving application scalability with conventional techniques. HPX provides efficient scalable parallelism by significantly reducing node starvation and effective latencies while ...
Comments