Abstract
Many HPC applications have memory requirements that exceed the typical memory available on the compute nodes. While many HPC installations have resources with very large memory installed, a more portable solution for those applications is to implement an out-of-core method. This out-of-core mechanism offloads part of the data typically onto disk when this data is not required. However, this presents a problem in parallel codes since the scalability of this approach is clearly limited by the disk latency and bandwidth. Moreover, in parallel file systems this design can lead to high loads of the file system and even failures. We present a library that provides the out-of-core functionality by making use of the main memory of devoted compute nodes. This library provides good performance and scalability and reduces the impact in the parallel file system by only using the local disk of each node. We have implemented an OpenSHMEM version of this library and compared the performance of this implementation with MPI. OpenSHMEM, together with other Partitioned Global Address Space approaches, represent one of the approaches for improving the performance of parallel applications towards the exascale. In this paper we show how OpenSHMEM represents an excellent approach for this type of application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bordawekar, R., Choudhary, A., Kennedy, K., Koelbel, C., Paleczny, M.: A model and compilation strategy for out-of-core data parallel programs. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–10. PPOPP 1995, ACM, New York, NY, USA (1995)
Brown, A.D., Mowry, T.C., Krieger, O.: Compiler-based I/O prefetching for out-of-core applications. ACM Trans. Comput. Syst. 19(2), 111–170 (2001)
Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, pp. 2:1–2:3. PGAS 2010, ACM, New York, NY, USA (2010)
Cox, M., Ellsworth, D.: Application-controlled demand paging for out-of-core visualization. In: Proceedings of the 8th Conference on Visualization 1997, pp. 235-ff. VIS 1997, IEEE Computer Society Press, Los Alamitos, CA, USA (1997)
Gropp, W.: MPI at exascale: challenges for data structures and algorithms. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) PVM/MPI. LNCS, vol. 5759, pp. 3–3. Springer, Heidelberg (2009)
Jose, J., Zhang, J., Venkatesh, A., Potluri, S., Panda, D.K.D.K.: A comprehensive performance evaluation of openSHMEM libraries on infiniband clusters. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 14–28. Springer, Heidelberg (2014)
Lindstrom, P.: Out-of-core simplification of large polygonal models. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 259–262. SIGGRAPH 2000, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA (2000)
Liu, J., Panda, D.: Implementing efficient and scalable flow control schemes in MPI over InfiniBand. In: 2004 Proceedings of the 18th International Parallel and Distributed Processing Symposium, pp. 183–193, April 2004
Simmons, C.S., Schulz, K.W.: A distributed memory out-of-core method on HPC clusters and its application to quantum chemistry applications. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond, pp. 1:1–1:7. XSEDE 2012, ACM, New York, NY, USA (2012)
Simmons, C.S.: Development of a computational framework for quantitative vibronic coupling and its application to the \(NO_{3}\) radical. Ph.D. thesis, University of Texas at Austin, Austin, May 2012
Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. External Mem. Algorithms Vis. 50, 161–179 (1999)
Vienne, J., Chen, J., Wasi-ur-Rahman, M., Islam, N.S., Subramoni, H., Panda, D.K.: Performance analysis and evaluation of infiniband FDR and 40GigE RoCE on HPC and cloud computing systems. In: IEEE 20th Annual Symposium on High-Performance Interconnects, HOTI 2012, Santa Clara, CA, USA, August 22–24, 2012, pp. 48–55. IEEE Computer Society (2012)
Acknowledgment
Research reported in this publication was supported by National Science Foundation under award number 1213084, Unified Runtime for Supporting Hybrid Programming Models on Heterogeneous Architectures.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gómez-Iglesias, A., Vienne, J., Hamidouche, K., Simmons, C.S., Barth, W.L., Panda, D. (2015). Scalable Out-of-core OpenSHMEM Library for HPC. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M. (eds) OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies. OpenSHMEM 2014. Lecture Notes in Computer Science(), vol 9397. Springer, Cham. https://doi.org/10.1007/978-3-319-26428-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-26428-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26427-1
Online ISBN: 978-3-319-26428-8
eBook Packages: Computer ScienceComputer Science (R0)