skip to main content
10.1145/2832241.2832242acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

ACPdl: data-structure and global memory allocator library over a thin PGAS-layer

Published: 15 November 2015 Publication History

Abstract

HPC systems comprise an increasing number of processor cores towards the exascale computing era. As the number of parallel processes on a system increases, the number of point-to-point connections for each process increases and the memory usage of connections becomes an issue. A new communication library called Advanced Communication Primitives (ACP) is being developed to address the issue by providing communication functions with the Partitioned Global Address Space (PGAS) model that is potentially connection-less. The ACP library is designed to underlie domain-specific languages or parallel language runtimes. The ACP basic layer (ACPbl) comprises a minimum set of functions to abstract interconnect devices and to provide an address translation mechanism. As far as using ACPbl, global address can be granted only to local memory. In this paper, a new set of functions called the ACP data library (ACPdl) including global memory allocator and data-structure library is introduced to improve the productivity of the ACP library. The global memory allocator allocates a memory region of a remote process and assigns global address to it without involving the remote process. The data-structure library uses the global memory allocator internally and provides functions to create, read, update and delete distributed data-structures. Evaluation results of global memory allocator and associative-array data-structure functions show that overhead between the main and communication threads may become a bottleneck when an implementation of ACPbl uses a low latency HPC-dedicated interconnect device.

References

[1]
TOP500 Supercomputer Sites: http://www.top500.org/.
[2]
Subramoni, H., Hamidouche, K., et al. 2014. Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences, 29th International Supercomputing Conference (ISC), 278--295.
[3]
Numrich, R. W. and Reid, J. 1998. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17, 2, 1--13.
[4]
Berkeley UPC - Unified Parallel C. http://upc.lbl.gov/.
[5]
Charles, P., Grothoff, C., et al. 2005. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN OOPSLA '05, 519--538.
[6]
Chamberlain, B. L., Callahan, D., and Zima, H. P. 2007. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications, 21 (3), 291--312.
[7]
XcalableMP WebSite: http://www.xcalablemp.org/.
[8]
ACE Project: http://ace-project.kyushu-u.ac.jp/index.html.
[9]
Nanri, T., Soga, T., et al. 2015. Channel Interface: A Primitive Model for Memory Efficient Communication. 23rd Euromicro PDP, 177--181.
[10]
Postel (editor), J. B. 1980. User Datagram Protocol, RFC 768.
[11]
InfiniBand Trade Association: http::/www.infinibandta.org.
[12]
Ajima, Y., Sumimoto, S., and Shimizu, T. 2010. Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers, IEEE Computer, vol. 42, no. 11, pp.30--40.
[13]
Ajima, Y., Inoue, T., et al. 2012. The Tofu Interconnect, IEEE Micro, 32 (1), 21--31.
[14]
Ajima, Y., Inoue, T., et al. 2014. Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect, 29th International Supercomputing Conference (ISC), 498--507.
[15]
Ajima, Y., Inoue, T., et al. 2014. The Tofu Interconnect 2, IEEE 22nd High-Performance Interconnects, 57--62.
[16]
Kernighan, B. W. 1988. The C Programming Language (2nd ed.). Prentice Hall Professional Technical Reference.
[17]
ARMCI -- Aggregate Remote Memory Copy Interface: http://hpc.pnl.gov/armci/.
[18]
GASNet Communication System: http://gasnet.lbl.gov/.
[19]
UCCS - Universal Common Communication Substrate: http://uccs.github.io/uccs/.
[20]
Chapman, B., Curtis, T., et al. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. PGAS '10, Article 2, 3 pages.
[21]
Kumar, S., Mamidala, A. R., et al. 2012. PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer, IEEE 26th IPDPS, 764--774.
[22]
MVAPICH: http://mvapich.cse.ohio-state.edu/.
[23]
Libfabric: http://ofiwg.github.io/libfabric/.
[24]
UCX -- Unified Communication X: http://www.openucx.org/.

Cited By

View all
  • (2018)Approaches for Memory-Efficient Communication Library and Runtime Communication OptimizationAdvanced Software Technologies for Post-Peta Scale Computing10.1007/978-981-13-1924-2_7(121-138)Online publication date: 7-Dec-2018
  • (2017)The Design of Advanced Communication to Reduce Memory Usage for Exa-scale SystemsHigh Performance Computing for Computational Science – VECPAR 201610.1007/978-3-319-61982-8_15(149-161)Online publication date: 14-Jul-2017
  • (2016)Reducing Manipulation Overhead of Remote Data-Structure by Controlling Remote Memory Access OrderHigh Performance Computing10.1007/978-3-319-46079-6_7(85-97)Online publication date: 6-Oct-2016

Index Terms

  1. ACPdl: data-structure and global memory allocator library over a thin PGAS-layer

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware
    November 2015
    58 pages
    ISBN:9781450339964
    DOI:10.1145/2832241
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. PGAS
    2. communication library
    3. data structure
    4. global memory allocator

    Qualifiers

    • Research-article

    Funding Sources

    • JST (Japan Science and Technology Agency)

    Conference

    SC15
    Sponsor:

    Acceptance Rates

    ESPM '15 Paper Acceptance Rate 5 of 10 submissions, 50%;
    Overall Acceptance Rate 5 of 10 submissions, 50%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Approaches for Memory-Efficient Communication Library and Runtime Communication OptimizationAdvanced Software Technologies for Post-Peta Scale Computing10.1007/978-981-13-1924-2_7(121-138)Online publication date: 7-Dec-2018
    • (2017)The Design of Advanced Communication to Reduce Memory Usage for Exa-scale SystemsHigh Performance Computing for Computational Science – VECPAR 201610.1007/978-3-319-61982-8_15(149-161)Online publication date: 14-Jul-2017
    • (2016)Reducing Manipulation Overhead of Remote Data-Structure by Controlling Remote Memory Access OrderHigh Performance Computing10.1007/978-3-319-46079-6_7(85-97)Online publication date: 6-Oct-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media