Skip to main content

Enabling Support for Zero Copy Semantics in an Asynchronous Task-Based Programming Model

  • Conference paper
  • First Online:
Euro-Par 2021: Parallel Processing Workshops (Euro-Par 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

  • 900 Accesses

Abstract

Communication is critical to the scalable and efficient performance of scientific simulations on extreme scale computing systems. Part of the promise of task-based programming models is that they can naturally overlap communication with computation and exploit locality between tasks. Copy-based semantics using eager communication protocols easily enable such asynchrony by alleviating the responsibility of buffer management from the user, both on the sender and the receiver. However, these semantics increase memory allocations and copies and in turn affect application memory footprint and performance, especially with large message buffers.

In this work we describe how the so-called “zero copy” messaging semantics can be supported in Converse, the message-driven parallel programming framework that is used by Charm++, by implementing support for user-owned buffer transfers in its lower level runtime system, LRTS. These semantics work on user-provided buffers and do not semantically require copies by either the user or the runtime system. We motivate our work by reviewing the existing messaging model in Converse/Charm++, identify its semantic shortcomings, and define new LRTS and Converse APIs to support zero copy communication based on RDMA capabilities. We demonstrate the utility of our new communication interfaces with benchmarks written in Converse. The result is up to 91% of message latency improvement and improved memory usage. These advances will enable future work on user-facing APIs in Charm++.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Acun, B., et al.: Parallel Programming with Migratable Objects: Charm++ in Practice. SC (2014)

    Google Scholar 

  2. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)

    Google Scholar 

  3. Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)

    Article  Google Scholar 

  4. Grun, P., et al.: A brief introduction to the openfabrics interfaces - a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 34–39 (2015). https://doi.org/10.1109/HOTI.2015.19

  5. Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: ICPPW 2009: Proceedings of the 2009 International Conference on Parallel Processing Workshops, pp. 394–401. IEEE Computer Society, Washington (2009)

    Google Scholar 

  6. Kalé, L., Bhandarkar, M., Jagathesan, N., Krishnan, S., Yelon, J.: Converse: an interoperable framework for parallel programming. In: International Parallel Processing Symposium (1996)

    Google Scholar 

  7. Liu, J., Wu, J., Panda, D.K.: High performance RDMA-based MPI implementation over infiniband. Int. J. Parallel Program. 32, 167198 (2004)

    Article  Google Scholar 

  8. Shamis, P., et al.: UCX: an open source framework for HPC network APIS and beyond. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 40–43 (2015). https://doi.org/10.1109/HOTI.2015.13

  9. El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared Memory Programming. John Wiley & Sons Inc., Hoboken (2005). https://doi.org/10.1002/0471478369.app1

  10. Vienne, J.: Benefits of cross memory attach for MPI libraries on HPC clusters. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2616498.2616532

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitin Bhat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhat, N., White, S., Kale, L.V. (2022). Enabling Support for Zero Copy Semantics in an Asynchronous Task-Based Programming Model. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06156-1_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06155-4

  • Online ISBN: 978-3-031-06156-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics