research-article

Public Access

Runtime Techniques for Automatic Process Virtualization

Authors:
Evan Ramos

Charmworks Inc, United States of America

Charmworks Inc, United States of America
View Profile

,
Sam White

Department of Computer Science, University of Illinois at Urbana-Champaign, United States

Department of Computer Science, University of Illinois at Urbana-Champaign, United States
View Profile

,
Aditya Bhosale

Department of Computer Science, University of Illinois at Urbana-Champaign, United States of America

Department of Computer Science, University of Illinois at Urbana-Champaign, United States of America
View Profile

,
Laxmikant Kale

Department of Computer Science, University of Illinois at Urbana-Champaign, United States of America

Department of Computer Science, University of Illinois at Urbana-Champaign, United States of America
View Profile

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel ProcessingAugust 2022Article No.: 26Pages 1–10https://doi.org/10.1145/3547276.3548522

Published:13 January 2023Publication History

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

Pages 1–10

ABSTRACT

Asynchronous many-task runtimes look promising for the next generation of high performance computing systems. But these runtimes are usually based on new programming models, requiring extensive programmer effort to port existing applications to them. An alternative approach is to reimagine the execution model of widely used programming APIs, such as MPI, in order to execute them more asynchronously. Virtualization is a powerful technique that can be used to execute a bulk synchronous parallel program in an asynchronous manner. Moreover, if the virtualized entities can be migrated between address spaces, the runtime can optimize execution with dynamic load balancing, fault tolerance, and other adaptive techniques.

Previous work on automating process virtualization has explored compiler approaches, source-to-source refactoring tools, and runtime methods. These approaches achieve virtualization with different tradeoffs in terms of portability (across different architectures, operating systems, compilers, and linkers), programmer effort required, and the ability to handle all different kinds of global state and programming languages. We implement support for three different related runtime methods, discuss shortcomings and their applicability to user-level virtualized process migration, and compare performance to existing approaches. Compared to existing approaches, one of our new methods achieves what we consider the best overall functionality in terms of portability, automation, support for migration, and runtime performance.

References

Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, and Laxmikant Kale. 2014. Parallel Programming with Migratable Objects: Charm++ in Practice(SC).Google Scholar
Gabriel Antoniu, Luc Bouge, and Raymond Namyst. 1999. An efficient and transparent thread migration scheme in the PM2 runtime system. In Proc. 3rd Workshop on Runtime Systems for Parallel Programming (RTSPP) San Juan, Puerto Rico. Lecture Notes in Computer Science 1586. Springer-Verlag, 496–510.Google Scholar
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: expressing locality and independence with logical regions. In Proceedings of the international conference on high performance computing, networking, storage and analysis. IEEE Computer Society Press, 66.Google ScholarDigital Library
Jean-Baptiste Besnard, Julien Adam, Sameer Shende, Marc Pérache, Patrick Carribault, Julien Jaeger, and Allen D. Maloney. 2016. Introducing Task-Containers as an Alternative to Runtime-Stacking. In Proceedings of the 23rd European MPI Users’ Group Meeting (Edinburgh, United Kingdom) (EuroMPI 2016). Association for Computing Machinery, New York, NY, USA, 51–63. https://doi.org/10.1145/2966884.2966910Google ScholarDigital Library
Shirley Browne, Christine Deane, George Ho, and Philip Mucci. 1999. PAPI: A Portable Interface to Hardware Performance Counters.Google Scholar
B.L. Chamberlain, D. Callahan, and H.P. Zima. 2007. Parallel Programmability and the Chapel Language. Int. J. High Perform. Comput. Appl. 21 (August 2007), 291–312. Issue 3. https://doi.org/10.1177/1094342007078442Google ScholarDigital Library
Sanjay Chatterjee, Sagnak Tasırlar, Zoran Budimlic, Vincent Cavé, Milind Chabbi, Max Grossman, Vivek Sarkar, and Yonghong Yan. 2013. Integrating Asynchronous Task Parallelism with MPI. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 712–725. https://doi.org/10.1109/IPDPS.2013.78Google ScholarDigital Library
James Dinan, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI Interoperability through Flexible Communication Endpoints. In Proceedings of the 20th European MPI Users’ Group Meeting (Madrid, Spain) (EuroMPI ’13). Association for Computing Machinery, New York, NY, USA, 13–18. https://doi.org/10.1145/2488551.2488553Google ScholarDigital Library
James Dinan and Mario Flajslik. 2014. Contexts: A Mechanism for High Throughput Communication in OpenSHMEM. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, OR, USA) (PGAS ’14). Association for Computing Machinery, New York, NY, USA, Article 10, 9 pages. https://doi.org/10.1145/2676870.2676872Google ScholarDigital Library
Message Passing Interface Forum. 2015. MPI: A Message-passing Interface Standard, Version 3.1 ; June 4, 2015. High-Performance Computing Center Stuttgart, University of Stuttgart. https://books.google.com/books?id=Fbv7jwEACAAJGoogle Scholar
Atsushi Hori, Min Si, Balazs Gerofi, Masamichi Takagi, Jai Dayal, Pavan Balaji, and Yutaka Ishikawa. 2018. Process-in-process: Techniques for Practical Address-space Sharing. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC ’18). ACM, New York, NY, USA, 131–143. https://doi.org/10.1145/3208040.3208045Google ScholarDigital Library
Chao Huang, Orion Lawlor, and L. V. Kalé. 2003. Adaptive MPI. In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), LNCS 2958. College Station, Texas, 306–322.Google Scholar
Nikhil Jain, Abhinav Bhatele, Jae-Seung Yeom, Mark F. Adams, Francesco Miniati, Chao Mei, and Laxmikant V. Kale. 2015. Charm++ & MPI: Combining the Best of Both Worlds. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (to appear)(IPDPS ’15). IEEE Computer Society. LLNL-CONF-663041.Google ScholarDigital Library
Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 6.Google ScholarDigital Library
Humaira Kamal and Alan Wagner. 2010. FG-MPI: Fine-grain MPI for multicore and clusters. In 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW). 1–8. https://doi.org/10.1109/IPDPSW.2010.5470773Google ScholarCross Ref
Stas Negara, Gengbin Zheng, Kuo-Chuan Pan, Natasha Negara, Ralph E. Johnson, Laxmikant V. Kalé, and Paul M. Ricker. 2010. Automatic MPI to AMPI Program Transformation Using Photran. In Proceedings of the 2010 Conference on Parallel Processing (Ischia, Italy) (Euro-Par 2010). Springer-Verlag, Berlin, Heidelberg, 531–539.Google ScholarDigital Library
Stas Negara, Gengbin Zheng, Kuo-Chuan Pan, Natasha Negara, Ralph E. Johnson, Laxmikant V. Kale, and Paul M. Ricker. 2010. Automatic MPI to AMPI Program Transformation using Photran. In 3rd Workshop on Productivity and Performance (PROPER 2010). Ischia/Naples/Italy.Google ScholarDigital Library
Marc Perache, Herve Jourdren, and Raymond Namyst. 2008. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing (Las Palmas de Gran Canaria, Spain) (Euro-Par ’08). Springer-Verlag, Berlin, Heidelberg, 78–88. https://doi.org/10.1007/978-3-540-85451-7_9Google ScholarDigital Library
KJ Roberts, JC Dietrich, D Wirasaet, WJ Pringle, and JJ Westerink. 2021. Dynamic load balancing for predictions of storm surge and coastal flooding. Environmental Modelling and Software 140, 105045. https://doi.org/10.1016/j.envsoft.2021.105045Google ScholarCross Ref
Hong Tang, Kai Shen, and Tao Yang. 1999. Compile/Run-Time Support for Threaded MPI Execution on Multiprogrammed Shared Memory Machines. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Atlanta, Georgia, USA) (PPoPP ’99). Association for Computing Machinery, New York, NY, USA, 107–118. https://doi.org/10.1145/301104.301114Google ScholarDigital Library
Marc Tchiboukdjian, Patrick Carribault, and Marc Pérache. 2012. Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium. 366–377. https://doi.org/10.1109/IPDPS.2012.42Google ScholarDigital Library
Sam White and Laxmikant V. Kale. 2018. Optimizing point-to-point communication between adaptive MPI endpoints in shared memory. Concurrency and Computation: Practice and Experience (2018), n/a–n/a. https://doi.org/10.1002/cpe.4467Google ScholarCross Ref
Gengbin Zheng, Stas Negara, Celso L. Mendes, Eduardo R. Rodrigues, and Laxmikant V. Kale. 2011. Automatic Handling of Global Variables for Multi-threaded MPI Programs. In Proceedings of the 16th International Conference on Parallel and Distributed Systems (ICPADS) 2011.Google ScholarDigital Library

Index Terms

Runtime Techniques for Automatic Process Virtualization
1. Computer systems organization
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Symbiotic virtualization
Read More
Improving machine virtualisation with 'hotplug memory'

Machine virtualisation is a key technology for server consolidation and on-demand server provisioning. To support this trend, it is essential to improve the performance of virtualisation software and enable the efficient running of many virtual ...
Read More
Automatic Virtualization of Accelerators
HotOS '19: Proceedings of the Workshop on Hot Topics in Operating Systems

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing
August 2022
233 pages
ISBN:9781450394451
DOI:10.1145/3547276

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 January 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 53
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Runtime Techniques for Automatic Process Virtualization

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Symbiotic virtualization

Improving machine virtualisation with 'hotplug memory'

Automatic Virtualization of Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Runtime Techniques for Automatic Process Virtualization

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Symbiotic virtualization

Improving machine virtualisation with 'hotplug memory'

Automatic Virtualization of Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media