research-article

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes

Authors:
Debashis Ganguly

Department of Computer Science, University of Pittsburgh

Department of Computer Science, University of Pittsburgh
View Profile

,
John R. Lange

Department of Computer Science, University of Pittsburgh

Department of Computer Science, University of Pittsburgh
View Profile

ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017June 2017Article No.: 8Pages 1–8https://doi.org/10.1145/3095770.3095778

Published:27 June 2017Publication History

ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017

Pages 1–8

ABSTRACT

It is generally accepted that future supercomputing workloads will consist of application compositions made up of coupled simulations as well as in-situ analytics. While these components have commonly been deployed using a space-shared configuration to minimize cross-workload interference, it is likely that not all the workload components will require the full processing capacity of the CPU cores they are running on. For instance, an analytics workload often does not need to run continuously and is not generally considered to have the same priority as simulation codes. In a space-shared configuration, this arrangement would lead to wasted resources due to periodically idle CPUs, which are generally unusable by traditional bulk synchronous parallel (BSP) applications. As a result, many have started to reconsider task based runtimes owing to their ability to dynamically utilize available CPU resources. While the dynamic behavior of task-based runtimes had historically been targeted at application induced load imbalances, the same basic situation arises due to the asymmetric performance resulting from time sharing a CPU with other workloads. Many have assumed that task based runtimes would be able to adapt easily to these new environments without significant modifications. In this paper, we present a preliminary set of experiments that measured how well asynchronous task-based runtimes are able to respond to load imbalances caused by the asymmetric performance of time shared CPUs. Our work focuses on a set of experiments using benchmarks running on both Charm++ and HPX-5 in the presence of a competing workload. The results show that while these runtimes are better suited at handling the scenarios than traditional runtimes, they are not yet capable of effectively addressing anything other than a fairly minimal level of CPU contention.

References

2014. The Qthread Library. http://www.cs.sandia.gov/qthreads/. (2014). Accessed April 5.Google Scholar
2017. Charm++ Mini-apps. http://charmplusplus.org/benchmarks/. (2017). Accessed April 5.Google Scholar
2017. HPX-5 Applications. https://hpx.crest.iu.edu/applications. (2017). Accessed April 5.Google Scholar
Bilge Acun and Laxmikant V Kale. 2016. Mitigating Processor Variation through Dynamic Load Balancing. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1073--1076.Google ScholarCross Ref
Bilge Acun, Phil Miller, and Laxmikant V Kale. 2016. Variation among processors under turbo boost in hpc systems. In Proceedings of the 2016 International Conference on Supercomputing. ACM, 6. Google ScholarDigital Library
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In Proceedings of the international conference on high performance computing, networking, storage and analysis. IEEE Computer Society Press, 66. Google ScholarDigital Library
Abhinav Bhatele, Kathryn Mohror, Steven H Langer, and Katherine E Isaacs. 2013. There goes the neighborhood: performance degradation due to nearby jobs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 41. Google ScholarDigital Library
Bradford L Chamberlain, David Callahan, and Hans P Zima. 2007. Parallel programmability and the chapel language. The International Journal of High Performance Computing Applications 21, 3 (2007), 291--312. Google ScholarDigital Library
Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Robert Sisneros, Orcun Yildiz, Shadi Ibrahim, Tom Peterka, and Leigh Orf. 2016. Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations. ACM Transactions on Parallel Computing (TOPC) 3, 3 (2016), 15. Google ScholarDigital Library
Matteo Frigo, Charles E Leiserson, and Keith H Randall. 1998. The implementation of the Cilk-5 multithreaded language. In ACM Sigplan Notices, Vol. 33. ACM, 212--223. Google ScholarDigital Library
Guang R Gao, Thomas Sterling, Rick Stevens, Mark Hereld, and Weirong Zhu. 2007. Parallex: A study of a new parallel computation model. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International. IEEE, 1--6.Google ScholarCross Ref
Yuichi Inadomi, Tapasya Patki, Koji Inoue, Mutsumi Aoyagi, Barry Rountree, Martin Schulz, David Lowenthal, Yasutaka Wada, Keiichiro Fukazawa, Masatsugu Ueda, and others. 2015. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 78. Google ScholarDigital Library
Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. Hpx: A task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 6. Google ScholarDigital Library
Laxmikant V Kale and Sanjeev Krishnan. 1993. CHARM++: a portable concurrent object oriented system based on C++. In ACM Sigplan Notices, Vol. 28. ACM, 91--108. Google ScholarDigital Library
Jonathan Lifflander, Sriram Krishnamoorthy, and Laxmikant V Kale. 2012. Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. ACM, 137--148. Google ScholarDigital Library
Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A run-time system for power-constrained HPC applications. In International Conference on High Performance Computing. Springer, 394--408.Google ScholarCross Ref
Oscar H Mondragon, Patrick G Bridges, Scott Levy, Kurt B Ferreira, and Patrick Widener. 2016. Understanding performance interference in next-generation HPC systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 33. Google ScholarDigital Library
Allan Porterfield, Rob Fowler, Sridutt Bhalachandra, Barry Rountree, Diptorup Deb, and Rob Lewis. 2015. Application runtime variability and power optimization for exascale computers. In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers. ACM, 3. Google ScholarDigital Library
Sangmin Seo, Abdelhalim Amer, Pavan Balaji, P Beckman, C Bordage, G Bosilca, A Brooks, A CastellAs, D Genet, T Herault, and others. 2015. Argobots: A lightweight low-level threading/tasking framework. (2015).Google Scholar
Akshay Venkatesh, Abhinav Vishnu, Khaled Hamidouche, Nathan Tallent, Dhabaleswar DK Panda, Darren Kerbyson, and Adolfy Hoisie. 2015. A case for application-oblivious energy-efficient MPI runtime. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 29. Google ScholarDigital Library
Jeremiah J Wilke. 2015. Dharma: Distributed asynchronous adaptive resilient management of applications. Technical Report. Sandia National Laboratories (SNL-CA), Livermore, CA (United States).Google Scholar

Index Terms

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes
1. General and reference
  1. Cross-computing tools and techniques
    1. Evaluation
    2. Performance
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Multithreading

Recommendations

D³: discarding dispensable data for efficient live migration of virtual machines
RACS '14: Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems

Virtualization, one of the most actively adopted technologies today in computer systems, is increasingly widening its range of applications. As multiple virtual machines are concurrently executed on a physical machine, they compete for physical ...
Read More
An early performance evaluation of many integrated core architecture based SGI rackable computing system
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Intel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core ...
Read More
Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

Asymmetric multicore processors (AMPs) consist of cores with the same ISA (instruction-set architecture), but different microarchitectural features, speed, and power consumption. Because cores with more complex features and higher speed typically use ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017
June 2017
62 pages
ISBN:9781450350860
DOI:10.1145/3095770

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Operating Systems
Performance Evaluation
Runtime Environments
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate58of169submissions,34%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 114
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes

ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017

ABSTRACT

References

Cited By

Index Terms

Recommendations

D³: discarding dispensable data for efficient live migration of virtual machines

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes

ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017

ABSTRACT

References

Cited By

Index Terms

Recommendations

D3: discarding dispensable data for efficient live migration of virtual machines

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

D³: discarding dispensable data for efficient live migration of virtual machines