skip to main content
10.1145/3095770.3095778acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes

Published:27 June 2017Publication History

ABSTRACT

It is generally accepted that future supercomputing workloads will consist of application compositions made up of coupled simulations as well as in-situ analytics. While these components have commonly been deployed using a space-shared configuration to minimize cross-workload interference, it is likely that not all the workload components will require the full processing capacity of the CPU cores they are running on. For instance, an analytics workload often does not need to run continuously and is not generally considered to have the same priority as simulation codes. In a space-shared configuration, this arrangement would lead to wasted resources due to periodically idle CPUs, which are generally unusable by traditional bulk synchronous parallel (BSP) applications. As a result, many have started to reconsider task based runtimes owing to their ability to dynamically utilize available CPU resources. While the dynamic behavior of task-based runtimes had historically been targeted at application induced load imbalances, the same basic situation arises due to the asymmetric performance resulting from time sharing a CPU with other workloads. Many have assumed that task based runtimes would be able to adapt easily to these new environments without significant modifications. In this paper, we present a preliminary set of experiments that measured how well asynchronous task-based runtimes are able to respond to load imbalances caused by the asymmetric performance of time shared CPUs. Our work focuses on a set of experiments using benchmarks running on both Charm++ and HPX-5 in the presence of a competing workload. The results show that while these runtimes are better suited at handling the scenarios than traditional runtimes, they are not yet capable of effectively addressing anything other than a fairly minimal level of CPU contention.

References

  1. 2014. The Qthread Library. http://www.cs.sandia.gov/qthreads/. (2014). Accessed April 5.Google ScholarGoogle Scholar
  2. 2017. Charm++ Mini-apps. http://charmplusplus.org/benchmarks/. (2017). Accessed April 5.Google ScholarGoogle Scholar
  3. 2017. HPX-5 Applications. https://hpx.crest.iu.edu/applications. (2017). Accessed April 5.Google ScholarGoogle Scholar
  4. Bilge Acun and Laxmikant V Kale. 2016. Mitigating Processor Variation through Dynamic Load Balancing. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1073--1076.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bilge Acun, Phil Miller, and Laxmikant V Kale. 2016. Variation among processors under turbo boost in hpc systems. In Proceedings of the 2016 International Conference on Supercomputing. ACM, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In Proceedings of the international conference on high performance computing, networking, storage and analysis. IEEE Computer Society Press, 66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Abhinav Bhatele, Kathryn Mohror, Steven H Langer, and Katherine E Isaacs. 2013. There goes the neighborhood: performance degradation due to nearby jobs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bradford L Chamberlain, David Callahan, and Hans P Zima. 2007. Parallel programmability and the chapel language. The International Journal of High Performance Computing Applications 21, 3 (2007), 291--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthieu Dorier, Gabriel Antoniu, Franck Cappello, Marc Snir, Robert Sisneros, Orcun Yildiz, Shadi Ibrahim, Tom Peterka, and Leigh Orf. 2016. Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations. ACM Transactions on Parallel Computing (TOPC) 3, 3 (2016), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Matteo Frigo, Charles E Leiserson, and Keith H Randall. 1998. The implementation of the Cilk-5 multithreaded language. In ACM Sigplan Notices, Vol. 33. ACM, 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guang R Gao, Thomas Sterling, Rick Stevens, Mark Hereld, and Weirong Zhu. 2007. Parallex: A study of a new parallel computation model. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yuichi Inadomi, Tapasya Patki, Koji Inoue, Mutsumi Aoyagi, Barry Rountree, Martin Schulz, David Lowenthal, Yasutaka Wada, Keiichiro Fukazawa, Masatsugu Ueda, and others. 2015. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. Hpx: A task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Laxmikant V Kale and Sanjeev Krishnan. 1993. CHARM++: a portable concurrent object oriented system based on C++. In ACM Sigplan Notices, Vol. 28. ACM, 91--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jonathan Lifflander, Sriram Krishnamoorthy, and Laxmikant V Kale. 2012. Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing. ACM, 137--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A run-time system for power-constrained HPC applications. In International Conference on High Performance Computing. Springer, 394--408.Google ScholarGoogle ScholarCross RefCross Ref
  17. Oscar H Mondragon, Patrick G Bridges, Scott Levy, Kurt B Ferreira, and Patrick Widener. 2016. Understanding performance interference in next-generation HPC systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Allan Porterfield, Rob Fowler, Sridutt Bhalachandra, Barry Rountree, Diptorup Deb, and Rob Lewis. 2015. Application runtime variability and power optimization for exascale computers. In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers. ACM, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sangmin Seo, Abdelhalim Amer, Pavan Balaji, P Beckman, C Bordage, G Bosilca, A Brooks, A CastellAs, D Genet, T Herault, and others. 2015. Argobots: A lightweight low-level threading/tasking framework. (2015).Google ScholarGoogle Scholar
  20. Akshay Venkatesh, Abhinav Vishnu, Khaled Hamidouche, Nathan Tallent, Dhabaleswar DK Panda, Darren Kerbyson, and Adolfy Hoisie. 2015. A case for application-oblivious energy-efficient MPI runtime. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jeremiah J Wilke. 2015. Dharma: Distributed asynchronous adaptive resilient management of applications. Technical Report. Sandia National Laboratories (SNL-CA), Livermore, CA (United States).Google ScholarGoogle Scholar

Index Terms

  1. The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ROSS '17: Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017
          June 2017
          62 pages
          ISBN:9781450350860
          DOI:10.1145/3095770

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 June 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate58of169submissions,34%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader