research-article

Transparently Space Sharing a Multicore Among Multiple Processes

Authors:
Timothy Creech

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Rajeev Barua

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

Authors Info & Claims

ACM Transactions on Parallel Computing Volume 3 Issue 3Article No.: 17pp 1–35https://doi.org/10.1145/3001910

Published:07 November 2016Publication History

ACM Transactions on Parallel Computing

Abstract

As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads used as they run, enable sophisticated and flexible resource management. Although many existing applications parallelized for SMPs with parallel runtimes are in fact already malleable, deployed runtime environments provide no interface nor any strategy for intelligently allocating hardware threads or even preventing oversubscription. Prior research methods either depend on profiling applications ahead of time to make good decisions about allocations or do not account for process efficiency at all, leading to poor performance. None of these prior methods have been adapted widely in practice. This article presents the Scheduling and Allocation with Feedback (SCAF) system: a drop-in runtime solution that supports existing malleable applications in making intelligent allocation decisions based on observed efficiency without any changes to semantics, program modification, offline profiling, or even recompilation. Our existing implementation can control most unmodified OpenMP applications. Other malleable threading libraries can also easily be supported with small modifications without requiring application modification or recompilation.

In this work, we present the SCAF daemon and a SCAF-aware port of the GNU OpenMP runtime. We present a new technique for estimating process efficiency purely at runtime using available hardware counters and demonstrate its effectiveness in aiding allocation decisions.

We evaluated SCAF using NAS NPB parallel benchmarks on five commodity parallel platforms, enumerating architectural features and their effects on our scheme. We measured the benefit of SCAF in terms of sum of speedups improvement (a common metric for multiprogrammed environments) when running all benchmark pairs concurrently compared to equipartitioning—the best existing competing scheme in the literature. We found that SCAF improves on equipartitioning on four out of five machines, showing a mean improvement factor in sum of speedups of 1.04 to 1.11x for benchmark pairs, depending on the machine, and 1.09x on average.

Since we are not aware of any widely available tool for equipartitioning, we also compare SCAF against multiprogramming using unmodified OpenMP, which is the only environment available to end users today. SCAF improves on the unmodified OpenMP runtimes for all five machines, with a mean improvement of 1.08 to 2.07x, depending on the machine, and 1.59x on average.

References

T. E. Anderson. 1990. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems 1, 1, 6--16. DOI:http://dx.doi.org/10.1109/71.80120 Google ScholarDigital Library
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. 1992. Scheduler activations: Effective kernel support for the user-level management of parallelism. ACM Transactions on Computer Systems 10, 1, 53--79. Google ScholarDigital Library
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98). ACM, New York, 119--129. DOI:http://dx.doi.org/10.1145/277651.277678 Google ScholarDigital Library
Robert D. Blumofe and Dionisios Papadopoulos. 1998. Hood: A User-Level Threads Library for Multiprogrammed Multiprocessors. Technical Report. University of Texas, Austin.Google Scholar
Su-Hui Chiang, Rajesh K. Mansharamani, and Mary K. Vernon. 1994. Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies. SIGMETRICS Performance Evaluation Review 22, 1, 33--44. DOI:http://dx.doi.org/10.1145/183019.183023 Google ScholarDigital Library
A. B. Downey. 1997. A Model for Speedup of Parallel Programs. Computer Science Division, University of California, Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-933.pdf. Google ScholarDigital Library
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI’98). Google ScholarDigital Library
Mary W. Hall and Margaret Martonosi. 1998. Adaptive parallelism in compiler-parallelized code. Concurrency: Practice and Experience 10, 14, 1235--1250. DOI:http://dx.doi.org/10.1002/(SICI)1096-9128(19981210)10:14<1235::AID-CPE373>3.0.CO;2-ZGoogle ScholarCross Ref
Jan Hungershöfer, Achim Streit, and Jens-Michael Wierum. 2001. Efficient Resource Management for Malleable Applications. Technical Report TR-003-01. Paderson Center for Parallel Computing.Google Scholar
L. V. Kale, S. Kumar, and J. DeSouza. 2002. A malleable-job system for timeshared parallel machines. In Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid. IEEE, Los Alamitos, CA, 230. DOI:http://dx.doi.org/10.1109/CCGRID.2002.1017131 Google ScholarDigital Library
I. H. Kazi and D. J. Lilja. 2000. A comprehensive dynamic processor allocation scheme for multiprogrammed multiprocessor systems. In Proceedings of the 2000 International Conference on Parallel Processing IEEE, Los Alamitos, CA, 153--161. DOI:http://dx.doi.org/10.1109/ICPP.2000.876103 Google ScholarDigital Library
Daniel James McFarland. 2011. Exploiting Malleable Parallelism on Multicore Systems. University Libraries, Virginia Polytechnic Institute and State University, Blacksburg, VA. http://scholar.lib.vt.edu/theses/available/etd-06292011-130247Google Scholar
OMQ Community. 2012. ZeroMQ. Retrieved October 11, 2016, from http://zero.mq/.Google Scholar
Heidi Pan, Benjamin Hindman, and Krste Asanović. 2010. Composing parallel software efficiently with lithe. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, 376--387. DOI:http://dx.doi.org/10.1145/1806596.1806639 Google ScholarDigital Library
PAPI Team. 2012. PAPI: Performance Application Programming Interface Retrieved October 11, 2016, from http://icl.cs.utk.edu/papi/.Google Scholar
Karl Pearson. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50, 302, 157--175.Google ScholarCross Ref
Arun Raman, Ayal Zaks, Jae W. Lee, and David I. August. 2012. Parcae: A system for flexible parallel execution. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). Google ScholarDigital Library
Jan H. Schonherr, Jan Richling, and Hans-Ulrich Heiss. 2010. Dynamic teams in OpenMP. In Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’10). IEEE, Los Alamitos, CA, 231--237. DOI:http://dx.doi.org/10.1109/SBAC-PAD.2010.36 Google ScholarDigital Library
Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2014. Adaptive, efficient, parallel execution of parallel programs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). Google ScholarDigital Library
R. Sudarsan and C. J. Ribbens. 2007. ReSHAPE: A framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment. In Proceedings of the International Conference on Parallel Processing (ICPP’07). 44. DOI:http://dx.doi.org/10.1109/ICPP.2007.73 Google ScholarDigital Library
M. Aater Suleman, Moinuddin K. Qureshi, and Yale N. Patt. 2008. Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, 277--286. DOI:http://dx.doi.org/10.1145/1346281.1346317 Google ScholarDigital Library
A. Tucker and A. Gupta. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). ACM, New York, 159--166. DOI:http://dx.doi.org/10.1145/74850.74866 Google ScholarDigital Library
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1, 6, 80--83.Google ScholarCross Ref

Index Terms

Transparently Space Sharing a Multicore Among Multiple Processes
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Multithreading

Recommendations

Efficient multiprogramming for multicores with SCAF
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads ...
Read More
Composing parallel software efficiently with lithe
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation

Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and ...
Read More
Composing parallel software efficiently with lithe
PLDI '10

Applications composed of multiple parallel libraries perform poorly when those libraries interfere with one another by obliviously using the same physical cores, leading to destructive resource oversubscription. This paper presents the design and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Parallel Computing Volume 3, Issue 3
December 2016
145 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3012407
Editor:
Phillip B. Gibbons
Carnegie Mellon University, Pittsburgh, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2016
- Accepted: 1 September 2016
- Revised: 1 January 2016
- Received: 1 October 2014
Published in topc Volume 3, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multithreaded programming
oversubscription
parallelism
resource management
user-level scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 180
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.