research-article

Portable performance on asymmetric multicore processors

Authors:

Stephen M. Blackburn,

Kathryn S. McKinleyAuthors Info & Claims

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

Pages 24 - 35

https://doi.org/10.1145/2854038.2854047

Published: 29 February 2016 Publication History

Abstract

Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load balance, and the critical path. Applying these criteria effectively is challenging especially for complex and non-scalable multithreaded applications. We demonstrate that runtimes for managed languages, which are now ubiquitous, provide a unique opportunity to abstract over AMP complexity and inform scheduling with rich semantics such as thread priorities, locks, and parallelism— information not directly available to the hardware, OS, or application. We present the WASH AMP scheduler, which (1) automatically identifies and accelerates critical threads in concurrent, but non-scalable applications; (2) respects thread priorities; (3) considers core availability and thread sensitivity; and (4) proportionally schedules threads on big and small cores to optimize performance and energy. We introduce new dynamic analyses that identify critical threads and classify applications as sequential, scalable, or non-scalable. Compared to prior work, WASH improves performance by 20% and energy by 9% or more on frequency-scaled AMP hardware (not simulation). Performance advantages grow to 27% when asymmetry increases. Performance advantages are robust to a complex multithreaded adversary independently scheduled by the OS. WASH effectively identifies and optimizes a wider class of workloads than prior work.

References

[1]

Android. Bionic platform, 2014. URL https://github.com/ android/platform_bionic.

[2]

D. F. Bacon, R. B. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for Java. In PLDI’98, pages 258–268, 1998.

Digital Library

[3]

M. Becchi and P. Crowley. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers, pages 29–40, 2006. ISBN 1-59593-302-6.

Digital Library

[4]

S. M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA’06, pages 169–190, Oct. 2006.

Digital Library

[5]

T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In ISCA’12, pages 225–236, 2012.

Digital Library

[6]

K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ISCA’12, pages 213–224, 2012.

Digital Library

[7]

K. V. Craeynest, S. Akram, W. Heirman, A. Jaleel, and L. Eeckhout. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. PACT’13, pages 177–187, 2013.

Digital Library

[8]

K. Du Bois, S. Eyerman, J. B. Sartor, and L. Eeckhout. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In ISCA’13, pages 511–522, Jun. 2013.

Digital Library

[9]

H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In ASPLOS, pages 319–332, 2011.

Digital Library

[10]

M. Hill and M. Marty. Amdahl’s law in the multicore era. Computer, 41(7):33–38, 2008.

Digital Library

[11]

J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Bottleneck identification and scheduling in multithreaded applications. In ASPLOS 2012, pages 223–234, 2012.

Digital Library

[12]

J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Utility-based acceleration of multithreaded application on asymmetric CMPs. In ISCA’13, pages 154–165, 2013.

Digital Library

[13]

R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 81–92, 2003.

Digital Library

[14]

R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In ISCA’04, pages 64–75, 2004.

Digital Library

[15]

T. Li, D. P. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC’07, pages 1–11, 2007.

Digital Library

[16]

T. Li, D. Baumberger, and S. Hahn. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In PPoPP’09, pages 65–74, 2009.

Digital Library

[17]

J. C. Mogul, J. Mudigonda, N. L. Binkert, P. Ranganathan, and V. Talwar. Using asymmetric single-ISA CMPs to save energy on operating systems. Micro, 28(3):26–41, 2008.

Digital Library

[18]

I. Molnar. Modular Scheduler Core and Completely Fair Scheduler {CFS}. http://lwn.net/Articles/230501/, Apr. 2007.

[19]

Qualcomm. Snapdragon 810 processors, 2014. URL https://www. qualcomm.com/products/snapdragon/processors/810.

[20]

J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. JPDC, 71(1):114– 131, 2011.

Digital Library

[21]

J. C. Saez, A. Fedorova, D. Koufaty, and M. Prieto. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM TCM, 30(2):6:1–38, 2012.

Digital Library

[22]

M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, pages 253–264, 2009.

Digital Library

[23]

The Jikes RVM Research Group. Jikes Open-Source Research Virtual Machine, 2011. URL http://www.jikesrvm.org.

Cited By

Heerekar BPhilippidis CChuang HOlivier PBarbalace ARavindran B(2024)Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689152(39-52)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689152
Xu XZhang QNing RXin CWu H(2024)SPOT: Structure Patching and Overlap Tweaking for Effective Pipelining in Privacy-Preserving MLaaS with Tiny Clients2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00124(1318-1329)Online publication date: 23-Jul-2024
https://doi.org/10.1109/ICDCS60910.2024.00124
Chen JManivannan MGoel BPericàs M(2023)JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy EfficiencyProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605586(828-838)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605586
Show More Cited By

Index Terms

Portable performance on asymmetric multicore processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors

Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for ...
The Impact of Dynamically Heterogeneous Multicore Processors on Thread Scheduling

Although most current multicore processors are homogeneous, microarchitects are now proposing heterogeneous core implementations, including systems in which heterogeneity is introduced at runtime. This article shows that operating system schedulers must ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

February 2016

283 pages

ISBN:9781450337786

DOI:10.1145/2854038

General Chair:
Bjoern Franke
University of Edinburgh, UK
,
Program Chairs:
Youfeng Wu
Intel, USA
,
Fabrice Rastello
Inria, France

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National High Technology Research and Development Program of China
China Postdoctoral Science Foundation
National Natural Science Foundation of China
National Science Foundation of the United States

Conference

CGO '16

Sponsor:

CGO '16: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization

March 12 - 18, 2016

Barcelona, Spain

Acceptance Rates

CGO '16 Paper Acceptance Rate 25 of 108 submissions, 23%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
426
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Heerekar BPhilippidis CChuang HOlivier PBarbalace ARavindran B(2024)Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689152(39-52)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689152
Xu XZhang QNing RXin CWu H(2024)SPOT: Structure Patching and Overlap Tweaking for Effective Pipelining in Privacy-Preserving MLaaS with Tiny Clients2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00124(1318-1329)Online publication date: 23-Jul-2024
https://doi.org/10.1109/ICDCS60910.2024.00124
Chen JManivannan MGoel BPericàs M(2023)JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy EfficiencyProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605586(828-838)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605586
Adnan (2023)Performance evaluation on work-stealing featured parallel programs on asymmetric performance multicore processorsArray10.1016/j.array.2023.10031119(100311)Online publication date: Sep-2023
https://doi.org/10.1016/j.array.2023.100311
Li CLin ZTian LZhang B(2023)A scheduling algorithm based on critical factors for heterogeneous multicore processorsConcurrency and Computation: Practice and Experience10.1002/cpe.796936:7Online publication date: 20-Nov-2023
https://doi.org/10.1002/cpe.7969
Lyerly RBilbao CMin CRossbach CRavindran B(2022)An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous NodesACM Transactions on Computer Systems10.1145/350522439:1-4(1-30)Online publication date: 5-Jul-2022
https://dl.acm.org/doi/10.1145/3505224
Chen JManivannan MGoel BAbduljabbar MPericas M(2022)STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00043(326-335)Online publication date: Nov-2022
https://doi.org/10.1109/SBAC-PAD55451.2022.00043
Horta EChuang HVSathish NPhilippidis CBarbalace AOlivier PRavindran BZhang KGherbi AVenkatasubramanian NVeiga L(2021)Xar-trekProceedings of the 22nd International Middleware Conference10.1145/3464298.3493388(104-118)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3464298.3493388
Wang MDing SCao TLiu YXu F(2021)AsyMoProceedings of the 27th Annual International Conference on Mobile Computing and Networking10.1145/3447993.3448625(215-228)Online publication date: 25-Oct-2021
https://dl.acm.org/doi/10.1145/3447993.3448625
Yu TZhong RJanjic VPetoumenos PZhai JLeather HThomson J(2021)Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore ProcessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304527932:5(1224-1237)Online publication date: 1-May-2021
https://doi.org/10.1109/TPDS.2020.3045279
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents