skip to main content
10.1145/2854038.2854047acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Portable performance on asymmetric multicore processors

Published: 29 February 2016 Publication History

Abstract

Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load balance, and the critical path. Applying these criteria effectively is challenging especially for complex and non-scalable multithreaded applications. We demonstrate that runtimes for managed languages, which are now ubiquitous, provide a unique opportunity to abstract over AMP complexity and inform scheduling with rich semantics such as thread priorities, locks, and parallelism— information not directly available to the hardware, OS, or application. We present the WASH AMP scheduler, which (1) automatically identifies and accelerates critical threads in concurrent, but non-scalable applications; (2) respects thread priorities; (3) considers core availability and thread sensitivity; and (4) proportionally schedules threads on big and small cores to optimize performance and energy. We introduce new dynamic analyses that identify critical threads and classify applications as sequential, scalable, or non-scalable. Compared to prior work, WASH improves performance by 20% and energy by 9% or more on frequency-scaled AMP hardware (not simulation). Performance advantages grow to 27% when asymmetry increases. Performance advantages are robust to a complex multithreaded adversary independently scheduled by the OS. WASH effectively identifies and optimizes a wider class of workloads than prior work.

References

[1]
Android. Bionic platform, 2014. URL https://github.com/ android/platform_bionic.
[2]
D. F. Bacon, R. B. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for Java. In PLDI’98, pages 258–268, 1998.
[3]
M. Becchi and P. Crowley. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers, pages 29–40, 2006. ISBN 1-59593-302-6.
[4]
S. M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA’06, pages 169–190, Oct. 2006.
[5]
T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In ISCA’12, pages 225–236, 2012.
[6]
K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ISCA’12, pages 213–224, 2012.
[7]
K. V. Craeynest, S. Akram, W. Heirman, A. Jaleel, and L. Eeckhout. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. PACT’13, pages 177–187, 2013.
[8]
K. Du Bois, S. Eyerman, J. B. Sartor, and L. Eeckhout. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In ISCA’13, pages 511–522, Jun. 2013.
[9]
H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In ASPLOS, pages 319–332, 2011.
[10]
M. Hill and M. Marty. Amdahl’s law in the multicore era. Computer, 41(7):33–38, 2008.
[11]
J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Bottleneck identification and scheduling in multithreaded applications. In ASPLOS 2012, pages 223–234, 2012.
[12]
J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Utility-based acceleration of multithreaded application on asymmetric CMPs. In ISCA’13, pages 154–165, 2013.
[13]
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 81–92, 2003.
[14]
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In ISCA’04, pages 64–75, 2004.
[15]
T. Li, D. P. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC’07, pages 1–11, 2007.
[16]
T. Li, D. Baumberger, and S. Hahn. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In PPoPP’09, pages 65–74, 2009.
[17]
J. C. Mogul, J. Mudigonda, N. L. Binkert, P. Ranganathan, and V. Talwar. Using asymmetric single-ISA CMPs to save energy on operating systems. Micro, 28(3):26–41, 2008.
[18]
I. Molnar. Modular Scheduler Core and Completely Fair Scheduler {CFS}. http://lwn.net/Articles/230501/, Apr. 2007.
[19]
Qualcomm. Snapdragon 810 processors, 2014. URL https://www. qualcomm.com/products/snapdragon/processors/810.
[20]
J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. JPDC, 71(1):114– 131, 2011.
[21]
J. C. Saez, A. Fedorova, D. Koufaty, and M. Prieto. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM TCM, 30(2):6:1–38, 2012.
[22]
M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, pages 253–264, 2009.
[23]
The Jikes RVM Research Group. Jikes Open-Source Research Virtual Machine, 2011. URL http://www.jikesrvm.org.

Cited By

View all
  • (2024)Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689152(39-52)Online publication date: 16-Sep-2024
  • (2024)SPOT: Structure Patching and Overlap Tweaking for Effective Pipelining in Privacy-Preserving MLaaS with Tiny Clients2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00124(1318-1329)Online publication date: 23-Jul-2024
  • (2023)JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy EfficiencyProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605586(828-838)Online publication date: 7-Aug-2023
  • Show More Cited By

Index Terms

  1. Portable performance on asymmetric multicore processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization
    February 2016
    283 pages
    ISBN:9781450337786
    DOI:10.1145/2854038
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE-CS: Computer Society

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 February 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Asymmetric
    2. Energy
    3. Heterogeneous
    4. Managed Software
    5. Multicore
    6. Performance
    7. Scheduling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CGO '16

    Acceptance Rates

    CGO '16 Paper Acceptance Rate 25 of 108 submissions, 23%;
    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689152(39-52)Online publication date: 16-Sep-2024
    • (2024)SPOT: Structure Patching and Overlap Tweaking for Effective Pipelining in Privacy-Preserving MLaaS with Tiny Clients2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00124(1318-1329)Online publication date: 23-Jul-2024
    • (2023)JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy EfficiencyProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605586(828-838)Online publication date: 7-Aug-2023
    • (2023)Performance evaluation on work-stealing featured parallel programs on asymmetric performance multicore processorsArray10.1016/j.array.2023.10031119(100311)Online publication date: Sep-2023
    • (2023)A scheduling algorithm based on critical factors for heterogeneous multicore processorsConcurrency and Computation: Practice and Experience10.1002/cpe.796936:7Online publication date: 20-Nov-2023
    • (2022)An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous NodesACM Transactions on Computer Systems10.1145/350522439:1-4(1-30)Online publication date: 5-Jul-2022
    • (2022)STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00043(326-335)Online publication date: Nov-2022
    • (2021)Xar-trekProceedings of the 22nd International Middleware Conference10.1145/3464298.3493388(104-118)Online publication date: 6-Dec-2021
    • (2021)AsyMoProceedings of the 27th Annual International Conference on Mobile Computing and Networking10.1145/3447993.3448625(215-228)Online publication date: 25-Oct-2021
    • (2021)Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore ProcessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.304527932:5(1224-1237)Online publication date: 1-May-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media