skip to main content
10.1145/3372799.3394370acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Performance Optimization on big.LITTLE Architectures: A Memory-latency Aware Approach

Published: 16 June 2020 Publication History

Abstract

The energy demands of modern mobile devices have driven a trend towards heterogeneous multi-core systems which include various types of core tuned for performance or energy efficiency, offering a rich optimization space for software. On such systems, data coherency between cores is automatically ensured by an interconnect between processors. On some chip designs the performance of this interconnect, and by extension of the entire CPU cluster, is highly dependent on the software's memory access characteristics and on the set of frequencies of each CPU core. Existing frequency scaling mechanisms in operating systems use a simple load-based heuristic to tune CPU frequencies, and so fail to achieve a holistically good configuration across such diverse clusters. We propose a new adaptive governor to solve this problem, which uses a simple trained hardware model of cache interconnect characteristics, along with real-time hardware monitors, to continually adjust core frequencies to maximize system performance. We evaluate our governor on the Exynos5422 SoC, as used in the Samsung Galaxy S5, across a range of standard benchmarks. This shows that our approach achieves a speedup of up to 40%, and a 70% energy saving, including a 30% speedup in common mobile applications such as video decoding and web browsing.

Supplementary Material

MP4 File (3372799.3394370.mp4)
Presentation Video

References

[1]
Scott Allyn. 2020. Jellyfish video. http://jell.yfish.us/media/jellyfish-3-mbps-hd-h264.mkv Retrieved April, 2020 from
[2]
Apple WebKit Team. 2018. Speedometer2.0. https://browserbench.org/Speedometer2.0/ Retrieved April, 2020 from
[3]
Karunakar R. Basireddy, Amit Kumar Singh, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2019. AdaMD: Adaptive Mapping and DVFS for Energy-efficient Heterogeneous Multi-cores. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. PP, X (2019), 1--1. https://doi.org/10.1109/TCAD.2019.2935065
[4]
Cristiana Bolchini, Stefano Cherubin, Gianluca C. Durelli, Simone Libutti, Antonio Miele, and Marco D. Santambrogio. 2016. A runtime controller for OpenCL applications on heterogeneous system architectures. CEUR Workshop Proceedings, Vol. 1697, February (2016), 29--35. https://doi.org/10.1145/3199610.3199614
[5]
Anastasiia Butko, Florent Bruguier, Abdoulaye Gamatie, Gilles Sassatelli, David Novo, Lionel Torres, and Michel Robert. 2016. Full-System Simulation of big.LITTLE Multicore Architecture for Performance and Energy Exploration. In 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC). IEEE, Lyon, France, 201--208. https://doi.org/10.1109/MCSoC.2016.20
[6]
Christopher Celio. 2009. Characterizing Multi-Core Processors Using Micro-benchmarks. https://github.com/ucb-bar/ccbench/wiki Retrieved April, 2020 from
[7]
Chrome DevTools Team. 2020. puppeteer. https://pptr.dev/ Retrieved April, 2020 from
[8]
Bryan Donyanavard, Tiago Mü ck, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: Runtime Task Allocation for Energy Efficient Heterogeneous Many-cores. In Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis - CODES '16. ACM Press, New York, New York, USA, 1--10. https://doi.org/10.1145/2968456.2968459
[9]
Fernando A Endo, Damien Couroussé, and Henri-pierre Charles. 2015. Micro-architectural simulation of embedded core heterogeneity with gem5 and McPAT. In Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation Methods and Tools - RAPIDO '15. ACM Press, New York, New York, USA, 1--6. https://doi.org/10.1145/2693433.2693440
[10]
Anthony Gutierrez, Ronald G. Dreslinski, Thomas F. Wenisch, Trevor Mudge, Ali Saidi, Chris Emmons, and Nigel Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 81--90. https://doi.org/10.1109/IISWC.2011.6114205
[11]
HardKernel. 2014. Odroid-XU3. http://www.hardkernel.com/ Retrieved April, 2020 from
[12]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, Vol. 34, 4 (2006), 1--17. https://doi.org/10.1145/1186736.1186737
[13]
Arm Holdings. 2013. White paper: big.LITTLE Technology: The Future of Mobile.
[14]
Arm Holdings. 2020 a. CCI-400. https://www.arm.com/products/silicon-ip-system/corelink-interconnect/cci-400 Retrieved April, 2020 from
[15]
Arm Holdings. 2020 b. Cortex-A15. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a15 Retrieved April, 2020 from
[16]
Arm Holdings. 2020 c. Cortex-A7. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a7 Retrieved April, 2020 from
[17]
Aamer Jaleel. 2010. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/workload/SPECanalysis.pdf Retrieved April, 2020 from
[18]
Piotr Kocanda and Andrzej Kos. 2015. Static and dynamic energy losses vs. temperature in different CMOS technologies. In 2015 22nd International Conference Mixed Design of Integrated Circuits & Systems (MIXDES). IEEE, 446--449. https://doi.org/10.1109/MIXDES.2015.7208560
[19]
Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The ondemand governor: past, present and future. Proceedings of the Linux Symposium, 215--230. https://www.kernel.org/doc/ols/2006/ols2006v2-pages-223--238.pdf Retrieved April, 2020 from
[20]
Basireddy Karunakar Reddy, Geoff V. Merrett, Bashir M. Al-Hashimi, and Amit Kumar Singh. 2018. Online concurrent workload classification for multi-core energy management. Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, 2018, Vol. 2018-January (2018), 621--624. https://doi.org/10.23919/DATE.2018.8342084
[21]
Basireddy Karunakar Reddy, Matthew J. Walker, Domenico Balsamo, Stephan Diestelhorst, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2017. Empirical CPU power modelling and estimation in the gem5 simulator. 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation, PATMOS 2017, Vol. 2017-Janua (2017), 1--8. https://doi.org/10.1109/PATMOS.2017.8106988
[22]
Jie Ren, Xiaoming Wang, Jianbin Fang, Yansong Feng, Dongxiao Zhu, Zhunchen Luo, Jie Zheng, and Zheng Wang. 2018. Proteus: Network-aware Web Browsing on Heterogeneous Mobile Systems. In Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies. ACM, New York, NY, USA, 379--392. https://doi.org/10.1145/3281411.3281422
[23]
Samsung. 2014. Exynos 5 Octa (5422). https://www.samsung.com/semiconductor/minisite/exynos/products/mobileprocessor/exynos-5-octa-5422/ Retrieved April, 2020 from
[24]
Amit Kumar Singh, Alok Prakash, Karunakar Reddy Basireddy, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2017. Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs. ACM Transactions on Embedded Computing Systems, Vol. 16, 5s (2017), 1--22. https://doi.org/10.1145/3126548
[25]
E. Del Sozzo, G. C. Durelli, E. M. G. Trainiti, A. Miele, M. D. Santambrogio, and C. Bolchini. 2016. Workload-Aware Power Optimization Strategy for Asymmetric Multiprocessors. In Proceedings of the 2016 Conference on Design, Automation and Test in Europe (DATE '16). EDA Consortium, San Jose, CA, USA, 531--534.
[26]
Ashley Stevens. 2013. White paper: Introduction to AMBA® 4 ACE? and big.LITTLE? Processing Technology.
[27]
Ben Taylor, Vicent Sanz Marco, and Zheng Wang. 2017. Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems - LCTES 2017, Vol. Part F1286. ACM Press, New York, New York, USA, 11--20. https://doi.org/10.1145/3078633.3081040
[28]
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, and Peizhao Zhang. 2019. Machine Learning at Facebook: Understanding Inference at the Edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 331--344. https://doi.org/10.1109/HPCA.2019.00048

Cited By

View all
  • (2024)Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery NetworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347541235:12(2449-2462)Online publication date: Dec-2024
  • (2024)CStream: Parallel Data Stream Compression on Multicore Edge DevicesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338686236:11(5889-5904)Online publication date: Nov-2024
  • (2023)Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00078(950-964)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. Performance Optimization on big.LITTLE Architectures: A Memory-latency Aware Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
    June 2020
    163 pages
    ISBN:9781450370943
    DOI:10.1145/3372799
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 June 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. asymmetric multi-processing
    2. dvfs governor
    3. operating systems
    4. snooping latency

    Qualifiers

    • Research-article

    Conference

    LCTES '20

    Acceptance Rates

    Overall Acceptance Rate 116 of 438 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery NetworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347541235:12(2449-2462)Online publication date: Dec-2024
    • (2024)CStream: Parallel Data Stream Compression on Multicore Edge DevicesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338686236:11(5889-5904)Online publication date: Nov-2024
    • (2023)Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00078(950-964)Online publication date: Apr-2023
    • (2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
    • (2022)A computational resources scheduling algorithm in edge cloud computing: from the energy efficiency of users’ perspectiveThe Journal of Supercomputing10.1007/s11227-021-04146-z78:7(9355-9376)Online publication date: 17-Jan-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media