research-article

Performance Optimization on big.LITTLE Architectures: A Memory-latency Aware Approach

Authors:

Barry PorterAuthors Info & Claims

LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

Pages 51 - 61

https://doi.org/10.1145/3372799.3394370

Published: 16 June 2020 Publication History

Abstract

The energy demands of modern mobile devices have driven a trend towards heterogeneous multi-core systems which include various types of core tuned for performance or energy efficiency, offering a rich optimization space for software. On such systems, data coherency between cores is automatically ensured by an interconnect between processors. On some chip designs the performance of this interconnect, and by extension of the entire CPU cluster, is highly dependent on the software's memory access characteristics and on the set of frequencies of each CPU core. Existing frequency scaling mechanisms in operating systems use a simple load-based heuristic to tune CPU frequencies, and so fail to achieve a holistically good configuration across such diverse clusters. We propose a new adaptive governor to solve this problem, which uses a simple trained hardware model of cache interconnect characteristics, along with real-time hardware monitors, to continually adjust core frequencies to maximize system performance. We evaluate our governor on the Exynos5422 SoC, as used in the Samsung Galaxy S5, across a range of standard benchmarks. This shows that our approach achieves a speedup of up to 40%, and a 70% energy saving, including a 30% speedup in common mobile applications such as video decoding and web browsing.

Supplementary Material

MP4 File (3372799.3394370.mp4)

Presentation Video

Download
39.28 MB

References

[1]

Scott Allyn. 2020. Jellyfish video. http://jell.yfish.us/media/jellyfish-3-mbps-hd-h264.mkv Retrieved April, 2020 from

[2]

Apple WebKit Team. 2018. Speedometer2.0. https://browserbench.org/Speedometer2.0/ Retrieved April, 2020 from

[3]

Karunakar R. Basireddy, Amit Kumar Singh, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2019. AdaMD: Adaptive Mapping and DVFS for Energy-efficient Heterogeneous Multi-cores. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. PP, X (2019), 1--1. https://doi.org/10.1109/TCAD.2019.2935065

[4]

Cristiana Bolchini, Stefano Cherubin, Gianluca C. Durelli, Simone Libutti, Antonio Miele, and Marco D. Santambrogio. 2016. A runtime controller for OpenCL applications on heterogeneous system architectures. CEUR Workshop Proceedings, Vol. 1697, February (2016), 29--35. https://doi.org/10.1145/3199610.3199614

[5]

Anastasiia Butko, Florent Bruguier, Abdoulaye Gamatie, Gilles Sassatelli, David Novo, Lionel Torres, and Michel Robert. 2016. Full-System Simulation of big.LITTLE Multicore Architecture for Performance and Energy Exploration. In 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC). IEEE, Lyon, France, 201--208. https://doi.org/10.1109/MCSoC.2016.20

[6]

Christopher Celio. 2009. Characterizing Multi-Core Processors Using Micro-benchmarks. https://github.com/ucb-bar/ccbench/wiki Retrieved April, 2020 from

[7]

Chrome DevTools Team. 2020. puppeteer. https://pptr.dev/ Retrieved April, 2020 from

[8]

Bryan Donyanavard, Tiago Mü ck, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: Runtime Task Allocation for Energy Efficient Heterogeneous Many-cores. In Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis - CODES '16. ACM Press, New York, New York, USA, 1--10. https://doi.org/10.1145/2968456.2968459

Digital Library

[9]

Fernando A Endo, Damien Couroussé, and Henri-pierre Charles. 2015. Micro-architectural simulation of embedded core heterogeneity with gem5 and McPAT. In Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation Methods and Tools - RAPIDO '15. ACM Press, New York, New York, USA, 1--6. https://doi.org/10.1145/2693433.2693440

Digital Library

[10]

Anthony Gutierrez, Ronald G. Dreslinski, Thomas F. Wenisch, Trevor Mudge, Ali Saidi, Chris Emmons, and Nigel Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 81--90. https://doi.org/10.1109/IISWC.2011.6114205

Digital Library

[11]

HardKernel. 2014. Odroid-XU3. http://www.hardkernel.com/ Retrieved April, 2020 from

[12]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, Vol. 34, 4 (2006), 1--17. https://doi.org/10.1145/1186736.1186737

[13]

Arm Holdings. 2013. White paper: big.LITTLE Technology: The Future of Mobile.

[14]

Arm Holdings. 2020 a. CCI-400. https://www.arm.com/products/silicon-ip-system/corelink-interconnect/cci-400 Retrieved April, 2020 from

[15]

Arm Holdings. 2020 b. Cortex-A15. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a15 Retrieved April, 2020 from

[16]

Arm Holdings. 2020 c. Cortex-A7. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a7 Retrieved April, 2020 from

[17]

Aamer Jaleel. 2010. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/workload/SPECanalysis.pdf Retrieved April, 2020 from

[18]

Piotr Kocanda and Andrzej Kos. 2015. Static and dynamic energy losses vs. temperature in different CMOS technologies. In 2015 22nd International Conference Mixed Design of Integrated Circuits & Systems (MIXDES). IEEE, 446--449. https://doi.org/10.1109/MIXDES.2015.7208560

[19]

Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The ondemand governor: past, present and future. Proceedings of the Linux Symposium, 215--230. https://www.kernel.org/doc/ols/2006/ols2006v2-pages-223--238.pdf Retrieved April, 2020 from

[20]

Basireddy Karunakar Reddy, Geoff V. Merrett, Bashir M. Al-Hashimi, and Amit Kumar Singh. 2018. Online concurrent workload classification for multi-core energy management. Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, 2018, Vol. 2018-January (2018), 621--624. https://doi.org/10.23919/DATE.2018.8342084

[21]

Basireddy Karunakar Reddy, Matthew J. Walker, Domenico Balsamo, Stephan Diestelhorst, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2017. Empirical CPU power modelling and estimation in the gem5 simulator. 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation, PATMOS 2017, Vol. 2017-Janua (2017), 1--8. https://doi.org/10.1109/PATMOS.2017.8106988

[22]

Jie Ren, Xiaoming Wang, Jianbin Fang, Yansong Feng, Dongxiao Zhu, Zhunchen Luo, Jie Zheng, and Zheng Wang. 2018. Proteus: Network-aware Web Browsing on Heterogeneous Mobile Systems. In Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies. ACM, New York, NY, USA, 379--392. https://doi.org/10.1145/3281411.3281422

Digital Library

[23]

Samsung. 2014. Exynos 5 Octa (5422). https://www.samsung.com/semiconductor/minisite/exynos/products/mobileprocessor/exynos-5-octa-5422/ Retrieved April, 2020 from

[24]

Amit Kumar Singh, Alok Prakash, Karunakar Reddy Basireddy, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2017. Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs. ACM Transactions on Embedded Computing Systems, Vol. 16, 5s (2017), 1--22. https://doi.org/10.1145/3126548

Digital Library

[25]

E. Del Sozzo, G. C. Durelli, E. M. G. Trainiti, A. Miele, M. D. Santambrogio, and C. Bolchini. 2016. Workload-Aware Power Optimization Strategy for Asymmetric Multiprocessors. In Proceedings of the 2016 Conference on Design, Automation and Test in Europe (DATE '16). EDA Consortium, San Jose, CA, USA, 531--534.

[26]

Ashley Stevens. 2013. White paper: Introduction to AMBA® 4 ACE? and big.LITTLE? Processing Technology.

[27]

Ben Taylor, Vicent Sanz Marco, and Zheng Wang. 2017. Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems - LCTES 2017, Vol. Part F1286. ACM Press, New York, New York, USA, 11--20. https://doi.org/10.1145/3078633.3081040

Digital Library

[28]

Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, and Peizhao Zhang. 2019. Machine Learning at Facebook: Understanding Inference at the Edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 331--344. https://doi.org/10.1109/HPCA.2019.00048

Cited By

Wu BBao WZhou B(2024)Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery NetworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347541235:12(2449-2462)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3475412
Zeng XZhang S(2024)CStream: Parallel Data Stream Compression on Multicore Edge DevicesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338686236:11(5889-5904)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3386862
Zeng XZhang S(2023)Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00078(950-964)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00078
Show More Cited By

Index Terms

Performance Optimization on big.LITTLE Architectures: A Memory-latency Aware Approach
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. System on a chip

Recommendations

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific ...
Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

Asymmetric multicore processors (AMPs) consist of cores with the same ISA (instruction-set architecture), but different microarchitectural features, speed, and power consumption. Because cores with more complex features and higher speed typically use ...
Performance, optimization, and fitness: Connecting applications to architectures

Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task- and thread-level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES '20: The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

June 2020

163 pages

ISBN:9781450370943

DOI:10.1145/3372799

General Chair:
Jingling Xue
UNSW Sydney, Australia
,
Program Chair:
Changhee Jung
Purdue University, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

LCTES '20

Sponsor:

LCTES '20: 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

June 16, 2020

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
294
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu BBao WZhou B(2024)Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery NetworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347541235:12(2449-2462)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3475412
Zeng XZhang S(2024)CStream: Parallel Data Stream Compression on Multicore Edge DevicesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338686236:11(5889-5904)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3386862
Zeng XZhang S(2023)Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00078(950-964)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00078
Fang JXu YKong HCai M(2023)A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architectureThe Journal of Supercomputing10.1007/s11227-023-05078-679:10(10570-10588)Online publication date: 11-Feb-2023
https://doi.org/10.1007/s11227-023-05078-6
Zhang JZheng RZhao XZhu JXu JWu Q(2022)A computational resources scheduling algorithm in edge cloud computing: from the energy efficiency of users’ perspectiveThe Journal of Supercomputing10.1007/s11227-021-04146-z78:7(9355-9376)Online publication date: 17-Jan-2022
https://doi.org/10.1007/s11227-021-04146-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten