research-article

Fast and Accurate Context-Aware Basic Block Timing Prediction using Transformers

Authors:

Abderaouf Nassim Amalou,

Isabelle PuautAuthors Info & Claims

CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction

Pages 227 - 237

https://doi.org/10.1145/3640537.3641572

Published: 20 February 2024 Publication History

Abstract

This paper introduces ORXESTRA, a context-aware execution time prediction model based on Transformers XL, specifically designed to accurately estimate performance in embedded system applications. Unlike traditional machine learning models that often overlook contextual information, resulting in biased predictions for individual isolated basic blocks, ORXESTRA overcomes this limitation by incorporating execution context awareness. By doing so, ORXESTRA effectively accounts for the processor micro-architecture without explicitly modeling micro-architectural elements such as caches, pipelines, and branch predictors. Our evaluations demonstrate ORXESTRA's ability to provide precise timing estimations for different ARM targets (Cortex M4, M7, A53, and A72), surpassing existing machine learning-based approaches in both prediction accuracy and prediction speed.

References

[1]

Accessed 2023. ARM Cortex-A53 Processor. https://developer.arm.com/ip-products/processors/cortex-a/cortex-a53

[2]

Accessed 2023. ARM Cortex-A72 Processor. https://developer.arm.com/ip-products/processors/cortex-a/cortex-a72

[3]

Accessed 2023. ARM Cortex-M4 Processor. https://developer.arm.com/ip-products/processors/cortex-m/cortex-m4

[4]

Accessed 2023. ARM Cortex-M7 Processor. https://developer.arm.com/ip-products/processors/cortex-m/cortex-m7

[5]

Abderaouf N Amalou, Elisa Fromont, and Isabelle Puaut. 2022. CATREEN: Context-Aware Code Timing Estimation with Stacked Recurrent Networks. In 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI). 571–576. https://doi.org/10.1109/ICTAI56018.2022.00090

[6]

Abderaouf Nassim AMALOU, Isabelle Puaut, and Elisa Fromont. 2023. Pre-training and fine-tuning dataset for transformers consisting of basic blocks and their execution times (average, minimum, and maximum) along with the execution context of these blocks, for various Cortex processors M7, M4, A53, and A72. https://doi.org/10.5281/zenodo.10043908

[7]

Abderaouf N Amalou, Isabelle Puaut, and Gilles Muller. 2021. WE-HML: hybrid WCET estimation using machine learning for architectures with caches. In 2021 IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). 31–40. https://doi.org/10.1109/RTCSA52859.2021.00011

[8]

Stanley F Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13, 4 (1999), 359–394.

Digital Library

[9]

Yishen Chen, Ajay Brahmakshatriya, Charith Mendis, Alex Renda, Eric Atkinson, Ondřej Sýkora, Saman Amarasinghe, and Michael Carbin. 2019. BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 167–177. https://doi.org/10.1109/IISWC47752.2019.9042166

[10]

Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, and Chengyong Wu. 2012. Deconstructing iterative optimization. ACM Trans. Archit. Code Optim., 9, 3 (2012), Article 21, oct, 30 pages. issn:1544-3566 https://doi.org/10.1145/2355585.2355594

Digital Library

[11]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423

[13]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139

[14]

SEGGER Microcontroller GmbH. [n. d.]. Ozone User Guide & Reference Manual. 348 pages. https://www.segger.com/

[15]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie LIU, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCode BERT: Pre-training Code Representations with Data Flow. In International Conference on Learning Representations. https://openreview.net/forum?id=jLoC4ez43PZ

[16]

M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538). 3–14. https://doi.org/10.1109/WWC.2001.990739

[17]

Urvashi Khandelwal, He He, Peng Qi, and Dan Jurafsky. 2018. Sharp nearby, fuzzy far away: How neural language models use context. arXiv preprint arXiv:1805.04623.

[18]

Peter M. W. Knijnenburg, Toru Kisuki, and Michael F. P. O’Boyle. 2002. Iterative Compilation. In Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS, Ed F. Deprettere, Jürgen Teich, and Stamatis Vassiliadis (Eds.) (Lecture Notes in Computer Science, Vol. 2268). Springer, 171–187. https://doi.org/10.1007/3-540-45874-3_10

[19]

Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.

[20]

Xuezixiang Li, Yu Qu, and Heng Yin. 2021. PalmTree: Learning an Assembly Language Model for Instruction Embedding. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA. 3236–3251. isbn:9781450384544 https://doi.org/10.1145/3460120.3484587

Digital Library

[21]

Martin Maas. 2020. A Taxonomy of ML for Systems Problems. IEEE Micro, 40, 5 (2020), 8–16. https://doi.org/10.1109/MM.2020.3012883

Digital Library

[22]

Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In International Conference on machine learning. 4505–4515.

[23]

Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In Int. Conference on machine learning. PMLR.

[24]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26 (2013).

[25]

Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, and Lindsey Decker. 2021. CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655.

[26]

Segger. [n. d.]. J-Trace PRO – The Leading Trace Solution. https://www.segger.com/products/debug-probes/j-trace/

[27]

Jun S. Shim, Bogyeong Han, Yeseong Kim, and Jihong Kim. 2022. DeepPM: Transformer-based Power and Performance Prediction for Energy-Aware Software. In 2022 Design, Automation Test in Europe Conference Exhibition (DATE). 1491–1496. https://doi.org/10.23919/DATE54114.2022.9774589

[28]

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy (SP). 138–157. https://doi.org/10.1109/SP.2016.17

[29]

Ondřej Sýkora, Phitchaya Mangpo Phothilimthana, Charith Mendis, and Amir Yazdanbakhsh. 2022. GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation. In 2022 IEEE International Symposium on Workload Characterization (IISWC). 14–26. https://doi.org/10.1109/IISWC55918.2022.00012

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30 (2017).

[31]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[32]

Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE, 106, 11 (2018), 1879–1901. https://doi.org/10.1109/JPROC.2018.2817118

[33]

Tomofumi Yuki and Louis-Noël Pouchet. 2016. PolyBench 4.2. 1 (pre-release).

Index Terms

Fast and Accurate Context-Aware Basic Block Timing Prediction using Transformers
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software

Recommendations

Using Dataflow Based Context for Accurate Value Prediction
PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques

Abstract: We explore the reasons behind the rather low prediction accuracy of existing data value predictors. Our studies show that contexts formed only from the outcomes of the last several instances of a static instruction do not always encapsulate ...
Multiple Branch and Block Prediction
HPCA '97: Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

Accurate branch prediction and instruction fetch prediction of a microprocessor are critical to achieve high performance. For a processor which fetches and executes multiple instructions per cycle, an accurate and high bandwidth instruction fetching ...
Accurate branch prediction for short threads
ASPLOS '08

Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction

February 2024

261 pages

ISBN:9798400705076

DOI:10.1145/3640537

General Chair:
Gabriel Rodríguez
Universidade da Coruña, Spain
,
Program Chairs:
P. Sadayappan
University of Utah, USA
,
Aravind Sukumaran-Rajam
Meta, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CC '24

Sponsor:

SIGPLAN

CC '24: 33rd ACM SIGPLAN International Conference on Compiler Construction

March 2 - 3, 2024

Edinburgh, United Kingdom

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
191
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)10

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten