skip to main content
10.1145/3640537.3641572acmconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article

Fast and Accurate Context-Aware Basic Block Timing Prediction using Transformers

Published: 20 February 2024 Publication History

Abstract

This paper introduces ORXESTRA, a context-aware execution time prediction model based on Transformers XL, specifically designed to accurately estimate performance in embedded system applications. Unlike traditional machine learning models that often overlook contextual information, resulting in biased predictions for individual isolated basic blocks, ORXESTRA overcomes this limitation by incorporating execution context awareness. By doing so, ORXESTRA effectively accounts for the processor micro-architecture without explicitly modeling micro-architectural elements such as caches, pipelines, and branch predictors. Our evaluations demonstrate ORXESTRA's ability to provide precise timing estimations for different ARM targets (Cortex M4, M7, A53, and A72), surpassing existing machine learning-based approaches in both prediction accuracy and prediction speed.

References

[1]
Accessed 2023. ARM Cortex-A53 Processor. https://developer.arm.com/ip-products/processors/cortex-a/cortex-a53
[2]
Accessed 2023. ARM Cortex-A72 Processor. https://developer.arm.com/ip-products/processors/cortex-a/cortex-a72
[3]
Accessed 2023. ARM Cortex-M4 Processor. https://developer.arm.com/ip-products/processors/cortex-m/cortex-m4
[4]
Accessed 2023. ARM Cortex-M7 Processor. https://developer.arm.com/ip-products/processors/cortex-m/cortex-m7
[5]
Abderaouf N Amalou, Elisa Fromont, and Isabelle Puaut. 2022. CATREEN: Context-Aware Code Timing Estimation with Stacked Recurrent Networks. In 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI). 571–576. https://doi.org/10.1109/ICTAI56018.2022.00090
[6]
Abderaouf Nassim AMALOU, Isabelle Puaut, and Elisa Fromont. 2023. Pre-training and fine-tuning dataset for transformers consisting of basic blocks and their execution times (average, minimum, and maximum) along with the execution context of these blocks, for various Cortex processors M7, M4, A53, and A72. https://doi.org/10.5281/zenodo.10043908
[7]
Abderaouf N Amalou, Isabelle Puaut, and Gilles Muller. 2021. WE-HML: hybrid WCET estimation using machine learning for architectures with caches. In 2021 IEEE 27th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). 31–40. https://doi.org/10.1109/RTCSA52859.2021.00011
[8]
Stanley F Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13, 4 (1999), 359–394.
[9]
Yishen Chen, Ajay Brahmakshatriya, Charith Mendis, Alex Renda, Eric Atkinson, Ondřej Sýkora, Saman Amarasinghe, and Michael Carbin. 2019. BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 167–177. https://doi.org/10.1109/IISWC47752.2019.9042166
[10]
Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, and Chengyong Wu. 2012. Deconstructing iterative optimization. ACM Trans. Archit. Code Optim., 9, 3 (2012), Article 21, oct, 30 pages. issn:1544-3566 https://doi.org/10.1145/2355585.2355594
[11]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423
[13]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[14]
SEGGER Microcontroller GmbH. [n. d.]. Ozone User Guide & Reference Manual. 348 pages. https://www.segger.com/
[15]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie LIU, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCode BERT: Pre-training Code Representations with Data Flow. In International Conference on Learning Representations. https://openreview.net/forum?id=jLoC4ez43PZ
[16]
M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538). 3–14. https://doi.org/10.1109/WWC.2001.990739
[17]
Urvashi Khandelwal, He He, Peng Qi, and Dan Jurafsky. 2018. Sharp nearby, fuzzy far away: How neural language models use context. arXiv preprint arXiv:1805.04623.
[18]
Peter M. W. Knijnenburg, Toru Kisuki, and Michael F. P. O’Boyle. 2002. Iterative Compilation. In Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS, Ed F. Deprettere, Jürgen Teich, and Stamatis Vassiliadis (Eds.) (Lecture Notes in Computer Science, Vol. 2268). Springer, 171–187. https://doi.org/10.1007/3-540-45874-3_10
[19]
Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
[20]
Xuezixiang Li, Yu Qu, and Heng Yin. 2021. PalmTree: Learning an Assembly Language Model for Instruction Embedding. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA. 3236–3251. isbn:9781450384544 https://doi.org/10.1145/3460120.3484587
[21]
Martin Maas. 2020. A Taxonomy of ML for Systems Problems. IEEE Micro, 40, 5 (2020), 8–16. https://doi.org/10.1109/MM.2020.3012883
[22]
Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In International Conference on machine learning. 4505–4515.
[23]
Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In Int. Conference on machine learning. PMLR.
[24]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26 (2013).
[25]
Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, and Lindsey Decker. 2021. CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655.
[26]
Segger. [n. d.]. J-Trace PRO – The Leading Trace Solution. https://www.segger.com/products/debug-probes/j-trace/
[27]
Jun S. Shim, Bogyeong Han, Yeseong Kim, and Jihong Kim. 2022. DeepPM: Transformer-based Power and Performance Prediction for Energy-Aware Software. In 2022 Design, Automation Test in Europe Conference Exhibition (DATE). 1491–1496. https://doi.org/10.23919/DATE54114.2022.9774589
[28]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy (SP). 138–157. https://doi.org/10.1109/SP.2016.17
[29]
Ondřej Sýkora, Phitchaya Mangpo Phothilimthana, Charith Mendis, and Amir Yazdanbakhsh. 2022. GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation. In 2022 IEEE International Symposium on Workload Characterization (IISWC). 14–26. https://doi.org/10.1109/IISWC55918.2022.00012
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30 (2017).
[31]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[32]
Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE, 106, 11 (2018), 1879–1901. https://doi.org/10.1109/JPROC.2018.2817118
[33]
Tomofumi Yuki and Louis-Noël Pouchet. 2016. PolyBench 4.2. 1 (pre-release).

Index Terms

  1. Fast and Accurate Context-Aware Basic Block Timing Prediction using Transformers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction
    February 2024
    261 pages
    ISBN:9798400705076
    DOI:10.1145/3640537
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 February 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Execution time estimation
    2. Machine learning
    3. long-short term memory
    4. transformers model

    Qualifiers

    • Research-article

    Conference

    CC '24
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 191
      Total Downloads
    • Downloads (Last 12 months)161
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media