skip to main content
research-article

Predictable GPU Wavefront Splitting for Safety-Critical Systems

Published: 09 September 2023 Publication History

Abstract

We present a predictable wavefront splitting (PWS) technique for graphics processing units (GPUs). PWS improves the performance of GPU applications by reducing the impact of branch divergence while ensuring that worst-case execution time (WCET) estimates can be computed. This makes PWS an appropriate technique to use in safety-critical applications, such as autonomous driving systems, avionics, and space, that require strict temporal guarantees. In developing PWS on an AMD-based GPU, we propose microarchitectural enhancements to the GPU, and a compiler pass that eliminates branch serializations to reduce the WCET of a wavefront. Our analysis of PWS exhibits a performance improvement of 11% over existing architectures with a lower WCET than prior works in wavefront splitting.

References

[1]
Tor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers, and Margaret Martonosi. 2018. General-Purpose Graphics Processor Architecture. Morgan & Claypool. 21–26.
[2]
Advanced Micro Devices. 2016. Graphics Core Next Architecture Reference Guide. (2016).
[3]
Advanced Micro Devices. 2019. Introducing RDNA Architecture. (2019).
[4]
Tanya Amert, Nathan Otterness, Ming Yang, James H. Anderson, and F. Donelson Smith. 2017. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In 2017 IEEE Real-Time Systems Symposium (RTSS). 104–115.
[5]
Pete Bannon, Ganesh Venkataramanan, Debjit Das Sarma, and Emil Talpes. 2019. Computer and redundancy solution for the full self-driving computer. In 2019 IEEE Hot Chips 31 Symposium (HCS), Cupertino, CA, USA, August 18–20, 2019. IEEE, 1–22.
[6]
Adam Betts and Alastair Donaldson. 2013. Estimating the WCET of GPU-accelerated applications using hybrid analysis. In 2013 25th Euromicro Conference on Real-Time Systems. 193–202.
[7]
Srikant Bharadwaj, Shomit Das, Yasuko Eckert, Mark Oskin, and Tushar Krishna. 2021. DUB: Dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads. In Proceedings of the 15th IEEE/ACM International Symposium on Networks-on-Chip (NOCS’21). Association for Computing Machinery, New York, NY, USA, 49–54.
[8]
Benjamin Brosgol. 2011. DO-178C: The next avionics safety standard. In Proceedings of the 2011 ACM Annual International Conference on Special Interest Group on the Ada Programming Language (SIGAda’11). Association for Computing Machinery, New York, NY, USA, 5–6.
[9]
Nicolas Brunie, Caroline Collange, and Gregory Diamos. 2012. Simultaneous branch and warp interweaving for sustained GPU performance. In 2012 39th Annual International Symposium on Computer Architecture (ISCA’12). 49–60.
[10]
Sana Damani, Mark Stephenson, Ram Rangan, Daniel Johnson, Rishkul Kulkarni, and Stephen W. Keckler. 2022. GPU subwarp interleaving. In Proceedings of the International Symposium on High-Performance Computer Architecture.
[11]
Wilson W.L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’07). 407–420.
[12]
Yijie Huangfu and Wei Zhang. 2017. Static WCET analysis of GPUs with predictable warp scheduling. In 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC’17). 101–108.
[13]
Jason Lowe-Power et al. 2020. The gem5 simulator: Version 20.0+. CoRR abs/2007.03152 (2020). arXiv:2007.03152https://arxiv.org/abs/2007.03152
[14]
Kuen-Long Lu and Yung-Yuan Chen. 2019. ISO 26262 ASIL-oriented hardware design framework for safety-critical automotive systems. In 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE’19). 1–6.
[15]
Jiayuan Meng, David Tarjan, and Kevin Skadron. 2010. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). Association for Computing Machinery, New York, NY, USA, 235–246.
[16]
Nathan Otterness and James H. Anderson. 2021. Exploring AMD GPU scheduling details by experimenting with “Worst Practices”. In 29th International Conference on Real-Time Networks and Systems (RTNS’2021). Association for Computing Machinery, New York, NY, USA, 24–34.
[17]
Michael Platzer and Peter Puschner. 2021. Vicuna: A timing-predictable RISC-V vector coprocessor for scalable parallel computation. In 33rd Euromicro Conference on Real-Time Systems (ECRTS’21) (Leibniz International Proceedings in Informatics (LIPIcs)), Björn B. Brandenburg (Ed.), Vol. 196. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 1:1–1:18.
[18]
Minsoo Rhu and Mattan Erez. 2013. The dual-path execution model for efficient GPU control flow. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). 591–602.
[19]
Corbin Robeck and Aryan Salmanpour. 2016. ROCm Developer Tools: HIP Examples. (2016). https://github.com/ROCm-Developer-Tools/HIP-Examples
[20]
Roy Spliet and Robert D. Mullins. 2022. Sim-D: A SIMD accelerator for hard real-time systems. IEEE Trans. Comput. 71, 4 (2022), 851–865.
[21]
Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H. Anderson, and F. Donelson Smith. 2018. Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In 30th Euromicro Conference on Real-Time Systems, ECRTS 2018, July 3–6, 2018, Barcelona, Spain (LIPIcs), Sebastian Altmeyer (Ed.), Vol. 106. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20:1–20:21.
[22]
Sharad Malik Yau-Tsun Steven Li. 1995. Performance analysis of embedded software using implicit path enumeration. In 32nd Design Automation Conference. 456–461.
[23]
Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen Leng, and Minyi Guo. 2022. Astraea: Towards QoS-aware and resource-efficient multi-stage GPU services. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22). Association for Computing Machinery, New York, NY, USA, 570–582.

Index Terms

  1. Predictable GPU Wavefront Splitting for Safety-Critical Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
    Special Issue ESWEEK 2023
    October 2023
    1394 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3614235
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 09 September 2023
    Accepted: 30 June 2023
    Revised: 02 June 2023
    Received: 23 March 2023
    Published in TECS Volume 22, Issue 5s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. safety-critical systems

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 187
      Total Downloads
    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media