skip to main content
research-article

PISCOT: A Pipelined Split-Transaction COTS-Coherent Bus for Multi-Core Real-Time Systems

Published: 29 October 2022 Publication History

Abstract

Tasks in modern embedded systems such as automotive and avionics communicate among each other using shared data towards achieving the desired functionality of the whole system. In commodity platforms, cores communicate data through the shared memory hierarchy and correctness is maintained by a cache coherence protocol. Recent works investigated the deployment of coherence protocols in real-time systems and showed significant performance improvements. Nonetheless, we find these works to require modifications to commodity coherence protocols, assume simple in-order pipelines, and most importantly suffer from significant latency delays due to coherence interference along with average performance degradation. In this work, we propose PISCOT: a predictable and coherent bus architecture that (i) provides a considerably tighter bound compared to the state-of-the-art predictable coherent solutions (4× tighter bounds in a quad-core system). (ii) It does so with a negligible performance loss compared to conventional high-performance architecture coherence delays (less than 4% for SPLASH-3 benchmarks). This improves average performance by up to 5× (2.8× on average) compared to its predictable coherence counterpart. Finally, (iii) it achieves that without requiring any modifications to conventional coherence protocols. We show this by integrating PISCOT on top of two protocols with a detailed implementation with complete transient states: MSI and MESI.

References

[1]
W. L. Bain Jr. and S. R. Ahuja. 1981. Performance analysis of high-speed digital buses for multiprocessing systems. In Proceedings of the 8th Annual Symposium on Computer Architecture. 107–133.
[2]
Ayoosh Bansal, Jayati Singh, Yifan Hao, Jen-Yang Wen, Renato Mancuso, and Marco Caccamo. 2019. Cache where you want! Reconciling predictability and coherent caching. arXiv preprint arXiv:1909.05349 (2019).
[3]
Matthias Becker, Dakshina Dasari, Borislav Nicolic, Benny Akesson, Vincent Nélis, and Thomas Nolte. 2016. Contention-free execution of automotive applications on a clustered many-core platform. In IEEE Euromicro Conference on Real-Time Systems (ECRTS).
[4]
M. Chisholm, N. Kim, B. C. Ward, N. Otterness, J. H. Anderson, and F. D. Smith. 2016. Reconciling the tension between hardware isolation and data sharing in mixed-criticality, multicore systems. In IEEE Real-Time Systems Symposium (RTSS).
[5]
Michael A. Fischer. 1988. Fair Arbitration Technique for a Split Transaction Bus in a Multiprocessor Computer System. US Patent 4,785,394.
[6]
Freescale semicondutor. 2016. QorIQ T2080 Reference Manual. Also supports T2081. Document Number: T2080RM. Rev. 3, 11/2016.
[7]
Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. 2015. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv. (2015).
[8]
Giovani Gracioli and Antônio Augusto Fröhlich. 2015. On the design and evaluation of a real-time operating system for cache-coherent multicore architectures. ACM SIGOPS Oper. Syst. Rev. (2015).
[9]
Danlu Guo, Mohamed Hassan, Rodolfo Pellizzoni, and Hiren Patel. 2018. A comparative study of predictable dram controllers. ACM Transactions on Embedded Computing Systems (TECS) (2018).
[10]
Arne Hamann, Dakshina Dasari, Simon Kramer, Michael Pressler, and Falk Wurst. 2017. Communication centric design in complex automotive embedded systems. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[11]
D. Hardy, T. Piquet, and I. Puaut. 2009. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In IEEE Real-Time Systems Symposium (RTSS).
[12]
Mohamed Hassan. 2018. On the off-chip memory latency of real-time systems: Is DDR DRAM really the best option?. In IEEE Real-Time Systems Symposium (RTSS).
[13]
Mohamed Hassan. 2020. Discriminative coherence: Balancing performance and latency bounds in data-sharing multi-core real-time systems. In Euromicro Conference on Real-Time Systems (ECRTS). 1–22.
[14]
M. Hassan, A. M. Kaushik, and H. Patel. 2017. Predictable cache coherence for multi-core real-time systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).
[15]
M. Hassan and H. Patel. 2016. Criticality- and requirement-aware bus arbitration for multi-core mixed criticality systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).
[16]
Mohamed Hassan and Rodolfo Pellizzoni. 2018. Bounding DRAM interference in COTS heterogeneous MPSoCs for mixed criticality systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018).
[17]
Farouk Hebbache, Mathieu Jan, Florian Brandner, and Laurent Pautet. 2018. Shedding the shackles of time-division multiplexing. In IEEE Real-Time Systems Symposium (RTSS).
[18]
John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.
[19]
Salah Hessien and Mohamed Hassan. 2020. The best of all worlds: Improving predictability at the performance of conventional coherence with no protocol modifications. In 2020 IEEE Real-Time Systems Symposium (RTSS). IEEE, 218–230.
[20]
Anirudh Mohan Kaushik, Mohamed Hassan, and Hiren Patel. 2020. Designing predictable cache coherence protocols for multi-core real-time systems. IEEE Trans. Comput. (2020).
[21]
Anirudh M. Kaushik and Hiren Patel. 2021. A systematic approach to achieving tight worst-case latency and high-performance under predictable cache coherence. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1–12.
[22]
Anirudh M. Kaushik, Paulos Tegegn, Zhuanhao Wu, and Hiren Patel. 2019. CARP: A data communication mechanism for multi-core mixed-criticality systems. In IEEE Real-Time Systems Symposium (RTSS).
[23]
Timon Kelter, Heiko Falk, Peter Marwedel, Sudipta Chattopadhyay, and Abhik Roychoudhury. 2011. Bus-aware multicore WCET analysis through TDMA offset bounds. In Euromicro Conference on Real-Time Systems (ECRTS).
[24]
Manpreet S. Khaira. 1996. Fast First-come First Served Arbitration Method. US Patent 5,574,867.
[25]
Namhoon Kim, Micaiah Chisholm, Nathan Otterness, James H. Anderson, and F. Donelson Smith. 2017. Allowing shared libraries while supporting hardware isolation in multicore real-time systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).
[26]
Benjamin Lesage, Damien Hardy, and Isabelle Puaut. 2010. Shared data caches conflicts reduction for WCET computation in multi-core architectures. In International Conference on Real-Time and Network Systems.
[27]
Renato Mancuso, Roman Dudko, Emiliano Betti, Marco Cesati, Marco Caccamo, and Rodolfo Pellizzoni. 2013. Real-time cache management framework for multi-core architectures. In IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).
[28]
Milo M. K. Martin, Mark D. Hill, and Daniel J. Sorin. 2012. Why on-chip cache coherence is here to stay. Communications of ACM (2012).
[29]
Marco Paolieri, Eduardo Quiñones, Francisco J. Cazorla, Guillem Bernat, and Mateo Valero. 2009. Hardware support for WCET analysis of hard real-time multicore systems. ACM SIGARCH Computer Architecture News (2009).
[30]
Rodolfo Pellizzoni, Bach D. Bui, Marco Caccamo, and Lui Sha. 2008. Coscheduling of CPU and I/O transactions in COTS-based embedded systems. In IEEE Real-Time Systems Symposium (RTSS).
[31]
Francesco Poletti, Davide Bertozzi, Luca Benini, and Alessandro Bogliolo. 2003. Performance analysis of arbitration policies for SoC communication architectures. Design Automation for Embedded Systems (2003).
[32]
Fong Pong and Michel Dubois. 1995. A new approach for the verification of cache coherence protocols. IEEE Transactions on Parallel and Distributed Systems (1995).
[33]
Roger Pujol, Hamid Tabani, Jaume Abella, Mohamed Hassan, and Francisco J. Cazorla. 2020. Empirical evidence for MPSoCs in critical systems: The case of NXP’s T2080 cache coherence. In IEEE Design Automation and Test in Europe (DATE). 1–4.
[34]
D. Radack et al. (Rockwell Collins). 2018. Civil Certification of Multi-core Processing Systems in Commercial Avionics.
[35]
Martin Schoeberl, Wolfgang Puffitsch, and Benedikt Huber. 2009. Towards time-predictable data caches for chip-multiprocessors. In Springer International Workshop on Software Technologies for Embedded and Ubiquitous Systems (IFIP).
[36]
Nathanaël Sensfelder, Julien Brunel, and Claire Pagetti. 2019. Modeling cache coherence to expose interference. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[37]
Nathanaël Sensfelder, Julien Brunel, and Claire Pagetti. 2020. On how to identify cache coherence: Case of the NXP QorIQ T4240. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[38]
Ashok Singhal, Bjorn Liencres, Jeff Price, Frederick M. Cerauskis, David Broniarczyk, Gerald Cheung, Erik Hagersten, and Nalini Agarwal. 1999. Implementing Snooping on a Split-transaction Computer System Bus. US Patent 5,978,874.
[39]
Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A primer on memory consistency and cache coherence. Synthesis Lectures on Computer Architecture (2011).
[40]
N. Sritharan, A. M. Kaushik, M. Hassan, and H. Patel. 2017. Hourglass: Predictable time-based cache coherence protocol for dual-critical multi-core systems. (2017).
[41]
Nivedita Sritharan, Anirudh Mohan Kaushik, Mohamed Hassan, and Hiren Patel. 2019. Enabling predictable, simultaneous and coherent data sharing in mixed criticality systems. (2019).
[42]
Man-Ki Yoon, Jung-Eun Kim, and Lui Sha. 2011. Optimizing tunable WCET with shared resource allocation and arbitration in hard real-time multicore systems. In IEEE Real-Time Systems Symposium (RTSS).
[43]
Mohamed Younis and Mohamed Aboutabl. 2002. Communication Handling in Integrated Modular Avionics. US Patent App. 09/821,601.
[44]
Heechul Yun, Rodolfo Pellizzoni, and Prathap Kumar Valsan. 2015. Parallelism-aware memory interference delay analysis for COTS multicore systems. In Euromicro Conference on Real-Time Systems (ECRTS).
[45]
Dimitrios Ziakas, Allen Baum, Robert A. Maddox, and Robert J. Safranek. 2010. Intel® quickpath interconnect architectural features supporting scalable system architectures. In IEEE Symposium on High Performance Interconnects.

Cited By

View all
  • (2024)A Dynamic Priority-aware Coherent Cache Architecture for Reactive Real-Time SystemProceedings of the 32nd International Conference on Real-Time Networks and Systems10.1145/3696355.3699700(142-152)Online publication date: 6-Nov-2024
  • (2024)High Performance and Predictable Shared Last-level Cache for Safety-Critical SystemsACM Transactions on Embedded Computing Systems10.1145/368730823:6(1-30)Online publication date: 11-Sep-2024
  • (2024)Exclusive Hierarchies for Predictable Sharing in Last-Level Cache2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00023(186-198)Online publication date: 13-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 1
January 2023
512 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3567467
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 29 October 2022
Online AM: 22 August 2022
Accepted: 23 July 2022
Revised: 09 June 2022
Received: 24 December 2021
Published in TECS Volume 22, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Datasets
  2. neural networks
  3. gaze detection
  4. text tagging

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)10
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Dynamic Priority-aware Coherent Cache Architecture for Reactive Real-Time SystemProceedings of the 32nd International Conference on Real-Time Networks and Systems10.1145/3696355.3699700(142-152)Online publication date: 6-Nov-2024
  • (2024)High Performance and Predictable Shared Last-level Cache for Safety-Critical SystemsACM Transactions on Embedded Computing Systems10.1145/368730823:6(1-30)Online publication date: 11-Sep-2024
  • (2024)Exclusive Hierarchies for Predictable Sharing in Last-Level Cache2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00023(186-198)Online publication date: 13-May-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media