Skip to main content

CoD: Coherence-on-Demand – Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures

  • Conference paper
  • First Online:
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

Abstract

Embedded system applications, with their inherently limited parallelism, rarely exploit all available processing resources in large DSM-based manycore architectures. In addition, global coherence spanning across all tiles does not scale well. Therefore, we have proposed a region-based cache coherence (RBCC) approach that enables coherence among a selectable cluster of tiles in accordance with application requirements. In this paper, we present a novel RBCC-malloc() extension that transparently tailors coherence to actually shared application working sets at runtime. Further, the design and hardware implementation of a flexibly configurable coherency region manager (CRM) supporting RBCC-malloc() are introduced. We synthesized the CRM on an FPGA for a 64-core system and observed a 57% reduction in BRAM-utilization compared to a global coherence directory for regions with up to 16 cores. Experiments reveal an application acceleration of up to 42% compared to a message passing based implementation. We also demonstrate the advantage of RBCC-malloc() compared to standalone RBCC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fleisch, B., Popek, G.: Mirage: a coherent distributed shared memory design. In: 12th ACM Symposium on Operating Systems Principles SOSP 1989, pp. 211–223. ACM, New York (1989). https://doi.org/10.1145/74850.74871

  2. Bennett, J.K., Carter, J.B., Zwaenepoel, W.: Munin: distributed shared memory based on type-specific memory coherence. In: 2nd ACM SIGPLAN Symposium on Principles&Amp; Practice of Parallel Programming. ACM, New York (1990). https://doi.org/10.1145/99163.99182

  3. de Dinechin, B.D.: Kalray MPPA®: massively parallel processor array: Revisiting DSP acceleration with the kalray MPPA manycore processor. In: 2015 IEEE Hot Chips 27 Symposium, pp. 1–27, August 2015. https://doi.org/10.1109/HOTCHIPS.2015.7477332

  4. Lenoski, D., et al.: The stanford dash multiprocessor. Computer 25(3), 63–79 (1992). https://doi.org/10.1109/2.121510

    Article  Google Scholar 

  5. Wentzlaff, D., et al.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007). https://doi.org/10.1109/MM.2007.4378780

    Article  Google Scholar 

  6. Kessler, R.E.: The cavium 32 core octeon ii 68xx. In: 2011 IEEE Hot Chips 23 Symposium (HCS), pp. 1–33, August 2011. https://doi.org/10.1109/HOTCHIPS.2011.7477487

  7. Srivatsa, A., et al.: Region based cache coherence for tiled MPSoCs. In: 30th IEEE International System-on-Chip Conference (2017). https://doi.org/10.1109/SOCC.2017.8226059

  8. Southern, G., Renau, J.: Analysis of PARSEC workload scalability. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 133–142, April 2016. https://doi.org/10.1109/ISPASS.2016.7482081

  9. Eggers, S.J., Katz, R.H.: Evaluating the performance of four snooping cache coherency protocols. In: 16th Annual International Symposium on Computer Architecture ISCA 1989, pp. 2–15. ACM, New York (1989). https://doi.org/10.1145/74925.74927

  10. Hennessy, J., Heinrich, M., Gupta, A.: Cache-coherent distributed shared memory: perspectives on its development and future challenges. Proc. IEEE 87(3), 418–429 (1999). https://doi.org/10.1109/5.747863

    Article  Google Scholar 

  11. Gupta, A., Weber, W.D., Mowry, T.: Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: International Conference on Parallel Processing, pp. 312–321 (1990)

    Google Scholar 

  12. Yao, Y., et al.: Selectdirectory: a selective directory for cache coherence in many-core architectures. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 175–180, March 2015

    Google Scholar 

  13. Ferdman, M., et al.: Cuckoo directory: a scalable directory for many-core systems. In: IEEE 17th International Symposium on High Performance Computer Architecture, pp. 169–180, February 2011. https://doi.org/10.1109/HPCA.2011.5749726

  14. Chaiken, D., Kubiatowicz, J., Agarwal, A.: Limitless directories: a scalable cache coherence scheme. In: ASPLOS IV, pp. 224–234. ACM, New York (1991). https://doi.org/10.1145/106972.106995

  15. Sodani, A., et al.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016). https://doi.org/10.1109/MM.2016.25

    Article  Google Scholar 

  16. Fu, Y., Nguyen, T.M., Wentzlaff, D.: Coherence domain restriction on large scale systems. In: 48th International Symposium on Microarchitecture MICRO-48, pp. 686–698. ACM, New York (2015). https://doi.org/10.1145/2830772.2830832

  17. Teich, J., et al.: Invasive computing: an overview. In: Multiprocessor System-on-Chip (2011)

    Google Scholar 

Download references

Acknowledgements

This work was partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project Number 146371743 - TRR 89: Invasive Computing. We would also like to thank the Computer Science 4 department at FAU, Erlangen for their valuable OS support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshay Srivatsa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Srivatsa, A., Rheindt, S., Gabriel, D., Wild, T., Herkersdorf, A. (2019). CoD: Coherence-on-Demand – Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27562-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27561-7

  • Online ISBN: 978-3-030-27562-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics