Abstract
Embedded system applications, with their inherently limited parallelism, rarely exploit all available processing resources in large DSM-based manycore architectures. In addition, global coherence spanning across all tiles does not scale well. Therefore, we have proposed a region-based cache coherence (RBCC) approach that enables coherence among a selectable cluster of tiles in accordance with application requirements. In this paper, we present a novel RBCC-malloc() extension that transparently tailors coherence to actually shared application working sets at runtime. Further, the design and hardware implementation of a flexibly configurable coherency region manager (CRM) supporting RBCC-malloc() are introduced. We synthesized the CRM on an FPGA for a 64-core system and observed a 57% reduction in BRAM-utilization compared to a global coherence directory for regions with up to 16 cores. Experiments reveal an application acceleration of up to 42% compared to a message passing based implementation. We also demonstrate the advantage of RBCC-malloc() compared to standalone RBCC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fleisch, B., Popek, G.: Mirage: a coherent distributed shared memory design. In: 12th ACM Symposium on Operating Systems Principles SOSP 1989, pp. 211–223. ACM, New York (1989). https://doi.org/10.1145/74850.74871
Bennett, J.K., Carter, J.B., Zwaenepoel, W.: Munin: distributed shared memory based on type-specific memory coherence. In: 2nd ACM SIGPLAN Symposium on Principles&Amp; Practice of Parallel Programming. ACM, New York (1990). https://doi.org/10.1145/99163.99182
de Dinechin, B.D.: Kalray MPPA®: massively parallel processor array: Revisiting DSP acceleration with the kalray MPPA manycore processor. In: 2015 IEEE Hot Chips 27 Symposium, pp. 1–27, August 2015. https://doi.org/10.1109/HOTCHIPS.2015.7477332
Lenoski, D., et al.: The stanford dash multiprocessor. Computer 25(3), 63–79 (1992). https://doi.org/10.1109/2.121510
Wentzlaff, D., et al.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007). https://doi.org/10.1109/MM.2007.4378780
Kessler, R.E.: The cavium 32 core octeon ii 68xx. In: 2011 IEEE Hot Chips 23 Symposium (HCS), pp. 1–33, August 2011. https://doi.org/10.1109/HOTCHIPS.2011.7477487
Srivatsa, A., et al.: Region based cache coherence for tiled MPSoCs. In: 30th IEEE International System-on-Chip Conference (2017). https://doi.org/10.1109/SOCC.2017.8226059
Southern, G., Renau, J.: Analysis of PARSEC workload scalability. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 133–142, April 2016. https://doi.org/10.1109/ISPASS.2016.7482081
Eggers, S.J., Katz, R.H.: Evaluating the performance of four snooping cache coherency protocols. In: 16th Annual International Symposium on Computer Architecture ISCA 1989, pp. 2–15. ACM, New York (1989). https://doi.org/10.1145/74925.74927
Hennessy, J., Heinrich, M., Gupta, A.: Cache-coherent distributed shared memory: perspectives on its development and future challenges. Proc. IEEE 87(3), 418–429 (1999). https://doi.org/10.1109/5.747863
Gupta, A., Weber, W.D., Mowry, T.: Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: International Conference on Parallel Processing, pp. 312–321 (1990)
Yao, Y., et al.: Selectdirectory: a selective directory for cache coherence in many-core architectures. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 175–180, March 2015
Ferdman, M., et al.: Cuckoo directory: a scalable directory for many-core systems. In: IEEE 17th International Symposium on High Performance Computer Architecture, pp. 169–180, February 2011. https://doi.org/10.1109/HPCA.2011.5749726
Chaiken, D., Kubiatowicz, J., Agarwal, A.: Limitless directories: a scalable cache coherence scheme. In: ASPLOS IV, pp. 224–234. ACM, New York (1991). https://doi.org/10.1145/106972.106995
Sodani, A., et al.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016). https://doi.org/10.1109/MM.2016.25
Fu, Y., Nguyen, T.M., Wentzlaff, D.: Coherence domain restriction on large scale systems. In: 48th International Symposium on Microarchitecture MICRO-48, pp. 686–698. ACM, New York (2015). https://doi.org/10.1145/2830772.2830832
Teich, J., et al.: Invasive computing: an overview. In: Multiprocessor System-on-Chip (2011)
Acknowledgements
This work was partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project Number 146371743 - TRR 89: Invasive Computing. We would also like to thank the Computer Science 4 department at FAU, Erlangen for their valuable OS support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Srivatsa, A., Rheindt, S., Gabriel, D., Wild, T., Herkersdorf, A. (2019). CoD: Coherence-on-Demand – Runtime Adaptable Working Set Coherence for DSM-Based Manycore Architectures. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-27562-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27561-7
Online ISBN: 978-3-030-27562-4
eBook Packages: Computer ScienceComputer Science (R0)