Concerning with On-Chip Network Features to Improve Cache Coherence Protocols for CMPs

Zeng, Hongbo; Huang, Kun; Wu, Ming; Hu, Weiwu

doi:10.1007/978-3-540-74309-5_29

Hongbo Zeng^1,2,
Kun Huang^1,2,
Ming Wu^1,2 &
…
Weiwu Hu¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4697))

Included in the following conference series:

Asia-Pacific Conference on Advances in Computer Systems Architecture

906 Accesses

Abstract

Chip multiprocessors (CMPs) with on-chip network connecting processor cores have been pervasively accepted as a promising technology to efficiently utilize the ever increasing density of transistors on a chip. Communications in CMPs require invalidating cached copies of a shared data block. The coherence traffic incurs more and more significant overhead as the number of cores in a CMP increases. Conventional designs of cache coherence protocols do not take into account characteristics of underlying networks for flexibility reasons. However, in CMPs, processor cores and the on-chip network are tightly integrated. Exposing the network features to cache coherence protocols will unveil some optimization opportunities. In this paper, we propose distance aware protocol and multi-target invalidations, which exploit the network characteristics to reduce the invalidation traffic overhead at negligible hardware cost. Experimental results on a 16-core CMP simulator showed that the two mechanisms reduced the average invalidation traffic latency by 5%, up to 8%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dally, W.J., Towles, B.: Route packets, not wires: on-chip inteconnection networks. In: DAC 2001. Proceedings of the 38th conference on Design automation, New York, NY, USA, pp. 684–689. ACM Press, New York (2001)
Chapter Google Scholar
Ho, R., Mai, K.W., Horowitz, M.A.: The future of wires. Proceedings of the IEEE 89(4), 490–504 (2001)
Article Google Scholar
Zhang, M., Asanovic, K.: Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: ISCA 2005. Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 336–345. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Held, J., Bautista, J., Koehl, S.: From a Few Cores to Many: A Tera-scale Computing Research Overview. Technical report, intel (2006)
Google Scholar
Laudon, J., Lenoski, D.: The sgi origin: a ccnuma highly scalable server. In: ISCA 1997. Proceedings of the 24th annual international symposium on Computer architecture, pp. 241–251. ACM Press, New York, NY, USA (1997)
Chapter Google Scholar
Dally, W.J., Towles, B.: Principles and Practices of Interconnection Networks. Kaufmann Publishers Inc., San Francisco, CA, USA (2003)
Google Scholar
Hu, W., Zhang, F., Li, Z.: Microarchitecture of the Godson-2 Processor. Journal of Computer Science and Technology 20(2), 243–249 (2005)
Article Google Scholar
Cox, A.L., Fowler, R.J.: Adaptive cache coherency for detecting migratory shared data. In: ISCA 1993. Proceedings of the 20th annual international symposium on Computer architecture, New York, NY, USA, pp. 98–108. ACM Press, New York (1993)
Chapter Google Scholar
Kaxiras, S., Goodman, J.R.: Improving CC-NUMA Performance Using Instruction-Based Prediction. In: Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, pp.161–170 (1999)
Google Scholar
Abdel-Shafi, H., Hall, J., Adve, S.V., Adve, V.S.: An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In: Third International Symposium on High-Performance Computer Architecture, pp. 204–215 (1997)
Google Scholar
Koufaty, D.A., Chen, X., Poulsen, D.K., Torrellas, J.: Data forwarding in scalable shared-memory multiprocessors. In: ICS 1995. Proceedings of the 9th international conference on Supercomputing, pp. 255–264. ACM Press, New York, NY, USA (1995)
Chapter Google Scholar
Lebeck, A.R., Wood, D.A.: Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA 1995. Proceedings of the 22nd annual international symposium on Computer architecture, pp. 48–59. ACM Press, New York, NY, USA (1995)
Chapter Google Scholar
Lai, A.-C., Falsafi, B.: Selective, accurate, and timely self-invalidation using last-touch prediction. In: ISCA 2000. Proceedings of the 27th annual international symposium on Computer architecture, pp. 139–148. ACM Press, New York, NY, USA (2000)
Chapter Google Scholar
Mullins, R., West, A., Moore, S.: Low-latency virtual-channel routers for on-chip networks. In: ISCA 2004. Proceedings of the 31st annual international symposium on Computer architecture, p. 188. IEEE Computer Society, Washington, DC, USA (2004)
Google Scholar
Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., Das, C.R.: A low latency router supporting adaptivity for on-chip interconnects. In: DAC 2005. Proceedings of the 42nd annual conference on Design automation, pp. 559–564. ACM Press, New York, NY, USA (2005)
Chapter Google Scholar
Eisley, N., Peh, L.S., Shang, L.: In-network cache coherence. In: MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 321–332. (2006)
Google Scholar
Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-aware coherence protocols for chip multiprocessors. In: ISCA 2006. Proceedings of the 33rd annual international symposium on Computer Architecture, pp. 339–351. IEEE Computer Society Press, Washington, DC, USA (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Hongbo Zeng, Kun Huang, Ming Wu & Weiwu Hu
Graduate University of the Chinese Academy of Sciences, Beijing, China
Hongbo Zeng, Kun Huang & Ming Wu

Authors

Hongbo Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Kun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Weiwu Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Lynn Choi Yunheung Paek Sangyeun Cho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, H., Huang, K., Wu, M., Hu, W. (2007). Concerning with On-Chip Network Features to Improve Cache Coherence Protocols for CMPs. In: Choi, L., Paek, Y., Cho, S. (eds) Advances in Computer Systems Architecture. ACSAC 2007. Lecture Notes in Computer Science, vol 4697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74309-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-74309-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74308-8
Online ISBN: 978-3-540-74309-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics