Abstract
Chip multiprocessors (CMPs) with on-chip network connecting processor cores have been pervasively accepted as a promising technology to efficiently utilize the ever increasing density of transistors on a chip. Communications in CMPs require invalidating cached copies of a shared data block. The coherence traffic incurs more and more significant overhead as the number of cores in a CMP increases. Conventional designs of cache coherence protocols do not take into account characteristics of underlying networks for flexibility reasons. However, in CMPs, processor cores and the on-chip network are tightly integrated. Exposing the network features to cache coherence protocols will unveil some optimization opportunities. In this paper, we propose distance aware protocol and multi-target invalidations, which exploit the network characteristics to reduce the invalidation traffic overhead at negligible hardware cost. Experimental results on a 16-core CMP simulator showed that the two mechanisms reduced the average invalidation traffic latency by 5%, up to 8%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dally, W.J., Towles, B.: Route packets, not wires: on-chip inteconnection networks. In: DAC 2001. Proceedings of the 38th conference on Design automation, New York, NY, USA, pp. 684–689. ACM Press, New York (2001)
Ho, R., Mai, K.W., Horowitz, M.A.: The future of wires. Proceedings of the IEEE 89(4), 490–504 (2001)
Zhang, M., Asanovic, K.: Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: ISCA 2005. Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 336–345. IEEE Computer Society, Los Alamitos (2005)
Held, J., Bautista, J., Koehl, S.: From a Few Cores to Many: A Tera-scale Computing Research Overview. Technical report, intel (2006)
Laudon, J., Lenoski, D.: The sgi origin: a ccnuma highly scalable server. In: ISCA 1997. Proceedings of the 24th annual international symposium on Computer architecture, pp. 241–251. ACM Press, New York, NY, USA (1997)
Dally, W.J., Towles, B.: Principles and Practices of Interconnection Networks. Kaufmann Publishers Inc., San Francisco, CA, USA (2003)
Hu, W., Zhang, F., Li, Z.: Microarchitecture of the Godson-2 Processor. Journal of Computer Science and Technology 20(2), 243–249 (2005)
Cox, A.L., Fowler, R.J.: Adaptive cache coherency for detecting migratory shared data. In: ISCA 1993. Proceedings of the 20th annual international symposium on Computer architecture, New York, NY, USA, pp. 98–108. ACM Press, New York (1993)
Kaxiras, S., Goodman, J.R.: Improving CC-NUMA Performance Using Instruction-Based Prediction. In: Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, pp.161–170 (1999)
Abdel-Shafi, H., Hall, J., Adve, S.V., Adve, V.S.: An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In: Third International Symposium on High-Performance Computer Architecture, pp. 204–215 (1997)
Koufaty, D.A., Chen, X., Poulsen, D.K., Torrellas, J.: Data forwarding in scalable shared-memory multiprocessors. In: ICS 1995. Proceedings of the 9th international conference on Supercomputing, pp. 255–264. ACM Press, New York, NY, USA (1995)
Lebeck, A.R., Wood, D.A.: Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA 1995. Proceedings of the 22nd annual international symposium on Computer architecture, pp. 48–59. ACM Press, New York, NY, USA (1995)
Lai, A.-C., Falsafi, B.: Selective, accurate, and timely self-invalidation using last-touch prediction. In: ISCA 2000. Proceedings of the 27th annual international symposium on Computer architecture, pp. 139–148. ACM Press, New York, NY, USA (2000)
Mullins, R., West, A., Moore, S.: Low-latency virtual-channel routers for on-chip networks. In: ISCA 2004. Proceedings of the 31st annual international symposium on Computer architecture, p. 188. IEEE Computer Society, Washington, DC, USA (2004)
Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., Das, C.R.: A low latency router supporting adaptivity for on-chip interconnects. In: DAC 2005. Proceedings of the 42nd annual conference on Design automation, pp. 559–564. ACM Press, New York, NY, USA (2005)
Eisley, N., Peh, L.S., Shang, L.: In-network cache coherence. In: MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 321–332. (2006)
Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-aware coherence protocols for chip multiprocessors. In: ISCA 2006. Proceedings of the 33rd annual international symposium on Computer Architecture, pp. 339–351. IEEE Computer Society Press, Washington, DC, USA (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zeng, H., Huang, K., Wu, M., Hu, W. (2007). Concerning with On-Chip Network Features to Improve Cache Coherence Protocols for CMPs. In: Choi, L., Paek, Y., Cho, S. (eds) Advances in Computer Systems Architecture. ACSAC 2007. Lecture Notes in Computer Science, vol 4697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74309-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-74309-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74308-8
Online ISBN: 978-3-540-74309-5
eBook Packages: Computer ScienceComputer Science (R0)