Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs

Ros, Alberto; Acacio, Manuel E.; García, José M.

doi:10.1007/978-3-642-03644-6_2

Alberto Ros¹⁹,
Manuel E. Acacio¹⁹ &
José M. García¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

723 Accesses
5 Citations

Abstract

In many-core CMP architectures, the cache coherence protocol is a key component since it can add requirements of area and power consumption to the final design and, therefore, it could restrict severely its scalability. Area constraints limit the use of precise sharing codes to small- or medium-scale CMPs. Power constraints make impractical to use broadcast-based protocols for large-scale CMPs.

Token-CMP and DiCo-CMP are cache coherence protocols that have been recently proposed to avoid the indirection problem of traditional directory-based protocols. However, Token-CMP is based on broadcasting requests to all tiles, while DiCo-CMP adds a precise sharing code to each cache entry. In this work, we address the traffic-area trade-off for these indirection-aware protocols. In particular, we propose and evaluate several implementations of DiCo-CMP which differ in the amount of coherence information that they must store. Our evaluation results show that our proposals entail a good traffic-area trade-off by halving the traffic requirements compared to Token-CMP and considerably reducing the area storage required by DiCo-CMP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Le, H.Q., et al.: IBM POWER6 microarchitecture. IBM Journal of Research and Development 51(6), 639–662 (2007)
Article MathSciNet Google Scholar
Shah, M., et al.: UltraSPARC T2: A highly-threaded, power-efficient, SPARC SOC. In: IEEE Asian Solid-State Circuits Conference, November 2007, pp. 22–25 (2007)
Google Scholar
Azimi, M., et al.: Integration challenges and tradeoffs for tera-scale architectures. Intel. Technology Journal 11(3), 173–184 (2007)
Article Google Scholar
Kumar, R., Zyuban, V., Tullsen, D.M.: Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In: 32nd Int’l. Symp. on Computer Architecture (ISCA), June 2005, pp. 408–419 (2005)
Google Scholar
Bosschere, K.D., et al.: High-performance embedded architecture and compilation roadmap. Transactions on HiPEAC I, 5–29 (January 2007)
Google Scholar
Owner, J.M., Hummel, M.D., Meyer, D.R., Keller, J.B.: System and method of maintaining coherency in a distributed communication system. U.S. Patent 7069361 (June 2006)
Google Scholar
Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Francisco (1999)
Google Scholar
Agarwal, A., Simoni, R., Hennessy, J.L., Horowitz, M.: An evaluation of directory schemes for cache coherence. In: 15th Int’l. Symp. on Computer Architecture (ISCA), May 1988, pp. 280–289 (1988)
Google Scholar
Chaiken, D., Kubiatowicz, J., Agarwal, A.: LimitLESS directories: A scalable cache coherence scheme. In: 4th Int. Conf. on Architectural Support for Programming Language and Operating Systems (ASPLOS), April 1991, pp. 224–234 (1991)
Google Scholar
Gupta, A., Weber, W.D., Mowry, T.C.: Reducing memory traffic requirements for scalable directory-based cache coherence schemes. In: Int’l. Conference on Parallel Processing (ICPP), August 1990, pp. 312–321 (1990)
Google Scholar
Marty, M.R., Bingham, J., Hill, M.D., Hu, A., Martin, M.M., Wood, D.A.: Improving multiple-cmp systems using token coherence. In: 11th Int’l. Symp. on High-Performance Computer Architecture (HPCA), February 2005, pp. 328–339 (2005)
Google Scholar
Ros, A., Acacio, M.E., García, J.M.: DiCo-CMP: Efficient cache coherency in tiled cmp architectures. In: 22nd Int’l. Parallel and Distributed Processing Symp. (IPDPS) (April 2008)
Google Scholar
Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: 10th Int. Conf. on Architectural Support for Programming Language and Operating Systems (ASPLOS), October 2002, pp. 211–222 (2002)
Google Scholar
Barroso, L.A., et al.: Piranha: A scalable architecture based on single-chip multiprocessing. In: 27th Int’l. Symp. on Computer Architecture (ISCA), June 2000, pp. 12–14 (2000)
Google Scholar
Magnusson, P.S., et al.: Simics: A full system simulation platform. IEEE Computer 35(2), 50–58 (2002)
Article Google Scholar
Martin, M.M., et al.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Computer Architecture News 33(4), 92–99 (2005)
Article MathSciNet Google Scholar
Puente, V., Gregorio, J.A., Beivide, R.: SICOSYS: An integrated framework for studying interconnection network in multiprocessor systems. In: 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, January 2002, pp. 15–22 (2002)
Google Scholar
Thoziyoor, S., Muralimanohar, N., Ahn, J.H., Jouppi, N.P.: Cacti 5.1. Technical Report HPL-2008-20, HP Labs (April 2008)
Google Scholar
Horel, T., Lauterbach, G.: UltraSPARC-III: Designing third-generation 64-bit performance. IEEE Micro. 19(3), 73–85 (1999)
Article Google Scholar
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: 22nd Int’l. Symp. on Computer Architecture (ISCA), June 1995, pp. 24–36 (1995)
Google Scholar
Li, M.L., Sasanka, R., Adve, S.V., Chen, Y.K., Debes, E.: The ALPBench benchmark suite for complex multimedia applications. In: Int’l. Symp. on Workload Characterization, October 2005, pp. 34–45 (2005)
Google Scholar
Martin, M.M., et al.: Timestamp snooping: An approach for extending SMPs. In: 9th Int. Conf. on Architectural Support for Programming Language and Operating Systems (ASPLOS), November 2000, pp. 25–36 (2000)
Google Scholar
Martin, M.M., Sorin, D.J., Hill, M.D., Wood, D.A.: Bandwidth adaptive snooping. In: 8th Int’l. Symp. on High-Performance Computer Architecture (HPCA), January 2002, pp. 251–262 (2002)
Google Scholar
Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-aware coherence protocols for chip multiprocessors. In: 33rd Int’l. Symp. on Computer Architecture (ISCA), June 2006, pp. 339–351 (2006)
Google Scholar
Martin, M.M., Harper, P.J., Sorin, D.J., Hill, M.D., Wood, D.A.: Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In: 30th Int’l. Symp. on Computer Architecture (ISCA), June 2003, pp. 206–217 (2003)
Google Scholar
Cheng, L., Carter, J.B., Dai, D.: An adaptive cache coherence protocol optimized for producer-consumer sharing. In: 13th Int’l. Symp. on High-Performance Computer Architecture (HPCA), February 2007, pp. 328–339 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, 30100, Murcia, Spain
Alberto Ros, Manuel E. Acacio & José M. García

Authors

Alberto Ros
View author publications
You can also search for this author in PubMed Google Scholar
Manuel E. Acacio
View author publications
You can also search for this author in PubMed Google Scholar
José M. García
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, Department of Computer Science, 410073, Changsha, P.R. China
Yong Dou
Lausanne (EPFL), Ecole Polytechnique Fédérale de ,Dépt. Physique, 1015, LAUSANNE, Switzerland
Ralf Gruber
Technik Rapperswil, HSR - Hochschule für, Oberseestr. 10, 8640, RAPPERSWIL , SCHWEIZ
Josef M. Joller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ros, A., Acacio, M.E., García, J.M. (2009). Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-03644-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics