Facing prefetching challenges in distributed shared memories for CMPs

Torrents, Martí; Martínez, Raul; Molina, Carlos

doi:10.1007/s11227-016-1675-1

Facing prefetching challenges in distributed shared memories for CMPs

Published: 17 February 2016

Volume 72, pages 1453–1476, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Martí Torrents¹,
Raul Martínez² &
Carlos Molina³

264 Accesses
2 Citations
Explore all metrics

Abstract

Prefetch engines working on distributed memory systems behave independently by analyzing the memory accesses that are addressed to the attached piece of cache. They potentially generate prefetching requests targeted at any other tile on the system that depends on the computed address. This distributed behavior involves several challenges that are not present when the cache is unified. In this paper, we identify, analyze, quantify, and hint on how to face the effects of these challenges, thus paving the way to future research on how to implement prefetching mechanisms at all levels of the cache hierarchy of this kind of system with shared distributed caches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Memory Centric Hardware Prefetching in Multi-core Processors

Prefetching Mechanism for Distributed Cache Architecture: Trends and Challenges

An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors

Article 26 July 2015

References

Byna S, Yong C, Xian-He S (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24:405–417
Article Google Scholar
Levinthal D (2009) Performance analysis guide for Intel Core i7 processor and Intel Xeon 5500 processors. White paper (2009)
Tilera (2014) Tile-gx processor family webpage. http://www.tilera.com/products/processors/TILE-Gx_Family/ (Online)
Byna S, Chen Y, Sun XH (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24(3):405–417
Article Google Scholar
Ebrahimi E, Mutlu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 316–326, New York, NY, USA. ACM
Flores A, Aragon JL, Acacio ME (2010) Heterogeneous interconnects for energy-efficient message management in CMPs. IEEE Trans Computers 59(1):16–28
Article MathSciNet Google Scholar
Lee CJ, Narasiman V, Mutlu O, Patt YN (2009) Improving memory bank-level parallelism in the presence of prefetching. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 327–336, New York, NY, USA. ACM
Lee J, Kim H, Vuduc R (2012) When prefetching works, when it doesnt, and why. ACM Trans Archit Code Optim 9(1):2
Article Google Scholar
Vanderwiel S, Lilja DJ (1996) A survey of data prefetching techniques. Technical report
Torrents M et al (2012) Comparative study of prefetching mechanisms. CEDI
Gorder PF (2007) Multicore processors for science and engineering. Comput Sci Eng 9(2):3–7
Low R (2005) Microprocessor trends: multicore, memory, and power developments. Embed Comput Design
Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: 14th international conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 99–109. IEEE
Ganusov I, Burtscher M (2005) Future execution: a hardware prefetching technique for chip multiprocessors. In: 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 350–360. IEEE
Sun XH, Byna S, Chen Y (2007) Server-based data push architecture for multi-processor environments. J Computer Sci Technol 22(5):641–652
Article Google Scholar
Fu JWC, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. SIGMICRO Newsl 23(1–2):102–110
Article Google Scholar
Tien-Fu C, Baer JL (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Computers 44:609–623
Article MATH Google Scholar
Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: IEEE Proceedings Software, p 96
Srinath S, Mutlu O, Kim Hyesoon, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: IEEE 13th international symposium on high performance computer architecture, 2007 (HPCA), pp 63–74
Zhuang X, Lee HHS (2003) A hardware-based cache pollution filtering mechanism for aggressive prefetches. In: 2003 International conference on parallel processing, 2003. Proceedings, pp 286–293. IEEE
Zhuang X, Lee HHS (2007) Reducing cache pollution via dynamic data prefetch filtering. IEEE Trans Comput 56(1):18–31
Article MathSciNet Google Scholar
Lee CJ, Mutlu O, Narasiman V, Patt YN (2008) Prefetch-aware DRAM controllers. In: Proceedings of the 41st annual IEEE/ACM international symposium on microarchitecture, pp 200–209. IEEE Computer Society
Lin WF, Reinhardt SK, Burger D (2001) Reducing DRAM latencies with an integrated memory hierarchy design. In: The seventh international symposium on high-performance computer architecture, 2001. HPCA, pp 301–312. IEEE
Flores A, Aragón JL, Acacio ME (2010) Energy-efficient hardware prefetching for CMPs using heterogeneous interconnects. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), 2010, pp 147–154. IEEE
Chidambaram Nachiappan N, Mishra AK, Kademir M, Sivasubramaniam A, Mutlu O, Das CR (2012) Application-aware prefetch prioritization in on-chip networks. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, pp 441–442. ACM
Lee J, Kim H, Shin M, Kim JH, Huh Jaehyuk (2014) Mutually aware prefetcher and on-chip network designs for multi-cores. IEEE Trans Computers 63(9):2316–2329
Article MathSciNet Google Scholar
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Arch News 39(2):1–7
Article Google Scholar
Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 72–81. ACM
Abadal S, Cabellos-Aparicio A, Lemme MC, Nemirovsky M et al (2013) Graphene-enabled wireless communication for massive multicore architectures. IEEE Commun Mag 51(11):137–143
Article Google Scholar

Download references

Acknowledgments

This work has been partially supported by the Spanish Ministry of Science and Innovation (MCI) and FEDER funds of the EU under the contracts TIN201018368 and TIN201347245C22R, and the Generalitat of Catalunya under Grants 2009SGR1250 and 2013FIB100127.

Author information

Authors and Affiliations

Computer Architecture Department, UPC-BarcelonaTech, Barcelona, Spain
Martí Torrents
Oracle Labs, Oracle Corporation, Vancouver, BC, Canada
Raul Martínez
Computer Engineering Department, Universitat Rovira i Virgili, Tarragona, Spain
Carlos Molina

Authors

Martí Torrents
View author publications
You can also search for this author in PubMed Google Scholar
Raul Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Molina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martí Torrents.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torrents, M., Martínez, R. & Molina, C. Facing prefetching challenges in distributed shared memories for CMPs. J Supercomput 72, 1453–1476 (2016). https://doi.org/10.1007/s11227-016-1675-1

Download citation

Published: 17 February 2016
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11227-016-1675-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facing prefetching challenges in distributed shared memories for CMPs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Memory Centric Hardware Prefetching in Multi-core Processors

Prefetching Mechanism for Distributed Cache Architecture: Trends and Challenges

An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Facing prefetching challenges in distributed shared memories for CMPs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Memory Centric Hardware Prefetching in Multi-core Processors

Prefetching Mechanism for Distributed Cache Architecture: Trends and Challenges

An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation