Skip to main content
Log in

Facing prefetching challenges in distributed shared memories for CMPs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Prefetch engines working on distributed memory systems behave independently by analyzing the memory accesses that are addressed to the attached piece of cache. They potentially generate prefetching requests targeted at any other tile on the system that depends on the computed address. This distributed behavior involves several challenges that are not present when the cache is unified. In this paper, we identify, analyze, quantify, and hint on how to face the effects of these challenges, thus paving the way to future research on how to implement prefetching mechanisms at all levels of the cache hierarchy of this kind of system with shared distributed caches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Byna S, Yong C, Xian-He S (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24:405–417

    Article  Google Scholar 

  2. Levinthal D (2009) Performance analysis guide for Intel Core i7 processor and Intel Xeon 5500 processors. White paper (2009)

  3. Tilera (2014) Tile-gx processor family webpage. http://www.tilera.com/products/processors/TILE-Gx_Family/ (Online)

  4. Byna S, Chen Y, Sun XH (2009) Taxonomy of data prefetching for multicore processors. J Computer Sci Technol 24(3):405–417

    Article  Google Scholar 

  5. Ebrahimi E, Mutlu O, Lee CJ, Patt YN (2009) Coordinated control of multiple prefetchers in multi-core systems. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 316–326, New York, NY, USA. ACM

  6. Flores A, Aragon JL, Acacio ME (2010) Heterogeneous interconnects for energy-efficient message management in CMPs. IEEE Trans Computers 59(1):16–28

    Article  MathSciNet  Google Scholar 

  7. Lee CJ, Narasiman V, Mutlu O, Patt YN (2009) Improving memory bank-level parallelism in the presence of prefetching. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, MICRO 42, pp 327–336, New York, NY, USA. ACM

  8. Lee J, Kim H, Vuduc R (2012) When prefetching works, when it doesnt, and why. ACM Trans Archit Code Optim 9(1):2

    Article  Google Scholar 

  9. Vanderwiel S, Lilja DJ (1996) A survey of data prefetching techniques. Technical report

  10. Torrents M et al (2012) Comparative study of prefetching mechanisms. CEDI

  11. Gorder PF (2007) Multicore processors for science and engineering. Comput Sci Eng 9(2):3–7

  12. Low R (2005) Microprocessor trends: multicore, memory, and power developments. Embed Comput Design

  13. Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: 14th international conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 99–109. IEEE

  14. Ganusov I, Burtscher M (2005) Future execution: a hardware prefetching technique for chip multiprocessors. In: 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005, pp 350–360. IEEE

  15. Sun XH, Byna S, Chen Y (2007) Server-based data push architecture for multi-processor environments. J Computer Sci Technol 22(5):641–652

    Article  Google Scholar 

  16. Fu JWC, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. SIGMICRO Newsl 23(1–2):102–110

    Article  Google Scholar 

  17. Tien-Fu C, Baer JL (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Computers 44:609–623

    Article  MATH  Google Scholar 

  18. Nesbit KJ, Smith JE (2004) Data cache prefetching using a global history buffer. In: IEEE Proceedings Software, p 96

  19. Srinath S, Mutlu O, Kim Hyesoon, Patt YN (2007) Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: IEEE 13th international symposium on high performance computer architecture, 2007 (HPCA), pp 63–74

  20. Zhuang X, Lee HHS (2003) A hardware-based cache pollution filtering mechanism for aggressive prefetches. In: 2003 International conference on parallel processing, 2003. Proceedings, pp 286–293. IEEE

  21. Zhuang X, Lee HHS (2007) Reducing cache pollution via dynamic data prefetch filtering. IEEE Trans Comput 56(1):18–31

    Article  MathSciNet  Google Scholar 

  22. Lee CJ, Mutlu O, Narasiman V, Patt YN (2008) Prefetch-aware DRAM controllers. In: Proceedings of the 41st annual IEEE/ACM international symposium on microarchitecture, pp 200–209. IEEE Computer Society

  23. Lin WF, Reinhardt SK, Burger D (2001) Reducing DRAM latencies with an integrated memory hierarchy design. In: The seventh international symposium on high-performance computer architecture, 2001. HPCA, pp 301–312. IEEE

  24. Flores A, Aragón JL, Acacio ME (2010) Energy-efficient hardware prefetching for CMPs using heterogeneous interconnects. In: 18th Euromicro international conference on parallel, distributed and network-based processing (PDP), 2010, pp 147–154. IEEE

  25. Chidambaram Nachiappan N, Mishra AK, Kademir M, Sivasubramaniam A, Mutlu O, Das CR (2012) Application-aware prefetch prioritization in on-chip networks. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, pp 441–442. ACM

  26. Lee J, Kim H, Shin M, Kim JH, Huh Jaehyuk (2014) Mutually aware prefetcher and on-chip network designs for multi-cores. IEEE Trans Computers 63(9):2316–2329

    Article  MathSciNet  Google Scholar 

  27. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Arch News 39(2):1–7

    Article  Google Scholar 

  28. Bienia C, Kumar S, Singh JP, Li K (2008) The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 72–81. ACM

  29. Abadal S, Cabellos-Aparicio A, Lemme MC, Nemirovsky M et al (2013) Graphene-enabled wireless communication for massive multicore architectures. IEEE Commun Mag 51(11):137–143

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the Spanish Ministry of Science and Innovation (MCI) and FEDER funds of the EU under the contracts TIN201018368 and TIN201347245C22R, and the Generalitat of Catalunya under Grants 2009SGR1250 and 2013FIB100127.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martí Torrents.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torrents, M., Martínez, R. & Molina, C. Facing prefetching challenges in distributed shared memories for CMPs. J Supercomput 72, 1453–1476 (2016). https://doi.org/10.1007/s11227-016-1675-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1675-1

Keywords

Navigation