Skip to main content
Log in

The Performance Optimization of Threaded Prefetching for Linked Data Structures

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Helper threaded prefetching based on Chip Multiprocessor is a well known approach to reducing memory latency and has been explored in linked data structures accesses. However, conventional helper threaded prefetching often suffers from useless prefetches and cache thrashing, which affect its effectiveness. In this paper, we first analyzed the shortcomings of conventional helper threaded prefetching for linked data structures. Then we proposed an improved helper threaded prefetching, Skip Helper Threaded Prefetching, for hotspots with two level data traversals. Our solution is to profile the applications and balance delinquent loads between main thread and prefetching thread based on the characteristic of operations in their hotspots. Evaluations show that the proposed solution improves average performance by 8.9% (-O2) and 8.5% (-O3) over the conventional helper threaded prefetching that greedily prefetches all delinquent loads. We also compare our proposal with the active threaded prefetching which synchronizes with main thread by semaphore, and find that our proposal provides better performance for the targeted applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Smith A.J.: Cache memories. Comput. Surv. 14(3), 473–530 (1982)

    Article  Google Scholar 

  2. Chen T.F., Baer J.-L.: Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44(5), 609–623 (1995)

    Article  MATH  Google Scholar 

  3. Mowry T.: Tolerating latency in multiprocessors through compiler inserted prefetching. ACM Trans. Comput. Syst. 16(1), 55–92 (1998)

    Article  Google Scholar 

  4. Collins, J.D., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: MICRO-35 (2002)

  5. Cooksey, R., Jourdan, S., Grunwald, D.: A stateless, content-directed data prefetching mechanism. In: ASPLOS-X (2002)

  6. Luk, C.K.: Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In: ISCA (2001)

  7. Roth, A., Moshovos, A., Sohi, G.S.: Dependence based prefetching for linked data structures. In: ASPLOS-8 (1998)

  8. Roth, A. Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: ISCA-26 (1999)

  9. Zilles, C. Sohi, G.: Execution-based prediction using speculative slices. In: ISCA-28 (2001)

  10. Kim, D., Liao, S.S., Wan, P.H., del Cuvillo, J., Tian, X., Zou, X., Wang, H., Yeung, D., Girkar, M., Shen, J.P.: Physical experimentation with prefetching helper threads on Intel’s hyper-threaded processors. In: Proceedings of the 2004 Annual Conference on Code Generation and Optimization (CGO-3), pp. 27–38, March 2004

  11. Tang J., Liu S., Gu Z., Liu C., Gaudiot J.L.: Prefetching in embedded mobile systems can be energy-efficient. IEEE Comput. Archit. 10(1), 8–11 (2011)

    Article  Google Scholar 

  12. Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Speculative execution on GPU: an exploratory study. In: Proceedings of the 39th International Conference on Parallel Processing, September 2010

  13. Liu, S., Eisenbeis, C., Gaudiot, J.-L.: A theoretical framework for value prediction in parallel systems. In: Proceedings of the 39th International Conference on Parallel Processing, September 2010

  14. Sohi G.S., Roth A.: Speculative multithreaded processors. IEEE Comput. 34(4), 66–73 (2001)

    Article  Google Scholar 

  15. Liu S., Eisenbeis C., Gaudiot J.-L.: Value prediction and speculative execution on GPU. Int. J. Parallel Program. 39(5), 533–552 (2010)

    Article  Google Scholar 

  16. Liu S., Gaudiot J.L.: Potential Impact of Value Prediction on Communication in Many-Core Architectures. IEEE Trans. Comput. 58(6), 759–769 (2010)

    Article  MathSciNet  Google Scholar 

  17. Byna S., Chen Y., Sun X.H.: A taxonomy of data prefetching mechanisms. J. Comput. Sci. Technol. 24(3), 405–417 (2009)

    Article  Google Scholar 

  18. Zilles, C., Sohi, G.: Master/slave speculative parallelization. In: Proceedings of the 35th International Symposium on Micro-architecture, November 2002

  19. Roth, A., Sohi, G.S.: A Quantitative framework for automated pre-execution thread selection. In: International Symposium on Micro-architecture, pp. 430–441 (2002)

  20. Gu, Z., Zheng, N., Zhang, Y.: The stable conditions of a task-pair with helper-thread in CMP. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 125–130. Las Vegas, Nevada, USA (2009)

  21. Yan Huang, Zhimin Gu.: Performance analysis of prefetching thread for linked data structure in CMPs. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering, pp. 1–4 (2009)

  22. http://www.intel.com/cd/software/products/apac/zho/245112.htm

  23. Song, Y., Kalogeropulos, S., Tirumalai, P. (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05)

  24. Lee J., Jung C., Lim D., Solihin Y.: Prefetching with helper threads for loosely coupled multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 20(9), 1309–1324 (2009)

    Article  Google Scholar 

  25. Lu J., Chen H., Yew P.C., Hsu W.C.: Design and implementation of a lightweight dynamic optimization system. J. Instr. Lev. Parallel. 6, 1–24 (2004)

    Google Scholar 

  26. Kim, D., Yeung, D.: Design and evaluation of compiler algorithms for pre-execution. In: ASPLOS, pp. 159–170 (2002)

  27. Liao et al.: Post-pass binary adaptation for software-based speculative pre-computation. In: PLDI, pp. 117–128 (2002)

  28. Lu et al.: Dynamic helper threaded prefetching on the sun UltraSPARC CMP processor. In: MICRO, pp. 93–104. September 2005

  29. Collins, J.D., Tullsen, D.M., Wang, H., Shen, J.P.: Dynamic speculative pre-computation. In: Proceedings of the 34th International Symposium on Micro-architecture, pp. 306–317 (2001)

  30. Ganusov, I., Burtscher, M.: Future execution: a hardware prefetching technique for chip multiprocessors. In: International Conference on Parallel Architectures and Compilation Techniques, September 2005

  31. Zhang, W., Tullsen, D.M., Calder, B.: Accelerating and adapting pre-computation threads for efficient prefetching. In: Proceedings of the 13th Symposium on High-Performance Computer Architecture, pp. 85–95 (2007)

  32. Chilimbi, T.M, Hirzel, M.: Dynamic hot data stream prefetching for general-purpose programs. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 199–209 (2002)

  33. Lee, C., Mutlu, O., Narasiman, V., Patt, Y.: Prefetch-aware DRAM controllers. In Proceedings of MICRO (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Y., Tang, J., Gu, Zm. et al. The Performance Optimization of Threaded Prefetching for Linked Data Structures. Int J Parallel Prog 40, 141–163 (2012). https://doi.org/10.1007/s10766-011-0172-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-011-0172-7

Keywords

Navigation