Abstract
Helper threaded prefetching based on Chip Multiprocessor is a well known approach to reducing memory latency and has been explored in linked data structures accesses. However, conventional helper threaded prefetching often suffers from useless prefetches and cache thrashing, which affect its effectiveness. In this paper, we first analyzed the shortcomings of conventional helper threaded prefetching for linked data structures. Then we proposed an improved helper threaded prefetching, Skip Helper Threaded Prefetching, for hotspots with two level data traversals. Our solution is to profile the applications and balance delinquent loads between main thread and prefetching thread based on the characteristic of operations in their hotspots. Evaluations show that the proposed solution improves average performance by 8.9% (-O2) and 8.5% (-O3) over the conventional helper threaded prefetching that greedily prefetches all delinquent loads. We also compare our proposal with the active threaded prefetching which synchronizes with main thread by semaphore, and find that our proposal provides better performance for the targeted applications.
Similar content being viewed by others
References
Smith A.J.: Cache memories. Comput. Surv. 14(3), 473–530 (1982)
Chen T.F., Baer J.-L.: Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44(5), 609–623 (1995)
Mowry T.: Tolerating latency in multiprocessors through compiler inserted prefetching. ACM Trans. Comput. Syst. 16(1), 55–92 (1998)
Collins, J.D., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: MICRO-35 (2002)
Cooksey, R., Jourdan, S., Grunwald, D.: A stateless, content-directed data prefetching mechanism. In: ASPLOS-X (2002)
Luk, C.K.: Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In: ISCA (2001)
Roth, A., Moshovos, A., Sohi, G.S.: Dependence based prefetching for linked data structures. In: ASPLOS-8 (1998)
Roth, A. Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: ISCA-26 (1999)
Zilles, C. Sohi, G.: Execution-based prediction using speculative slices. In: ISCA-28 (2001)
Kim, D., Liao, S.S., Wan, P.H., del Cuvillo, J., Tian, X., Zou, X., Wang, H., Yeung, D., Girkar, M., Shen, J.P.: Physical experimentation with prefetching helper threads on Intel’s hyper-threaded processors. In: Proceedings of the 2004 Annual Conference on Code Generation and Optimization (CGO-3), pp. 27–38, March 2004
Tang J., Liu S., Gu Z., Liu C., Gaudiot J.L.: Prefetching in embedded mobile systems can be energy-efficient. IEEE Comput. Archit. 10(1), 8–11 (2011)
Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Speculative execution on GPU: an exploratory study. In: Proceedings of the 39th International Conference on Parallel Processing, September 2010
Liu, S., Eisenbeis, C., Gaudiot, J.-L.: A theoretical framework for value prediction in parallel systems. In: Proceedings of the 39th International Conference on Parallel Processing, September 2010
Sohi G.S., Roth A.: Speculative multithreaded processors. IEEE Comput. 34(4), 66–73 (2001)
Liu S., Eisenbeis C., Gaudiot J.-L.: Value prediction and speculative execution on GPU. Int. J. Parallel Program. 39(5), 533–552 (2010)
Liu S., Gaudiot J.L.: Potential Impact of Value Prediction on Communication in Many-Core Architectures. IEEE Trans. Comput. 58(6), 759–769 (2010)
Byna S., Chen Y., Sun X.H.: A taxonomy of data prefetching mechanisms. J. Comput. Sci. Technol. 24(3), 405–417 (2009)
Zilles, C., Sohi, G.: Master/slave speculative parallelization. In: Proceedings of the 35th International Symposium on Micro-architecture, November 2002
Roth, A., Sohi, G.S.: A Quantitative framework for automated pre-execution thread selection. In: International Symposium on Micro-architecture, pp. 430–441 (2002)
Gu, Z., Zheng, N., Zhang, Y.: The stable conditions of a task-pair with helper-thread in CMP. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 125–130. Las Vegas, Nevada, USA (2009)
Yan Huang, Zhimin Gu.: Performance analysis of prefetching thread for linked data structure in CMPs. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering, pp. 1–4 (2009)
http://www.intel.com/cd/software/products/apac/zho/245112.htm
Song, Y., Kalogeropulos, S., Tirumalai, P. (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05)
Lee J., Jung C., Lim D., Solihin Y.: Prefetching with helper threads for loosely coupled multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 20(9), 1309–1324 (2009)
Lu J., Chen H., Yew P.C., Hsu W.C.: Design and implementation of a lightweight dynamic optimization system. J. Instr. Lev. Parallel. 6, 1–24 (2004)
Kim, D., Yeung, D.: Design and evaluation of compiler algorithms for pre-execution. In: ASPLOS, pp. 159–170 (2002)
Liao et al.: Post-pass binary adaptation for software-based speculative pre-computation. In: PLDI, pp. 117–128 (2002)
Lu et al.: Dynamic helper threaded prefetching on the sun UltraSPARC CMP processor. In: MICRO, pp. 93–104. September 2005
Collins, J.D., Tullsen, D.M., Wang, H., Shen, J.P.: Dynamic speculative pre-computation. In: Proceedings of the 34th International Symposium on Micro-architecture, pp. 306–317 (2001)
Ganusov, I., Burtscher, M.: Future execution: a hardware prefetching technique for chip multiprocessors. In: International Conference on Parallel Architectures and Compilation Techniques, September 2005
Zhang, W., Tullsen, D.M., Calder, B.: Accelerating and adapting pre-computation threads for efficient prefetching. In: Proceedings of the 13th Symposium on High-Performance Computer Architecture, pp. 85–95 (2007)
Chilimbi, T.M, Hirzel, M.: Dynamic hot data stream prefetching for general-purpose programs. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 199–209 (2002)
Lee, C., Mutlu, O., Narasiman, V., Patt, Y.: Prefetch-aware DRAM controllers. In Proceedings of MICRO (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, Y., Tang, J., Gu, Zm. et al. The Performance Optimization of Threaded Prefetching for Linked Data Structures. Int J Parallel Prog 40, 141–163 (2012). https://doi.org/10.1007/s10766-011-0172-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-011-0172-7