Abstract
Helper threaded prefetching based on chip multiprocessor has been shown to reduce memory latency and improve overall system performance, and has been explored in linked data structures accesses. In our earlier work, we had proposed an effective threaded prefetching technique that balances delinquent loads between main thread and helper thread to improve effectiveness of prefetching. In this paper, we analyze memory access characteristic of specific application to estimate effective prefetch distance range for our proposed threaded prefetching technique. The effect of hardware prefetchers on the estimation is also exploited. We discuss key design issues of our proposed method and present preliminary experimental results. Our experimental evaluations indicated that the bounded range of effective prefetch distance can be determined using our method, and the optimal prefetch distances can be determined based on the estimated effective prefetch distance range by few trial runs.
Similar content being viewed by others
References
Smith A.J.: Cache memories. Comput. Surv. 14(3), 473–530 (1982)
Chen T.F., Baer J.-L.: Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44(5), 609–623 (1995)
Mowry T.: Tolerating latency in multiprocessors through compiler inserted prefetching. ACM Trans. Comput. Syst. 16(1), 55–92 (1998)
Collins, J.D., Sair, S., Calder, B., Tullsen, D.M.: Pointer cache assisted prefetching. In: MICRO-35 (2002)
Cooksey, R., Jourdan, S., Grunwald, D.: A stateless, content-directed data prefetching mechanism. In: ASPLOS-X (2002)
Luk, C.K.: Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In: ISCA (2001)
Roth, A., Moshovos, A., Sohi, G.S.: Dependence based prefetching for linked data structures. In: ASPLOS-8 (1998)
Roth, A., Sohi, G.S.: Effective jump-pointer prefetching for linked data structures. In: ISCA-26 (1999)
Zilles, C., Sohi, G.: Execution-based prediction using speculative slices. In: ISCA-28 (2001)
Kim, D., Liao, S.S., Wan, P.H., del Cuvillo, J., Tian, X., Zou, X., Wang, H., Yeung, D., Girkar, M., Shen, J.P.: Physical experimentation with prefetching helper threads on Intel’s Hyper- threaded processors. In: Proceedings of the 2004 Annual Conference on Code Generation and Optimization (CGO-3), pp. 27–38, March (2004)
Tang J., Liu S., Gu Z., Liu C., Gaudiot J.L.: Prefetching in embedded mobile systems can be energy-efficient. IEEE Comput. Archit. 10(1), 8–11 (2011)
Liu, S., Eisenbeis, C., Gaudiot, J.L.: Speculative execution on GPU: an exploratory study. In: Proceedings of the 39th International Conference on Parallel Processing, September (2010)
Liu, S., Eisenbeis, C., Gaudiot, J.L.: A theoretical framework for value prediction in parallel systems. In: Proceedings of the 39th International Conference on Parallel Processing, September (2010)
Sohi G.S., Roth A.: Speculative multithreaded processors. IEEE Comput. 34(4), 66–73 (2001)
Liu S., Gaudiot J.L.: Potential impact of value prediction on communication in many-core architectures. IEEE Trans. Comput. 58(6), 759–769 (2010)
Byna S., Chen Y., Sun X.H.: A taxonomy of data prefetching mechanisms. J. Comput. Sci. Technol. 24(3), 405–417 (2009)
Zilles, C., Sohi, G.: Master/slave speculative parallelization. In: Proceedings of the 35th International Symposium on Micro-architecture, November (2002)
Liu S., Eisenbeis C., Gaudiot J.L.: Value prediction and speculative execution on GPU. Int. J. Parallel Program. 39(5), 533–552 (2010)
Gu, Z., Zheng, N., Zhang, Y.: The stable conditions of a task-pair with helper-thread in CMP. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 125–130 (2009)
Huang, Y., Gu, Z.: Performance analysis of prefetching thread for linked data structure in CMPs. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering, pp. 1–4 (2009)
Song, Y., Kalogeropulos, S., Tirumalai, P.: Design and implementation of a compiler framework for helper threading on multi-core processors. In: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05)
Lee J., Jung C., Lim D., Solihin Y.: Prefetching with helper threads for loosely coupled multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 20(9), 1309–1324 (2009)
Lu J., Chen H., Yew P.C., Hsu W.C.: Design and implementation of a lightweight dynamic optimization system. J. Instr. Lev. Parallel 6, 1–24 (2004)
Kim, D., Yeung, D.: Design and evaluation of compiler algorithms for pre-execution. In: ASPLOS, pp. 159–170 (2002)
Liao et al.: Post-pass binary adaptation for software-based speculative pre-computation. In: PLDI, pp. 117–128 (2002)
Lu et al.: Dynamic helper threaded prefetching on the sun UltraSPARC CMP processor. In: MICRO, pp. 93–104 (2005)
Collins, J.D., Tullsen, D.M., Hong Wang, Shen, J.P.: Dynamic speculative pre-computation. In: 34th International Symposium on Micro-architecture. pp. 306–317 (2001)
Huang, Y., Tang, J., Gu, Z.-M., Cai, M., Zhang, J., Zheng, N.: The performance optimization of threaded prefetching for linked data structures. Int. J. Parallel Program. 40(2), 141–163 (2012)
http://www.intel.com/cd/software/products/apac/zho/245112.htm
Srinath et al.: Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In: HPCA-13, pp. 63–74 (2007)
Doweck, J.: White Paper: Inside Intel Core Microarchitecture and Smart Memory Access. Intel Corporation (2006). http://download.intel.com/technology/architecture/sma.pdf
Zhimin, G., Yinxia, F., Ninghan, Z., Jianxun, Z., Min, C., Yan, H., Jie, T.: Improving performance of the irregular data intensive application with small computation workload for CMPs. In: Proceedings of the 40th International Conference on Parallel Processing Workshops (ICPPW), 2011
Lee, C., Mutlu, O., Narasiman, V., Patt, Y.: Prefetch-Aware DRAM Controllers. In: Proceedings of MICRO (2008)
Trishul, M. Chilimbi, Hirzel, M.: Dynamic hot data stream prefetching for general-purpose programs, In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 199–209 (2002)
Ganusov, I., Burtscher, M.: Future execution: a hardware prefetching technique for chip multiprocessors. In: International Conference on Parallel Architectures and Compilation Techniques, September (2005)
Zhang, W., Tullsen, D.M., Calder, B.: Accelerating and adapting pre-computation threads for efficient prefetching. In: Proceedings of the 13th Symposium on High-Performance Computer Architecture, pp. 85–95 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, Y., Gu, ZM., Tang, J. et al. Estimating Effective Prefetch Distance in Threaded Prefetching for Linked Data Structures. Int J Parallel Prog 40, 465–487 (2012). https://doi.org/10.1007/s10766-012-0194-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0194-9