Abstract
As memory hierarchy becomes deeper and shared by more processors, locality increasingly determines system performance. As a rigorous and precise locality model, reuse distance has been used in program optimizations, performance prediction, memory disambiguation, and locality phase prediction. However, the high cost of measurement has been severely impeding its uses in scenarios requiring high efficiency, such as product compilers, performance debugging, run-time optimizations.
We recently discovered the statistical connection between time and reuse distance, which led to an efficient way to approximate reuse distance using time. However, not exposed are some algorithmic and implementation techniques that are vital for the efficiency and scalability of the approximation model. This paper presents these techniques. It describes an algorithm that approximates reuse distance on arbitrary scales; it explains a portable scheme that employs memory controller to accelerate the measure of time distance; it uncovers the algorithm and proof of a trace generator that can facilitate various locality studies.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, San Francisco (2001)
Beyls, K., D’Hollander, E.H.: Reuse distance-based cache hint selection. In: Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany (August 2002)
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: Proceedings of Supercomputing (2000)
Cascaval, G.C.: Compile-time Performance Prediction of Scientific Programs. Ph.D thesis, University of Illinois, Urbana-Champaign (2000)
Chilimbi, T.M.: Efficient representations and abstractions for quantifying and exploiting data reference locality. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah (June 2001)
Ding, C.: Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse. Ph.D thesis, Dept. of Computer Science, Rice University (January 2000)
Ding, C., Zhong, Y.: Predicting whole-program locality with reuse distance analysis. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA (June 2003)
Fang, C., Carr, S., Onder, S., Wang, Z.: Instruction based memory distance analysis and its application to optimization. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO (2005)
Huang, S.A., Shen, J.P.: The intrinsic bandwidth requirements of ordinary programs. In: Proceedings of the 7th International Conferences on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA (October 1996)
Li, Z., Gu, J., Lee, G.: An evaluation of the potential benefits of register allocation for array references. In: Workshop on Interaction between Compilers and Computer Architectures in conjunction with the HPCA-2, San Jose, California (February 1996)
Luk, C.-K., et al.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, Illinois (June 2005)
Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, NewYork City, NY (June 2004)
Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM System Journal 9(2), 78–117 (1970)
McKinley, K.S., Temam, O.: Quantifying loop nest locality using SPEC 1995 and the perfect benchmarks. ACM Transactions on Computer Systems 17(4), 288–336 (1999)
Shen, X., Shaw, J., Meeker, B.: Accurate approximation of locality from time distance histograms. Technical Report TR902, Computer Science Department, University of Rochester (2006)
Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality approximation using time. In: Proceedings of the ACM SIGPLAN Conference on Priciples of Programming Languages, 7 pages (2007) (short paper)
Thabit, K.O.: Cache Management by the Compiler. Ph.D thesis, Dept. of Computer Science, Rice University (1981)
Zhao, Q., Sim, J.E., Wong, W.-F., Rudolph, L.: DEP: detailed execution profile. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (2006)
Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers 56(3) (2007)
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array regrouping and structure splitting using whole-program reference affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shen, X., Shaw, J. (2008). Scalable Implementation of Efficient Locality Approximation. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-89740-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89739-2
Online ISBN: 978-3-540-89740-8
eBook Packages: Computer ScienceComputer Science (R0)