skip to main content
10.1145/1375634.1375648acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

Sampling-based program locality approximation

Published: 07 June 2008 Publication History

Abstract

Reuse signature, or reuse distance pattern, is an accurate model for program memory accessing behaviors. It has been studied and shown to be effective in program analysis and optimizations by many recent works. However, the high overhead associated with reuse distance measurement restricts the scope of its application. This paper explores applying sampling in reuse signature collection to reduce the time overhead. We compare different sampling strategies and show that an enhanced systematic sampling with a uniform coverage of all distance ranges can be used to extrapolate the reuse distance distribution. Based on that analysis, we present a novel sampling method with a measurement accuracy of more than 99%. Our average speedup of reuse signature collection is 7.5 while the best improvement observed is 34. This is the first attempt to utilize sampling in measuring reuse signatures. Experiments with varied programs and instrumentation tools show that sampling has great potential in promoting the practical uses of reuse signatures and enabling more optimization opportunities.

References

[1]
Intel vtune performance analyzers. http://www.intel.com /software/products /vtune.
[2]
ALBONESI, D. H. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd annual ACM/IEEE international Symposium on Microarchitecture (1999).
[3]
ALMASI, G., CASCAVAL, C., AND PADUA, D. Calculating stack distances efficiently. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance (Berlin, Germany, 2002).
[4]
ANDERSON, J. M., BERC, L. M., DEAN, J., GHEMAWAT, S., HENZINGER, M. R., LEUNG, S. A., SITES, R. L.,VANDEVOORDE, M. T., WALDSPURGER, C. A., AND WEIHL, W. E. Continuous profiling: Where have all the cycles gone? In SOSP ?97: Proceedings of the sixteenth ACM symposium on Operating systems principles (New York, NY, USA, 1997), ACM Press, pp. 1--14.
[5]
ARNOLD, M., AND GROVE, D. Collecting and exploiting highaccuracy call graph profiles in virtual machines. In Proceedings of the International Symposium on Code Generation and Optimization (CGO?05) (2005).
[6]
ARNOLD, M., AND RYDER, B. G. A framework for reducing the cost of instrumented code. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (Snowbird, Utah, June 2001).
[7]
ARNOLD, M., AND SWEENEY, P. F. Approximating the calling context tree via sampling. Tech. rep., IBM Research, 2000.
[8]
BALASUBRAMONIAN, R., ALBONESI, D., BUYUKTOS, A., AND DWARKADAS, S. Dynamic memory hierarchy performance and energy optimization. In Proceedings of the 27th Annual International Symposium on Computer Architecture (June 2000).
[9]
BENNETT, B., AND KRUSKAL, V. J. LRU stack processing. IBM Journal of Research and Development (July 1975), 353--357.
[10]
BEYLS, K., AND D?HOLLANDER, E. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems (August 2001).
[11]
BEYLS, K., AND D?HOLLANDER, E. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference (Paderborn, Germany, August 2002).
[12]
BURROWS, M., ERLINGSON, U., LEUNG, S.-T., VANDEVOORDE, M. T., WALDSPURGER, C. A.,WALKER, K., AND WEIHL, W. E. Efficient and flexible value sampling. In Architectural Support for Programming Languages and Operating Systems (2000).
[13]
CHILIMBI, T. M., AND HIRZEL, M. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (Berlin, Germany, June 2002).
[14]
DING, C. Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse. PhD thesis, Dept. of Computer Science, Rice University, January 2000.
[15]
DING, C., AND ZHONG, Y. Compiler-directed run-time monitoring of program data access. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance (Berlin, Germany, 2002).
[16]
DING, C., AND ZHONG, Y. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACMSIGPLAN Conference on Programming Language Design and Implementation (San Diego, CA, June 2003).
[17]
FANG, C., CARR, S., ONDER, S., AND WANG, Z. Reuse-distancebased miss-rate prediction on a per instruction basis. In Proceedings of the first ACM SIGPLANWorkshop on Memory System Performance (Washington DC, June 2004).
[18]
FANG, C., CARR, S.,ONDER, S., AND WANG, Z. Instruction based memory distance analysis and its application to optimization. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (St. Louis, MO, 2005).
[19]
HIRZEL, M., AND CHILIMBI, T. M. Bursty tracing: A framework for low-overhead temporal profiling. In Proceedings of ACM Workshop on Feedback-Directed and Dynamic Optimization (Dallas, Texas, December 2001).
[20]
HUANG, S. A., AND SHEN, J. P. The intrinsic bandwidth requirements of ordinary programs. In Proceedings of the 7th International Conferences on Architectural Support for Programming Languages and Operating Systems (Cambridge, MA, October 1996).
[21]
ITZKOWITZ, M., WYLIE, B. J. N., AOKI, C., AND KOSCHE, N. Memory profiling using hardware counters. In Proceedings of the ACM/IEEE conference on Supercomputing (Washington, DC, 2003).
[22]
KIM, Y. H., HILL, M. D., AND WOOD, D. A. Implementing stack simulation for highly-associative memories. In Proc. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (May 1991), pp. 212--213.
[23]
LARUS, J. R. Abstract execution: A technique for efficiently tracing programs. Software - Practice and Experience (SPE) 20, 12 (1990).
[24]
LI, Z., GU, J., AND LEE, G. An evaluation of the potential benefits of register allocation for array references. In Workshop on Interaction between Compilers and Computer Architectures in conjunction with the HPCA-2 (San Jose, CA, February 1996).
[25]
MARIN, G., AND MELLOR-CRUMMEY, J. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems (New York City, NY, June 2004).
[26]
MATTSON, R. L., GECSEI, J., SLUTZ, D., AND TRAIGER, I. Evaluation techniques for storage hierarchies. IBM System Journal 9, 2 (1970), 78--117.
[27]
NETHERCOTE, N., AND SEWARD, J. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (New York, NY, USA, 2007).
[28]
OLKEN, F. Efficient methods for calculating the success function of fixed space replacement policies. Tech. Rep. LBL-12370, Lawrence Berkeley Laboratory, 1981.
[29]
SHEN, X., SHAW, J., MEEKER, B., AND DING, C. Locality approximation using time. In Proceedings of 2007 Symposium on Principles of Programming Languages (Nice, France, January 2007).
[30]
SHEN, X., ZHONG, Y., AND DING, C. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI) (Boston, MA, 2004).
[31]
SUGUMAR, R. A., AND ABRAHAM, S. G. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Tech. rep., University of Michigan, August 1993.
[32]
WHALEY, J. A portable sampling-based profiler for java virtual machines. In ACM 2000 Java Grande Conference (2000), pp. 78--87.
[33]
ZAGHA, M., LARSON, B., TURNER, S., AND ITZKOWITZ, M. Performance analysis using the mips r10000 performance counters. In Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (November 1996).
[34]
ZHAO, Q., RABBAH, R., AMARASINGHE, S., RUDOLPH, L., AND WONG, W.-F. Ubiquitous memory introspection. In Proceedings of International Symposium on Code Generation and Optimization (CGO?07) (2007).
[35]
ZHONG, Y., DROPSHO, S. G., SHEN, X., STUDER, A., AND DING, C. Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers 56, 3 (March 2007), 328--343.
[36]
ZHONG, Y., ORLOVICH, M., SHEN, X., AND DING, C. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (Washington, DC, June 2004).

Cited By

View all
  • (2023)DJXPerf: Identifying Memory Inefficiencies via Object-Centric Profiling for JavaProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580010(81-94)Online publication date: 17-Feb-2023
  • (2023)DroidPerf: Profiling Memory Objects on Android DevicesProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3592503(1-15)Online publication date: 2-Oct-2023
  • (2022)Predicting reuse interval for optimized web cachingProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571999(1-15)Online publication date: 13-Nov-2022
  • Show More Cited By

Index Terms

  1. Sampling-based program locality approximation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISMM '08: Proceedings of the 7th international symposium on Memory management
    June 2008
    170 pages
    ISBN:9781605581347
    DOI:10.1145/1375634
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. profiling
    2. program locality
    3. reuse distance
    4. reuse signature
    5. sampling

    Qualifiers

    • Research-article

    Conference

    ISMM '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 72 of 156 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DJXPerf: Identifying Memory Inefficiencies via Object-Centric Profiling for JavaProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580010(81-94)Online publication date: 17-Feb-2023
    • (2023)DroidPerf: Profiling Memory Objects on Android DevicesProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3592503(1-15)Online publication date: 2-Oct-2023
    • (2022)Predicting reuse interval for optimized web cachingProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571999(1-15)Online publication date: 13-Nov-2022
    • (2022)PoirotProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560710(937-950)Online publication date: 7-Nov-2022
    • (2022)MemSweeper: virtualizing cluster memory management for high memory utilization and isolationProceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management10.1145/3520263.3534651(15-28)Online publication date: 14-Jun-2022
    • (2022)Predicting Reuse Interval for Optimized Web Caching: An LSTM-Based Machine Learning ApproachSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00091(1-15)Online publication date: Nov-2022
    • (2022)STAFF: A Model for Structure Layout Optimization2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846314(115-122)Online publication date: 22-Apr-2022
    • (2022)Efficient Stack Distance Approximation Based on Workload CharacteristicsIEEE Access10.1109/ACCESS.2022.318032710(59792-59805)Online publication date: 2022
    • (2021)Measuring Cache Complexity Using Data Movement Distance (DMD)2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00070(417-419)Online publication date: Jun-2021
    • (2020)Efficient miss ratio curve computation for heterogeneous content popularityProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489197(741-751)Online publication date: 15-Jul-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media