skip to main content
10.1145/2338965.2336756acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
Article

Multi-slicing: a compiler-supported parallel approach to data dependence profiling

Published: 15 July 2012 Publication History

Abstract

Retrofitting existing software for the increasingly dominant multicore microprocessors has a strong appeal from the economic point of view. One of the key issues in such an effort is to fully understand the data dependences in the existing software. Unfortunately, current compilers have quite limited ability to analyze data dependences. Therefore, execution-driven data dependence profiling has gained significant interest because it can resolve memory access ambiguity exactly during program execution, which allows data dependences to be analyzed exactly. Although such dependence profiling is valid for specific inputs only, the insight it provides can be highly valuable to software engineers in their parallelization effort. On the other hand, dependence profiling itself can take tremendous memory and machine time. In this paper, we propose a novel dependence profiling method which, with the support of several new compiler and runtime techniques, partitions the profiling task into many independent slices, each requiring significantly less memory. Different slices can be profiled in parallel, producing subgraphs which are eventually combined automatically into the complete data dependence graph by the compiler. The slices can be extracted with different degrees of granularity. Experiments show that, for several well-known benchmark programs, our parallel scheme shortens the profiling time by a few orders of magnitude.

References

[1]
Agrawal, H., and Horgan, J. R. Dynamic program slicing. In PLDI (1990), pp. 246-256.
[2]
Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., and August, D. Revisiting the sequential programming model for multi-core. In MICRO (2007), pp. 69-84.
[3]
Bruening, D., Garnett, T., and Amarasinghe, S. An infrastructure for adaptive dynamic optimization. In CGO (2003), pp. 265-275.
[4]
Burtscher, M. VPC3: a fast and effective trace-compression algorithm. In SIGMETRICS (2004), pp. 167-176.
[5]
Chen, T., Lin, J., Dai, X., Hsu, W., and Yew, P. Data dependence profiling for speculative optimizations. In CC (2004), Springer, pp. 57-72.
[6]
Crosthwaite, P., Williams, J., and Sutton, P. Profile driven data-dependency analysis for improved high level language hardware synthesis. In International Conference on Field-Programmable Technology (2009), pp. 207-214.
[7]
Dutertre, B., and De Moura, L. The yices smt solver. Tool paper at http://yices.csl.sri. com/tool-paper.pdf 2 (2006).
[8]
Faxén, K.-F., Popov, K., Jansson, S., and Albertsson, L. Embla - data dependence profiling for parallel programming. In CISIS (2008), pp. 780-785.
[9]
Garcia, S., Jeon, D., Louie, C. M., and Taylor, M. B. Kremlin: rethinking and rebooting gprof for the multicore age. In PLDI (2011), pp. 458-469.
[10]
Havlak, P., and Kennedy, K. An implementation of interprocedural bounded regular section analysis. IEEE Trans. Parallel Distrib. Syst. 2, 3 (July 1991), pp. 350-360.
[11]
Jeon, D., Garcia, S., Louie, C., Kota Venkata, S., and Taylor, M. B. Kremlin: like gprof, but for parallelization. In PPoPP (2011), pp. 293-294.
[12]
Kim, M., Kim, H., and Luk, C.-K. SD3: A scalable approach to dynamic data-dependence profiling. In MICRO (2010), pp. 535-546.
[13]
Larus, J. R. Whole program paths. In PLDI (1999), pp. 259-269.
[14]
Liang, D., Pennings, M., and Harrold, M. J. Evaluating the precision of static reference analysis using profiling. In ISSTA (2002), pp. 22-32.
[15]
Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., and Torrellas, J. POSH: a TLS compiler that exploits program structure. In PPoPP (2006), pp. 158-167.
[16]
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI (2005), pp. 190-200.
[17]
Mak, J., Faxén, K.-F., Janson, S., and Mycroft, A. Estimating and exploiting potential parallelism by source-level dependence profiling. In EuroPar (2010), pp. 26-37.
[18]
Mock, M., Das, M., Chambers, C., and Eggers, S. J. Dynamic points-to sets: a comparison with static analyses and potential applications in program understanding and optimization. In PASTE (2001), pp. 66-72.
[19]
Moseley, T., Shye, A., Reddi, V. J., Grunwald, D., and Peri, R. Shadow profiling: Hiding instrumentation costs with parallelism. In CGO (2007), pp. 198-208.
[20]
Nethercote, N., and Seward, J. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI (2007), pp. 89-100.
[21]
Pearce, D. J., Kelly, P. H., and Hankin, C. Efficient field-sensitive pointer analysis of C. ACM Trans. Program. Lang. Syst. 30, 1 (November 2007), Article No. 4.
[22]
Rul, S., Vandierendonck, H., and De Bosschere, K. A profile-based tool for finding pipeline parallelism in sequential programs. Parallel Comput. 36, 9 (September 2010), pp. 531-551.
[23]
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., and Dongarra, J. MPI-The Complete Reference, Volume 1: The MPI Core, 2nd. (revised) ed. MIT Press, Cambridge, MA, USA, 1998.
[24]
Steensgaard, B. Points-to analysis by type inference of programs with structures and unions. In CC (1996), pp. 136-150.
[25]
Steensgaard, B. Points-to analysis in almost linear time. In POPL (1996), pp. 32-41.
[26]
Tallam, S., and Gupta, R. Unified control flow and data dependence traces. ACM Trans. Archit. Code Optim. 4, 3 (September 2007), Article No. 19.
[27]
Tallam, S., Gupta, R., and Zhang, X. Extended whole program paths. In PACT (2005), pp. 17-26.
[28]
Thies, W., Chandrasekhar, V., and Amarasinghe, S. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO (2007), pp. 356-369.
[29]
Tournavitis, G., Wang, Z., Franke, B., and O'Boyle, M. F. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In PLDI (2009), pp. 177-187.
[30]
Vandierendonck, H., Rul, S., and De Bosschere, K. The Paralax infrastructure: automatic parallelization with a helping hand. In PACT (2010), pp. 389-400.
[31]
Wallace, S., and Hazelwood, K. Superpin: Parallelizing dynamic instrumentation for real-time performance. In CGO (2007), pp. 209-220.
[32]
Wu, P., Kejariwal, A., and Cascaval, C. Compiler-Driven Dependence Profiling to Guide Program Parallelization. In LCPC (2008), pp. 232-248.
[33]
Zhang, X., and Gupta, R. Whole execution traces and their applications. ACM Trans. Archit. Code Optim. 2, 3 (September 2005), pp. 301-334.
[34]
Zhang, X., Gupta, R., and Zhang, Y. Precise dynamic slicing algorithms. In ICSE (2003), pp. 319-329.
[35]
Zhang, X., Navabi, A., and Jagannathan, S. Alchemist: A transparent dependence distance profiling infrastructure. In CGO (2009), pp. 47-58.
[36]
Zhang, X., Tallam, S., and Gupta, R. Dynamic slicing long running programs through execution fast forwarding. In FSE (2006), pp. 81-91.
[37]
Zhao, Q., Cutcutache, I., and Wong, W.-F. Pipa: pipelined profiling and analysis on multi-core systems. In CGO (2008), pp. 185-194.

Cited By

View all
  • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
  • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
  • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • Show More Cited By

Index Terms

  1. Multi-slicing: a compiler-supported parallel approach to data dependence profiling

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISSTA 2012: Proceedings of the 2012 International Symposium on Software Testing and Analysis
      July 2012
      341 pages
      ISBN:9781450314541
      DOI:10.1145/2338965
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 July 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ISSTA '12
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 58 of 213 submissions, 27%

      Upcoming Conference

      ISSTA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
      • (2022)Accelerating Data Dependence Profiling Through Abstract Interpretation of Loop InstructionsIEEE Access10.1109/ACCESS.2022.316072910(31626-31640)Online publication date: 2022
      • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
      • (2020)Approximate Data Dependence Profiling Based on Abstract Interval and Congruent DomainsArchitecture of Computing Systems – ARCS 202010.1007/978-3-030-52794-5_1(3-16)Online publication date: 9-Jul-2020
      • (2016)Approximate Data Dependence Graph Generation Using Adaptive Sampling2016 45th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2016.54(329-337)Online publication date: Aug-2016
      • (2015)An Efficient Data-Dependence Profiler for Sequential and Parallel ProgramsProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2015.41(484-493)Online publication date: 25-May-2015
      • (2015)Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory OperationsAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-27140-8_40(583-596)Online publication date: 16-Dec-2015
      • (2014)Variability of data dependences and control flow2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2014.6844482(180-189)Online publication date: Mar-2014
      • (2013)General data structure expansion for multi-threadingACM SIGPLAN Notices10.1145/2499370.246218248:6(243-252)Online publication date: 16-Jun-2013
      • (2013)General data structure expansion for multi-threadingProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462182(243-252)Online publication date: 16-Jun-2013
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media