Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory Operations

Li, Zhen; Beaumont, Michael; Jannesari, Ali; Wolf, Felix

doi:10.1007/978-3-319-27140-8_40

Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory Operations

Zhen Li¹⁷,
Michael Beaumont¹⁸,
Ali Jannesari¹⁷ &
…
Felix Wolf¹⁷

Conference paper
First Online: 16 December 2015

1475 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9531))

Abstract

Nowadays, more and more program analysis tools adopt profiling approaches in order to obtain data dependences because of their ability of tracking dynamically allocated memory, pointers, and array indices. However, dependence profiling suffers from high time overhead. To lower the overhead, former dependence profiling techniques either exploit features of the specific program analyses they are designed for, or let the profiling process run in parallel. Although they successfully lowered the time overhead of dependence profiling by a certain amount, none of them have tried to solve the fundamental problem that causes the high time overhead: the memory operations that are repeatedly executed in loops. In most of the time, these memory operations lead to exactly the same data dependences. However, a profiling method has to profile all these memory operations over and over again in order to not miss a single dependence that may occur just once. In this paper, we present a method that allow a dependence profiling technique to skip memory operations that are repeatedly executed in loops without missing any single data dependence. Our method works with all types of loops and does not require any prepossessing like source annotation of the input code. Experiment results show that our method can lower the time overhead of data-dependence profiling by up to 52 %.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amini, M., Goubier, O., Guelton, S., Mcmahon, J.O., Pasquier, F.X., Pan, G., Villalon, P.: Par4All: From convex array regions to heterogeneous computing. In: Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques, IMPACT 2012 (2012)
Google Scholar
Andersch, M., Juurlink, B., Chi, C.C.: A benchmark suite for evaluating parallel programming models. In: Proceedings 24th Workshop on Parallel Systems and Algorithms, PARS 2011, pp. 7–17 (2011)
Google Scholar
Atre, R., Jannesari, A., Wolf, F.: The basic building blocks of parallel tasks. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC 2015, pp. 3:1–3:11. ACM, New York (2015)
Google Scholar
August, D.I., Huang, J., Beard, S.R., Johnson, N.P., Jablin, T.B.: Automatically exploiting cross-invocation parallelism using runtime information. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, pp. 1–11. IEEE Computer Society (2013)
Google Scholar
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)
Article Google Scholar
Garcia, S., Jeon, D., Louie, C.M., Taylor, M.B.: Kremlin: rethinking and rebooting gprof for the multicore age. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, pp. 458–469. ACM (2011)
Google Scholar
Govindarajan, R., Anantpur, J.: Runtime dependence computation and execution of loops on heterogeneous systems. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, pp. 1–10. IEEE Computer Society (2013)
Google Scholar
Grosser, T., Groesslinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(04), 1250010 (2012)
Article MathSciNet Google Scholar
Huda, Z.U., Jannesari, A., Wolf, F.: Using template matching to infer parallel design patterns. ACM Trans. Archit. Code Optim. 11(4), 64:1–64:21 (2015)
Article Google Scholar
Ketterlin, A., Clauss, P.: Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In: Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 45, pp. 437–448. IEEE Computer Society (2012)
Google Scholar
Kim, M., Kim, H., Luk, C.K.: Prospector: discovering parallelism via dynamic data-dependence profiling. In: Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism, HOTPAR 2010 (2010)
Google Scholar
Kim, M., Kim, H., Luk, C.K.: SD3: A scalable approach to dynamic data-dependence profiling. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 43, pp. 535–546. IEEE Computer Society (2010)
Google Scholar
Lee, S.I., Johnson, T., Eigenmann, R.: Cetus - an extensible compiler infrastructure for source-to-source transformation. In: Rauchwerger, L. (ed.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2958, pp. 539–553. Springer, Heidelberg (2004)
Chapter Google Scholar
Li, Z., Atre, R., Ul-Huda, Z., Jannesari, A., Wolf, F.: DiscoPoP: A profiling tool to identify parallelization opportunities. In: Niethammer, C., Gracia, J., Knüpfer, A., Resch, M.M., Nagel, W.E. (eds.) Tools for High Performance Computing 2014, 1st edn, pp. 1–10. Springer International Publishing, Switzerland (2015)
Google Scholar
Li, Z., Jannesari, A., Wolf, F.: Discovery of potential parallelism in sequential programs. In: Proceedings of the 42nd International Conference on Parallel Processing, PSTI 2013, pp. 1004–1013, vol. 13. IEEE Computer Society (2013)
Google Scholar
Li, Z., Jannesari, A., Wolf, F.: An efficient data-dependence profiler for sequential and parallel programs. In: Proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium, IPDPS 2015, pp. 484–493 (2015)
Google Scholar
Ottoni, G., Rangan, R., Stoler, A., August, D.I.: Automatic thread extraction with decoupled software pipelining. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pp. 105–118. IEEE Computer Society (2005)
Google Scholar
Rul, S., Vandierendonck, H., De Bosschere, K.: Function level parallelism driven by data dependencies. SIGARCH Comput. Archit. News 35(1), 55–62 (2007)
Article Google Scholar
Rul, S., Vandierendonck, H., De Bosschere, K.: A profile-based tool for finding pipeline parallelism in sequential programs. Parallel Comput. 36(9), 531–551 (2010)
Article MATH Google Scholar
Serebryany, K., Potapenko, A., Iskhodzhanov, T., Vyukov, D.: Dynamic race detection with LLVM compiler. In: Khurshid, S., Sen, K. (eds.) RV 2011. LNCS, vol. 7186, pp. 110–114. Springer, Heidelberg (2012)
Chapter Google Scholar
Vanka, R., Tuck, J.: Efficient and accurate data dependence profiling using software signatures. In: Proceedings of the 10th International Symposium on Code Generation and Optimization, CGO 2012, pp. 186–195. ACM, New York (2012)
Google Scholar
Ye, J.M., Chen, T.: Exploring potential parallelism of sequential programs with superblock reordering. In: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, HPCC 2012, pp. 9–16. IEEE Computer Society (2012)
Google Scholar
Yu, H., Li, Z.: Multi-slicing: a compiler-supported parallel approach to data dependence profiling. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pp. 23–33. ACM (2012)
Google Scholar
Zhang, X., Navabi, A., Jagannathan, S.: Alchemist: a transparent dependence distance profiling infrastructure. In: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2009, pp. 47–58. IEEE Computer Society (2009)
Google Scholar
Zhao, B., Li, Z., Jannesari, A., Wolf, F., Wu, W.: Dependence-based code transformation for coarse-grained parallelism. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC 2015, pp. 1–10. ACM, New York (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, 64289, Darmstadt, Germany
Zhen Li, Ali Jannesari & Felix Wolf
RWTH Aachen University, 52062, Aachen, Germany
Michael Beaumont

Authors

Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Michael Beaumont
View author publications
You can also search for this author in PubMed Google Scholar
Ali Jannesari
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Li .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University, Changsha, China
Kenli Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Z., Beaumont, M., Jannesari, A., Wolf, F. (2015). Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory Operations. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-27140-8_40
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics