Abstract
Manual parallelization of programs is known to be difficult and error-prone, and there are currently few ways to measure the amount of potential parallelism in the original sequential code.
We present an extension of Embla, a Valgrind-based dependence profiler that links dynamic dependences back to source code. This new tool estimates potential task-level parallelism in a sequential program and helps programmers exploit it at the source level. Using the popular fork-join model, our tool provides a realistic estimate of potential speed-up for parallelization with frameworks like Cilk, TBB or OpenMP 3.0 . Estimates can be given for several different parallelization models, varying in programmer effort and capabilities required of the underlying implementation. Our tool also outputs source-level dependence information to aid the parallelization of programs with lots of inherent parallelism, as well as critical paths to suggest algorithmic rewrites of programs with little of it.
We validate our claims by running our tool over serial elisions of sample Cilk programs, finding additional inherent parallelism not exploited by the Cilk code, as well as over serial C benchmarks where the profiling results suggest parallelism-enhancing algorithmic rewrites.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co. Inc, Boston (1986)
Blume, B., Eigenmann, R., Faigin, K., Grout, J., Hoe, J., Padua, D., Petersen, P., Pottenger, B., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: The next generation in parallelizing compilers. In: Proc. Workshop on Languages and Compilers for Parallel Computing. Springer, Heidelberg (1994)
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37(1), 55–69 (1996)
Dagum, L., Menon, R.: OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Faxén, K.F., Popov, K., Jansson, S., Albertsson, L.: Embla—data dependence profiling for parallel programming. In: CISIS 2008: Proc. 2nd Int’l Conf. on Complex, Intelligent and Software Intensive Systems. IEEE, Los Alamitos (2008)
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: A free, commercially representative embedded benchmark suite. In: WWC 2001: Proc. Int’l Workshop on Workload Characterization. IEEE, Los Alamitos (2001)
Henning, J.: SPEC CPU2000: measuring CPU performance in the new millennium. Computer 33(7), 28–35 (2000)
Kennedy, K., Allen, J.R.: Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)
KleinOsowski, A., Lilja, D.J.: MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Comp. Arch. Letters 1 (2002)
Kreaseck, B., Tullsen, D., Calder, B.: Limits of task-based parallelism in irregular applications. In: Proc. Int’l Symp. on High Performance Computing (2000)
Larus, J.R.: Loop-level parallelism in numeric and symbolic programs. IEEE Trans. Parallel Distrib. Syst. 4(7), 812–826 (1993)
Lea, D.: A Java fork/join framework. In: Proc. ACM, Conf. on Java Grande (2000)
Leijen, D., Hall, J.: Parallel performance: Optimize managed code for multi-core machines. MSDN Magazine (October 2007)
Leiserson, C.E.: The Cilk++ concurrency platform. In: DAC 2009: Proc. 46th Annual Design Automation Conference. ACM, New York (2009)
Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., Torrellas, J.: POSH: a TLS compiler that exploits program structure. In: PPoPP 2006: Proc. 11th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (2006)
Mellor-Crummey, J.: On-the-fly detection of data races for programs with nested fork-join parallelism. In: Proc. of Supercomputing 1991, pp. 24–33. ACM Press, New York (1991)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007)
Nguyen, H., Taura, K., Yonezawa, A.: Parallelizing programs using access traces. In: LCR 2002: Proc. 6th Int’l Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (2002)
Oplinger, J.T., Heine, D.L., Lam, M.S.: In search of speculative thread-level parallelism. In: Proc. 1999 Int’l Conf. on Parallel Architectures and Compilation Techniques. IEEE, Los Alamitos (1999)
Perez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE Int’l Conf. on Cluster Computing (2008)
Reinders, J.: Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly Media, Inc., Sebastopol (2007)
Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: a dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems 15(4), 391–411 (1997)
Steffan, J.G., Colohan, C., Zhai, A., Mowry, T.C.: The STAMPede approach to thread-level speculation. ACM Trans. Comput. Syst. 23(3), 253–300 (2005)
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: PLDI 2009: Proc. 2009 ACM SIGPLAN Conf. on Programming Language Design and Implementation (2009)
Wall, D.W.: Limits of instruction-level parallelism. In: Proc. 4th Int’l Conf. on Architectural Support for Programming Languages and Operating System. ACM, New York (1991)
Warg, F., Stenström, P.: Limits on speculative module-level parallelism in imperative and object-oriented programs on CMP platforms. In: PACT 2001: Proc. 2001 Int’l Conf. on Parallel Architectures and Compilation Techniques. IEEE, Los Alamitos (2001)
Wu, P., Kejariwal, A., Caşcaval, C.: Compiler-driven dependence profiling to guide program parallelization. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 232–248. Springer, Heidelberg (2008)
Zhang, X., Navabi, A., Jagannathan, S.: Alchemist: A transparent dependence distance profiling infrastructure. In: CGO 2009: Proc. 2009 Int’l Symp. on Code Generation and Optimization. IEEE, Los Alamitos (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mak, J., Faxén, KF., Janson, S., Mycroft, A. (2010). Estimating and Exploiting Potential Parallelism by Source-Level Dependence Profiling. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15277-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-15277-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15276-4
Online ISBN: 978-3-642-15277-1
eBook Packages: Computer ScienceComputer Science (R0)