skip to main content
10.1145/2134243.2134253acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Limits of parallelism using dynamic dependency graphs

Published: 20 July 2009 Publication History

Abstract

The advance of multi-core processors has led to renewed interest in extracting parallelism from programs. It is sometimes useful to know how much parallelism is exploitable in the limit for general programs, to put into perspective the speedups of various parallelisation techniques. Wall's study [19] was one of the first to examine limits of parallelism in detail. We present an extension of Wall's analysis of limits of parallelism, by constructing Dynamic Dependency Graphs from execution traces of a number of benchmark programs, allowing us better visualisation of the types of dependencies which limit parallelism, as well as flexibility in transforming graphs when exploring possible optimisations. Some of the results of Wall and subsequent studies are confirmed, including the fact that average available parallelism is often above 100, but requires effective measures to resolve control dependencies, as well as memory renaming. We also study how certain compiler artifacts affect the limits of parallelism. In particular we show that the use of a spaghetti stack, as a technique to implicitly rename stack memory and break chains on true dependencies on the stack pointer, can lead to a doubling of potential parallelism.

References

[1]
A. W. Appel and Z. Shao. An empirical and analytic study of stack vs. heap cost for languages with closures. Technical Report CS-TR-450-94, 1994.
[2]
T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Nineteenth International Symposium on Computer Architecture, pages 342--351, Gold Coast, Australia, 1992. ACM and IEEE Computer Society.
[3]
F. Bellard. Qemu, a fast and portable dynamic translator. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 41--41, Berkeley, CA, USA, 2005. USENIX Association.
[4]
B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoe, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 10--1. Springer-Verlag, Berlin/Heidelberg, 1994.
[5]
H. J. Curnow and B. A. Wichmann. A synthetic benchmark. Computer Journal, 19(1):43--49, 1976.
[6]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.
[7]
S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, 1996.
[8]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC '01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, USA, 2001. IEEE Computer Society.
[9]
K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
[10]
M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Nineteenth International Symposium on Computer Architecture, pages 46--57, Gold Coast, Australia, 1992. ACM and IEEE Computer Society.
[11]
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. SIGOPS Oper. Syst. Rev., 30(5):138--147, 1996.
[12]
G. Ottoni and D. I. August. Global multi-threaded instruction scheduling. In Proceedings of the 40th annual IEEE/ACM International Symposium on Microarchitecture, pages 56--68, 2007.
[13]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, Washington, DC, USA, 2005. IEEE Computer Society.
[14]
M. A. Postiff, D. A. Greene, G. S. Tyson, and T. N. Mudge. The limits of instruction level parallelism in SPEC95 applications. Computer Architecture News, 217(1):31--34, 1999.
[15]
J. E. Smith. A study of branch prediction strategies. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pages 202--215, New York, NY, USA, 1998. ACM.
[16]
D. Stefanović and M. Martonosi. Limits and graph structure of available instruction-level parallelism (research note). Lecture Notes in Computer Science, 1900:1018--1022, 2001.
[17]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. SIGARCH Comput. Archit. News, 28(2):1--12, 2000.
[18]
H. Sutter. A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3):16--20, March 2005.
[19]
D. W. Wall. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), volume 26, pages 176--189, New York, NY, 1991. ACM Press.
[20]
R. P. Weicker. Dhrystone: a synthetic systems programming benchmark. Commun. ACM, 27(10):1013--1030, 1984.

Cited By

View all
  • (2019)Improving Digital Circuit Simulation with Batch-Parallel Logic Evaluation2019 22nd Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2019.00031(144-151)Online publication date: Aug-2019
  • (2016)Parallel Performance Problems on Shared-Memory Multicore SystemsIEEE Transactions on Software Engineering10.1109/TSE.2016.251934642:8(764-785)Online publication date: 1-Aug-2016
  • (2015)Value State Flow GraphACM Transactions on Reconfigurable Technology and Systems10.1145/28077029:2(1-22)Online publication date: 4-Dec-2015
  • Show More Cited By

Index Terms

  1. Limits of parallelism using dynamic dependency graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis
    July 2009
    52 pages
    ISBN:9781605586564
    DOI:10.1145/2134243
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic dependency graph
    2. limits of parallelism
    3. spaghetti stack

    Qualifiers

    • Research-article

    Conference

    ISSTA '09
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Improving Digital Circuit Simulation with Batch-Parallel Logic Evaluation2019 22nd Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2019.00031(144-151)Online publication date: Aug-2019
    • (2016)Parallel Performance Problems on Shared-Memory Multicore SystemsIEEE Transactions on Software Engineering10.1109/TSE.2016.251934642:8(764-785)Online publication date: 1-Aug-2016
    • (2015)Value State Flow GraphACM Transactions on Reconfigurable Technology and Systems10.1145/28077029:2(1-22)Online publication date: 4-Dec-2015
    • (2014)A New Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial HardwareProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.18(122-131)Online publication date: 19-May-2014
    • (2014)Detecting SIMDization Opportunities through Static/Dynamic Dependence AnalysisEuro-Par 2013: Parallel Processing Workshops10.1007/978-3-642-54420-0_62(637-646)Online publication date: 2014
    • (2013)Beyond reuse distance analysisACM Transactions on Architecture and Code Optimization10.1145/2541228.255530910:4(1-29)Online publication date: 1-Dec-2013
    • (2013)A Multiple-ISA Reconfigurable ArchitectureProceedings of the 2013 III Brazilian Symposium on Computing Systems Engineering10.1109/SBESC.2013.23(71-76)Online publication date: 4-Nov-2013
    • (2013)Spatial computing architecture using randomness of memory cell stability under voltage control2013 European Conference on Circuit Theory and Design (ECCTD)10.1109/ECCTD.2013.6662276(1-4)Online publication date: Sep-2013
    • (2012)Mixing static and dynamic strategies for high performance and low area reconfigurable systemsInternational Journal of High Performance Systems Architecture10.1504/IJHPSA.2012.0475674:1(13-24)Online publication date: 1-Jun-2012
    • (2012)Dynamic trace-based analysis of vectorization potential of applicationsACM SIGPLAN Notices10.1145/2345156.225410847:6(371-382)Online publication date: 11-Jun-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media