research-article

Limits of parallelism using dynamic dependency graphs

Authors:

Alan MycroftAuthors Info & Claims

WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis

Pages 42 - 48

https://doi.org/10.1145/2134243.2134253

Published: 20 July 2009 Publication History

Abstract

The advance of multi-core processors has led to renewed interest in extracting parallelism from programs. It is sometimes useful to know how much parallelism is exploitable in the limit for general programs, to put into perspective the speedups of various parallelisation techniques. Wall's study [19] was one of the first to examine limits of parallelism in detail. We present an extension of Wall's analysis of limits of parallelism, by constructing Dynamic Dependency Graphs from execution traces of a number of benchmark programs, allowing us better visualisation of the types of dependencies which limit parallelism, as well as flexibility in transforming graphs when exploring possible optimisations. Some of the results of Wall and subsequent studies are confirmed, including the fact that average available parallelism is often above 100, but requires effective measures to resolve control dependencies, as well as memory renaming. We also study how certain compiler artifacts affect the limits of parallelism. In particular we show that the use of a spaghetti stack, as a technique to implicitly rename stack memory and break chains on true dependencies on the stack pointer, can lead to a doubling of potential parallelism.

References

[1]

A. W. Appel and Z. Shao. An empirical and analytic study of stack vs. heap cost for languages with closures. Technical Report CS-TR-450-94, 1994.

[2]

T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Nineteenth International Symposium on Computer Architecture, pages 342--351, Gold Coast, Australia, 1992. ACM and IEEE Computer Society.

Digital Library

[3]

F. Bellard. Qemu, a fast and portable dynamic translator. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 41--41, Berkeley, CA, USA, 2005. USENIX Association.

Digital Library

[4]

B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoe, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 10--1. Springer-Verlag, Berlin/Heidelberg, 1994.

Digital Library

[5]

H. J. Curnow and B. A. Wichmann. A synthetic benchmark. Computer Journal, 19(1):43--49, 1976.

[6]

J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.

Digital Library

[7]

S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, 1996.

Digital Library

[8]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC '01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[9]

K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.

Digital Library

[10]

M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Nineteenth International Symposium on Computer Architecture, pages 46--57, Gold Coast, Australia, 1992. ACM and IEEE Computer Society.

Digital Library

[11]

M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. SIGOPS Oper. Syst. Rev., 30(5):138--147, 1996.

Digital Library

[12]

G. Ottoni and D. I. August. Global multi-threaded instruction scheduling. In Proceedings of the 40th annual IEEE/ACM International Symposium on Microarchitecture, pages 56--68, 2007.

Digital Library

[13]

G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[14]

M. A. Postiff, D. A. Greene, G. S. Tyson, and T. N. Mudge. The limits of instruction level parallelism in SPEC95 applications. Computer Architecture News, 217(1):31--34, 1999.

Digital Library

[15]

J. E. Smith. A study of branch prediction strategies. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pages 202--215, New York, NY, USA, 1998. ACM.

Digital Library

[16]

D. Stefanović and M. Martonosi. Limits and graph structure of available instruction-level parallelism (research note). Lecture Notes in Computer Science, 1900:1018--1022, 2001.

Digital Library

[17]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. SIGARCH Comput. Archit. News, 28(2):1--12, 2000.

Digital Library

[18]

H. Sutter. A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3):16--20, March 2005.

[19]

D. W. Wall. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), volume 26, pages 176--189, New York, NY, 1991. ACM Press.

Digital Library

[20]

R. P. Weicker. Dhrystone: a synthetic systems programming benchmark. Commun. ACM, 27(10):1013--1030, 1984.

Digital Library

Cited By

Patrou MLegault JGraham AKent K(2019)Improving Digital Circuit Simulation with Batch-Parallel Logic Evaluation2019 22nd Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2019.00031(144-151)Online publication date: Aug-2019
https://doi.org/10.1109/DSD.2019.00031
Atachiants RDoherty GGregg D(2016)Parallel Performance Problems on Shared-Memory Multicore SystemsIEEE Transactions on Software Engineering10.1109/TSE.2016.251934642:8(764-785)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1109/TSE.2016.2519346
Zaidi AGreaves D(2015)Value State Flow GraphACM Transactions on Reconfigurable Technology and Systems10.1145/28077029:2(1-22)Online publication date: 4-Dec-2015
https://dl.acm.org/doi/10.1145/2807702
Show More Cited By

Index Terms

Limits of parallelism using dynamic dependency graphs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Dynamic dependency analysis of ordinary programs
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture

A quantitative analysis of program execution is essential to the computer architecture design process. With the current trend in architecture of enhancing the performance of uniprocessors by exploiting fine-grain parallelism, first-order metrics of ...
Limits of control flow on parallelism
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)

This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and speculative execution. We evaluate these techniques by ...
Limits to parallelism in scientific computing

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis

July 2009

52 pages

ISBN:9781605586564

DOI:10.1145/2134243

Program Chairs:
Ben Liblit
University of Wisconsin-Madison
,
Andy Podgurski
Case Western Reserve University

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISSTA '09

Sponsor:

SIGSOFT

ISSTA '09: International Symposium on Software Testing and Analysis

July 20, 2009

Illinois, Chicago

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
240
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Patrou MLegault JGraham AKent K(2019)Improving Digital Circuit Simulation with Batch-Parallel Logic Evaluation2019 22nd Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2019.00031(144-151)Online publication date: Aug-2019
https://doi.org/10.1109/DSD.2019.00031
Atachiants RDoherty GGregg D(2016)Parallel Performance Problems on Shared-Memory Multicore SystemsIEEE Transactions on Software Engineering10.1109/TSE.2016.251934642:8(764-785)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1109/TSE.2016.2519346
Zaidi AGreaves D(2015)Value State Flow GraphACM Transactions on Reconfigurable Technology and Systems10.1145/28077029:2(1-22)Online publication date: 4-Dec-2015
https://dl.acm.org/doi/10.1145/2807702
Zaidi AGreaves D(2014)A New Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial HardwareProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.18(122-131)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1109/IPDPSW.2014.18
Aumage OBarthou DHaine CMeunier T(2014)Detecting SIMDization Opportunities through Static/Dynamic Dependence AnalysisEuro-Par 2013: Parallel Processing Workshops10.1007/978-3-642-54420-0_62(637-646)Online publication date: 2014
https://doi.org/10.1007/978-3-642-54420-0_62
Fauzia NElango VRavishankar MRamanujam JRastello FRountev APouchet LSadayappan P(2013)Beyond reuse distance analysisACM Transactions on Architecture and Code Optimization10.1145/2541228.255530910:4(1-29)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2541228.2555309
Capella FBrandalero MJunior JBeck ACarro L(2013)A Multiple-ISA Reconfigurable ArchitectureProceedings of the 2013 III Brazilian Symposium on Computing Systems Engineering10.1109/SBESC.2013.23(71-76)Online publication date: 4-Nov-2013
https://dl.acm.org/doi/10.1109/SBESC.2013.23
Yoshimura CYamaoka MAoki HMizuno H(2013)Spatial computing architecture using randomness of memory cell stability under voltage control2013 European Conference on Circuit Theory and Design (ECCTD)10.1109/ECCTD.2013.6662276(1-4)Online publication date: Sep-2013
https://doi.org/10.1109/ECCTD.2013.6662276
Rutzig MBeck A(2012)Mixing static and dynamic strategies for high performance and low area reconfigurable systemsInternational Journal of High Performance Systems Architecture10.1504/IJHPSA.2012.0475674:1(13-24)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1504/IJHPSA.2012.047567
Holewinski JRamamurthi RRavishankar MFauzia NPouchet LRountev ASadayappan P(2012)Dynamic trace-based analysis of vectorization potential of applicationsACM SIGPLAN Notices10.1145/2345156.225410847:6(371-382)Online publication date: 11-Jun-2012
https://dl.acm.org/doi/10.1145/2345156.2254108
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten