skip to main content
10.1145/1989493.1989506acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Parallelism and data movement characterization of contemporary application classes

Published: 04 June 2011 Publication History

Abstract

This paper presents a framework for characterizing the distribution of fine-grained parallelism, data movement, and communication-minimizing code partitions. Understanding the spectrum of parallelism available in applications, and how much data movement might result if such parallelism is exploited, is essential in the hardware design process because these properties will be the limiters to performance scaling of future computing systems. The framework is applied to characterizing 26 applications and kernels, classified according to their dominant components in the Berkeley dwarf/ computational motif classification.
The distributions of ILP and TLP over execution time are studied, and it is shown that, though mean ILP is high, available ILP is significantly smaller for most of the execution. The results from this framework are complemented by hardware performance counter data on two RISC platforms (IBM Power7 and Freescale P2020) and one CISC platform (IntelAtom D510), spanning a broad range of real machine characteristics. Employing a combination of these new techniques, and building upon previous proposals, it is demonstrated that the similarity in available ideal-case parallelism and data movement within and across the dwarf classes, is limited.

References

[1]
G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference, AFIPS '67 (Spring), pages 483--485. ACM, 1967.
[2]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: a view from berkeley. Technical report, University of California at Berkeley, December 2006.
[3]
T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. SIGARCH Comput. Archit. News, 20(2):342--351, 1992.
[4]
S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. Multiscalar processors. In ISCA '95: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414--425, Los Alamitos, CA, USA, 1995.
[5]
M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO 40: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 69--84, Washington, DC, USA, 2007.
[6]
J. A. Brown and D. M. Tullsen. The shared-thread multiprocessor. In ICS '08: Proceedings of the 22nd annual international conference on Supercomputing, pages 73--82. ACM, 2008.
[7]
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. SIGARCH Comput. Archit. News, 25(3):13--25, 1997.
[8]
D. Burger, J. R. Goodman, and A. Kagi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd annual international symposium on Computer architecture, ISCA '96, pages 78--89. ACM, 1996.
[9]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms, third edition, 2009.
[10]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
[11]
M. J. Flynn. Toward more efficient computer organizations. In AFIPS '72 (Spring): Proceedings of the May 16-18, 1972, spring joint computer conference, pages 1211--1217. ACM, 1972.
[12]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC '01: Proceedings of the IEEE International Workshop on Workload Characterization, pages 3--14, Washington, DC, USA, 2001. IEEE Computer Society.
[13]
Y. He, C. E. Leiserson, and W. M. Leiserson. The cilkview scalability analyzer. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, pages 145--156. ACM, 2010.
[14]
J. L. Hennessy and D. A. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., 2002.
[15]
M. Iyer, C. Ashok, J. Stone, N. Vachharajani, D. A. Connors, and M. Vachharajani. Finding parallelism for future epic machines. In Proceedings of the 4th Workshop on Explicitly Parallel Instruction Computing Techniques, 2005.
[16]
R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30:7--15, 2010.
[17]
K. Kennedy and U. Kremer. Automatic data layout for distributed-memory machines. ACM Trans. Program. Lang. Syst., 20(4):869--916, 1998.
[18]
M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. SIGARCH Comput. Archit. News, 20(2):46--57, 1992.
[19]
A. Nakajima, R. Kobayashi, H. Ando, and T. Shimada. Limits of thread-level parallelism in non-numerical programs. In IPSJ Transactions on Advanced Computing Systems, pages 12--20, 2006.
[20]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, Washington, DC, USA, 2005.
[21]
M. A. Postiff, D. A. Greene, G. S. Tyson, and T. N. Mudge. The limits of instruction level parallelism in SPEC95 applications. SIGARCH Comput. Archit. News, 27(1):31--34, 1999.
[22]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA '07: Proceedings of the 13th International Symposium on High-Performance Computer Architecture, pages 13--24, 2007.
[23]
E. Riseman and C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, 21:1405--1411, 1972.
[24]
K. Scott and J. Davidson. Exploring the limits of sub-word level parallelism. In PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 81, Washington, DC, USA, 2000. IEEE Computer Society.
[25]
R. Simar and R. Tatge. How TI adopted VLIW in digital signal processors. Solid-State Circuits Magazine, IEEE, 1(3):10--14, summer 2009.
[26]
K. B. Theobald, G. R. Gao, and L. J. Hendren. On the limits of program parallelism and its smoothability. In Proceedings of the 25th annual international symposium on Microarchitecture, MICRO 25, pages 10--19, Los Alamitos, CA, USA, 1992. IEEE Computer Society Press.
[27]
G. S. Tjaden and M. J. Flynn. Detection and parallel execution of independent instructions. IEEE Trans. Comput., 19(10):889--895, 1970.
[28]
J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel Distrib. Comput., 63(9):853--865, 2003.
[29]
D. Wall. Limits of instruction-level parallelism. In ASPLOS-IV: Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, pages 176--188. ACM, 1991.

Cited By

View all
  • (2023)At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC WorkloadsACM Transactions on Architecture and Code Optimization10.1145/362952020:4(1-26)Online publication date: 25-Oct-2023
  • (2020)A System for Generating Non-Uniform Random Variates using Graphene Field-Effect Transistors2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP49362.2020.00026(101-108)Online publication date: Jul-2020
  • (2018)AIWC: OpenCL-Based Architecture-Independent Workload Characterization2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)10.1109/LLVM-HPC.2018.8639381(81-91)Online publication date: Nov-2018
  • Show More Cited By

Index Terms

  1. Parallelism and data movement characterization of contemporary application classes

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
        June 2011
        404 pages
        ISBN:9781450307437
        DOI:10.1145/1989493
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        In-Cooperation

        • EATCS: European Association for Theoretical Computer Science

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 June 2011

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. basic-block-level parallelism
        2. berkeley computational motifs
        3. data movement
        4. instruction-level parallelism

        Qualifiers

        • Research-article

        Conference

        SPAA '11

        Acceptance Rates

        Overall Acceptance Rate 447 of 1,461 submissions, 31%

        Upcoming Conference

        SPAA '25
        37th ACM Symposium on Parallelism in Algorithms and Architectures
        July 28 - August 1, 2025
        Portland , OR , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC WorkloadsACM Transactions on Architecture and Code Optimization10.1145/362952020:4(1-26)Online publication date: 25-Oct-2023
        • (2020)A System for Generating Non-Uniform Random Variates using Graphene Field-Effect Transistors2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP49362.2020.00026(101-108)Online publication date: Jul-2020
        • (2018)AIWC: OpenCL-Based Architecture-Independent Workload Characterization2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)10.1109/LLVM-HPC.2018.8639381(81-91)Online publication date: Nov-2018
        • (2018)A Review of Near-Memory Computing Architectures: Opportunities and Challenges2018 21st Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2018.00106(608-617)Online publication date: Aug-2018
        • (2018)IntroductionThread and Data Mapping for Multicore Systems10.1007/978-3-319-91074-1_1(1-8)Online publication date: 5-Jul-2018
        • (2017)The End of Moore's LawComputing in Science and Engineering10.1109/MCSE.2017.2919:2(41-50)Online publication date: 1-Mar-2017
        • (2014)MIPT: Rapid exploration and evaluation for migrating sequential algorithms to multiprocessing systems with multi-port memories2014 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2014.6903767(776-783)Online publication date: Jul-2014
        • (2012)DOMEProceedings of the 2012 workshop on High-Performance Computing for Astronomy Date10.1145/2286976.2286978(1-4)Online publication date: 18-Jun-2012
        • (2011)Quantitative analysis of parallelism and data movement properties across the Berkeley computational motifsProceedings of the 8th ACM International Conference on Computing Frontiers10.1145/2016604.2016625(1-2)Online publication date: 3-May-2011

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media