skip to main content
10.1145/1375527.1375579acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Analyzing memory access intensity in parallel programs on multicore

Published: 07 June 2008 Publication History

Abstract

As the shared memory bus becomes a major performance bottleneck for many numerical applications on multicore chips, understanding how the increased parallelism on chip strains the memory bandwidth and hence affects the efficiency of parallel codes becomes a critical issue. This paper introduces the notion of memory access intensity to facilitate quantitative analysis of program's memory behavior on multicores which employ state-of-the-art prefetching hardware. Three numerical solvers for large scale sparse linear systems are used to demonstrate the estimation of memory access intensity and its effect on program performance.

References

[1]
Monica S. Lam, et al. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM, April 1991
[2]
Michael E. Wolf and Monica S. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 30--44, June 1991.
[3]
Zhiyuan Li and Yonghong Song. Automatic Tiling of Iterative Stencil Loops. ACM Trans. on Programming Languages and Systems 26(6), pp. 975--1028, November, 2004.
[4]
Alan Jay Smith, Cache Memories. Computing Surveys, 14(3):473-530, September, 1982
[5]
Dean M. Tullsen and Susan J.Eggers. Limitations of Cache prefetching on a bus-based Multiprocessor. In Proceedings of the 20th annual international symposium on Computer architecture, 1993.
[6]
Santhosh Srinath and et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In proceedings of the 13th International Symposium on High-Performance Computer Architecture, 2006.
[7]
S. Carr and K. Kennedy, "Improving the Ratio of Memory Operations to Floating-Point Operations in Loops," ACM Transactions on Programming Languages and Systems, vol. 16, pp. 1768--1810, November 1994.M. E.
[8]
K. Asanovic and et al. "The Landscape of Parallel Computing Research: A View from Berkeley," EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2006-183 December 18, 2006.
[9]
L.S Blackford, et al. ScaLAPACK User's Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1997.
[10]
Andy Cleary, Jack Dongarra. Implementation in ScaLAPACK of divide-and-conquer algorithms for banded and tridiagonal linear systems. University of Tennessee Computer Science Technical Report, 1997.
[11]
Intel® Math Kernel Library, http://www.intel.com/software/products/mkl/.
[12]
E. Polizzi, Ahmed H. Sameh. The SPIKE algorithm: a parallel hybrid banded system solver. Parallel Computing, 2006.
[13]
Qi Zhang, et al. Parallelization and Performance Analysis of Video Feature Extractions on Multi-Core Based Systems. In proceedings of International Conference on Parallel Processing (ICPP), 2007
[14]
Sadaf R. Alam, et al. Characterization of Scientific Workloads on Systems with Multi-Core Processors. In International Symposium on Workload Characterization, 2006.
[15]
Figure 14 Spike NEW: performance for wide banded system
[16]
Lei Chai, et al. Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System, In Cluster Computing and the Grid, 2007
[17]
John L. Hennessy, David A. Patterson. Computer Architecture: A Quantitative Approach, Fourth Edition, 2007

Cited By

View all
  • (2020)Parallel Hybrid Sparse Linear System SolversParallel Algorithms in Computational Science and Engineering10.1007/978-3-030-43736-7_4(95-120)Online publication date: 7-Jul-2020
  • (2016)Trace-based analysis methodology of program flash contention in embedded multicore systemsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2971852(199-204)Online publication date: 14-Mar-2016
  • (2016)An Approach to Parallelization of SIFT Algorithm on GPUs for Real-Time ApplicationsJournal of Computer and Communications10.4236/jcc.2016.41700204:17(18-50)Online publication date: 2016
  • Show More Cited By

Index Terms

  1. Analyzing memory access intensity in parallel programs on multicore

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
    June 2008
    390 pages
    ISBN:9781605581583
    DOI:10.1145/1375527
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. intensity
    2. linear system
    3. memory bandwidth
    4. multicore
    5. parallel

    Qualifiers

    • Research-article

    Conference

    ICS08
    Sponsor:
    ICS08: International Conference on Supercomputing
    June 7 - 12, 2008
    Island of Kos, Greece

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Parallel Hybrid Sparse Linear System SolversParallel Algorithms in Computational Science and Engineering10.1007/978-3-030-43736-7_4(95-120)Online publication date: 7-Jul-2020
    • (2016)Trace-based analysis methodology of program flash contention in embedded multicore systemsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2971852(199-204)Online publication date: 14-Mar-2016
    • (2016)An Approach to Parallelization of SIFT Algorithm on GPUs for Real-Time ApplicationsJournal of Computer and Communications10.4236/jcc.2016.41700204:17(18-50)Online publication date: 2016
    • (2016)A Data Locality and Memory Contention Analysis Method in Embedded NUMA Multi-core Systems2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC)10.1109/MCSoC.2016.15(85-92)Online publication date: Sep-2016
    • (2015)Multiprocessor Capacity Metric and AnalysisIEEE Transactions on Computers10.1109/TC.2015.238983164:11(3181-3196)Online publication date: 1-Nov-2015
    • (2015)Estimating graph distance and centrality on shared nothing architecturesConcurrency and Computation: Practice & Experience10.1002/cpe.335427:14(3587-3613)Online publication date: 25-Sep-2015
    • (2014)Loop scheduling with memory access reduction subject to register constraints for DSP applicationsSoftware—Practice & Experience10.1002/spe.218644:8(999-1026)Online publication date: 1-Aug-2014
    • (2013)Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution ModelsProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.47(1607-1617)Online publication date: 20-May-2013
    • (2012)Computational Capacity-Based Codesign of Computer SystemsHigh-Performance Scientific Computing10.1007/978-1-4471-2437-5_2(45-73)Online publication date: 2012
    • (2012)Block‐adaptive quantum mechanics: An adaptive divide‐and‐conquer approach to interactive quantum chemistryJournal of Computational Chemistry10.1002/jcc.2315734:6(492-504)Online publication date: 29-Oct-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media