skip to main content
10.1145/2989081.2989090acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes

Authors Info & Claims
Published:03 October 2016Publication History

ABSTRACT

Full characterization of the performance of a new memory technology is typically a subtle process because of the difficulty in subjecting the memory to different access patterns before creating a full system. Simple performance characterization, such as raw bandwidth, does not give enough information about the suitability of the memory for different architectural design choices, such as suitability for processing in memory, performance reliance on relaxed ordering semantic, or how to implement atomics, etc.

This paper discusses the use of the ApexMAP synthetic benchmarks to assess the Hybrid Memory Cube (HMC) technology. ApexMAP, through a simple model for spatial and temporal locality, allows creating many application probes that could be used to subject the memory to different access patterns. We use a Verilog implementation of ApexMAP to show the impact of contending requests, flow control, and access granularity on the HMC performance. We show a wide variation (up to 20×) in the observed performance based on the application locality parameters and the HMC architectural configurations.

References

  1. G. Almási, C. Caşcaval, and D. A. Padua. Calculating Stack Distances Efficiently. SIGPLAN Workshop on Memory System Performance, 38(2):37--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Caşcaval and D. A. Padua. Estimating Cache Misses and Locality Using Stack Distances. The 17th annual international conference on Supercomputing (ICS'03), pages 150--159, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. SIGPLAN Not., 28(7):1--12, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Ding and Y. Zhong. Predicting Whole-program Locality Through Reuse Distance Analysis. The ACM SIGPLAN 2003 conference on Programming language design and implementation, 38(5):245--257, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Eeckhout and K. D. Bosschere. Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces. The 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT '01), page 25, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Mechanistic Performance Model for Superscalar Out-of-order Processors. ACM Trans. Comput. Syst., 27(2):1--37, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Gokhale, S. Lloyd, and C. Macaraeg. Hybrid memory cube performance characterization on data-centric workloads. The 5th Workshop on Irregular Applications: Architectures and Algorithms, pages 7:1--7:8, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Z. Ibrahim and E. Strohmaier. Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions. The 39th International Conference on Parallel Processing, pages 353--362, Sept 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Kim, H. Kim, S. Yalamanchili, and A. F. Rodrigues. Understanding Energy Aspects of Processing-near-Memory for HPC Workloads. The 2015 International Symposium on Memory Systems(MEMSYS '15), pages 276--282, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. D. Leidel and Y. Chen. Hmc-sim: A simulation framework for hybrid memory cube devices. The IEEE International Parallel & Distributed Processing Symposium Workshops, pages 1465--1474, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. LINPACK Benchmark. http://www.netlib.org/linpack/.Google ScholarGoogle Scholar
  12. R. Mattson, L. S. Gecsei, D. R., and I. Traiger. Evaluation techniques for storage hierarchies. IBM Systems, 2(1. 9):78--117, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pico Computing. EX-800: Blade Server with Hybrid Memory Cube, 12 2013.Google ScholarGoogle Scholar
  14. SPEC OMP (OpenMP Benchmark Suite) V3.2. http://www.spec.org/omp/.Google ScholarGoogle Scholar
  15. Standard Performance Evaluation Corporation. http://www.spec.org.Google ScholarGoogle Scholar
  16. E. Strohmaier and H. Shan. Apex-Map: A Synthetic Scalable Benchmark Probe to Explore Data Access Performance on Highly Parallel Systems. Euro-Par 2005 Parallel Processing, pages 114--123, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Strohmaier and H. Shan. Apex-map: A global data access benchmark to analyze hpc systems and parallel programming paradigms. The 2005 ACM/IEEE conference on Supercomputing, pages 49--, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Strohmaier and H. Shan. APEX-Map: a Parameterized Scalable Memory Access Probe for High-performance Computing Systems. Concurr. Comput.: Pract. Exper., 19(17):2185--2205, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Suresh, P. Cicotti, and L. Carrington. Evaluation of emerging memory technologies for HPC, data intensive applications. The 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 239--247, Sept 2014.Google ScholarGoogle ScholarCross RefCross Ref
  20. The Graph 500 List. http://www.graph500.org/.Google ScholarGoogle Scholar
  21. M. M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely. A Genetic Algorithms Approach to Modeling the Performance of Memory-bound Computations. The 2007 ACM/IEEE conference on Supercomputing, pages 1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Top 500 supercomputers. http://www.top500.org/.Google ScholarGoogle Scholar
  23. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. The 22nd annual international symposium on Computer architecture (ISCA '95), pages 24--36, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Zhong, X. Shen, and C. Ding. Program Locality Analysis Using Reuse Distance. ACM Trans. Program. Lang. Syst., 31(6):1--39, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
      October 2016
      463 pages
      ISBN:9781450343053
      DOI:10.1145/2989081

      Copyright © 2016 ACM

      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader