short-paper

Principles of Memory-Centric Programming for High Performance Computing

Authors:
Yonghong Yan

University of South Carolina, Columbia, SC

University of South Carolina, Columbia, SC
View Profile

,
Ron Brightwell

Sandia National Laboratories, Albuquerque, NM

Sandia National Laboratories, Albuquerque, NM
View Profile

,
Xian-He Sun

Illinois Institute of Technology, Chicago, IL

Illinois Institute of Technology, Chicago, IL
View Profile

MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPCNovember 2017Pages 2–6https://doi.org/10.1145/3145617.3158212

Published:12 November 2017Publication History

MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPC

Pages 2–6

ABSTRACT

The memory wall challenge -- the growing disparity between CPU speed and memory speed -- has been one of the most critical and long-standing challenges in computing. For high performance computing, programming to achieve efficient execution of parallel applications often requires more tuning and optimization efforts to improve data and memory access than for managing parallelism. The situation is further complicated by the recent expansion of the memory hierarchy, which is becoming deeper and more diversified with the adoption of new memory technologies and architectures such as 3D-stacked memory, non-volatile random-access memory (NVRAM), and hybrid software and hardware caches.

The authors believe it is important to elevate the notion of memory-centric programming, with relevance to the compute-centric or data-centric programming paradigms, to utilize the unprecedented and ever-elevating modern memory systems. Memory-centric programming refers to the notion and techniques of exposing hardware memory system and its hierarchy, which could include DRAM and NUMA regions, shared and private caches, scratch pad, 3-D stacked memory, non-volatile memory, and remote memory, to the programmer via portable programming abstractions and APIs. These interfaces seek to improve the dialogue between programmers and system software, and to enable compiler optimizations, runtime adaptation, and hardware reconguration with regard to data movement, beyond what can be achieved using existing parallel programming APIs. In this paper, we provide an overview of memory-centric programming concepts and principles for high performance computing.

References

{n. d.}. ISSCC 2016 TechTrends. ({n. d.}). http://isscc.org/doc/2016/ISSCC2016_TechTrends.pdf.Google Scholar
{n. d.}. The Chapel Parallel Programming Language. http://chapel.cray.com/. ({n. d.}).Google Scholar
{n. d.}. X10: Performance and Productivity at Scale. http://x10-lang.org/. ({n. d.}).Google Scholar
B. Alpern, L. Carter, and J. Ferrante. 1993. Modeling parallel computers as memory hierarchies. In Programming Models for Massively Parallel Computers, 1993. Proceedings. 116--123.Google Scholar
J. A. Ang, R. F. Barrett, R. E. Benner, D. Burke, C. Chan, J. Cook, D. Donofrio, S. D. Hammond, K. S. Hemmert, S. M. Kelly, H. Le, V. J. Leung, D. R. Resnick, A. F. Rodrigues, J. Shalf, D. Stark, D. Unat, and N. J. Wright. 2014. Abstract Machine Models and Proxy Architectures for Exascale Computing. In Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing (Co-HPC '14). IEEE Press, Piscataway, NJ, USA, 25--32. Google ScholarDigital Library
Krste Asanovic, Rastislav Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Katherine Yelick. 2009. A View of the Parallel Computing Landscape. Commun. ACM 52, 10 (Oct. 2009), 56--67. Google ScholarDigital Library
Abdel-Hameed Badawy, Aneesh Aggarwal, Donald Yeung, and Chau-Wen Tseng. 2004. The efficacy of software prefetching and locality optimizations on future memory systems. Journal of Instruction-Level Parallelism 6, 7 (2004).Google Scholar
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 66, 11 pages. http://dl.acm.org/citation.cfm?id=2388996.2389086 Google ScholarDigital Library
Douglas C. Burger, James R. Goodman, and Alain KÃd'gi. 1995. The Declining Effectiveness of Dynamic Caching for General-Purpose Microprocessors. Technical Report. University of Wisconsin-Madison Computer Sciences.Google Scholar
Surendra Byna, Yong Chen, and Xian-He Sun. 2008. A Taxonomy of Data Prefetching Mechanisms. In Proceedings of the The International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN '08). IEEE Computer Society, Washington, DC, USA, 19--24. Google ScholarDigital Library
Laura Carrington, Allan Snavely, and Nicole Wolter. 2006. A Performance Prediction Framework for Scientific Applications. Future Gener. Comput. Syst. 22, 3 (Feb. 2006), 336--346. Google ScholarDigital Library
Francky Catthoor, Nikil D. Dutt, and Christoforos E. Kozyrakis. 2000. How to Solve the Current Memory Access and Data Transfer Bottlenecks: At the Processor Architecture or at the Compiler Level. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE '00). ACM, New York, NY, USA, 426--435. Google ScholarDigital Library
Stephan Diehl and Peter Sestoft. 2000. Abstract Machines for Programming Language Implementation. Future Gener. Comput. Syst. 16, 7 (May 2000), 739--751. Google ScholarDigital Library
Lorin Hochstein, Jeff Carver, Forrest Shull, Sima Asgari, Victor Basili, Jeffrey K. Hollingsworth, and Marvin V. Zelkowitz. 2005. Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC '05). IEEE Computer Society, Washington, DC, USA, 35--. Google ScholarDigital Library
M. Kara, J. R. Davy, D. Goodeve, and J. Nash (Eds.). 1997. Abstract Machine Models for Parallel and Distributed Computing. IOS Press, Amsterdam, The Netherlands, The Netherlands. Google ScholarDigital Library
Suji Lee, Jongpil Jung, and Chong-Min Kyung. 2012. Hybrid cache architecture replacing SRAM cache with future memory technology. In 2012 IEEE International Symposium on Circuits and Systems. IEEE, 2481--2484.Google ScholarCross Ref
Gabriel H. Loh. 2008. 3D-Stacked Memory Architectures for Multi-core Processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 453--464. Google ScholarDigital Library
Jagan Singh Meena, Simon Min Sze, Umesh Chand, and Tseung-Yuen Tseng. 2014. Overview of emerging nonvolatile memory technologies. Nanoscale Research Letters 9, 1 (2014), 1--33.Google ScholarCross Ref
Sparsh Mittal, Jeffrey S Vetter, and Dong Li. 2015. A survey of architectural approaches for managing embedded DRAM and non-volatile on-chip caches. IEEE Transactions on Parallel and Distributed Systems 26, 6 (2015), 1524--1537.Google ScholarCross Ref
Sebastian Nanz, Scott West, and Kaue Soares da Silveira. 2013. Benchmarking Usability and Performance of Multicore Languages. CoRR abs/1302.2837 (2013). http://arxiv.org/abs/1302.2837Google Scholar
S. S. Nemawarkar and G. R. Gao. 1997. Latency tolerance: a metric for performance analysis of multithreaded architectures. In Parallel Processing Symposium, 1997. Proceedings., 11th International. 227--232. Google ScholarDigital Library
P. Ramm, A. Klumpp, J. Weber, N. Lietaer, M. Taklo, W. De Raedt, T. Fritzsch, and P. Couderc. 2010. 3D Integration technology: Status and application development. In ESSCIRC, 2010 Proceedings of the. 9--16.Google Scholar
S. Salehian, Jiawen Liu, and Yonghong Yan. 2017. Comparison of Threading Programming Models. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 766--774.Google Scholar
Sean Treichler, Michael Bauer, and Alex Aiken. 2013. Language Support for Dynamic, Hierarchical Data Partitioning. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '13). ACM, New York, NY, USA, 495--514. Google ScholarDigital Library
Yuan Xie. 2011. Modeling, architecture, and applications for emerging memory technologies. IEEE Design & Test of Computers 1 (2011), 44--51. Google ScholarDigital Library
Yonghong Yan, Jiawen Liu, Kirk W. Cameron, and Mariam Umar. 2017. HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 788--798.Google Scholar
Yonghong Yan, Jisheng Zhao, Yi Guo, and Vivek Sarkar. 2009. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement.. In LCPC'09. 172--187. Google ScholarDigital Library

Index Terms

Principles of Memory-Centric Programming for High Performance Computing
1. Computing methodologies
  1. Concurrent computing methodologies
  2. Parallel computing methodologies
2. Theory of computation
  1. Models of computation
    1. Concurrency

Recommendations

Next high performance and low power flash memory package structure

In general, NAND flash memory has advantages in low power consumption, storage capacity, and fast erase/write performance in contrast to NOR flash. But, main drawback of the NAND flash memory is the slow access time for random read operations. Therefore,...
Read More
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Phase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Read More
State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System
DAC '14: Proceedings of the 51st Annual Design Automation Conference

Multi-level Cell Spin-Transfer Torque Random Access Memory (MLC STT-RAM) is a promising nonvolatile memory technology for high-capacity and high-performance applications. However, the reliability concerns and the complicated access mechanism greatly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPC
November 2017
43 pages
ISBN:9781450351317
DOI:10.1145/3145617

Copyright © 2017 ACM
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Abstract Machine Model
Data Consistency
Explicit Data Mapping
Memory-Centric Programming
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 209
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Principles of Memory-Centric Programming for High Performance Computing

MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPC

ABSTRACT

References

Cited By

Index Terms

Recommendations

Next high performance and low power flash memory package structure

Energy efficient Phase Change Memory based main memory for future high performance systems

State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Principles of Memory-Centric Programming for High Performance Computing

MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPC

ABSTRACT

References

Cited By

Index Terms

Recommendations

Next high performance and low power flash memory package structure

Energy efficient Phase Change Memory based main memory for future high performance systems

State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media