ABSTRACT
Both energy efficiency and system reliability are significant concerns towards exa-scale high-performance computing. In such large HPC systems, applications are required to conduct massive I/O operations to local storage devices (e.g. a NAND flash memory) for scalable checkpoint and restart. However, checkpoint/restart can use a large portion of runtime, and consumes enormous energy by non-I/O subsystems, such as CPU and memory. Thus, energy-aware optimization, including I/O operations to storage, is required for checkpoint/restart. In this paper, we present a profile-based I/O optimization technique for NAND flash memory devices based on Markov model for checkpoint/restart. The results based on performance studies show that our profile lookup approach can save 4.1% of energy consumption in an application execution with checkpoint/restart. Especially, our approach improves the energy consumption of write operations by 67.4% and read operations by 40.2% on a PCIe-attached NAND flash memory device.
- Fusion-io. http://www.fusionio.com/.Google Scholar
- OMRON RC3008.Google Scholar
- H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng. DataStager: Scalable Data Staging Services for Petascale Applications. In Proceedings of the 18th ACM international symposium on High performance distributed computing, HPDC '09, pages 39--48, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- N. Ali and M. Lauria. Improving the Performance of Remote I/O Using Asynchronous Primitives. pages 218--228.Google Scholar
- D. Brodowski and N. Golde. "Linux CPUFreq - CPUFreq governors," Linux Kernel. http://www.mjmwired.net/kernel/Documentation/cpu-freq/governors.txt.Google Scholar
- A. M. Caulfield, J. Coburn, T. Mollov, A. De, A. Akel, J. He, A. Jagatheesan, R. K. Gupta, A. Snavely, and S. Swanson. Understanding the impact of emerging non-volatile memories on high-performance, io-intensive computing. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- E. N. E. et al. System Resilience at Extreme Scale, Technical report. Technical report, 2008.Google Scholar
- R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. Cameron. Powerpack: Energy profiling and analysis of high-performance systems and applications. Parallel and Distributed Systems, IEEE Transactions on, 21(5):658--671, may 2010. Google ScholarDigital Library
- R. Ge, X. Feng, and X.-H. Sun. Sera-io: Integrating energy consciousness into parallel i/o middleware. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID '12, pages 204--211, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
- A. Geist and S. Dosanjh. Iesp exascale challenge: Co-design of architectures and algorithms. Int. J. High Perform. Comput. Appl., 23(4):401--402, Nov. 2009. Google ScholarDigital Library
- L. A. Gomez, N. Maruyama, F. Cappello, and S. Matsuoka. Distributed Diskless Checkpoint for Large Scale Systems. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 63--72. IEEE, May 2010. Google ScholarDigital Library
- L. Grupp, A. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. Siegel, and J. Wolf. Characterizing flash memory: Anomalies, observations, and applications. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 24--33, dec. 2009. Google ScholarDigital Library
- Q. Huang, S. Su, J. Li, P. Xu, K. Shuang, and X. Huang. Enhanced energy-efficient scheduling for parallel applications in cloud. In Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, pages 781--786, may 2012. Google ScholarDigital Library
- A. Manzanres, X. Ruan, S. Yin, M. Nijim, W. Luo, and X. Qin. Energy-aware prefetching for parallel disk systems: Algorithms, models, and evaluation. In Network Computing and Applications, 2009. NCA 2009. Eighth IEEE International Symposium on, pages 90--97, july 2009. Google ScholarDigital Library
- N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal, E. Schares, F. Trivedi, E. Goodness, and L. Nevill. Bit error rate in nand flash memories. In Reliability Physics Symposium, 2008. IRPS 2008. IEEE International, pages 9--19, 27 2008-may 1 2008.Google ScholarCross Ref
- A. Moody, G. Bronevetsky, K. Mohror, and B. R. de Supinski. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, Nov. 2010. IEEE Computer Society. Google ScholarDigital Library
- M. Nijim, A. Manzanares, X. Ruan, and X. Qin. Hybud: An energy-efficient architecture for hybrid parallel disk systems. Computer Communications and Networks, International Conference on, 0:1--6, 2009. Google ScholarDigital Library
- R. A. Oldfield, S. Arunagiri, P. J. Teller, S. Seelam, M. R. Varela, R. Riesen, and P. C. Roth. Modeling the impact of checkpoints on next-generation systems. In Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, MSST '07, pages 30--46, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- J. S. Plank, K. Li, and M. A. Puening. Diskless Checkpointing. IEEE Trans. Parallel Distrib. Syst., 9(10):972--986, Oct. 1998. Google ScholarDigital Library
- K. Sato, N. Maruyama, K. Mohror, A. Moody, T. Gamblin, B. R. de Supinski, and S. Matsuoka. Design and modeling of a non-blocking checkpointing system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. Google ScholarDigital Library
- N. H. Vaidya. On Checkpoint Latency. Technical report, College Station, TX, USA, 1995. Google ScholarDigital Library
- L. Wang, G. von Laszewski, J. Dayal, and F. Wang. Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with dvfs. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID '10, pages 368--377, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
Index Terms
- Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system
Recommendations
Energy-aware demand paging on NAND flash-based embedded storages
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and designThe ever-increasing requirement for high-performance and huge-capacity memories of emerging embedded applications has led to the widespread adoption of SDRAM and NAND flash memory as main and secondary memories, respectively. In particular, the use of ...
Energy-aware flash memory management in virtual memory system
The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, ...
Design of heterogeneously-integrated memory system with storage class memories and NAND flash memories
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation ConferenceHeterogeneously-integrated memory system is configured with various types of storage class memories (SCMs) and NAND flash memories. SCMs are faster than NAND flash, and they are divided into memory and storage types with their characteristics. NAND ...
Comments