skip to main content
10.1145/2465813.2465822acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system

Published:18 June 2013Publication History

ABSTRACT

Both energy efficiency and system reliability are significant concerns towards exa-scale high-performance computing. In such large HPC systems, applications are required to conduct massive I/O operations to local storage devices (e.g. a NAND flash memory) for scalable checkpoint and restart. However, checkpoint/restart can use a large portion of runtime, and consumes enormous energy by non-I/O subsystems, such as CPU and memory. Thus, energy-aware optimization, including I/O operations to storage, is required for checkpoint/restart. In this paper, we present a profile-based I/O optimization technique for NAND flash memory devices based on Markov model for checkpoint/restart. The results based on performance studies show that our profile lookup approach can save 4.1% of energy consumption in an application execution with checkpoint/restart. Especially, our approach improves the energy consumption of write operations by 67.4% and read operations by 40.2% on a PCIe-attached NAND flash memory device.

References

  1. Fusion-io. http://www.fusionio.com/.Google ScholarGoogle Scholar
  2. OMRON RC3008.Google ScholarGoogle Scholar
  3. H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng. DataStager: Scalable Data Staging Services for Petascale Applications. In Proceedings of the 18th ACM international symposium on High performance distributed computing, HPDC '09, pages 39--48, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Ali and M. Lauria. Improving the Performance of Remote I/O Using Asynchronous Primitives. pages 218--228.Google ScholarGoogle Scholar
  5. D. Brodowski and N. Golde. "Linux CPUFreq - CPUFreq governors," Linux Kernel. http://www.mjmwired.net/kernel/Documentation/cpu-freq/governors.txt.Google ScholarGoogle Scholar
  6. A. M. Caulfield, J. Coburn, T. Mollov, A. De, A. Akel, J. He, A. Jagatheesan, R. K. Gupta, A. Snavely, and S. Swanson. Understanding the impact of emerging non-volatile memories on high-performance, io-intensive computing. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. N. E. et al. System Resilience at Extreme Scale, Technical report. Technical report, 2008.Google ScholarGoogle Scholar
  8. R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. Cameron. Powerpack: Energy profiling and analysis of high-performance systems and applications. Parallel and Distributed Systems, IEEE Transactions on, 21(5):658--671, may 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Ge, X. Feng, and X.-H. Sun. Sera-io: Integrating energy consciousness into parallel i/o middleware. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID '12, pages 204--211, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Geist and S. Dosanjh. Iesp exascale challenge: Co-design of architectures and algorithms. Int. J. High Perform. Comput. Appl., 23(4):401--402, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. A. Gomez, N. Maruyama, F. Cappello, and S. Matsuoka. Distributed Diskless Checkpoint for Large Scale Systems. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 63--72. IEEE, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Grupp, A. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. Siegel, and J. Wolf. Characterizing flash memory: Anomalies, observations, and applications. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 24--33, dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Q. Huang, S. Su, J. Li, P. Xu, K. Shuang, and X. Huang. Enhanced energy-efficient scheduling for parallel applications in cloud. In Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, pages 781--786, may 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Manzanres, X. Ruan, S. Yin, M. Nijim, W. Luo, and X. Qin. Energy-aware prefetching for parallel disk systems: Algorithms, models, and evaluation. In Network Computing and Applications, 2009. NCA 2009. Eighth IEEE International Symposium on, pages 90--97, july 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal, E. Schares, F. Trivedi, E. Goodness, and L. Nevill. Bit error rate in nand flash memories. In Reliability Physics Symposium, 2008. IRPS 2008. IEEE International, pages 9--19, 27 2008-may 1 2008.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Moody, G. Bronevetsky, K. Mohror, and B. R. de Supinski. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, Nov. 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Nijim, A. Manzanares, X. Ruan, and X. Qin. Hybud: An energy-efficient architecture for hybrid parallel disk systems. Computer Communications and Networks, International Conference on, 0:1--6, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. A. Oldfield, S. Arunagiri, P. J. Teller, S. Seelam, M. R. Varela, R. Riesen, and P. C. Roth. Modeling the impact of checkpoints on next-generation systems. In Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, MSST '07, pages 30--46, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. S. Plank, K. Li, and M. A. Puening. Diskless Checkpointing. IEEE Trans. Parallel Distrib. Syst., 9(10):972--986, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Sato, N. Maruyama, K. Mohror, A. Moody, T. Gamblin, B. R. de Supinski, and S. Matsuoka. Design and modeling of a non-blocking checkpointing system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. H. Vaidya. On Checkpoint Latency. Technical report, College Station, TX, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Wang, G. von Laszewski, J. Dayal, and F. Wang. Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with dvfs. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID '10, pages 368--377, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
            June 2013
            64 pages
            ISBN:9781450319836
            DOI:10.1145/2465813

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 June 2013

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            FTXS '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate16of25submissions,64%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader