WatCache: a workload-aware temporary cache on the compute side of HPC systems

Yu, Jie; Liu, Guangming; Dong, Wenrui; Li, Xiaoyong

doi:10.1007/s11227-017-2167-7

WatCache: a workload-aware temporary cache on the compute side of HPC systems

Published: 26 October 2017

Volume 75, pages 554–586, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jie Yu¹,
Guangming Liu²,
Wenrui Dong¹ &
…
Xiaoyong Li³

365 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

As the computing power of high-performance computing (HPC) systems is developing to exascale, the storage systems are stretched to their limits to process the growing I/O traffic. Researchers are building storage systems on top of compute node-local fast storage devices (such as NVMe SSD) to alleviate the I/O bottleneck. However, user jobs have varying requirements of I/O bandwidth; therefore, it is a serious waste of expensive storage devices to have them on all compute nodes and build them into a global storage system. In addition, current node-local storage systems need to cope with the challenging small I/O and rank 0 I/O pattern from HPC workloads. In this paper, we presented a workload-aware temporary cache (WatCache) to meet above challenges. We designed a workload-aware node allocation method to allocate fast storage devices to jobs according to their I/O requirements and merged the devices of the jobs into separate temporary cache spaces. We implemented a metadata caching strategy that reduces the metadata overhead of I/O requests to improve the performance of small I/O. We designed a data layout strategy that distributes consecutive data that exceeds a threshold to multiple devices to achieve higher aggregate bandwidth for rank 0 I/O. Through extensive tests with several I/O benchmarks and applications, we have validated that WatCache offers linearly scalable performance, and brings significant performance promotions to small I/O and rank 0 I/O patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NV-BSP: A Burst I/O Storage Pool Based on NVMe SSDs

The Case for Workflow-Aware Storage:An Opportunity Study

Article 18 July 2014

L. B. Costa, H. Yang, … S. Al-Kiswany

Research Characterization on I/O Improvements of Storage Environments

References

Aurora. http://aurora.alcf.anl.gov/
Balle T, Johnsen P Improving i/o performance of the weather research and forecast (wrf) model
Banu JS, Babu MR (2015) Exploring vectorization and prefetching techniques on scientific kernels and inferring the cache performance metrics. Int J Grid High Perform Comput 7(2):18–36
Article Google Scholar
Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science. IEEE, pp 1–10
Brito JBDS (2016) Hcem model and a comparative workload analysis of hadoop cluster. Int J Big Data Intell 4(1):47
Article Google Scholar
Burst buffer architecture and software roadmap. http://www.nersc.gov/users/computational-systems/cori/burst-buffer/burst-buffer/
Byan S, Lentini J, Madan A, Pabon L, Condict M, Kimmel J, Kleiman S, Small C, Storer M (2012) Mercury: host-side flash caching for the data center. In: IEEE Symposium on Mass Storage Systems and Technologies. https://doi.org/10.1109/MSST.2012.6232368
Catalyst. http://computation.llnl.gov/computers/catalyst
Congiu G, Narasimhamurthy S, Süß T, Brinkmann A (2016) Improving collective i/o performance using non-volatile memory devices. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 120–129
Cray datawarp applications i/o accelerator. http://www.cray.com/products/storage/datawarp
Dahlin MD, Wang RY, Anderson TE, Patterson DA (1994) Cooperative caching: using remote client memory to improve file system performance. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation. USENIX Association, p 19
Darshan. https://xgitlab.cels.anl.gov/darshan/darshan. Accessed August 13, 2016
Dong X, Xie Y, Muralimanohar N, Jouppi NP (2011) Hybrid checkpointing using emerging nonvolatile memories for future exascale systems. ACM Trans Archit Code Optim 8(2):6
Article Google Scholar
Dong W, Liu G, Yu J, Hu W, Liu X (2015) Sfdc: File access pattern aware cache framework for high-performance computer. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, pp 342–350
Fitzpatrick B (2004) Distributed caching with memcached. Linux J 2004(124):5
Google Scholar
Gluster file system. http://www.gluster.org
Greenberg HN, Bent J, Grider G (2015) Mdhim: a parallel key/value framework for hpc. In: Proceedings of the 7th USENIX Conference on Hot Topics in Storage and File Systems, HotStorage’15. USENIX Association, Berkeley, CA, USA, pp 10–10. http://dl.acm.org/citation.cfm?id=2813749.2813759
Gunasekaran R, Oral S, Hill J, Miller R, Wang F, Leverman D (2015) Comparative i/o workload characterization of two leadership class storage clusters. In: Proceedings of the 10th Parallel Data Storage Workshop. ACM, pp 31–36
Holland DA, Angelino E, Wald G, Seltzer MI (2013) Flash caching on the storage client. In: Presented as Part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13). USENIX, San Jose, CA, pp 127–138. https://www.usenix.org/conference/atc13/technical-sessions/presentation/holland
Infinite memory engine. http://www.ddn.com/products/infinite-memory-engine-ime14k/
Jung M, Wilson III EH, Choi W, Shalf J, Aktulga HM, Yang C, Saule E, Catalyurek UV, Kandemir M(2013) Exploring the future of out-of-core computing with compute-local non-volatile memory. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, p 75
Koziol Q et al (2014) High performance parallel I/O. CRC Press, Boca Raton
Google Scholar
Li X, Xiao L, Ke X, Dong B, Li R, Liu D (2014) Towards hybrid client-side cache management in network-based file systems. Comput Sci Inf Syst 11(1):271–289
Article Google Scholar
Liao WK, Ching A, Coloma K, Nisar A, Choudhary A, Chen J, Sankaran Ry, Klasky S (2007) Using mpi file caching to improve parallel write performance for large-scale scientific applications. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. ACM, p 8
Liao Wk, Coloma K, Choudhary A, Ward L, Russell E, Tideman S (2005) Collective caching: application-aware client-side file caching. In: High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium on. IEEE, pp 81–90
Liu N, Cope J, Carns P, Carothers C, Ross R, Grider G, Crume A, Maltzahn C (2012) On the role of burst buffers in leadership-class storage systems. In: 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, pp 1–11
Liu X, Lu Y, Lu Y, Wu C, Wu J (2016) masfs: File system based on memory and ssd in compute nodes for high performance computers. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 569–576
Lofstead J, Polte M, Gibson G, Klasky S, Schwan K, Oldfield R, Wolf M, Liu Q (2011) Six degrees of scientific data: reading patterns for extreme scale science io. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing. ACM, pp 49–60
Luu H, Winslett M, Gropp W, Ross R, Carns P, Harms K, Prabhat M, Byna S, Yao Y (2015) A multiplatform study of i/o behavior on petascale supercomputers. In: Proceedings of the 24th International Symposium on High-performance Parallel and Distributed Computing. ACM, pp 33–44
Mao L, Qi D, Lin W, Zhu C (2015) A self-adaptive prediction algorithm for cloud workloads. Int J Grid High Perform Comput 7(2):65–76
Article Google Scholar
Mittal S, Vetter JS (2016) A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans Parallel Distrib Syst 27(5):1537–1550
Article Google Scholar
Nvm express. http://www.nvmexpress.org/
Ovsyannikov A, Romanus M, Van Straalen B, Weber GH, Trebotich D (2016) Scientific workflows at datawarp-speed: accelerated data-intensive science using nerscs burst buffer. In: Proceedings of the 1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems. IEEE Press, pp 1–6
Performance and debugging tools: Darshan. http://www.nersc.gov/users/software/performance-and-debugging-tools/darshan/. Accessed August 13, 2016
Pollak B. Portable operating system interface (posix)-part 1x: real-time distributed systems communication application program interface (api). IEEE Standard P 1003
Rew R, Davis G (1990) Netcdf: an interface for scientific data access. IEEE Comput Graph Appl 10(4):76–82
Article Google Scholar
Romanus M, Ross RB, Parashar M (2015) Challenges and considerations for utilizing burst buffers in high-performance computing. arXiv preprint arXiv:1509.05492
Samsung enterprise ssd. http://www.samsung.com/semiconductor/products/flash-storage/enterprise-ssd/
Schenck W, El Sayed S, Foszczynski M, Homberg W, Pleiter D (2017) Evaluation and performance modeling of a burst buffer solution. ACM SIGOPS Oper Syst Rev 50(1):12–26
Article Google Scholar
Schlagkamp S, Ferreira da Silva R, Allcock W, Deelman E, Schwiegelshohn U (2016) Consecutive job submission behavior at mira supercomputer. In: Proceedings of the 25th ACM International Symposium on High-performance Parallel and Distributed Computing. ACM, pp 93–96
Sharedhashfile. https://github.com/simonhf/sharedhashfile
Shibata T, Choi S, Taura K (2010) File-access patterns of data-intensive workflow applications and their implications to distributed filesystems. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, pp 746–755
Sierra. https://www.llnl.gov/news/next-generation-supercomputer-coming-lab
Summit fact sheet. https://www.olcf.ornl.gov/summit/
Top500 supercomputer sites. http://www.top500.org
Trinity. http://www.lanl.gov/projects/trinity/
Vetter JS, Mittal S (2015) Opportunities for nonvolatile memory systems in extreme-scale high-performance computing. Comput Sci Eng 17(2):73–82
Article Google Scholar
W Hu, Liu Gm, Li Q, Jiang Yh (2016) Storage wall for exascale supercomputing. J Zhejiang Univ Sci 2016:10–25
Google Scholar
Wang T, Mohror K, Moody A, Sato K, Yu W (2016) An ephemeral burst-buffer file system for scientific applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, p 69
Wang T, Oral S, Pritchard M, Wang B, Yu W (2015) Trio: burst buffer based i/o orchestration. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 194–203
Wang F, Oral S, Shipman G, Drokin O, Wang T, Huang I (2009) Understanding lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Tech. Rep
Wang T, Oral S, Wang Y, Settlemyer B, Atchley S, Yu W (2014) Burstmem: a high-performance burst buffer system for scientific applications. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 71–79
Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design And Implementation. USENIX Association, pp 307–320
Xie M, Lu Y, Liu L, Cao H, Yang X (2011) Implementation and evaluation of network interface and message passing services for tianhe-1a supercomputer. In: 2011 IEEE 19th Annual Symposium on High Performance Interconnects. IEEE, pp 78–86
Xu W, Lu Y, Li Q, Zhou E, Song Z, Dong Y, Zhang W, Wei D, Zhang X, Chen H et al (2014) Hybrid hierarchy storage system in milkyway-2 supercomputer. Front Comput Sci 8(3):367–377
Article MathSciNet Google Scholar
Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Workshop on Job Scheduling Strategies for Parallel Processing. Springer, pp 44–60
Zhao D, Zhang Z, Zhou X, Li T, Wang K, Kimpe D, Carns P, Ross R, Raicu I (2014) Fusionfs: toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 61–70

Download references

Acknowledgements

We thank Jian Zhang and Fuxing Sun at NSCC-TJ for their generous help of setting up experimental environments on Tianhe-1A.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Jie Yu & Wenrui Dong
National Supercomputer Centre in Tianjin, Tianjin, China
Guangming Liu
Academy of Ocean Science and Engineering, National University of Defense Technology, Changsha, China
Xiaoyong Li

Authors

Jie Yu
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenrui Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyong Li.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant No. 61502511).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J., Liu, G., Dong, W. et al. WatCache: a workload-aware temporary cache on the compute side of HPC systems. J Supercomput 75, 554–586 (2019). https://doi.org/10.1007/s11227-017-2167-7

Download citation

Published: 26 October 2017
Issue Date: 06 February 2019
DOI: https://doi.org/10.1007/s11227-017-2167-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WatCache: a workload-aware temporary cache on the compute side of HPC systems

Abstract

Access this article

Similar content being viewed by others

NV-BSP: A Burst I/O Storage Pool Based on NVMe SSDs

The Case for Workflow-Aware Storage:An Opportunity Study

Research Characterization on I/O Improvements of Storage Environments

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

WatCache: a workload-aware temporary cache on the compute side of HPC systems

Abstract

Access this article

Similar content being viewed by others

NV-BSP: A Burst I/O Storage Pool Based on NVMe SSDs

The Case for Workflow-Aware Storage:An Opportunity Study

Research Characterization on I/O Improvements of Storage Environments

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation