Cost-intelligent application-specific data layout optimization for parallel file systems

Song, Huaiming; Yin, Yanlong; Chen, Yong; Sun, Xian-He

doi:10.1007/s10586-012-0200-4

Cost-intelligent application-specific data layout optimization for parallel file systems

Published: 15 February 2012

Volume 16, pages 285–298, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Huaiming Song¹,
Yanlong Yin²,
Yong Chen³ &
…
Xian-He Sun²

284 Accesses
9 Citations
Explore all metrics

Abstract

Parallel file systems have been developed in recent years to ease the I/O bottleneck of high-end computing system. These advanced file systems offer several data layout strategies in order to meet the performance goals of specific I/O workloads. However, while a layout policy may perform well on some I/O workload, it may not perform as well for another. Peak I/O performance is rarely achieved due to the complex data access patterns. Data access is application dependent. In this study, a cost-intelligent data access strategy based on the application-specific optimization principle is proposed. This strategy improves the I/O performance of parallel file systems. We first present examples to illustrate the difference of performance under different data layouts. By developing a cost model which estimates the completion time of data accesses in various data layouts, the layout can better match the application. Static layout optimization can be used for applications with dominant data access patterns, and dynamic layout selection with hybrid replications can be used for applications with complex I/O patterns. Theoretical analysis and experimental testing have been conducted to verify the proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach can provide up to a 74% performance improvement for data-intensive applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance-Aware Data Placement in Hybrid Parallel File Systems

Fine-grained management of I/O optimizations based on workload characteristics

Article 31 December 2020

I/O Optimizations Based on Workload Characteristics for Parallel File Systems

References

Lustre: A scalable, robust, highly-available cluster file system. White Paper, Cluster File Systems, Inc. (2006) [Online]. Available: http://www.lustre.org/
Schmuck, F., Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: FAST’02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, p. 19. USENIX Association, Berkeley (2002)
Google Scholar
Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., Zhou, B.: Scalable performance of the Panasas parallel file system. In: FAST’08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp. 1–17. USENIX Association, Berkeley (2008)
Google Scholar
Carns, P.H., Ligon, W.B. III, Ross, R.B., Thakur, R.: PVFS: A parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 317–327. USENIX Association, Berkeley (2000)
Google Scholar
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: FRONTIERS’99: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, p. 182. IEEE Computer Society, Washington (1999)
Chapter Google Scholar
Thakur, R., Choudhary, A.: An extended two-phase method for accessing sections of out-of-core arrays. Sci. Program. 5(4), 301–317 (1996)
Google Scholar
Seamons, K.E., Chen, Y., Jones, P., Jozwiak, J., Winslett, M.: Server-directed collective I/O in Panda. In: SC’95: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM), p. 57. ACM, New York (1995)
Chapter Google Scholar
Chen, Y., Sun, X.-H., Thakur, R., Song, H., Jin, H.: Improving parallel I/O performance with data layout awareness. In: Cluster’10: Proceedings of the IEEE International Conference on Cluster Computing 2010. IEEE Computer Society, Washington (2010)
Google Scholar
Ching, A., Choudhary, A., Liao, W.-K., Ross, R., Gropp, W.: Efficient structured data access in parallel file systems. In: Cluster’03: Proceedings of the IEEE International Conference on Cluster Computing (2003)
Google Scholar
Ching, A., Choudhary, A., Coloma, K., Liao, W.-K., Ross, R., Gropp, W.: Noncontiguous I/O accesses through MPI-IO. In: CCGRID’03: Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid, p. 104 (2003)
Google Scholar
Nitzberg, B., Lo, V.: Collective buffering: improving parallel I/O performance. In: HPDC’97: Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing, p. 148. IEEE Computer Society, Washington (1997)
Google Scholar
Ma, X., Winslett, M., Lee, J., Yu, S.: Faster collective output through active buffering. In: IPDPS’02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, p. 151. IEEE Computer Society, Washington (2002)
Google Scholar
Isaila, F., Malpohl, G., Olaru, V., Szeder, G., Tichy, W.: Integrating collective I/O and cooperative caching into the “ClusterFile” parallel file system. In: ICS’04: Proceedings of the 18th Annual International Conference on Supercomputing, pp. 58–67. ACM, New York (2004)
Chapter Google Scholar
Liao, W.-K., Coloma, K., Choudhary, A., Ward, L., Russell, E., Tideman, S.: Collective caching: Application-aware client-side file caching. In: HPDC’05: Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, 2005. HPDC-14, pp. 81–90. IEEE Computer Society, Washington (2005)
Chapter Google Scholar
Fu, J.W.C., Patel, J.H.: Data prefetching in multiprocessor vector cache memories. In: ISCA’91: Proceedings of the 18th Annual International Symposium on Computer Architecture, pp. 54–63. ACM, New York (1991)
Google Scholar
Dahlgren, F., Dubois, M., Stenstrom, P.: Fixed and adaptive sequential prefetching in shared memory multiprocessors. In: ICPP’93: Proceedings of the 1993 International Conference on Parallel Processing, pp. 56–63. IEEE Computer Society, Washington (1993)
Chapter Google Scholar
Patterson, R.H., Gibson, G.A., Ginting, E., Stodolsky, D., Zelenka, J.: Informed prefetching and caching. In: Proceedings of the 15th ACM Symposium on Operating Systems Principles, pp. 79–95. ACM Press, New York (1995)
Google Scholar
Byna, S., Chen, Y., Sun, X.-H., Thakur, R., Gropp, W.: Parallel I/O prefetching using MPI file caching and I/O signatures. In: SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–12. IEEE Press, Piscataway (2008)
Google Scholar
Lei, H., Duchamp, D.: An analytical approach to file prefetching. In: Proceedings of the USENIX 1997 Annual Technical Conference, pp. 275–288 (1997)
Google Scholar
Tran, N., Reed, D.A., Member, S.: Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Trans. Parallel Distrib. Syst. 15, 362–377 (2004)
Article Google Scholar
Chen, Y., Byna, S., Sun, X.-H., Thakur, R., Gropp, W.: Hiding I/O latency with pre-execution prefetching for parallel applications. In: SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–10. IEEE Press, Piscataway (2008)
Google Scholar
Rhodes, P.J., Tang, X., Bergeron, R.D., Sparr, T.M.: Iteration aware prefetching for large multidimensional scientific datasets. In: SSDBM’05: Proc. of the 17th International Conference on Scientific and Statistical Database Management, Berkeley, CA, US, pp. 45–54 (2005)
Google Scholar
Rubin, S., Bodík, R., Chilimbi, T.: An efficient profile-analysis framework for data-layout optimizations. SIGPLAN Not. 37(1), 140–153 (2002)
Article Google Scholar
Wang, Y., Kaeli, D.: Profile-guided I/O partitioning In: ICS’03: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 252–260. ACM, New York (2003)
Chapter Google Scholar
Hsu, W.W., Smith, A.J., Young, H.C.: The automatic improvement of locality in storage systems. ACM Trans. Comput. Syst. 23(4), 424–473 (2005)
Article Google Scholar
Huang, H., Hung, W., Shin, K.G.: FS2: Dynamic data replication in free disk space for improving disk performance and energy consumption. In: SOSP’05: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, pp. 263–276. ACM, New York (2005)
Chapter Google Scholar
Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., Hristidis, V.: BORG: Block-reORGanization for self-optimizing storage systems In: Proceedings of the 7th Conference on File and Storage Technologies, pp. 183–196. USENIX Association, Berkeley (2009). [Online]. Available: http://portal.acm.org/citation.cfm?id=1525908.1525922
Google Scholar
Wang, C., Zhang, Z., Ma, X., Vazhkudai, S.S., Mueller, F.: Improving the availability of supercomputer job input data using temporal replication. Comput. Sci. Res. Dev. 23 (2009)
Song, H., Sun, X.-H., Yin, Y., Chen, Y.: A cost-intelligent application-specific data layout scheme for parallel file systems. In: HPDC’11: Proceedings of the 20th International ACM Symposium on High Performance Distributed Computing, pp. 37–48 (2011)
Google Scholar
Seltzer, M., Chen, P., Ousterhout, J.: Disk scheduling revisited. In: Proceedings of the USENIX Winter Technical Conference, USENIX Winter ’90, pp. 313–324 (1990)
Google Scholar
Worthington, B.L., Ganger, G.R., Patt, Y.N.: Scheduling algorithms for modern disk drives. In: SIGMETRICS’94: Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 241–251 (1994)
Chapter Google Scholar
Lumb, C.R., Schindler, J., Ganger, G.R., Nagle, D.F.: Towards higher disk head utilization: extracting free bandwidth from busy disk drives. In: OSDI’00: Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation, pp. 87–102. USENIX Association, Berkeley (2000)
Google Scholar
Zhang, X., Jiang, S.: InterferenceRemoval: Removing interference of disk access for MPI programs through data replication. In: ICS’10: Proceedings of the 24th International Conference on Supercomputing, pp. 223–232 (2010)
Chapter Google Scholar
Isaila, F., Tichy, W.F.: Clusterfile: a flexible physical layout parallel file system. In: Cluster’01: Proceedings of the 3rd IEEE International Conference on Cluster Computing, p. 37 (2001)
Google Scholar
Wang, F., Xin, Q., Hong, B., Brandt, S.A., Miller, E.L., Long, D.D.E., Mclarty, T.T.: File system workload analysis for large scientific computing applications. In: Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 139–152, Apr. 2004
Google Scholar
Ligon, W.B., Ross, R.B.: Implementation and performance of a parallel file system for high performance distributed applications. In: HPDC’96: Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing, p. 471. IEEE Computer Society, Washington (1996)
Chapter Google Scholar
Ruemmler, C., Wilkes, J.: An introduction to disk drive modeling. IEEE Comput. 27, 17–28 (1994)
Article Google Scholar
Tian, Y., Klasky, S., Abbasi, H., Lofstead, J., Grout, R., Podhorszki, N., Liu, Q., Wang, Y., Yu, W.: EDO: Improving read performance for scientific applications through elastic data organization. In: Cluster’11: Proceedings of the IEEE International Conference on Cluster Computing. Cluster, vol. 11 (2011)
Google Scholar
Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Scalable I/O tracing and analysis. In: PDSW’09: Proceedings of the 4th Annual Workshop on Petascale Data Storage, pp. 26–31. ACM, New York (2009)
Chapter Google Scholar
Yun, H.-C., Lee, S.-K., Lee, J., Maeng, S.: An efficient lock protocol for home-based lazy release consistency. In: CCGRID’01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, p. 527. IEEE Computer Society, Washington (2001)
Google Scholar
Phanishayee, A., Krevat, E., Vasudevan, V., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: Measurement and analysis of TCP throughput collapse in cluster-based storage systems. In: FAST’08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp. 1–14. USENIX Association, Berkeley (2008)
Google Scholar
Vasudevan, V., Phanishayee, A., Shah, H., Krevat, E., Andersen, D.G., Ganger, G.R., Gibson, G.A., Mueller, B.: Safe and effective fine-grained TCP retransmissions for datacenter communication. In: SIGCOMM’09: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, pp. 303–314. ACM, New York (2009). [Online]. Available: http://doi.acm.org/10.1145/1592568.1592604
Chapter Google Scholar
Vasudevan, V., Shah, H., Phanishayee, A., Krevat, E., Andersen, D., Ganger, G., Gibson, G.: Solving TCP incast in cluster storage systems (poster presentation). In: FAST’09: Proceedings of the 7th USENIX Conference on File and Storage Technologies (2009)
Google Scholar

Download references

Acknowledgement

The authors are thankful to Dr. Rajeev Thakur and Dr. Robert Ross of Argonne National Laboratory for their constructive and thoughtful suggestions toward this study. The authors are also grateful to anonymous reviewers for their valuable comments and suggestions. This research was supported in part by National Science Foundation under US NSF grant CCF-0621435 and CCF-0937877.

Author information

Authors and Affiliations

R&D Center, Dawning Information Industry Co., Ltd., Beijing, 100084, China
Huaiming Song
Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
Yanlong Yin & Xian-He Sun
Department of Computer Science, Texas Tech University, Lubbock, TX, 79409, USA
Yong Chen

Authors

Huaiming Song
View author publications
You can also search for this author in PubMed Google Scholar
Yanlong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Yong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xian-He Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaiming Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, H., Yin, Y., Chen, Y. et al. Cost-intelligent application-specific data layout optimization for parallel file systems. Cluster Comput 16, 285–298 (2013). https://doi.org/10.1007/s10586-012-0200-4

Download citation

Received: 12 September 2011
Accepted: 23 January 2012
Published: 15 February 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10586-012-0200-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-intelligent application-specific data layout optimization for parallel file systems

Abstract

Access this article

Similar content being viewed by others

Performance-Aware Data Placement in Hybrid Parallel File Systems

Fine-grained management of I/O optimizations based on workload characteristics

I/O Optimizations Based on Workload Characteristics for Parallel File Systems

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-intelligent application-specific data layout optimization for parallel file systems

Abstract

Access this article

Similar content being viewed by others

Performance-Aware Data Placement in Hybrid Parallel File Systems

Fine-grained management of I/O optimizations based on workload characteristics

I/O Optimizations Based on Workload Characteristics for Parallel File Systems

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation