I/O and File Systems for Data-Intensive Applications

Yin, Yanlong; Jin, Hui; Sun, Xian-He

doi:10.1007/978-1-4939-2092-1_18

Yanlong Yin³,
Hui Jin⁴ &
Xian-He Sun³

4063 Accesses

Abstract

Largecany other knowledge discoveries. During the evolution of parallel computing, it forms two major camps: high-performance computing (or Supercomputing) and cloud computing. HPC is computing-oriented and the typical applications are scientific simulation, numerical computation, and etc. They rely on low-latency networks for message passing and use parallel programming paradigms such as MPI to enable parallelism [1]. Cloud computing is usually data-processing-oriented and the typical framework is designed for large-scale batch data processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We eliminated the I/O phase in the SiCortex experiments and only measured the communication phase cost for overhead analysis. The lack of local disk and the job scheduler of SiCortex make it impractical, if not impossible, to deploy Kosmos file system on SiCortex.

References

“The Message Passing Interface (MPI) standard” [Online]. Available: http://www.mcs.anl.gov/research/projects/mpi/.
F. Schmuck and R. Haskin, “GPFS: A Shared-disk FileSystem for Large Computing Clusters,” in Proceedings of the 1st USENIX Conference on File and, 2002.
Google Scholar
“Lustre File Systems Website,” [Online]. Available: http://wiki.lustre.org/index.php/Main_Page.
P. J. Braam., “The Lustre Storage Architecture,” [Online]. Available: http://www.lustre.org/documentation.html.
“OrangeFS Website,” [Online]. Available: orangefs.org.
Google Scholar
Carns, P.H., Ligon, W.B. III, and Ross, R.B., “PVFS: A Parallel File System for Linux Clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, 2000.
Google Scholar
“MPI-2: Extensions to the Message-Passing Interface,” [Online]. Available: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.
R. Thakur, W. Gropp, and E. Lusk, “Data Sieving and Collective I/O in ROMIO,” in FRONTIERS ’99: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, 1999.
Google Scholar
Dean, Jeffrey, and Ghemawat, Sanjay, “MapReduce: Simplified Data Processing on Large Clusters,” in Sixth Symposium on Operating System Design and Implementation, 2004.
Google Scholar
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” in 19th ACM Symposium on Operating Systems Principles, 2003.
Google Scholar
“Hadoop Distribute Filesystem Website,” [Online]. Available: http://hadoop.apache.org/hdfs/.
“Kosmos Distributed Filesystem” [Online]. Available: http://code.google.com/p/kosmosfs/.
“libHDFS Source Code” [Online]. Available: http://github.com/apache/hadoop-hdfs/blob/trunk/src/c++/libhdfs/hdfs.h.
Brewer, E, “PODC Keynote Presentation,” 2000. [Online]. Available: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.
H. Song, Y. Yin, Y. Chen, and X.-H. Sun, “A Cost-Intelligent Application-Specific Data Layout Scheme for Parallel File Systems,” in Proc. of the 20th International ACM Symposium on High Performance Distributed Computing, 2011.
Google Scholar
Prost, J.-P.; Treumann, R.; Hedges, R.; Jia, B.; Koniges, A., “MPI-IO/GPFS, an Optimized Implemetation of MPI-IO on top of GPFS,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2001.
Google Scholar
Liao, Wei-keng, and Choudhary, Alok, “Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based On Underlying Parallel File System Locking Protocols,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, 2008.
Google Scholar
H. Jin, J. Ji, X.-H. Sun, Y. Chen and R. Thakur, “CHAIO: Enabling HPC Applications on Data-Intensive File Systems,” in 41st International Conference on Parallel Processing, 2012.
Google Scholar
“TOP500 Supercomputer Sites” [Online]. Available: http://www.top500.org/.
“Magellan Project: A Cloud for Science,” [Online]. Available: http://magellan.alcf.anl.gov/.
Walker, E., “Benchmarking Amazon EC2 for High-Performance Scientific Computing,” Usenix Login, 2008.
Google Scholar
He, Q.; Zhou, S.; Kobler, B.; Duffy, D.; McGlynn, T., “Case Study for Running HPC Applications in Public Clouds,” in Proc. of 1st Workshop on Scientific Cloud Computing (ScienceCloud), 2010.
Google Scholar
“HPC in the Cloud,” [Online]. Available: http://www.hpcinthecloud.com/.
Moody, A.; Bronevetsky, G.; Mohror, K.; Supinski, B. R., “Design, Modeling and Evaluation of a Scalable Multi-Level Checkpointing System,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2010.
Google Scholar
Oldfield, R.; Ward, L.; Riesen, R.; Riesen, A.; Widener, P.; Widener, T., “Lightweight I/O for Scientific Applications,” in Proc. of IEEE Cluster Computing (Cluster), 2006.
Google Scholar
C. Mitchell, J. Ahrensy and J. Wang, “VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distirburted I/O Systems,” in IEEE International Parallel & Distributed Processing Symposium, 2011.
Google Scholar
Bent John and Gibson Garth and Grider Gary and McClelland Ben and Nowoczynski Paul and Nunez James and Polte Milo and Wingate Meghan, “PLFS: A Checkpoint Filesystem for Parallel Applications,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.
Google Scholar
Sehrish Saba and Mackey Grant and Wang Jun and Bent John, “MRAP: a Novel Mapreduce-based Framework to Support HPC Analytics Applications with Access Patterns,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010.
Google Scholar
Al-Kiswany, S.; Ripeanu, M.; Vazhkudai, S. S.; Gharaibeh, A., “stdchk: A Checkpoint Storage System for Desktop Grid Computing,” in Proc. of The 28th International Conference on Distributed Computing Systems (ICDCS), 2008.
Google Scholar
“IOR HPC Benchmark,” [Online]. Available: http://sourceforge.net/projects/ior-sio/.
B. Nicolae, G. Antoniu, L. Bougé, D. Moise and A. Carpen-Amarie, “BlobSeer: Next-Generation Data Management for Large Scale Infrastructures,” Journal of Parallel and Distributed Computing, vol. 2, pp. 169–184, 2011.
Article Google Scholar
M.-E. Esteban, G. Maya, M. Carlos, J. Bent and S. Brandt, “Mixing Hadoop and HPC Workloads on Parallel,” in the 2009 ACM Petascale Data Storage Workshop (PDSW 09), 2009.
Google Scholar
W. Tantisiriroj, S. Patil, G. Gibson, S. W. Son, S. J. Lang and R. B. Ross, “On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.
Google Scholar
“Hamster: Hadoop And Mpi on the same cluSTER,” [Online]. Available: http://issues.apache.org/jira/browse/MAPREDUCE-2911.
“Apache Mesos” [Online]. Available: http://mesos.apache.org/.
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in the 8th USENIX conference on Networked systems design and implementation, 2011.
Google Scholar
“MapR Direct Access NFS” [Online]. Available: http://www.mapr.com/products/only-with-mapr/direct-access-nfs.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Illinois Institute of Technology, 60616, Chicago, IL, USA
Yanlong Yin & Xian-He Sun
Parallel Execution Group, Oracle Corporation, 94065, Redwood City, CA, USA
Hui Jin

Authors

Yanlong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Hui Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xian-He Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanlong Yin .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, North Dakota State University, Fargo, North Dakota, USA
Samee U. Khan
School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yin, Y., Jin, H., Sun, XH. (2015). I/O and File Systems for Data-Intensive Applications. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_18

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2092-1_18
Published: 17 March 2015
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2091-4
Online ISBN: 978-1-4939-2092-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics