Abstract
Largecany other knowledge discoveries. During the evolution of parallel computing, it forms two major camps: high-performance computing (or Supercomputing) and cloud computing. HPC is computing-oriented and the typical applications are scientific simulation, numerical computation, and etc. They rely on low-latency networks for message passing and use parallel programming paradigms such as MPI to enable parallelism [1]. Cloud computing is usually data-processing-oriented and the typical framework is designed for large-scale batch data processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We eliminated the I/O phase in the SiCortex experiments and only measured the communication phase cost for overhead analysis. The lack of local disk and the job scheduler of SiCortex make it impractical, if not impossible, to deploy Kosmos file system on SiCortex.
References
“The Message Passing Interface (MPI) standard” [Online]. Available: http://www.mcs.anl.gov/research/projects/mpi/.
F. Schmuck and R. Haskin, “GPFS: A Shared-disk FileSystem for Large Computing Clusters,” in Proceedings of the 1st USENIX Conference on File and, 2002.
“Lustre File Systems Website,” [Online]. Available: http://wiki.lustre.org/index.php/Main_Page.
P. J. Braam., “The Lustre Storage Architecture,” [Online]. Available: http://www.lustre.org/documentation.html.
“OrangeFS Website,” [Online]. Available: orangefs.org.
Carns, P.H., Ligon, W.B. III, and Ross, R.B., “PVFS: A Parallel File System for Linux Clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, 2000.
“MPI-2: Extensions to the Message-Passing Interface,” [Online]. Available: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.
R. Thakur, W. Gropp, and E. Lusk, “Data Sieving and Collective I/O in ROMIO,” in FRONTIERS ’99: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, 1999.
Dean, Jeffrey, and Ghemawat, Sanjay, “MapReduce: Simplified Data Processing on Large Clusters,” in Sixth Symposium on Operating System Design and Implementation, 2004.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” in 19th ACM Symposium on Operating Systems Principles, 2003.
“Hadoop Distribute Filesystem Website,” [Online]. Available: http://hadoop.apache.org/hdfs/.
“Kosmos Distributed Filesystem” [Online]. Available: http://code.google.com/p/kosmosfs/.
“libHDFS Source Code” [Online]. Available: http://github.com/apache/hadoop-hdfs/blob/trunk/src/c++/libhdfs/hdfs.h.
Brewer, E, “PODC Keynote Presentation,” 2000. [Online]. Available: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.
H. Song, Y. Yin, Y. Chen, and X.-H. Sun, “A Cost-Intelligent Application-Specific Data Layout Scheme for Parallel File Systems,” in Proc. of the 20th International ACM Symposium on High Performance Distributed Computing, 2011.
Prost, J.-P.; Treumann, R.; Hedges, R.; Jia, B.; Koniges, A., “MPI-IO/GPFS, an Optimized Implemetation of MPI-IO on top of GPFS,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2001.
Liao, Wei-keng, and Choudhary, Alok, “Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based On Underlying Parallel File System Locking Protocols,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, 2008.
H. Jin, J. Ji, X.-H. Sun, Y. Chen and R. Thakur, “CHAIO: Enabling HPC Applications on Data-Intensive File Systems,” in 41st International Conference on Parallel Processing, 2012.
“TOP500 Supercomputer Sites” [Online]. Available: http://www.top500.org/.
“Magellan Project: A Cloud for Science,” [Online]. Available: http://magellan.alcf.anl.gov/.
Walker, E., “Benchmarking Amazon EC2 for High-Performance Scientific Computing,” Usenix Login, 2008.
He, Q.; Zhou, S.; Kobler, B.; Duffy, D.; McGlynn, T., “Case Study for Running HPC Applications in Public Clouds,” in Proc. of 1st Workshop on Scientific Cloud Computing (ScienceCloud), 2010.
“HPC in the Cloud,” [Online]. Available: http://www.hpcinthecloud.com/.
Moody, A.; Bronevetsky, G.; Mohror, K.; Supinski, B. R., “Design, Modeling and Evaluation of a Scalable Multi-Level Checkpointing System,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2010.
Oldfield, R.; Ward, L.; Riesen, R.; Riesen, A.; Widener, P.; Widener, T., “Lightweight I/O for Scientific Applications,” in Proc. of IEEE Cluster Computing (Cluster), 2006.
C. Mitchell, J. Ahrensy and J. Wang, “VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distirburted I/O Systems,” in IEEE International Parallel & Distributed Processing Symposium, 2011.
Bent John and Gibson Garth and Grider Gary and McClelland Ben and Nowoczynski Paul and Nunez James and Polte Milo and Wingate Meghan, “PLFS: A Checkpoint Filesystem for Parallel Applications,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.
Sehrish Saba and Mackey Grant and Wang Jun and Bent John, “MRAP: a Novel Mapreduce-based Framework to Support HPC Analytics Applications with Access Patterns,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010.
Al-Kiswany, S.; Ripeanu, M.; Vazhkudai, S. S.; Gharaibeh, A., “stdchk: A Checkpoint Storage System for Desktop Grid Computing,” in Proc. of The 28th International Conference on Distributed Computing Systems (ICDCS), 2008.
“IOR HPC Benchmark,” [Online]. Available: http://sourceforge.net/projects/ior-sio/.
B. Nicolae, G. Antoniu, L. Bougé, D. Moise and A. Carpen-Amarie, “BlobSeer: Next-Generation Data Management for Large Scale Infrastructures,” Journal of Parallel and Distributed Computing, vol. 2, pp. 169–184, 2011.
M.-E. Esteban, G. Maya, M. Carlos, J. Bent and S. Brandt, “Mixing Hadoop and HPC Workloads on Parallel,” in the 2009 ACM Petascale Data Storage Workshop (PDSW 09), 2009.
W. Tantisiriroj, S. Patil, G. Gibson, S. W. Son, S. J. Lang and R. B. Ross, “On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.
“Hamster: Hadoop And Mpi on the same cluSTER,” [Online]. Available: http://issues.apache.org/jira/browse/MAPREDUCE-2911.
“Apache Mesos” [Online]. Available: http://mesos.apache.org/.
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in the 8th USENIX conference on Networked systems design and implementation, 2011.
“MapR Direct Access NFS” [Online]. Available: http://www.mapr.com/products/only-with-mapr/direct-access-nfs.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Yin, Y., Jin, H., Sun, XH. (2015). I/O and File Systems for Data-Intensive Applications. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_18
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2092-1_18
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2091-4
Online ISBN: 978-1-4939-2092-1
eBook Packages: Computer ScienceComputer Science (R0)