Skip to main content

I/O and File Systems for Data-Intensive Applications

  • Chapter
  • First Online:
Handbook on Data Centers
  • 4063 Accesses

Abstract

Largecany other knowledge discoveries. During the evolution of parallel computing, it forms two major camps: high-performance computing (or Supercomputing) and cloud computing. HPC is computing-oriented and the typical applications are scientific simulation, numerical computation, and etc. They rely on low-latency networks for message passing and use parallel programming paradigms such as MPI to enable parallelism [1]. Cloud computing is usually data-processing-oriented and the typical framework is designed for large-scale batch data processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We eliminated the I/O phase in the SiCortex experiments and only measured the communication phase cost for overhead analysis. The lack of local disk and the job scheduler of SiCortex make it impractical, if not impossible, to deploy Kosmos file system on SiCortex.

References

  1. “The Message Passing Interface (MPI) standard” [Online]. Available: http://www.mcs.anl.gov/research/projects/mpi/.

  2. F. Schmuck and R. Haskin, “GPFS: A Shared-disk FileSystem for Large Computing Clusters,” in Proceedings of the 1st USENIX Conference on File and, 2002.

    Google Scholar 

  3. “Lustre File Systems Website,” [Online]. Available: http://wiki.lustre.org/index.php/Main_Page.

  4. P. J. Braam., “The Lustre Storage Architecture,” [Online]. Available: http://www.lustre.org/documentation.html.

  5. “OrangeFS Website,” [Online]. Available: orangefs.org.

    Google Scholar 

  6. Carns, P.H., Ligon, W.B. III, and Ross, R.B., “PVFS: A Parallel File System for Linux Clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, 2000.

    Google Scholar 

  7. “MPI-2: Extensions to the Message-Passing Interface,” [Online]. Available: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.

  8. R. Thakur, W. Gropp, and E. Lusk, “Data Sieving and Collective I/O in ROMIO,” in FRONTIERS ’99: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, 1999.

    Google Scholar 

  9. Dean, Jeffrey, and Ghemawat, Sanjay, “MapReduce: Simplified Data Processing on Large Clusters,” in Sixth Symposium on Operating System Design and Implementation, 2004.

    Google Scholar 

  10. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” in 19th ACM Symposium on Operating Systems Principles, 2003.

    Google Scholar 

  11. “Hadoop Distribute Filesystem Website,” [Online]. Available: http://hadoop.apache.org/hdfs/.

  12. “Kosmos Distributed Filesystem” [Online]. Available: http://code.google.com/p/kosmosfs/.

  13. “libHDFS Source Code” [Online]. Available: http://github.com/apache/hadoop-hdfs/blob/trunk/src/c++/libhdfs/hdfs.h.

  14. Brewer, E, “PODC Keynote Presentation,” 2000. [Online]. Available: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.

  15. H. Song, Y. Yin, Y. Chen, and X.-H. Sun, “A Cost-Intelligent Application-Specific Data Layout Scheme for Parallel File Systems,” in Proc. of the 20th International ACM Symposium on High Performance Distributed Computing, 2011.

    Google Scholar 

  16. Prost, J.-P.; Treumann, R.; Hedges, R.; Jia, B.; Koniges, A., “MPI-IO/GPFS, an Optimized Implemetation of MPI-IO on top of GPFS,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2001.

    Google Scholar 

  17. Liao, Wei-keng, and Choudhary, Alok, “Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based On Underlying Parallel File System Locking Protocols,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, 2008.

    Google Scholar 

  18. H. Jin, J. Ji, X.-H. Sun, Y. Chen and R. Thakur, “CHAIO: Enabling HPC Applications on Data-Intensive File Systems,” in 41st International Conference on Parallel Processing, 2012.

    Google Scholar 

  19. “TOP500 Supercomputer Sites” [Online]. Available: http://www.top500.org/.

  20. “Magellan Project: A Cloud for Science,” [Online]. Available: http://magellan.alcf.anl.gov/.

  21. Walker, E., “Benchmarking Amazon EC2 for High-Performance Scientific Computing,” Usenix Login, 2008.

    Google Scholar 

  22. He, Q.; Zhou, S.; Kobler, B.; Duffy, D.; McGlynn, T., “Case Study for Running HPC Applications in Public Clouds,” in Proc. of 1st Workshop on Scientific Cloud Computing (ScienceCloud), 2010.

    Google Scholar 

  23. “HPC in the Cloud,” [Online]. Available: http://www.hpcinthecloud.com/.

  24. Moody, A.; Bronevetsky, G.; Mohror, K.; Supinski, B. R., “Design, Modeling and Evaluation of a Scalable Multi-Level Checkpointing System,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2010.

    Google Scholar 

  25. Oldfield, R.; Ward, L.; Riesen, R.; Riesen, A.; Widener, P.; Widener, T., “Lightweight I/O for Scientific Applications,” in Proc. of IEEE Cluster Computing (Cluster), 2006.

    Google Scholar 

  26. C. Mitchell, J. Ahrensy and J. Wang, “VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distirburted I/O Systems,” in IEEE International Parallel & Distributed Processing Symposium, 2011.

    Google Scholar 

  27. Bent John and Gibson Garth and Grider Gary and McClelland Ben and Nowoczynski Paul and Nunez James and Polte Milo and Wingate Meghan, “PLFS: A Checkpoint Filesystem for Parallel Applications,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.

    Google Scholar 

  28. Sehrish Saba and Mackey Grant and Wang Jun and Bent John, “MRAP: a Novel Mapreduce-based Framework to Support HPC Analytics Applications with Access Patterns,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010.

    Google Scholar 

  29. Al-Kiswany, S.; Ripeanu, M.; Vazhkudai, S. S.; Gharaibeh, A., “stdchk: A Checkpoint Storage System for Desktop Grid Computing,” in Proc. of The 28th International Conference on Distributed Computing Systems (ICDCS), 2008.

    Google Scholar 

  30. “IOR HPC Benchmark,” [Online]. Available: http://sourceforge.net/projects/ior-sio/.

  31. B. Nicolae, G. Antoniu, L. Bougé, D. Moise and A. Carpen-Amarie, “BlobSeer: Next-Generation Data Management for Large Scale Infrastructures,” Journal of Parallel and Distributed Computing, vol. 2, pp. 169–184, 2011.

    Article  Google Scholar 

  32. M.-E. Esteban, G. Maya, M. Carlos, J. Bent and S. Brandt, “Mixing Hadoop and HPC Workloads on Parallel,” in the 2009 ACM Petascale Data Storage Workshop (PDSW 09), 2009.

    Google Scholar 

  33. W. Tantisiriroj, S. Patil, G. Gibson, S. W. Son, S. J. Lang and R. B. Ross, “On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.

    Google Scholar 

  34. “Hamster: Hadoop And Mpi on the same cluSTER,” [Online]. Available: http://issues.apache.org/jira/browse/MAPREDUCE-2911.

  35. “Apache Mesos” [Online]. Available: http://mesos.apache.org/.

  36. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in the 8th USENIX conference on Networked systems design and implementation, 2011.

    Google Scholar 

  37. “MapR Direct Access NFS” [Online]. Available: http://www.mapr.com/products/only-with-mapr/direct-access-nfs.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanlong Yin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Yin, Y., Jin, H., Sun, XH. (2015). I/O and File Systems for Data-Intensive Applications. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-2092-1_18

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-2091-4

  • Online ISBN: 978-1-4939-2092-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics