ABSTRACT
Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.
- M. Abd-El-Malek, W. V. C. II, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa Minor: Versatile Cluster-based Storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '05), 2005. Google ScholarDigital Library
- M. K. Aguilera, A. Merchant, M. A. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In Proceedings of the ACM Symposium on Operating systems principles (SOSP), 2007. Google ScholarDigital Library
- M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a Distributed File System. SIGOPS Operating Systems Review (OSR), 25(5), 1991. Google ScholarDigital Library
- P. A. Bernstein and N. Goodman. The failure and recovery problem for replicated databases. In PODC '83: Proceedings of the second annual ACM symposium on Principles of distributed computing, 1983. Google ScholarDigital Library
- M. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006. Google ScholarDigital Library
- Cassandra: A Structured Storage System on a P2P Network. http://code.google.com/p/the-cassandraproject.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006. Google ScholarDigital Library
- J. Cipar, M. D. Corner, and E. D. Berger. TFS: a Transparent File System for Contributory Storage. In Proceedings of the 5th USENIX conference on File and Storage Technologies (FAST '07), 2007. Google ScholarDigital Library
- Cisco. Cisco Data Center Infrastructure 2.5 Design Guide. http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DC_Infra2_5/DCI_SRND.pdf.Google Scholar
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2008.Google ScholarDigital Library
- J. Dean. Software Engineering Advice from Building Large-Scale Distributed Systems. http://research.google.com/people/jeff/stanford-295-talk.pdf.Google Scholar
- J. Dean. Experiences with MapReduce, an Abstraction for Large-Scale Computation. In Keynote I: PACT, 2006. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2004. Google ScholarDigital Library
- G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proceedings of the ACM Symposium on Operating systems principles (SOSP), 2007. Google ScholarDigital Library
- Dryad Project Page at MSR. http://research.microsoft.com/en-us/projects/Dryad/.Google Scholar
- Facebook. Hive. http://hadoop.apache.org/hive/.Google Scholar
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating systems principles (SOSP), 2003. Google ScholarDigital Library
- S. D. Gribble, E. A. Brewer, J. M. Hellerstein, and D. Culler. Scalable, Distributed Data Structures for Internet Service Construction. In Proceedings of the 4th USENIX Symposium on Opearting Systems Design and Implementation (OSDI '00), 2000. Google ScholarDigital Library
- Hadoop Presentations. http://wiki.apache.org/hadoop/HadoopPresentations.Google Scholar
- HBase. http://hadoop.apache.org/hbase.Google Scholar
- HDFS (Hadoop Distributed File System). http://hadoop.apache.org/core/docs/r0.20.0/hdfs_user_guide.html.Google Scholar
- M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492, 1990. Google ScholarDigital Library
- J. H. Howard. An Overview of the Andrew File System. In Proceedings of the Annual USENIX Winter Technical Conference, 1988.Google Scholar
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs From Sequential Building Blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007. Google ScholarDigital Library
- S. Y. Ko, I. Hoque, B. Cho, and I. Gupta. On Availability of Intermediate Data in Cloud Computations. In Proceedings of the 12th Workshop on Hot Topics in Operating Systems (HotOS), 2009. Google ScholarDigital Library
- A. Kuzmanovic and E. W. Knightly. TCP-LP: Low-Priority Service via End-Point Congestion Control. IEEE/ACM Transactions on Networking, 14(4):739--752, 2006. Google ScholarDigital Library
- Y. Kwon, M. Balazinska, and A. Greenberg. Fault-tolerant Stream Processing using a Distributed, Replicated File System. In Proceedings of the International Conference on Very Large Data Bases (VLDB '08), 2008.Google ScholarDigital Library
- J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: Abstractions as the Foundation for Storage Infrastructure. In Proceedings of the 6th USENIX Symposium on Opearting Systems Design and Implementation (OSDI '04), 2004. Google ScholarDigital Library
- Sorting 1PB with MapReduce. http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html.Google Scholar
- Network File System (NFS) version 4 Protocol. http://tools.ietf.org/html/rfc3530.Google Scholar
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD), 2008. Google ScholarDigital Library
- Powered by Hadoop. http://wiki.apache.org/hadoop/PoweredBy.Google Scholar
- Y. Saito, C. Karamanolis, M. Karlsson, and M. Mahalingam. Taming Aggressive Replication in the Pangaea Wide-Area File System. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2002. Google ScholarDigital Library
- Y. Saito and M. Shapiro. Optimistic Replication. ACM Computing Surveys, 37(1):42--81, 2005. Google ScholarDigital Library
- D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the fifteenth ACM symposium on Operating systems principles (SOSP '95), 1995. Google ScholarDigital Library
- A. Venkataramani, R. Kokku, and M. Dahlin. TCP Nice: A Mechanism for Background Transfers. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2002. Google ScholarDigital Library
- W. Vogels. File System Usage in Windows NT 4.0. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles (SOSP), 1999. Google ScholarDigital Library
Index Terms
- Making cloud intermediate data fault-tolerant
Recommendations
Analyzing job completion reliability and job energy consumption for a heterogeneous MapReduce cluster under different intermediate-data replication policies
Recently, MapReduce has been a popular distributed programming framework for solving data-intensive applications. However, a large-scale MapReduce cluster has inevitable machine/node failures and considerable energy consumption. To solve these problems, ...
High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing ApplicationsHadoop is a quickly budding ecosystem of components based on Google's MapReduce algorithm and file system work for implementing MapReduce algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process ...
Smart Intermediate Data Transfer for MapReduce on Cloud Computing
CLOUDCOM-ASIA '13: Proceedings of the 2013 International Conference on Cloud Computing and Big DataMapReduce is a programming model proposed by Google to process large datasets in clusters. However, MapReduce often needs to transfer much intermediate data among nodes, which is harmful to performances of an application. MapReduce can be enhanced by ...
Comments