skip to main content
10.1145/2806777.2806937acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Understanding issue correlations: a case study of the Hadoop system

Published:27 August 2015Publication History

ABSTRACT

Over the last decade, Hadoop has evolved into a widely used platform for Big Data applications. Acknowledging its wide-spread use, we present a comprehensive analysis of the solved issues with applied patches in the Hadoop ecosystem. The analysis is conducted with a focus on Hadoop's two essential components: HDFS (storage) and MapReduce (computation), it involves a total of 4218 solved issues over the last six years, covering 2180 issues from HDFS and 2038 issues from MapReduce. Insights derived from the study concern system design and development, particularly with respect to correlated issues and correlations between root causes of issues and characteristics of the Hadoop subsystems. These findings shed light on the future development of Big Data systems, on their testing, and on bug-finding tools.

References

  1. Apache Cascading. http://www.cascading.org/.Google ScholarGoogle Scholar
  2. Apache Flume. http://flume.apache.org/.Google ScholarGoogle Scholar
  3. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  4. Apache HBase. http://hbase.apache.org/.Google ScholarGoogle Scholar
  5. Apache HCatalog. https://cwiki.apache.org/confluence/display/Hive/HCatalog.Google ScholarGoogle Scholar
  6. Apache Hive. https://hive.apache.org/.Google ScholarGoogle Scholar
  7. Apache Mahout. https://mahout.apache.org/.Google ScholarGoogle Scholar
  8. Apache Pig. http://pig.apache.org/.Google ScholarGoogle Scholar
  9. J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, and S. Brandt. SciHadoop: Array-based Query Processing in Hadoop. In SC'11, Seattle, WA, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Centralized Cache Management in HDFS. https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html.Google ScholarGoogle Scholar
  11. A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An Empirical Study of Operating Systems Errors. In SOSP'01, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Contributors to Hadoop. http://blog.cloudera.com/blog/2011/10/the-community-effect/.Google ScholarGoogle Scholar
  13. J. Dai, J. Huang, S. Huang, B. Huang, and Y. Liu. HiTune: Dataflow-Based Performance Analysis for Big Data Cloud. In USENIX ATC'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Do, T. Harter, Y. Liu, H. S. Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. HARDFS: Hardening HDFS with Selective and Lightweight Versioning. In FAST'13, San Jose, CA, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. U. Erlingsson, M. Peinado, S. Peter, and M. Budiu. Fay: Extensible Distributed Tracing from Kernels to Clusters. In SOSP'11, Cascais, Portugal, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Fonseca, C. Li, V. Singhal, and R. Rodrigues. A Study of the Internal and External Effects of Concurrency Bugs. In DSN'10.Google ScholarGoogle Scholar
  17. D. Fryer, K. Sun, R. Mahmood, T. Cheng, S. Benjamin, A. Goel, and A. D. Brown. Recon: Verifying File System Consistency at Runtime. In FAST'12, San Jose, CA, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In USENIX ATC'06, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. S. Gunawi, T. Do, P. Joshi, P. Alvaro, J. M. Hellerstein, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, K. Sen, and D. Borthakur. FATE and DESIGN: A Framework for Cloud Recovery Testing. In NSDI'11, Boston, MA, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. S. Gunawi, M. Hao, T. Leesatapornwongsa, T. Patanaanake, T. Do, J. Adityatama, K. J. Eliazar, A. Laksono, J. F. Lukman, V. Martin, and A. D. Satria. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems. In SOCC'14, Nov. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hadoop at Twitter. https://blog.twitter.com/2010/hadoop-twitter.Google ScholarGoogle Scholar
  22. Hadoop Distributed File System. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html.Google ScholarGoogle Scholar
  23. Hadoop MapReduce. http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html.Google ScholarGoogle Scholar
  24. Hadoop Systems. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  25. T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Analysis of HDFS Under HBase: A Facebook Messages Case Study. In FAST'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, and Z. Xu. RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems. In ICDE'11, Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Huai, A. Chauhan, A. Gates, G. Hagleitner, E. N. Hanson, O. O'Malley, J. Pandey, Y. Yuan, R. Lee, and X. Zhang. Major Technical Advancements in Apache Hive. In SIGMOD'14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Huang, X. Ouyang, J. Jose, M. W. Rahman, H. Wang, M. Luo, H. Subramoni, C. Murthy, and D. K. Panda. High-Performance Design of HBase with RDMA over Infiniband. In IPDPS'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Joshi, M. Ganai, G. Balakrishnan, A. Gupta, and N. Papakonstantinou. SETSUDO: Perturbation-based Testing Framework for Scalable Distributed Systems. In TRIOS'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Lu, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Lu. A Study of Linux File System Evolution. In FAST'13, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics. In ASPLOS'08, Seattle, WA, Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin. Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture. In SIGMOD'13, New York, USA, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In OSDI'14, Broomfield, CO, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Rabkin and R. H. Katz. How Hadoop Clusters Break. IEEE Software, pages 88--94, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Reynolds, C. Killian, J. L. Wiener, J. C. Mogul, M. A. Shah, and A. Vahdat. Pip: Detecting the Unexpected in Distributed Systems. In NSDI'06, San Jose, CA, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Rodrigues, M. Castro, and B. Liskov. BASE: Using Abstraction to Improve Fault Tolerance. In SOSP'01, Banff, Canada, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. Rubio-González, H. S. Gunawi, B. Liblit, R. H. Arpaci-Dusseau, and A. C. Arpaci-Dusseau. Error propagation analysis for file systems. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. R. Sambasivan, R. Fonseca, I. Shafer, and G. R. Ganger. So, you want to trace your distributed system? Key design insights from years of practical experience. Technical Report, CMU-PDL-14-102, 2014.Google ScholarGoogle Scholar
  39. A. Silberstein, R. Sears, W. Zhou, and B. F. Cooper. A Batch of PNUTS: Experiences Connecting Cloud Batch and Serving Systems. In SIGMOD'11, Athens, Greece, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Tatineni. Hadoop for Scientific Computing. SDSC Summer Institute: HPC Meets Big Data, 2014.Google ScholarGoogle Scholar
  41. B. Venners. Inside the Java Virtual Machine. McGraw-Hill, Inc., New York, NY, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Wang, I. A. Rayan, G. Eisenhauer, K. Schwan, V. Talwar, M. Wolf, and C. Huneycutt. VScope: Middleware for Troubleshooting Time-Sensitive Data Center Applications. In Middleware' 12, Montreal, Quebec, Canada, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, and B. Qiu. BigDataBench: A Big Data Benchmark Suite from Internet Services. In HPCA'14, Flordia, USA, Feb. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  44. Y. Wang, M. Kapritsos, L. Schmidt, L. Alvisi, and M. Dahlin. Exalt: Empowering Researchers to Evaluate Large-Scale Storage Systems. In NSDI'14, Seattle, WA, Apr. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Xu, J. Zhang, P. Huang, J. Zheng, T. Sheng, D. Yuan, Y. Zhou, and S. Pasupathy. Do Not Blame Users for Misconfigurations. In SOSP'13, Farmington, Pennsylvania, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Z. Yin, X. Ma, J. Zheng, Y. Zhou, L. N. Bairavasundaram, and S. Pasupathy. An Empirical Study on Configuration Errors in Commercial and Open Source Systems. In SOSP'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems. In OSDI'14, Broomfield, CO, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. D. Yuan, S. Park, P. Huang, Y. Liu, M. M. Lee, X. Tang, Y. Zhou, and S. Savage. Be Conservative: Enhancing Failure Diagnosis with Proactive Logging. In OSDI'12, Hollywood, CA, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage. Improving Software Diagnosability via Log Enhancement. In ASPLOS' 11, Newport Beach, California, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Understanding issue correlations: a case study of the Hadoop system

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
      August 2015
      446 pages
      ISBN:9781450336512
      DOI:10.1145/2806777

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SoCC '15 Paper Acceptance Rate34of157submissions,22%Overall Acceptance Rate169of722submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader