Skip to main content

Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10065))

Abstract

Performance modeling for MapReduce applications with large-scale data is a very important issue in the study of optimization, evaluation, prediction and resource scheduling of the jobs over big data and cloud computing platforms. In this paper, we study the Hadoop distributed computing framework, which is the current trend of Big Data solutions. We use the locally weighted linear regression (LWLR) algorithm and linear regression (LR) algorithm to establish three kinds of computing models based on different characteristics to estimate the execution time of the applications that have large-scale data and run on the Hadoop framework, and at the same time we make comparison and improvement to the three models. By building different types of experimental environments, and running different types of jobs, we can draw a conclusion that all the three models have very good results in predicting the execution time and evaluating the performance of large-scale data applications with small-scale data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Snijders, C., Matzat, U., Reips, U.D.: “Big Data”: big gaps of knowledge in the field of internet science. Intl. J. Internet Sci. 7(1), 1–5 (2012)

    Google Scholar 

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  3. Wang, X., Lu, Z., Wu, J., et al.: In: STechAH: an autoscaling scheme for hadoop in the private cloud. In: 2015 IEEE International Conference on Services Computing (SCC), pp. 395–402. IEEE (2015)

    Google Scholar 

  4. Ruppert, D., Wand, M.P.: Multivariate locally weighted least squares regression. Ann. Stat. 22(3), 1346–1370 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  5. Herodotou, H.: Hadoop performance models. arXiv preprint arXiv:1106.0940 (2011)

  6. Lin, X., Meng, Z., Xu, C., et al.: A practical performance model for hadoop mapreduce. In: 2012 IEEE International Conference on Cluster Computing Workshops (Cluster Workshops), pp. 231–239. IEEE (2012)

    Google Scholar 

  7. Song, G., Meng, Z., Huet, F., et al.: A hadoop mapreduce performance prediction method. In: High Performance Computing and Communications. IEEE (2013)

    Google Scholar 

  8. Tian, F., Chen, K.: Towards optimal resource provisioning for running mapreduce programs in public clouds. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), pp. 155–162. IEEE (2011)

    Google Scholar 

  9. Chen, K., Powers, J., Guo, S., et al.: Cresp: towards optimal resource provisioning for mapreduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)

    Article  Google Scholar 

  10. Carrera, I.: Performance modeling of mapreduce applications for the cloud. The Federal University of Rio Grande do Sul (2014)

    Google Scholar 

  11. Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)

    Google Scholar 

  12. Verma, A., Cherkasova, L., Campbell, Roy, H.: Resource provisioning framework for mapreduce jobs with performance goals. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 165–186. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25821-3_9

    Chapter  Google Scholar 

  13. Zhang, Z., Cherkasova, L., Loo, B.T.: Benchmarking approach for designing a mapreduce performance model. In: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering, pp. 253–258. ACM (2013)

    Google Scholar 

  14. Khan, M., Jin, Y., Li, M., et al.: Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans. Parallel Distrib. Syst. 27(2), 441–454 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by Shanghai 2016 Innovation Action Project under Grant 16DZ1100200-Data-trade-supporting Big data Testbed. This work is also supported by 2016–2019 National Natural Science Foundation of China under Grant No. 61572137-Multiple Clouds based CDN as a Service Key Technology Research, Shanghai 2015 Innovation Action Project under Grant No.1551110700- New media-oriented Big data analysis and content delivery key technology and application, and Fudan-Hitachi Innovative Software Technology Joint Project-“Cloud Platform Design for Big data”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhihui Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Wang, N., Yang, J., Lu, Z., Li, X., Wu, J. (2016). Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49178-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49177-6

  • Online ISBN: 978-3-319-49178-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics