Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud

Wang, Nini; Yang, Jian; Lu, Zhihui; Li, Xiaoyan; Wu, Jie

doi:10.1007/978-3-319-49178-3_6

Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud

Nini Wang¹⁶,
Jian Yang¹⁶,
Zhihui Lu¹⁶,
Xiaoyan Li¹⁶ &
…
Jie Wu¹⁷

Conference paper
First Online: 10 November 2016

2559 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10065))

Abstract

Performance modeling for MapReduce applications with large-scale data is a very important issue in the study of optimization, evaluation, prediction and resource scheduling of the jobs over big data and cloud computing platforms. In this paper, we study the Hadoop distributed computing framework, which is the current trend of Big Data solutions. We use the locally weighted linear regression (LWLR) algorithm and linear regression (LR) algorithm to establish three kinds of computing models based on different characteristics to estimate the execution time of the applications that have large-scale data and run on the Hadoop framework, and at the same time we make comparison and improvement to the three models. By building different types of experimental environments, and running different types of jobs, we can draw a conclusion that all the three models have very good results in predicting the execution time and evaluating the performance of large-scale data applications with small-scale data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Snijders, C., Matzat, U., Reips, U.D.: “Big Data”: big gaps of knowledge in the field of internet science. Intl. J. Internet Sci. 7(1), 1–5 (2012)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Wang, X., Lu, Z., Wu, J., et al.: In: STechAH: an autoscaling scheme for hadoop in the private cloud. In: 2015 IEEE International Conference on Services Computing (SCC), pp. 395–402. IEEE (2015)
Google Scholar
Ruppert, D., Wand, M.P.: Multivariate locally weighted least squares regression. Ann. Stat. 22(3), 1346–1370 (1994)
Article MathSciNet MATH Google Scholar
Herodotou, H.: Hadoop performance models. arXiv preprint arXiv:1106.0940 (2011)
Lin, X., Meng, Z., Xu, C., et al.: A practical performance model for hadoop mapreduce. In: 2012 IEEE International Conference on Cluster Computing Workshops (Cluster Workshops), pp. 231–239. IEEE (2012)
Google Scholar
Song, G., Meng, Z., Huet, F., et al.: A hadoop mapreduce performance prediction method. In: High Performance Computing and Communications. IEEE (2013)
Google Scholar
Tian, F., Chen, K.: Towards optimal resource provisioning for running mapreduce programs in public clouds. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), pp. 155–162. IEEE (2011)
Google Scholar
Chen, K., Powers, J., Guo, S., et al.: Cresp: towards optimal resource provisioning for mapreduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)
Article Google Scholar
Carrera, I.: Performance modeling of mapreduce applications for the cloud. The Federal University of Rio Grande do Sul (2014)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)
Google Scholar
Verma, A., Cherkasova, L., Campbell, Roy, H.: Resource provisioning framework for mapreduce jobs with performance goals. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 165–186. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25821-3_9
Chapter Google Scholar
Zhang, Z., Cherkasova, L., Loo, B.T.: Benchmarking approach for designing a mapreduce performance model. In: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering, pp. 253–258. ACM (2013)
Google Scholar
Khan, M., Jin, Y., Li, M., et al.: Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans. Parallel Distrib. Syst. 27(2), 441–454 (2016)
Article Google Scholar

Download references

Acknowledgments

This work is supported by Shanghai 2016 Innovation Action Project under Grant 16DZ1100200-Data-trade-supporting Big data Testbed. This work is also supported by 2016–2019 National Natural Science Foundation of China under Grant No. 61572137-Multiple Clouds based CDN as a Service Key Technology Research, Shanghai 2015 Innovation Action Project under Grant No.1551110700- New media-oriented Big data analysis and content delivery key technology and application, and Fudan-Hitachi Innovative Software Technology Joint Project-“Cloud Platform Design for Big data”.

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, 200433, China
Nini Wang, Jian Yang, Zhihui Lu & Xiaoyan Li
Engineering Research Center of Cyber Security, Auditing and Monitoring, Ministry of Education, Shanghai, 200433, China
Jie Wu

Authors

Nini Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihui Lu .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
North China University of Technology, Beijing, China
Yanbo Han
University of Murcia, Murcia, Spain
Gregorio Martínez Pérez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, N., Yang, J., Lu, Z., Li, X., Wu, J. (2016). Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-49178-3_6
Published: 10 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49177-6
Online ISBN: 978-3-319-49178-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics