Abstract
Performance modeling for MapReduce applications with large-scale data is a very important issue in the study of optimization, evaluation, prediction and resource scheduling of the jobs over big data and cloud computing platforms. In this paper, we study the Hadoop distributed computing framework, which is the current trend of Big Data solutions. We use the locally weighted linear regression (LWLR) algorithm and linear regression (LR) algorithm to establish three kinds of computing models based on different characteristics to estimate the execution time of the applications that have large-scale data and run on the Hadoop framework, and at the same time we make comparison and improvement to the three models. By building different types of experimental environments, and running different types of jobs, we can draw a conclusion that all the three models have very good results in predicting the execution time and evaluating the performance of large-scale data applications with small-scale data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Snijders, C., Matzat, U., Reips, U.D.: “Big Data”: big gaps of knowledge in the field of internet science. Intl. J. Internet Sci. 7(1), 1–5 (2012)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Wang, X., Lu, Z., Wu, J., et al.: In: STechAH: an autoscaling scheme for hadoop in the private cloud. In: 2015 IEEE International Conference on Services Computing (SCC), pp. 395–402. IEEE (2015)
Ruppert, D., Wand, M.P.: Multivariate locally weighted least squares regression. Ann. Stat. 22(3), 1346–1370 (1994)
Herodotou, H.: Hadoop performance models. arXiv preprint arXiv:1106.0940 (2011)
Lin, X., Meng, Z., Xu, C., et al.: A practical performance model for hadoop mapreduce. In: 2012 IEEE International Conference on Cluster Computing Workshops (Cluster Workshops), pp. 231–239. IEEE (2012)
Song, G., Meng, Z., Huet, F., et al.: A hadoop mapreduce performance prediction method. In: High Performance Computing and Communications. IEEE (2013)
Tian, F., Chen, K.: Towards optimal resource provisioning for running mapreduce programs in public clouds. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), pp. 155–162. IEEE (2011)
Chen, K., Powers, J., Guo, S., et al.: Cresp: towards optimal resource provisioning for mapreduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)
Carrera, I.: Performance modeling of mapreduce applications for the cloud. The Federal University of Rio Grande do Sul (2014)
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)
Verma, A., Cherkasova, L., Campbell, Roy, H.: Resource provisioning framework for mapreduce jobs with performance goals. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 165–186. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25821-3_9
Zhang, Z., Cherkasova, L., Loo, B.T.: Benchmarking approach for designing a mapreduce performance model. In: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering, pp. 253–258. ACM (2013)
Khan, M., Jin, Y., Li, M., et al.: Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans. Parallel Distrib. Syst. 27(2), 441–454 (2016)
Acknowledgments
This work is supported by Shanghai 2016 Innovation Action Project under Grant 16DZ1100200-Data-trade-supporting Big data Testbed. This work is also supported by 2016–2019 National Natural Science Foundation of China under Grant No. 61572137-Multiple Clouds based CDN as a Service Key Technology Research, Shanghai 2015 Innovation Action Project under Grant No.1551110700- New media-oriented Big data analysis and content delivery key technology and application, and Fudan-Hitachi Innovative Software Technology Joint Project-“Cloud Platform Design for Big data”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, N., Yang, J., Lu, Z., Li, X., Wu, J. (2016). Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud. In: Wang, G., Han, Y., Martínez Pérez, G. (eds) Advances in Services Computing. APSCC 2016. Lecture Notes in Computer Science(), vol 10065. Springer, Cham. https://doi.org/10.1007/978-3-319-49178-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-49178-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49177-6
Online ISBN: 978-3-319-49178-3
eBook Packages: Computer ScienceComputer Science (R0)