Abstract
Hadoop/MapReduce has emerged as a de facto programming framework to explore cloud-computing resources. Hadoop has many configuration parameters, some of which are crucial to the performance of MapReduce jobs. In practice, these parameters are usually set to default or inappropriate values. This severely limits system performance (e.g., execution time). Therefore, it is essential but also challenging to investigate how to automatically tune these parameters to optimize MapReduce job performance. In this paper, we propose an automatic MapReduce configuration optimization framework named as MR-COF. By monitoring and analyzing the runtime behavior, the framework adopts a cost-based performance prediction model that predicts the MapReduce job performance. In addition, we design a genetic search algorithm which iteratively tunes parameters in order to find out the best one. Testbed-based experimental results show that the average MapReduce job performance is increased by 35 % with MR-COF compared to the default configuration.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dittrich, J., Quiané-Ruiz, J.A.: Efficient big data processing in hadoop MapReduce. Proc. VLDB Endowment 5(12), 2014–2015 (2012)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: The ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM Press (2009)
Liu, C., Jin, H., Jiang, W., Hai, L.: Research on performance optimization approach of data-intensive application with MapReduce. J. Wuhan Univ. Technol. 32(20), 36–41 (2010)
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The Performance of MapReduce: An In-depth Study. Proc. VLDB Endowment 3(1), 472–483 (2010)
Babu, S.: Towards Automatic Optimization of MapReduce Programs. In: 1st ACM symposium on Cloud computing, pp. 137–142. ACM Press (2010)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: The ACM SIGMOD International Conference on Management of data, pp. 1099–1110. ACM Press (2008)
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. 13(4), 277–298 (2005)
Thusoo, A., Sarma, J.S., Jain, N., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2), 1626–1629 (2009)
Thusoo, A., Sarma, J. S., Jain, N., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-A petabyte scale data warehouse using hadoop. In: The 26th IEEE International Conference on Data Engineering, pp. 996–1005. IEEE Press (2010)
Yang, H., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: The ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM Press (2008)
Jiang, D., Tung, A., Chen, G.: Map-Join-Reduce: toward scalable and efficient data analysis on large clusters. IEEE Trans. Knowl. Data Eng. 23, 1299–1311 (2011)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2010)
Shi, J., Zou, J., Lu, J., Cao, Z., Li, S., Wang, C.: MRTuner: a toolkit to enable holistic optimization for MapReduce jobs. Proc. VLDB Endowment 7(13), 1–12 (2014)
Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A. R., Fuller, N.: MRONLINE: mapreduce online performance tuning. In: The 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 165–176. ACM Press (2014)
Tian, F., Chen, K.: Towards Optimal Resource Provisioning for Running Mapreduce Programs in Public Clouds. In: The IEEE International Conference on Cloud Computing, pp. 155–162. IEEE Press (2011)
Zhang, Z., Cherkasova, L., Loo, B.T.: Parameterizable benchmarking framework for designing a mapreduce performance mode. Concurrency Comput. Pract. Experience 26(12), 2005–2026 (2014)
Yigitbasi, N., Willke, T.L., Liao, G., Epema, D.: Towards machine learning-based auto-tuning of mapreduce. In: The 21st IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 11–20. IEEE Press (2013)
Chen, K., Powers, J., Guo, S., Tian, F.: CRESP: Towards optimal resource provisioning for mapreduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)
Liao, G., Datta, K., Willke, T.L.: Gunther: search-based auto-tuning of MapReduce. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 406–419. Springer, Heidelberg (2013)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: The Conference on Innovative Data Systems Research, vol. 11, pp. 261–272. ACM Press (2011)
Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: The 1st USENIX Workshop on Hot Topics in Cloud Computing, pp. 156–172. ACM Press (2009)
Wang, G., Butt, A. R., Pandey, P., Gupta, K.: A Simulation Approach to Evaluating Design Decisions in MapReduce Setups. In: The IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 1–11. IEEE Press (2009)
Verma, A., Cherkasova, L., Campbell, R.H.: Play It Again, SimMR! In: The IEEE International Conference on Cluster Computing, pp. 253–261. IEEE Press (2011)
A Dynamic Instrumentation Tool for Java. http://kenai.com/projects/btrace
Srinivas, M., Patnaik, L.M.: Genetic algorithms: a survey. Computer 27(6), 17–26 (1994)
Acknowledgments
This paper is supported by China National Natural Science Foundation under grant Nos. 61272470, 61305087, 61402425, 61440060, 41404076 and 61501412; the China Postdoctoral Science Foundation funded project under grant No. 2014M562086; the key projects of Hubei Provincial Natural Science Foundation under grant No. 2015CFA065; the Fundamental Research Funds for the Central Universities, China University of Geosciences, Wuhan under grant No. CUGL130233.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, C., Zeng, D., Yao, H., Hu, C., Yan, X., Fan, Y. (2015). MR-COF: A Genetic MapReduce Configuration Optimization Framework. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-27140-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)