MR-COF: A Genetic MapReduce Configuration Optimization Framework

Liu, Chao; Zeng, Deze; Yao, Hong; Hu, Chengyu; Yan, Xuesong; Fan, Yuanyuan

doi:10.1007/978-3-319-27140-8_24

MR-COF: A Genetic MapReduce Configuration Optimization Framework

Chao Liu^17,18,
Deze Zeng¹⁷,
Hong Yao¹⁷,
Chengyu Hu¹⁷,
Xuesong Yan¹⁷ &
…
Yuanyuan Fan¹⁷

Conference paper
First Online: 16 December 2015

1553 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9531))

Abstract

Hadoop/MapReduce has emerged as a de facto programming framework to explore cloud-computing resources. Hadoop has many configuration parameters, some of which are crucial to the performance of MapReduce jobs. In practice, these parameters are usually set to default or inappropriate values. This severely limits system performance (e.g., execution time). Therefore, it is essential but also challenging to investigate how to automatically tune these parameters to optimize MapReduce job performance. In this paper, we propose an automatic MapReduce configuration optimization framework named as MR-COF. By monitoring and analyzing the runtime behavior, the framework adopts a cost-based performance prediction model that predicts the MapReduce job performance. In addition, we design a genetic search algorithm which iteratively tunes parameters in order to find out the best one. Testbed-based experimental results show that the average MapReduce job performance is increased by 35 % with MR-COF compared to the default configuration.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dittrich, J., Quiané-Ruiz, J.A.: Efficient big data processing in hadoop MapReduce. Proc. VLDB Endowment 5(12), 2014–2015 (2012)
Article Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: The ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM Press (2009)
Google Scholar
Liu, C., Jin, H., Jiang, W., Hai, L.: Research on performance optimization approach of data-intensive application with MapReduce. J. Wuhan Univ. Technol. 32(20), 36–41 (2010)
Google Scholar
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The Performance of MapReduce: An In-depth Study. Proc. VLDB Endowment 3(1), 472–483 (2010)
Article Google Scholar
Babu, S.: Towards Automatic Optimization of MapReduce Programs. In: 1st ACM symposium on Cloud computing, pp. 137–142. ACM Press (2010)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: The ACM SIGMOD International Conference on Management of data, pp. 1099–1110. ACM Press (2008)
Google Scholar
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with Sawzall. Sci. Program. 13(4), 277–298 (2005)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2), 1626–1629 (2009)
Article Google Scholar
Thusoo, A., Sarma, J. S., Jain, N., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-A petabyte scale data warehouse using hadoop. In: The 26th IEEE International Conference on Data Engineering, pp. 996–1005. IEEE Press (2010)
Google Scholar
Yang, H., Dasdan, A., Hsiao, R.L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: The ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM Press (2008)
Google Scholar
Jiang, D., Tung, A., Chen, G.: Map-Join-Reduce: toward scalable and efficient data analysis on large clusters. IEEE Trans. Knowl. Data Eng. 23, 1299–1311 (2011)
Article Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2010)
Google Scholar
Shi, J., Zou, J., Lu, J., Cao, Z., Li, S., Wang, C.: MRTuner: a toolkit to enable holistic optimization for MapReduce jobs. Proc. VLDB Endowment 7(13), 1–12 (2014)
Article Google Scholar
Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A. R., Fuller, N.: MRONLINE: mapreduce online performance tuning. In: The 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 165–176. ACM Press (2014)
Google Scholar
Tian, F., Chen, K.: Towards Optimal Resource Provisioning for Running Mapreduce Programs in Public Clouds. In: The IEEE International Conference on Cloud Computing, pp. 155–162. IEEE Press (2011)
Google Scholar
Zhang, Z., Cherkasova, L., Loo, B.T.: Parameterizable benchmarking framework for designing a mapreduce performance mode. Concurrency Comput. Pract. Experience 26(12), 2005–2026 (2014)
Article Google Scholar
Yigitbasi, N., Willke, T.L., Liao, G., Epema, D.: Towards machine learning-based auto-tuning of mapreduce. In: The 21st IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 11–20. IEEE Press (2013)
Google Scholar
Chen, K., Powers, J., Guo, S., Tian, F.: CRESP: Towards optimal resource provisioning for mapreduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)
Article Google Scholar
Liao, G., Datta, K., Willke, T.L.: Gunther: search-based auto-tuning of MapReduce. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 406–419. Springer, Heidelberg (2013)
Chapter Google Scholar
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: The Conference on Innovative Data Systems Research, vol. 11, pp. 261–272. ACM Press (2011)
Google Scholar
Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: The 1st USENIX Workshop on Hot Topics in Cloud Computing, pp. 156–172. ACM Press (2009)
Google Scholar
Wang, G., Butt, A. R., Pandey, P., Gupta, K.: A Simulation Approach to Evaluating Design Decisions in MapReduce Setups. In: The IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 1–11. IEEE Press (2009)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: Play It Again, SimMR! In: The IEEE International Conference on Cluster Computing, pp. 253–261. IEEE Press (2011)
Google Scholar
A Dynamic Instrumentation Tool for Java. http://kenai.com/projects/btrace
Srinivas, M., Patnaik, L.M.: Genetic algorithms: a survey. Computer 27(6), 17–26 (1994)
Article Google Scholar

Download references

Acknowledgments

This paper is supported by China National Natural Science Foundation under grant Nos. 61272470, 61305087, 61402425, 61440060, 41404076 and 61501412; the China Postdoctoral Science Foundation funded project under grant No. 2014M562086; the key projects of Hubei Provincial Natural Science Foundation under grant No. 2015CFA065; the Fundamental Research Funds for the Central Universities, China University of Geosciences, Wuhan under grant No. CUGL130233.

Author information

Authors and Affiliations

Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430074, China
Chao Liu, Deze Zeng, Hong Yao, Chengyu Hu, Xuesong Yan & Yuanyuan Fan
China Services Computing Technology and System Lab and Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, 430074, China
Chao Liu

Authors

Chao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Deze Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Chengyu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deze Zeng .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University, Changsha, China
Kenli Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Zeng, D., Yao, H., Hu, C., Yan, X., Fan, Y. (2015). MR-COF: A Genetic MapReduce Configuration Optimization Framework. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-27140-8_24
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27139-2
Online ISBN: 978-3-319-27140-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics