Abstract
MapReduce is a programming model to process a massive amount of data on cloud computing. MapReduce processes data in two phases and needs to transfer intermediate data among computers between phases. MapReduce allows programmers to aggregate intermediate data with a function named combiner before transferring it. By leaving programmers the choice of using a combiner, MapReduce has a risk of performance degradation because aggregating intermediate data benefits some applications but harms others. Now, MapReduce can work with our proposal named the Adaptive Combiner for MapReduce (ACMR) to automatically, smartly, and trainer for getting a better performance without any interference of programmers. In experiments on seven applications, MapReduce can utilize ACMR to get the performance comparable to the system that is optimal for an application.

















Similar content being viewed by others
References
Dean, J. and Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th symposium on operating systems design and implementation (OSDI), pp. 137–150, Dec 2004
Dede, E., Govindaraju, M., and Ramakrishnan, L.: Benchmarking MapReduce implementations for application usage scenarios. In: Proceedings of the IEEE/ACM international conference on grid computing (GRID), pp. 90–97, Sept 2011
Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. J. ACM SIGMOD 40(4), 11–20 (Dec. 2011)
Mazur, E., Li, B., Diao, Y., and Shenoy, P.: Towards scalable one-pass analytics using MapReduce. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd Forum (IPDPSW), pp. 1102–1111, May 2011
Li, K., Yang, L.T., Lin, X.: Advanced topics in cloud computing. J. Netw. Comput. Appl. 34(4), 1033–1034 (2011)
Zhou, M., Mu, Y., Susilo, W., Yan, J., Dong, L.: Privacy enhanced data outsourcing in the cloud. J. Netw. Comput. Appl. 35(4), 1367–1373 (2012)
Wu, T.L., Qiu, J., and Fox, G.: MapReduce in the clouds for science. In: Proceedings of IEEE second international conference on cloud computing technology and science (CloudCom), pp. 565–572, Dec 2010
Prodan, R., Sperk, M., Ostermann, S.: Evaluating high-performance computing on google app engine. IEEE Softw. 29(2), 52–58 (2012)
Huang, T.C.: Program ultra-dispatcher for launching applications in a customization manner on cloud computing. J. Netw. Comput. Appl. (JNCA) 35(1), 423–446 (2012)
PHP: Hypertext Preprocessor, http://www.php.net/
Suzumura, T., Trent, S., Tatsubori, M., Tozawa, A. and Onodera, T.: Performance comparison of web service engines in PHP, Java and C. In: Proceedings of IEEE international conference on web services (ICWS), pp. 385–392, Sept 2008
Yu, X. and Yi, C.: Design and implementation of the website based on PHP & MYSQL’. In: Proceedings of international conference on E-product E-service and E-entertainment (ICEEE), pp. 1–4, Nov 2010
White, T.: Hadoop: the definitive guide. ISBN: 978-0-596-52497-4, O’Reilly Media, Yahoo! Press, June 5, 2009
Duan, A.: Research and application of distributed parallel search hadoop algorithm. In: Proceedings of international conference on systems and informatics (ICSAI), pp. 2462–2465, May 2012
Shvachko, K., Kuang, H., Radia, S. and Chansler, R.: The Hadoop distributed File system. In: Proceedings of 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp. 1–10, May 2010
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. J. BMC Bioinforma. 11(12), S1 (2010)
Wright, G.R. and Stevens, W.R.: TCP/IP illustrated: the protocols. ISBN: 0-201-63346-9, Vol. 2: The Implementation. Addison-Wesley, 1995
Yang, Y.R and Lam, S.S.: General AIMD congestion control. In: Proceedings of ICNP, pp. 187–198, Nov 2000
Everette, S., Gardner, J.: Exponential smoothing: the state of the art. J. Forecast. 4(1), 1–28 (1985)
Gosling, J., Joy, B., and Steele, G.L.: The Java Language Specification, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1996). ISBN:0201634511
The Apache Software Foundation, http://www.apache.org/
Lee, W.M.: Recommending proper API code examples for documentation purpose. In: Proceedings of 18th Asia Pacific software engineering conference (APSEC), pp. 331–338, 2011
Yang, G.: The application of MapReduce in the cloud computing. In: Proceeding of 2th international symposium on intelligence information processing and trusted computing (IPTC), pp. 154–156, Oct 2011
Astrachan, O.: Bubble sort: an archaeological algorithmic analysis. In: Proceedings of the 34th SIGCSE technical symposium on computer science education, pp. 1–5, 2003
Inaba, M., Katoh, N., and Imai, H.: Applications of weighted voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of 10th annual ACM symposium computational geometry, pp. 332–339, June 1994
Bull, R.I., Trevors, A., Malton, A.J., Godfrey, M.W.: Semantic grep: regular expressions + relational abstraction. In: Proceedings of ninth working conference on reverse, engineering (WCRE’02), pp. 267–276, Oct 2002
Zhu, S., Zhiwei, X., Haibo, C., Rong, C., Weihua, Z., and Binyu, Z.: Evaluating SPLASH-2 applications using MapReduce. In: Proceedings of APPT’09, pp. 452–464, 2009
He, B., Fang, W., Luo, Q., Govindaraju, N.K., and Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th international conference on parallel architectures and compilation, techniques, pp. 260–269, 2008
Isard, M., Budiu, M., Yu, Y., Birrell, A. and Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of European conference on computer systems (EuroSys), pp. 59–72, 2007
Smith, J.M., Chang, P.Y.T.: Optimizing the performance of a relational algebra database interface. J. ACM 18(10), 568–579 (Oct. 1975)
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., and Fox, G.: Twister: a runtime for iterative MapReduce. In: Proceedings of the first international workshop on MapReduce and its applications(HPDC’10), pp. 810–818, 2010
Condie, T., Conway, N., Alvaro, P., and Hellerstien, J.M.: MapReduce online. In: Proceedings of 7th USENIX conference on networked systems design and implementation (NSDI), pp. 12–21, 2010
Kambatla, K., Rapolu, N., Jagannathan, S., and Grama, A.: Asynchronous algorithms in mapreduce. In: Proceedings of IEEE CLUSTER, pp. 245–254, 2010
Yu, Y., Gunda, P.K., and Isard, M.: Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of ACM symposium on operating systems principles (SOSP), pp. 247–260, 2009
Jiang, D., Tung, A.K.H., Chen, G.: Map-join-reduce: towards scalable and efficient data analysis on large clusters. J. IEEE Trans. Knowl. Data Eng. 23(9), 1299–1311 (2011)
Acknowledgments
In this paper, we introduce the achievement of Project NSC 100-2628-E-262-001-MY2 supported by National Science Council at Taiwan. We appreciate the cooperation of colleagues at Tamkang University. We thank Lunghwa University of Science and Technology for greatly approving of our cloud computing research and kindly offering us experiment devices. Besides, we thank the editor and reviewers for their valuable comments on this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, TC., Chu, KC., Lee, WT. et al. Adaptive Combiner for MapReduce on cloud computing. Cluster Comput 17, 1231–1252 (2014). https://doi.org/10.1007/s10586-014-0362-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-014-0362-3