Skip to main content
Log in

Adaptive Combiner for MapReduce on cloud computing

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

MapReduce is a programming model to process a massive amount of data on cloud computing. MapReduce processes data in two phases and needs to transfer intermediate data among computers between phases. MapReduce allows programmers to aggregate intermediate data with a function named combiner before transferring it. By leaving programmers the choice of using a combiner, MapReduce has a risk of performance degradation because aggregating intermediate data benefits some applications but harms others. Now, MapReduce can work with our proposal named the Adaptive Combiner for MapReduce (ACMR) to automatically, smartly, and trainer for getting a better performance without any interference of programmers. In experiments on seven applications, MapReduce can utilize ACMR to get the performance comparable to the system that is optimal for an application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Dean, J. and Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th symposium on operating systems design and implementation (OSDI), pp. 137–150, Dec 2004

  2. Dede, E., Govindaraju, M., and Ramakrishnan, L.: Benchmarking MapReduce implementations for application usage scenarios. In: Proceedings of the IEEE/ACM international conference on grid computing (GRID), pp. 90–97, Sept 2011

  3. Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. J. ACM SIGMOD 40(4), 11–20 (Dec. 2011)

    Google Scholar 

  4. Mazur, E., Li, B., Diao, Y., and Shenoy, P.: Towards scalable one-pass analytics using MapReduce. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd Forum (IPDPSW), pp. 1102–1111, May 2011

  5. Li, K., Yang, L.T., Lin, X.: Advanced topics in cloud computing. J. Netw. Comput. Appl. 34(4), 1033–1034 (2011)

    Article  MathSciNet  Google Scholar 

  6. Zhou, M., Mu, Y., Susilo, W., Yan, J., Dong, L.: Privacy enhanced data outsourcing in the cloud. J. Netw. Comput. Appl. 35(4), 1367–1373 (2012)

    Article  Google Scholar 

  7. Wu, T.L., Qiu, J., and Fox, G.: MapReduce in the clouds for science. In: Proceedings of IEEE second international conference on cloud computing technology and science (CloudCom), pp. 565–572, Dec 2010

  8. Prodan, R., Sperk, M., Ostermann, S.: Evaluating high-performance computing on google app engine. IEEE Softw. 29(2), 52–58 (2012)

    Article  Google Scholar 

  9. Huang, T.C.: Program ultra-dispatcher for launching applications in a customization manner on cloud computing. J. Netw. Comput. Appl. (JNCA) 35(1), 423–446 (2012)

    Article  Google Scholar 

  10. PHP: Hypertext Preprocessor, http://www.php.net/

  11. Suzumura, T., Trent, S., Tatsubori, M., Tozawa, A. and Onodera, T.: Performance comparison of web service engines in PHP, Java and C. In: Proceedings of IEEE international conference on web services (ICWS), pp. 385–392, Sept 2008

  12. Yu, X. and Yi, C.: Design and implementation of the website based on PHP & MYSQL’. In: Proceedings of international conference on E-product E-service and E-entertainment (ICEEE), pp. 1–4, Nov 2010

  13. White, T.: Hadoop: the definitive guide. ISBN: 978-0-596-52497-4, O’Reilly Media, Yahoo! Press, June 5, 2009

  14. Duan, A.: Research and application of distributed parallel search hadoop algorithm. In: Proceedings of international conference on systems and informatics (ICSAI), pp. 2462–2465, May 2012

  15. Shvachko, K., Kuang, H., Radia, S. and Chansler, R.: The Hadoop distributed File system. In: Proceedings of 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp. 1–10, May 2010

  16. Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. J. BMC Bioinforma. 11(12), S1 (2010)

    Article  Google Scholar 

  17. Wright, G.R. and Stevens, W.R.: TCP/IP illustrated: the protocols. ISBN: 0-201-63346-9, Vol. 2: The Implementation. Addison-Wesley, 1995

  18. Yang, Y.R and Lam, S.S.: General AIMD congestion control. In: Proceedings of ICNP, pp. 187–198, Nov 2000

  19. Everette, S., Gardner, J.: Exponential smoothing: the state of the art. J. Forecast. 4(1), 1–28 (1985)

    Article  Google Scholar 

  20. Gosling, J., Joy, B., and Steele, G.L.: The Java Language Specification, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1996). ISBN:0201634511

  21. The Apache Software Foundation, http://www.apache.org/

  22. Lee, W.M.: Recommending proper API code examples for documentation purpose. In: Proceedings of 18th Asia Pacific software engineering conference (APSEC), pp. 331–338, 2011

  23. Yang, G.: The application of MapReduce in the cloud computing. In: Proceeding of 2th international symposium on intelligence information processing and trusted computing (IPTC), pp. 154–156, Oct 2011

  24. Astrachan, O.: Bubble sort: an archaeological algorithmic analysis. In: Proceedings of the 34th SIGCSE technical symposium on computer science education, pp. 1–5, 2003

  25. Inaba, M., Katoh, N., and Imai, H.: Applications of weighted voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of 10th annual ACM symposium computational geometry, pp. 332–339, June 1994

  26. Bull, R.I., Trevors, A., Malton, A.J., Godfrey, M.W.: Semantic grep: regular expressions + relational abstraction. In: Proceedings of ninth working conference on reverse, engineering (WCRE’02), pp. 267–276, Oct 2002

  27. Zhu, S., Zhiwei, X., Haibo, C., Rong, C., Weihua, Z., and Binyu, Z.: Evaluating SPLASH-2 applications using MapReduce. In: Proceedings of APPT’09, pp. 452–464, 2009

  28. He, B., Fang, W., Luo, Q., Govindaraju, N.K., and Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th international conference on parallel architectures and compilation, techniques, pp. 260–269, 2008

  29. Isard, M., Budiu, M., Yu, Y., Birrell, A. and Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of European conference on computer systems (EuroSys), pp. 59–72, 2007

  30. Smith, J.M., Chang, P.Y.T.: Optimizing the performance of a relational algebra database interface. J. ACM 18(10), 568–579 (Oct. 1975)

    Google Scholar 

  31. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., and Fox, G.: Twister: a runtime for iterative MapReduce. In: Proceedings of the first international workshop on MapReduce and its applications(HPDC’10), pp. 810–818, 2010

  32. Condie, T., Conway, N., Alvaro, P., and Hellerstien, J.M.: MapReduce online. In: Proceedings of 7th USENIX conference on networked systems design and implementation (NSDI), pp. 12–21, 2010

  33. Kambatla, K., Rapolu, N., Jagannathan, S., and Grama, A.: Asynchronous algorithms in mapreduce. In: Proceedings of IEEE CLUSTER, pp. 245–254, 2010

  34. Yu, Y., Gunda, P.K., and Isard, M.: Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of ACM symposium on operating systems principles (SOSP), pp. 247–260, 2009

  35. Jiang, D., Tung, A.K.H., Chen, G.: Map-join-reduce: towards scalable and efficient data analysis on large clusters. J. IEEE Trans. Knowl. Data Eng. 23(9), 1299–1311 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

In this paper, we introduce the achievement of Project NSC 100-2628-E-262-001-MY2 supported by National Science Council at Taiwan. We appreciate the cooperation of colleagues at Tamkang University. We thank Lunghwa University of Science and Technology for greatly approving of our cloud computing research and kindly offering us experiment devices. Besides, we thank the editor and reviewers for their valuable comments on this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tzu-Chi Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, TC., Chu, KC., Lee, WT. et al. Adaptive Combiner for MapReduce on cloud computing. Cluster Comput 17, 1231–1252 (2014). https://doi.org/10.1007/s10586-014-0362-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-014-0362-3

Keywords

Navigation