HadoopM: A Message-Enabled Data Processing System on Large Clusters

Pan, Wei; Li, Zhanhuai; Suo, Bo; Wang, Zhuo

doi:10.1007/978-3-662-43984-5_18

Wei Pan^21,22,
Zhanhuai Li^21,22,
Bo Suo^21,22 &
…
Zhuo Wang^21,22

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1002 Accesses

Abstract

MapReduce as a popular platform for solving embarrassingly parallel problems has been extensively used on large commodity clusters. However constrained by embarrassingly parallel assumption, some computation patterns are not easy to express in MapReduce, and in some cases performance and efficiency can not be achieved without communication between tasks, such as iteration and map phase filtration from a holistic perspective. This paper presents HadoopM, a message-enhanced version of Hadoop MapReduce architecture that it breaks the key embarrassingly parallel assumption and can execute the MR jobs in a more efficient and elegant way. HadoopM allows user-defined message to be passed between mappers or reducers by two message passing mechanisms: lightweight and heavyweight, and asynchronous and synchronous message passing are both supported by system. HadoopM retains the scalability and fault-tolerance of Hadoop and is binary compatible with Hadoop Mapreduce. Our experimental results demonstrate the superiority of modified version over original Hadoop MapReduce on a range of algorithms. In some cases, such as PageRank and Skyline, HadoopM significantly boosts the job performance up to 50 %.

This work is sponsored by the National Basic Research Program (973 program) of China (No. 2012CB316203), the National Natural Science Foundation of China (Nos. 61033007, 61303037, 61332006), the National High Technology Research and Development Program (863 Program) of China (No. 2012AA011004).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: efficient iterative data processing on large clusters. PVLDB 3(1), 285–296 (2010)
Google Scholar
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS’06, pp. 281–288. MIT Press (2006)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M.: Mapreduce online. In: NSDI’10, pp. 21–21 (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI’04, pp. 137–150 (2004)
Google Scholar
Ding, L.-L., Xin, J., Wang, G., Huang, S.: Efficient skyline query processing of massive data based on map-reduce. Chin. J. Comput. 10, 1785–1796 (2011)
Article Google Scholar
Dittrich, J., Quian-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010)
Google Scholar
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative MapReduce. In: HPDC’10, pp. 810–818. ACM (2010)
Google Scholar
Elnikety, E., Elsayed, T., Ramadan, H.E.: iHadoop: asynchronous iterations for MapReduce. In: CloudCom’11, pp. 81–90. IEEE (2011)
Google Scholar
Floratou, A., Patel, J.M., Shekita, E.J., Tata, S.: Column-oriented storage techniques for MapReduce. PVLDB 4(7), 419–429 (2011)
Google Scholar
Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. PVLDB 4(6), 385–396 (2011)
Google Scholar
Li, B., Mazur, E., Diao, Y., McGregor, A., Shenoy, P.J.: A platform for scalable one-pass analytics using MapReduce. In: SIGMOD’11, pp. 985–996 (2011)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD’10, pp. 135–146 (2010)
Google Scholar
Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J.-S., Maeng, S.: Hama: an efficient matrix computation with the MapReduce framework. In: CloudCom’10, pp. 721–726 (2010)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33, 103–111 (1990)
Article Google Scholar
Zhang, B., Zhou, S., Guan, J.: Adapting skyline computation to the MapReduce framework: algorithms and experiments. In: Xu, J., Yu, G., Zhou, S., Unland, R. (eds.) DASFAA Workshops 2011. LNCS, vol. 6637, pp. 403–414. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. In: IPDPS Workshops’11, pp. 1112–1121. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Northwestern Polytechnical University, Xi’an, 710072, China
Wei Pan, Zhanhuai Li, Bo Suo & Zhuo Wang
Guangdong Key Laboratory of Popular High Performance Computers, Shenzhen Key Laboratory of Service Computing and Applications, Shen’zhen, 518060, China
Wei Pan, Zhanhuai Li, Bo Suo & Zhuo Wang

Authors

Wei Pan
View author publications
You can also search for this author in PubMed Google Scholar
Zhanhuai Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Suo
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Pan .

Editor information

Editors and Affiliations

Pohang University of Science and Technology (POSTECH), Pohang, Korea, Republic of (South Korea)
Wook-Shin Han
National University of Singapore, Singapore, Singapore
Mong Li Lee
Udayana University, Badung, Indonesia
Agus Muliantara
Udayana University, Badung, Indonesia
Ngurah Agus Sanjaya
Christian-Albrechts-Universität zu Kiel Institut für Informatik, Kiel, Germany
Bernhard Thalheim
Fudan University, Shanghai, China
Shuigeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pan, W., Li, Z., Suo, B., Wang, Z. (2014). HadoopM: A Message-Enabled Data Processing System on Large Clusters. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-662-43984-5_18
Published: 11 July 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics