HIP: Information Passing for Optimizing Join-Intensive Data Processing Workloads on Hadoop

Hong, Seokyong; Anyanwu, Kemafor

doi:10.1007/978-3-642-32597-7_33

Seokyong Hong²⁰ &
Kemafor Anyanwu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7447))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

3201 Accesses

Abstract

Hadoop-based data processing platforms translate join intensive queries into multiple “jobs” (MapReduce cycles). Such multi-job workflows lead to a significant amount of data movement through the disk, network and memory fabric of a Hadoop cluster which could negatively impact performance and scalability. Consequently, techniques that minimize sizes of intermediate results will be useful in this context. In this paper, we present an information passing technique (HIP) that can minimize the size of intermediate data on Hadoop-based data processing platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Apache Hadoop, http://hadoop.apache.org
Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience. PVLDB 2(2), 1414–1425 (2009)
Google Scholar
Dittrich, J., Quiané-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah. PVLDB 3(1), 518–529 (2010)
Google Scholar
Lin, Y., Agrawal, D., Chen, C., Ooi, B.C., Wu, S.: Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework. In: ACM SIGMOD, pp. 961–972. ACM, Athens (2011)
Google Scholar
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A Comparison of Join Algorithms for Log Processing in MapReduce. In: ACM SIGMOD, pp. 975–986. ACM, Indianapolis (2010)
Google Scholar
Ives, Z.G., Taylor, N.E.: Sideways Information Passing for Push-Style Query Processing. In: 24th International Conference on ICDE, pp. 774–783. IEEE, Cancún (2008)
Google Scholar
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: ACM SIGMOD, pp. 627–640. ACM, Providence (2009)
Chapter Google Scholar
Bernstein, P.A., Chiu, D.W.: Using Semi-Joins to Solve Relational Queries. J. ACM 28(1), 25–40 (1981)
Article MathSciNet MATH Google Scholar
Avnur, R., Hellerstein, J.M.: Eddies: Continuously Adaptive Query Processing. In: ACM SIGMOD, pp. 261–272. ACM, Dallas (2000)
Chapter Google Scholar
Mumick, I.S., Pirahesh, H.: Implementation of Magic-sets in a Relational Database System. In: ACM SIGMOD, pp. 103–114. ACM, Minneapolis (1994)
Google Scholar
Apache Hive, http://hive.apache.org
Apache Pig, http://pig.apache.org

Download references

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, Raleigh, USA
Seokyong Hong & Kemafor Anyanwu

Authors

Seokyong Hong
View author publications
You can also search for this author in PubMed Google Scholar
Kemafor Anyanwu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Marriott School,, Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
Institute of Software Technology & Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A Min Tjoa
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, S., Anyanwu, K. (2012). HIP: Information Passing for Optimizing Join-Intensive Data Processing Workloads on Hadoop. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32597-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-32597-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32596-0
Online ISBN: 978-3-642-32597-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics