The Design of the Efficient Theta-Join in Map-Reduce Environment

Penar, Maciej; Wilczek, Artur

doi:10.1007/978-3-319-34099-9_15

Maciej Penar¹⁵ &
Artur Wilczek¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Included in the following conference series:

1181 Accesses
1 Citations

Abstract

When analysing the data, the user often may want to perform the join between the input data sources. At first glance, in Map-Reduce programming model, the developer is limited only to equi-joins as they can be easily implemented using the grouping operation. However, some techniques have been developed to leverage the joins using non-equality conditions. In this paper, we propose the enhancement to cross-join based algorithms, like Strict-Even Join, by handling the equality and non-equality conditions separately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache hadoop reference. http://hadoop.apache.org/
Easy amazon ec2 instance comparison. http://www.ec2instances.info/
Afrati, F.N., Ullman, J.D.: Optimizing joins in a Map-Reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Google Scholar
Atta, F., Viglas, S., Niazi, S.: Sand join - a skew handling join algorithm for google’s MapReduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175, December 2011
Google Scholar
Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2010)
Google Scholar
Bamha, M., Hassan, A., Loulergue, F.: Handling data-skew effects in join operations using mapreduce. In: Journées nationales du GdR GPL, Paris, France, June 2014. https://hal.inria.fr/hal-00979104
Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, University of Edinburgh (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Mag. Commun. ACM - 50th anniversary issue: 1958–2008 51, 107–113 (2008)
Article Google Scholar
Dewitt, D., Stonebraker, M.: Map-Reduce: A major step backwards. http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
Ercegovac, V., Blanas, S.: A Comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)
Google Scholar
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. pp. 938–948 (2010)
Google Scholar
Li, J., Wu, L., Zhang, C.: Optimizing theta-joins in a MapReduce environment. Int. J. Database Theory Appl. 6, 91–108 (2013)
Google Scholar
Miner, D., Shook, A.: MapReduce Design Patterns. Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly, Beijing (2013). http://opac.inria.fr/record=b1134500, dEBSZ
Google Scholar
Okcan, A., Riedewald, M.: Anti-combining for mapreduce. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 839–850. ACM, New York (2014). http://doi.acm.org/10.1145/2588555.2610499
Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 949–960 (2011)
Google Scholar
Palla, K.: A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2009)
Google Scholar
Pigul, A.: Generalized Parallel Join Algorithms and Designing Cost Models (2012)
Google Scholar
White, T.: Hadoop: The Definitive Guide, chap. 8, 3rd edn. O’Reilly, Sebastopol (2012)
Google Scholar
Zhang, X., Chen, L., Wang, M.: Efficient multiway theta-join processing using MapReduce. In: Proceedings of the VLDB Endowment (PVLDB). 11, vol. 5, pp. 1184–1195 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Management, Wroclaw University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Maciej Penar & Artur Wilczek

Authors

Maciej Penar
View author publications
You can also search for this author in PubMed Google Scholar
Artur Wilczek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Maciej Penar or Artur Wilczek .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Penar, M., Wilczek, A. (2016). The Design of the Efficient Theta-Join in Map-Reduce Environment. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-34099-9_15
Published: 28 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics