Abstract
When analysing the data, the user often may want to perform the join between the input data sources. At first glance, in Map-Reduce programming model, the developer is limited only to equi-joins as they can be easily implemented using the grouping operation. However, some techniques have been developed to leverage the joins using non-equality conditions. In this paper, we propose the enhancement to cross-join based algorithms, like Strict-Even Join, by handling the equality and non-equality conditions separately.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache hadoop reference. http://hadoop.apache.org/
Easy amazon ec2 instance comparison. http://www.ec2instances.info/
Afrati, F.N., Ullman, J.D.: Optimizing joins in a Map-Reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Atta, F., Viglas, S., Niazi, S.: Sand join - a skew handling join algorithm for google’s MapReduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175, December 2011
Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2010)
Bamha, M., Hassan, A., Loulergue, F.: Handling data-skew effects in join operations using mapreduce. In: Journées nationales du GdR GPL, Paris, France, June 2014. https://hal.inria.fr/hal-00979104
Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, University of Edinburgh (2010)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Mag. Commun. ACM - 50th anniversary issue: 1958–2008 51, 107–113 (2008)
Dewitt, D., Stonebraker, M.: Map-Reduce: A major step backwards. http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
Ercegovac, V., Blanas, S.: A Comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. pp. 938–948 (2010)
Li, J., Wu, L., Zhang, C.: Optimizing theta-joins in a MapReduce environment. Int. J. Database Theory Appl. 6, 91–108 (2013)
Miner, D., Shook, A.: MapReduce Design Patterns. Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly, Beijing (2013). http://opac.inria.fr/record=b1134500, dEBSZ
Okcan, A., Riedewald, M.: Anti-combining for mapreduce. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 839–850. ACM, New York (2014). http://doi.acm.org/10.1145/2588555.2610499
Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 949–960 (2011)
Palla, K.: A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2009)
Pigul, A.: Generalized Parallel Join Algorithms and Designing Cost Models (2012)
White, T.: Hadoop: The Definitive Guide, chap. 8, 3rd edn. O’Reilly, Sebastopol (2012)
Zhang, X., Chen, L., Wang, M.: Efficient multiway theta-join processing using MapReduce. In: Proceedings of the VLDB Endowment (PVLDB). 11, vol. 5, pp. 1184–1195 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Penar, M., Wilczek, A. (2016). The Design of the Efficient Theta-Join in Map-Reduce Environment. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)