Skip to main content

Abstract

When analysing the data, the user often may want to perform the join between the input data sources. At first glance, in Map-Reduce programming model, the developer is limited only to equi-joins as they can be easily implemented using the grouping operation. However, some techniques have been developed to leverage the joins using non-equality conditions. In this paper, we propose the enhancement to cross-join based algorithms, like Strict-Even Join, by handling the equality and non-equality conditions separately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache hadoop reference. http://hadoop.apache.org/

  2. Easy amazon ec2 instance comparison. http://www.ec2instances.info/

  3. Afrati, F.N., Ullman, J.D.: Optimizing joins in a Map-Reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)

    Google Scholar 

  4. Atta, F., Viglas, S., Niazi, S.: Sand join - a skew handling join algorithm for google’s MapReduce framework. In: 2011 IEEE 14th International Multitopic Conference (INMIC), pp. 170–175, December 2011

    Google Scholar 

  5. Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2010)

    Google Scholar 

  6. Bamha, M., Hassan, A., Loulergue, F.: Handling data-skew effects in join operations using mapreduce. In: Journées nationales du GdR GPL, Paris, France, June 2014. https://hal.inria.fr/hal-00979104

  7. Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, University of Edinburgh (2010)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Mag. Commun. ACM - 50th anniversary issue: 1958–2008 51, 107–113 (2008)

    Article  Google Scholar 

  9. Dewitt, D., Stonebraker, M.: Map-Reduce: A major step backwards. http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html

  10. Ercegovac, V., Blanas, S.: A Comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)

    Google Scholar 

  11. Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. pp. 938–948 (2010)

    Google Scholar 

  12. Li, J., Wu, L., Zhang, C.: Optimizing theta-joins in a MapReduce environment. Int. J. Database Theory Appl. 6, 91–108 (2013)

    Google Scholar 

  13. Miner, D., Shook, A.: MapReduce Design Patterns. Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly, Beijing (2013). http://opac.inria.fr/record=b1134500, dEBSZ

    Google Scholar 

  14. Okcan, A., Riedewald, M.: Anti-combining for mapreduce. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 839–850. ACM, New York (2014). http://doi.acm.org/10.1145/2588555.2610499

  15. Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 949–960 (2011)

    Google Scholar 

  16. Palla, K.: A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework. Master’s thesis, University of Edinburgh (2009)

    Google Scholar 

  17. Pigul, A.: Generalized Parallel Join Algorithms and Designing Cost Models (2012)

    Google Scholar 

  18. White, T.: Hadoop: The Definitive Guide, chap. 8, 3rd edn. O’Reilly, Sebastopol (2012)

    Google Scholar 

  19. Zhang, X., Chen, L., Wang, M.: Efficient multiway theta-join processing using MapReduce. In: Proceedings of the VLDB Endowment (PVLDB). 11, vol. 5, pp. 1184–1195 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Maciej Penar or Artur Wilczek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Penar, M., Wilczek, A. (2016). The Design of the Efficient Theta-Join in Map-Reduce Environment. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics