Skip to main content

Flexible and Adaptive Stream Join Algorithm

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9932))

Included in the following conference series:

Abstract

Flexibility and self-adaptivity are important to real-time join processing in a parallel shared-nothing environment. Join-Matrix is a high-performance model on distributed stream joins and supports arbitrary join predicates. It can handle data skew perfectly since it randomly routes tuples to cells with each steam corresponding to one side of the matrix. Designing of the partitioning scheme of the matrix is a determining factor to maximize system throughputs under the premise of economizing computing resources. In this paper, we propose a novel flexible and adaptive scheme partitioning algorithm for stream join operator, which ensures high throughput but with economical resource usages by allocating resources on demand. Specifically, a lightweight scheme generator, which requires the sample of each stream volume and processing resource quota of each physical machine, generates a join scheme; then a migration plan generator decides how to migrate data among machines under the consideration of minimizing migration cost while ensuring correctness. Extensive experiments are done on different kind of join workloads and show high competence comparing with baseline systems on benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Storm. http://storm.apache.org/

  2. The TPC-H Benchmark. http://www.tpc.org/tpch

  3. Nasir, M.A.U., De Francisci Morales, G., et al.: The power of both choices: practical load balancing for distributed stream processing engines. In: ICDE, pp. 137–148 (2015)

    Google Scholar 

  4. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Elseidy, M., Elguindy, A., Vitorovic, A., Koch, C.: Scalable and adaptive online joins. In: VLDB, pp. 441–452 (2014)

    Google Scholar 

  6. Epstein, R.S., Stonebraker, M., Wong, E.: Distributed query processing in a relational data base system. In: SIGMOD, pp. 169–180 (1978)

    Google Scholar 

  7. Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014)

    Article  Google Scholar 

  8. Huebsch, R., Garofalakis, M., Hellerstein, J., Stoica, I.: Advanced join strategies for large-scale distributed computation. In: VLDB, pp. 1484–1495 (2014)

    Google Scholar 

  9. Kwon, Y., Balazinska, M., et al.: Skewtune: mitigating skew in mapreduce applications. In: SIGMOD, pp. 25–36 (2012)

    Google Scholar 

  10. Lin, Q., Ooi, B.C., Wang, Z., Yu, C.: Scalable distributed stream join processing. In: SIGMOD, pp. 811–825 (2015)

    Google Scholar 

  11. Liu, B., Zhu, Y., Jbantova, M., et al.: A dynamically adaptive distributed system for processing complex continuous queries. In: VLDB, pp. 1338–1341 (2005)

    Google Scholar 

  12. Nasir, M.A.U., Serafini, M., et al.: When two choices are not enough: balancing at scale in distributed stream processing. In: ICDE (2016)

    Google Scholar 

  13. Okcan, A., Riedewald, M.: Processing theta-joins using mapreduce. In: SIGMOD, pp. 949–960 (2011)

    Google Scholar 

  14. Stamos, J.W., Young, H.C.: A symmetric and replicate algorithm for distributed joins. IEEE Trans. Parallel Distrib. Syst. 4(12), 1345–1354 (1993)

    Article  Google Scholar 

  15. Ufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in mapreduce based on scalable cardinality estimates. In: ICDE, pp. 522–533 (2012)

    Google Scholar 

  16. Vitorovic, A., ElSeidy, M., Koch, C.: Load balancing and skew resilience for parallel joins. In: ICDE (2016)

    Google Scholar 

  17. Xing, Y., Hwang, J., Cetintemel, U., Zdonik, S.: Providing resiliency to load variations in distributed stream processing. In: VLDB, pp. 775–786 (2006)

    Google Scholar 

  18. Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: SIGMOD, pp. 1043–1052 (2008)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by National High Technology Research and Development Program of China (863 Project) No. 2015AA015307, National Science Foundation of China under grant (No. 61232002 and NO. 61332006), and National Science Foundation of Shanghai (No. 14ZR1412600). The corresponding author is Rong Zhang.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fang, J., Wang, X., Zhang, R., Zhou, A. (2016). Flexible and Adaptive Stream Join Algorithm. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45817-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45816-8

  • Online ISBN: 978-3-319-45817-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics