Abstract
Distributed joins have gained importance in the past decade, mainly due to the increased number of available data sources on the Internet. In this work we extend Bloomjoin, the state of the art algorithm for distributed joins, so that it minimizes the network usage for the query execution based on database statistics. We present 4 extensions of the algorithm, and construct a query optimizer for selecting the best extension for each query. Our theoretical analysis and experimental evaluation shows significant network cost savings compared to the original Bloomjoin algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bernstein, J. P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst. 6(4), 602–625 (1981)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)
Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: Zaniolo, C. (ed.) Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, May 28-30, pp. 84–95. ACM Press, Washington (1986)
Mitzenmacher, M.: Compressed bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)
Papapetrou, O., Michael, L., Nejdl, W., Siberski, W.: Additional analysis on bloom filters. Technical report, Division of Engineering and Applied Sciences, Harvard University and L3S Research Center, Leibniz Universität Hannover (2007)
Valduriez, P., Gardarin, G.: Join and semijoin algorithms for a multiprocessor database machine. ACM Trans. Database Syst. 9(1), 133–161 (1984)
Yu, C.T., Chang, C.C.: Distributed query processing. ACM Comput. Surv. 16(4), 399–433 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramesh, S., Papapetrou, O., Siberski, W. (2008). Optimizing Distributed Joins with Bloom Filters. In: Parashar, M., Aggarwal, S.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2008. Lecture Notes in Computer Science, vol 5375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89737-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-89737-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89736-1
Online ISBN: 978-3-540-89737-8
eBook Packages: Computer ScienceComputer Science (R0)