Abstract
Large amount of uncertain data is collected by many emerging applications which contain multiple sources in a distributed manner. Previous efforts on querying uncertain data in distributed environment have only focus on ranking and skyline, join queries have not been addressed in earlier work despite their importance in databases. In this paper, we address distributed probabilistic threshold join query, which retrieves results satisfying the join condition with combining probabilities that meet the threshold requirement from distributed sites. We propose a new kind of bloom filters called Probability Bloom Filters (PBF) to represent set with probabilistic attribute and design a PBF based Bloomjoin algorithm for executing distributed probabilistic threshold join query with communication efficiency. Furthermore, we provide theoretical analysis of the network cost of our algorithm and demonstrate it by simulation. The experiment results show that our algorithm can save network cost efficiently by comparing to original Bloomjoin algorithm in most scenarios.
This work was supported by the National Natural Science Foundation of China (NSFC) under grant No. 61001070. We would like to thank anonymous reviewers for the insightful comments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)
Deng, K., Zhou, X., Shen, H.T.: Multi-source skyline query processing in road networks. In: ICDE (2007)
Li., F., Yi., K., Jestes, J.: Ranking Distributed Probabilistic Data. In: SIGMOD (2009)
Ye, M., Liu, X., Lee, W., Lee, D.: Probabilistic Top-k Query Processing in Distributed Sensor Networks. In: ICDE (2010)
Ding, X., Jin, H.: Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data. In: ICDCS (2010)
Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM TOIS 14(1) (1997)
Perez, L., Arumugam, S., Jermaine, C.: Evaluation of Probabilistic Threshold Queries in MCDB. In: SIGMOD (2010)
Yang, S., Zhang, W., Zhang, Y., Lin, X.: Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 51–62. Springer, Heidelberg (2009)
Agrawal, P., Widom, J.: Confidence-aware join algorithms. In: ICDE (2009)
Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)
Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for distributed queries. In: VLDB (1986)
Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Ramesh, S., Papapetrou, O., Siberski, W.: Optimizing distributed joins with bloom filters. In: Parashar, M., Aggarwal, S.K. (eds.) ICDCIT 2008. LNCS, vol. 5375, pp. 145–156. Springer, Heidelberg (2008)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. SIGCOMM Comput. Commun. Rev. 28(4), 254–265 (1998)
Michael, L., Nejdl, W., Papapetrou, O., Siberski, W.: Improving distributed join efficiency with extended bloom filter operations. In: AINA (2007)
Papapetrou, O., Siberski, W., Nejdl, W.: Cardinality estimation and dynamic length adaptation for Bloom filters. Distrib Parallel Databases 28, 119–156 (2010)
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: CIKM (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deng, L., Wang, F., Huang, B. (2011). Probabilistic Threshold Join over Distributed Uncertain Data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23535-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)