Abstract
This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the “best available” objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k·N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abiteboul, S., Kanellakis, P., Grahne, G.: On the Representation and Querying of Sets of Possible Worlds. In: Proceedings of SIGMOD 1987 (1987)
Beskales, G., Soliman, M.A., Ilyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases. In: Proceedings of VLDB 2008 (2008)
Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data. TKDE 99(1)
Cheng, R., Chen 0002, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of EDBT 2009 (2009)
Cheng, R., Chen, J., Mokbel, M.F., Chow, C.-Y.: Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data. In: Proceedings of ICDE 2008 (2008)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying Imprecise Data in Moving Object Environments. TKDE 16(9)
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE 2009 (2009)
Dalvi, N., Suciu, D.: Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of VLDB 2004 (2004)
Fuhr, N., Rölleke, T.: A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM Transaction on Information System 15(1)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of SIGMOD 2008 (2008)
Jin, C., Yi, K., Chen, L., Yu, J.X., Lin, X.: Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment 1(1)
Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic Nearest-Neighbor Query on Uncertain Objects. In: Proceedings of DSFAA 2007 (2007)
Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: ProbView: a Flexible Probabilistic Database System. ACM Transaction on Database System 22(3)
Mohamed, I.F.I., Soliman, A., Chang, K.C.-C.: Top-k Query Processing in Uncertain Databases. In: Proceedings of ICDE 2007 (2007)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proceedings of VLDB 2007 (2007)
Prithviraj, S., Deshpande, A.: Representing and Querying Correlated Tuples in Probabilistic Databases. In: Proceedings of ICDE 2007 (2007)
Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working Models for Uncertain Data. In: Proceedings of ICDE 2006 (2006)
Zhang, W., Lin, X., Zhang, Y., Wang, W., Yu, J.X.: Probabilistic Skyline Operator over Sliding Windows. In: Proceedings of ICDE 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, X., Ye, M., Xu, J., Tian, Y., Lee, WC. (2010). k-Selection Query over Uncertain Data. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 5981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12026-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-12026-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12025-1
Online ISBN: 978-3-642-12026-8
eBook Packages: Computer ScienceComputer Science (R0)