Abstract
Top-K aggregate query, which ranks groups of tuples by their aggregate values and returns the K groups with the highest aggregates, is a crucial requirement in many domains such as information extraction, data integration, and sensor data processing. In this paper, we formulate the top-K aggregate queries when the tuple scores are presented as continuous probability distributions. Algorithms for top-K aggregate queries are presented. To further improve the performance, we develop pruning techniques and adaptive strategy that avoid computing the exact aggregate values of some groups that are guaranteed not to be in top-K. Our experimental study shows the efficiency of our techniques over several datasets with continuous attribute uncertainty.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: VLDB (2006)
Cheng, R., Kalahnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB Journal 16(4) (2007)
Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: On score distribution and typical answeres. In: SIGMOD (2009)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: A probabilistic threshold approach. In: SIGMOD (2008)
Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. TKDE (2011)
Li, J., Deshpande, A.: Ranking continuous probabilistic datasets. In: VLDB (2010)
Lian, X., Chen, L.: Probabilistic inverse ranking queries in uncertain databases. The VLDB Journal (2011)
Lyness, J.N.: Notes on the adaptive simpson quadrature routine. Journal of ACM (1969)
Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)
Soliman, M.A., Ilyas, I.F.: Probabilistic top-k and ranking-aggregate queries. TODS (2008)
Soliman, M.A., Ilyas, I.F.: Ranking with uncertain scores. In: ICDE (2009)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: ICDE (2007)
Wang, C., Yuan, L.Y., You, H.-H., Zaiane, O.R.: On pruning for top-k ranking in uncertain databases. In: VLDB (2011)
Lian, X., Chen, L.: Probabilisitc ranked queries in uncertain databases. In: EDBT (2008)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases with x-relations. TKDE (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Feng, L., Zhang, J. (2013). Top-K Aggregate Queries on Continuous Probabilistic Datasets. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)