Abstract
Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time-complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.
Similar content being viewed by others
References
Aggarwal C C. Managing and Mining Uncertain Data. Springer, 2009
Antova L, Koch C, Olteanu D. From complete to incomplete information and back. In: Proceedings of ACM SIGMOD. 2007, 713–724
Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. The VLDB Journal, 2007, 16(4): 523–544
Agrawal P, Benjelloun O, Sarma A, Hayworth C, Nabar S, Sugihara T, Widom J. Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB. 2006, 1151–1154
Soliman M, Ilyas I, Chen-Chuan Chang K. Top-k query processing in uncertain databases. In: ICDE. 2007, 896–905
Benjelloun O, Sarma A, Halevy A, Widom J. ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB. 2006, 953–964
Jiang L X. Learning random forests for ranking. Frontiers of Computer Science in China. 2011, 5(1): 79–86
Geng X B, Cheng X Q. Learning multiple metrics for ranking. Frontiers of Computer Science in China. 2011, 5(3): 259–267
Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM SIGMOD. 2008, 673–686
Zhang X, Chomicki J. On the semantics and evaluation of top-k queries in probabilistic databases. In: Proceedings of DBRank. 2008, 556–563
Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE. 2009, 305–316
Ge T, Zdonik S, Madden S. Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of ACM SIGMOD. 2009, 375–388
Yan D, Ng W. Robust ranking of uncertain data. In: Proceedings of DASFAA. 2011, 254–268
Jin C, Yi K, Chen L, Yu J, Lin X. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1(1): 301–312
Jin C, Gao M, Zhou A. Handling ER-topk query on uncertain streams. In: Proceedings of DASFAA. 2011, 326–340
Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press, 1995, 67–73
Dalvi N, Suciu D. Management of probabilistic data: foundations and challenges. In: Proceedings of PODS. 2007, 1–12
Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of SODA. 2007, 346–355
Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of ACM SIGMOD. 2007, 281–292
Jin C, Zhou M, Zhou A. Computing rarity on uncertain data. SCIENCE CHINA Information Sciences, 2011, 54(10): 2028–2039
Aggarwal C, Yu P. A framework for clustering uncertain data streams. In: Proceedings of ICDE. 2008, 150–159
Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of ACM SIGMOD. 2008, 819–832
Zhang W, Lin X, Zhang Y, Wang W, Yu J. Probabilistic skyline operator over sliding windows. In: Proceedings of ICDE. 2009, 1060–1071
Tran T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of SIGMOD. 2010, 159–170
Tran T, McGregor A, Diao Y, Peng L, Liu A. Conditioning and aggregating uncertain data streams: going beyond expectations. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1302–1313
Soliman M, Ilyas I. Ranking with uncertain scores. In: Proceedings of ICDE. 2009, 317–328
Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment, 2009, 2(1): 502–513
Hua M, Pei J. Continuously monitoring top-k uncertain data streams: a probabilistic threshold method. Distributed and Parallel Databases, 2009, 26(1): 29–65
Tang M, Li F, Phillips J M, Jestes J. Efficient threshold monitoring for distributed probabilistic data. In: Proceedings of ICDE. 2012
Author information
Authors and Affiliations
Corresponding author
Additional information
Cheqing Jin received his BS and MS in computer science from Zhejiang University, China, in 1999 and 2002, respectively. He received his PhD degree in computer science from Fudan University, China, in 2005. Currently, he is a professor of the Software Engineering Institute, East China Normal University, China. His current research interests include streaming data, uncertain databases, location-based services, and data quality.
Jingwei Zhang received his MS in computer science from Guilin University of Electronic Technology, Guilin, China, in 2004. He is currently a PhD candidate in computer science at East China Normal University, Shanghai. His research interests include web data management and analysis, massive data management, and data stream mining.
Aoying Zhou is a professor and deputy dean of the Software School at East China Normal University, Shanghai, where he also heads the Institute of Massive Computing. He is the winner of the National Science Fund for Distinguished Young Scholars supported by NSFC and also of the professorship appointment under Chang Jiang Scholars Program sponsored by the Ministry of Education. He acts as the vice-director of ACM SIGMOD China and the Database Technology Committee of China Computer Federation. He servs as the associate editor-in-chief of the China Journal of Computer, and member of the editorial boards of some prestigious academic journals, such as the VLDB Journal, www Journal, and FCS. His research interests include Web data management, data management for data-intensive computing, management of uncertain data, data mining and data streams, distributed storage, and P2P computing.
Rights and permissions
About this article
Cite this article
Jin, C., Zhang, J. & Zhou, A. Continuous ranking on uncertain streams. Front. Comput. Sci. 6, 686–699 (2012). https://doi.org/10.1007/s11704-012-1227-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-012-1227-7