Skip to main content
Log in

Continuous ranking on uncertain streams

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time-complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal C C. Managing and Mining Uncertain Data. Springer, 2009

  2. Antova L, Koch C, Olteanu D. From complete to incomplete information and back. In: Proceedings of ACM SIGMOD. 2007, 713–724

  3. Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. The VLDB Journal, 2007, 16(4): 523–544

    Article  Google Scholar 

  4. Agrawal P, Benjelloun O, Sarma A, Hayworth C, Nabar S, Sugihara T, Widom J. Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB. 2006, 1151–1154

  5. Soliman M, Ilyas I, Chen-Chuan Chang K. Top-k query processing in uncertain databases. In: ICDE. 2007, 896–905

  6. Benjelloun O, Sarma A, Halevy A, Widom J. ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB. 2006, 953–964

  7. Jiang L X. Learning random forests for ranking. Frontiers of Computer Science in China. 2011, 5(1): 79–86

    Article  MathSciNet  Google Scholar 

  8. Geng X B, Cheng X Q. Learning multiple metrics for ranking. Frontiers of Computer Science in China. 2011, 5(3): 259–267

    Article  MathSciNet  Google Scholar 

  9. Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM SIGMOD. 2008, 673–686

  10. Zhang X, Chomicki J. On the semantics and evaluation of top-k queries in probabilistic databases. In: Proceedings of DBRank. 2008, 556–563

  11. Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE. 2009, 305–316

  12. Ge T, Zdonik S, Madden S. Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of ACM SIGMOD. 2009, 375–388

  13. Yan D, Ng W. Robust ranking of uncertain data. In: Proceedings of DASFAA. 2011, 254–268

  14. Jin C, Yi K, Chen L, Yu J, Lin X. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1(1): 301–312

    Google Scholar 

  15. Jin C, Gao M, Zhou A. Handling ER-topk query on uncertain streams. In: Proceedings of DASFAA. 2011, 326–340

  16. Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press, 1995, 67–73

  17. Dalvi N, Suciu D. Management of probabilistic data: foundations and challenges. In: Proceedings of PODS. 2007, 1–12

  18. Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of SODA. 2007, 346–355

  19. Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of ACM SIGMOD. 2007, 281–292

  20. Jin C, Zhou M, Zhou A. Computing rarity on uncertain data. SCIENCE CHINA Information Sciences, 2011, 54(10): 2028–2039

    Article  MathSciNet  Google Scholar 

  21. Aggarwal C, Yu P. A framework for clustering uncertain data streams. In: Proceedings of ICDE. 2008, 150–159

  22. Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of ACM SIGMOD. 2008, 819–832

  23. Zhang W, Lin X, Zhang Y, Wang W, Yu J. Probabilistic skyline operator over sliding windows. In: Proceedings of ICDE. 2009, 1060–1071

  24. Tran T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of SIGMOD. 2010, 159–170

  25. Tran T, McGregor A, Diao Y, Peng L, Liu A. Conditioning and aggregating uncertain data streams: going beyond expectations. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1302–1313

    Google Scholar 

  26. Soliman M, Ilyas I. Ranking with uncertain scores. In: Proceedings of ICDE. 2009, 317–328

  27. Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment, 2009, 2(1): 502–513

    Google Scholar 

  28. Hua M, Pei J. Continuously monitoring top-k uncertain data streams: a probabilistic threshold method. Distributed and Parallel Databases, 2009, 26(1): 29–65

    Article  Google Scholar 

  29. Tang M, Li F, Phillips J M, Jestes J. Efficient threshold monitoring for distributed probabilistic data. In: Proceedings of ICDE. 2012

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheqing Jin.

Additional information

Cheqing Jin received his BS and MS in computer science from Zhejiang University, China, in 1999 and 2002, respectively. He received his PhD degree in computer science from Fudan University, China, in 2005. Currently, he is a professor of the Software Engineering Institute, East China Normal University, China. His current research interests include streaming data, uncertain databases, location-based services, and data quality.

Jingwei Zhang received his MS in computer science from Guilin University of Electronic Technology, Guilin, China, in 2004. He is currently a PhD candidate in computer science at East China Normal University, Shanghai. His research interests include web data management and analysis, massive data management, and data stream mining.

Aoying Zhou is a professor and deputy dean of the Software School at East China Normal University, Shanghai, where he also heads the Institute of Massive Computing. He is the winner of the National Science Fund for Distinguished Young Scholars supported by NSFC and also of the professorship appointment under Chang Jiang Scholars Program sponsored by the Ministry of Education. He acts as the vice-director of ACM SIGMOD China and the Database Technology Committee of China Computer Federation. He servs as the associate editor-in-chief of the China Journal of Computer, and member of the editorial boards of some prestigious academic journals, such as the VLDB Journal, www Journal, and FCS. His research interests include Web data management, data management for data-intensive computing, management of uncertain data, data mining and data streams, distributed storage, and P2P computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, C., Zhang, J. & Zhou, A. Continuous ranking on uncertain streams. Front. Comput. Sci. 6, 686–699 (2012). https://doi.org/10.1007/s11704-012-1227-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-012-1227-7

Keywords

Navigation