Skip to main content
Log in

A survey of queries over uncertain data

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top-\(k\) queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://movielens.umn.edu.

References

  1. Abiteboul S, Chan T, Kharlamov E, Nutt W, Senellart P (2010) Aggregate queries for discrete and continuous probabilistic xml. In: Proceedings of ICDT, pp 50–61

  2. Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects databases. In: Proceedings of IEEE ICDE, pp 376–385

  3. Aggarwal C (2008) On unifying privacy and uncertain data models. In: Proceedings of the 24th international conference on data engineering (ICDE), pp 386–395

  4. Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng (TKDE) 21(5):609–623

    Google Scholar 

  5. Agrawal P, Widom J (2009) Confidence-aware join algorithms. In: Proceedings of IEEE ICDE

  6. Andritsos P, Fuxman A, Miller R (2006) Clean answers over dirty databases: a probabilistic approach. In: Proceedings of the 22nd international conference on data engineering (ICDE), pp 30–30

  7. Antova L, Jansen T, Koch C, Olteanu D (2008) Fast and simple relational processing of uncertain data. In: Proceedings of the 24th international conference on data engineering (ICDE), pp 983–992

  8. Antova L, Koch C, Olteanu D (2009) \(10^{10^6}\) worlds and beyond: efficient representation and processing of incomplete information. VLDB J 18(5):1021–1040

    Article  Google Scholar 

  9. Aßfalg J, Kriegel H, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: Proceedings of international conference on scientific and statistical database management (SSDBM). Springer, Berlin, pp 435–443

  10. Atallah M, Qi Y (2009) Computing all skyline probabilities for uncertain data. In: Proceedings of the ACM symposium on principles of database systems (PODS), pp 279–287

  11. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream. In: Proceedings ACM symposium on principles of database systems (PODS), pp 1–16

  12. Barbará D, Garcia-Molina H, Porter D (1992) The management of probabilistic data. IEEE Trans Knowl Data Eng (TKDE) 4(5):487–502

    Article  Google Scholar 

  13. Benjelloun O, Sarma A, Halevy A, Widom J (2006) Uldbs: databases with uncertainty and lineage. In: Proceedings of international conference on very large data bases (VLDB), pp 953–964

  14. Bernecker T, Emrich T, Kriegel H, Mamoulis N, Renz M, Zufle A (2011) A novel probabilistic pruning approach to speed up similarity queries in uncertain databases. In: Proceedings of IEEE ICDE

  15. Bernecker T, Emrich T, Kriegel H, Renz M, Züfle A (2012) Probabilistic ranking in fuzzy object databases. In: Proceedings of ACM CIKM, pp 2647–2650

  16. Beskales G, Soliman M, IIyas I (2008) Efficient search for the top-k probable nearest neighbors in uncertain data. In: Proceedings of international conference on very large data bases (VLDB)

  17. Beyer K, Haas P, Reinwald B, Sismanis Y, Gemulla R (2007) On synopses for distinct-value estimation under multiset operations. In: Proceedings of ACM SIGMOD, pp 199–210

  18. Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  MATH  Google Scholar 

  19. Böhm C, Fiedler F, Oswald A, Plant C, Wackersreuther B (2009) Probabilistic skyline queries. In: Proceedings of ACM CIKM, pp 651–660

  20. Böhm C, Pryakhin A, Schubert M (2006) The gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of IEEE ICDE

  21. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 421–430

  22. Bu Y, Howe B, Balazinska M, Ernst M (2010) Haloop: efficient iterative data processing on large clusters. PVLDB 3(1–2):285–296

    Google Scholar 

  23. Burdick D, Deshpande P, Jayram T, Ramakrishnan R, Vaithyanathan S (2007) Olap over uncertain and imprecise data. VLDB J 16(1):123–144

    Article  Google Scholar 

  24. Chaudhuri S, Das G, Hristidis V, Weikum G (2006) Probabilistic information retrieval approach for ranking of database query results. ACM TODS 31(3):1134–1168

    Article  Google Scholar 

  25. Cheema M, Lin X, Wang W, Zhang W, Pei J (2009) Probabilistic reverse nearest neighbor queries on uncertain data. IEEE TKDE 22(4):550–564

    Google Scholar 

  26. Chen J, Cheng R (2007) Efficient evaluation of imprecise location-dependent queries. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 586–595

  27. Chen L, Özsu M, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD, pp 491–502

  28. Chen Y, Qin X, Liu L (2010) Uncertain distance-based range queries over uncertain moving objects. J Comput Sci Technol 25(5):982–998

    Article  MathSciNet  Google Scholar 

  29. Cheng R, Chen J, Mokbel M, Chow C (2008) Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of IEEE ICDE, pp 973–982

  30. Cheng R, Chen L, Chen J, Xie X (2009) Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of ACM EDBT, pp 672–683

  31. Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM SIGMOD, pp 551–562

  32. Cheng R, Kalashnikov D, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng (TKDE) 16(9):1112–1127

    Article  Google Scholar 

  33. Cheng R, Kalashnikov D, Prabhakar S (2007) Evaluation of probabilistic queries over imprecise data in constantly-evolving environments. Inf Syst (IS) 32(1):104–130

    Article  Google Scholar 

  34. Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J (2004) Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of VLDB, pp 876–887

  35. Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J (2006) Efficient join processing over uncertain data. In: Proceedings of ACM CIKM, pp 738–747

  36. Cheng S, Li J (2009) Sampling based (epsilon, delta)-approximate aggregation algorithm in sensor networks. In: Proceedings of IEEE ICDCS, pp 273–280

  37. Chiu S, Huang J, Huang J (2012) On processing continuous frequent k-n-match queries for dynamic data over networked data sources. Knowl Inf Syst 31(3):547–579

    Article  Google Scholar 

  38. Chu D, Deshpande A, Hellerstein J, Hong W (2006) Approximate data collection in sensor networks using probabilistic models. In: Proceedings of IEEE ICDE, pp 48–48

  39. Chung B, Lee W, Chen A (2009) Processing probabilistic spatio-temporal range queries over moving objects with uncertainty. In: Proceedings of ACM EDBT, pp 60–71

  40. Cocci R, Tran T, Diao Y, Shenoy P (2008) Efficient data interpretation and compression over rfid streams. In: Proceedings of IEEE ICDE, pp 1445–1447

  41. Condie T, Conway N, Alvaro P, Hellerstein J, Elmeleegy K, Sears R (2010) Mapreduce online. In: Proceedings of USENIX conference on networked systems design and implementation (NSDI)

  42. Considine J, Li F, Kollios G, Byers J (2004) Approximate aggregation techniques for sensor data. In: Proceedings of IEEE ICDE, pp 449–460

  43. Cormode G, Garofalakis M (2007) Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 281–292

  44. Cormode G, Garofalakis M, Muthukrishnan S, Rastogi R (2005) Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of ACM SIGMOD

  45. Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of IEEE ICDE, pp 305–316

  46. Cuzzocrea A (2011) Retrieving accurate estimates to olap queries over uncertain and imprecise multidimensional data streams. In: Scientific and statistical database management (SSDBM). Springer, Berlin, pp 575–576

  47. Dai X, Yiu M, Mamoulis N, Tao Y, Vaitis M (2005) Probabilistic spatial queries on existentially uncertain data. In: Proceedings of advances in spatial and temporal data (SSTD). Springer, Berlin

  48. Dallachiesa M, Nushi B, Mirylenka K, Palpanas T (2012) Uncertain time-series similarity: return to the basics. In: Proceedings of VLDB endowment, vol 5, pp 1662–1673

  49. Dalvi N, Suciu D (2007) The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the ACM symposium on principles of database systems (PODS). ACM, pp 293–302

  50. Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544

    Article  Google Scholar 

  51. Dalvi N, Suciu D (2007) Management of probabilistic data: foundations and challenges. In: Proceedings of the ACM symposium on principles of database systems (PODS), pp 1–12

  52. Das A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 40–51

  53. Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the conference on operating system design and implementation (OSDI), pp 137–150

  54. Deligiannakis A, Kotidis Y, Roussopoulos N (2004) Hierarchical in-network data aggregation with quality guarantees. In: Proceedings of EDBT, pp 577–578

  55. Deligiannakis A, Kotidis Y, Roussopoulos N (2006) Processing approximate aggregate queries in wireless sensor networks. Inf Syst (IS) 31(8):770–792

    Article  Google Scholar 

  56. Dellis E, Seeger B (2007) Efficient computation of reverse skyline queries. In: Proceedings of the 33rd international conference on very large data bases (VLDB), pp 291–302. VLDB endowment

  57. Deng L, Wang F, Huang B (2011) Probabilistic threshold join over distributed uncertain data. In: Proceedings of Web-Age Information Management. Springer, pp 68–80

  58. Deshpande A, Guestrin C, Madden S, Hellerstein J, Hong W (2004) Model-driven data acquisition in sensor networks. In: Proceedings of VLDB

  59. Ding X, Jin H (2010) Efficient and progressive algorithms for distributed skyline queries over uncertain data. In: Proceedings of the 28th international conference on distributed computing systems (ICDCS), pp 149–158

  60. Dittrich J, Quiané-Ruiz J, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceedings of PVLDB 3(1–2):515–529

    Google Scholar 

  61. Emrich T, Kriegel H, Kröger P, Renz M, Züfle A (2010) Boosting spatial pruning: on optimal pruning of mbrs. In: Proceedings ACM SIGMOD, pp 39–50

  62. Fagin R (1996) Combining fuzzy information from multiple systems. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 216–226

  63. Fagin R (1998) Fuzzy queries in multimedia database systems. In: Proceedings of ACM symposium on principles of database systems (PODS). ACM, pp 1–10

  64. Fan W, Geerts F, Li J, Xiong M (2011) Discovering conditional functional dependencies. IEEE Trans Knowl Data Eng (TKDE) 23(5):683–698

    Article  Google Scholar 

  65. Flajolet P, Nigel Martin G (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209

    Article  MATH  Google Scholar 

  66. Forbes A, Sousa J (2011) The gum, bayesian inference and the observation and measurement equations. Measurement 44(8):1422–1435

    Article  Google Scholar 

  67. Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. In: Proceedings of the international joint conferences on artificial intelligence (IJCAI)

  68. Fuxman A, Fazli E, Miller R (2005) Conquer: efficient management of inconsistent databases. In: Proceedings of ACM SIGMOD, pp 155–166

  69. Ganguly S, Garofalakis M, Rastogi R (2003) Processing set expressions over continuous update streams. In: Proceedings of ACM SIGMOD, pp 265–276

  70. Ge T, Zdonik S (2008) Handling uncertain data in array database systems. In: Proceedings of the 24th international conference on data engineering (ICDE), pp 1140–1149. IEEE

  71. Ge T, Zdonik S, Madden S (2009) Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of ACM SIGMOD

  72. Golab L, Özsu M (2003) Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of VLDB, pp 500–511

  73. Green T, Tannen V (2006) Models for incomplete and probabilistic information. IEEE Data Eng Bull 29(1):17–24

    Google Scholar 

  74. Guo P (2009) Fuzzy data envelopment analysis and its application to location problems. Inf Sci 179(6):820–829

    Article  MATH  Google Scholar 

  75. Gupta R, Sarawagi S (2006) Creating probabilistic databases from information extraction models. In: Proceedings of the international conference on very Large data bases (VLDB)

  76. Haas P, Swami A (1992) Sequential sampling procedures for query size estimation. ACM SIGMOD Record 21(2):341–350

    Article  Google Scholar 

  77. Hong T, Chen C, Lee Y, Wu Y (2008) Genetic-fuzzy data mining with divide-and-conquer strategy. IEEE Trans Evolut Comput 12(2):252–265

    Google Scholar 

  78. Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384

    Article  Google Scholar 

  79. Hua M, Pei J, Zhang W, Lin X (2008) Efficiently answering probabilistic threshold top-k queries on uncertain data. In: Proceedings of IEEE ICDE, pp 1403–1405

  80. Hua M, Pei J, Zhang W, Lin X (2008) Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM SIGMOD, pp 673–686

  81. Huang Y, Chen C, Lee C (2009) Continuous k-nearest neighbor query for moving objects with uncertain velocity. GeoInformatica 13:1–25

    Article  Google Scholar 

  82. Huang Y, Lee C (2010) Efficient evaluation of continuous spatio-temporal queries on moving objects with uncertain velocity. Geoinformatica 14(2):163–200

    Article  Google Scholar 

  83. Hung E, Getoor L, Subrahmanian V (2003) Pxml: a probabilistic semistructured data model and algebra. In: Proceedings of the IEEE 19th international conference on data engineering (ICDE)

  84. Ishikawa Y, Iijima Y, Yu J (2009) Spatial range querying for gaussian-based imprecise query objects. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 676–687

  85. Jampani R, Xu F, Wu M, Perez L, Jermaine C, Haas P (2008) Mcdb: a monte carlo approach to managing uncertain data. In: Proceedings of ACM SIGMOD, pp 687–700

  86. Jayram T, Kale S, Vee E (2007) Efficient aggregation algorithms for probabilistic data. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms (SODA), pp 346–355

  87. Jayram T, McGregor A, Muthukrishnan S, Vee E (2008) Estimating statistical aggregates on probabilistic data streams. ACM Trans Database Syst (TODS) 33(4):26

    Article  Google Scholar 

  88. Jeffery S, Franklin M, Garofalakis M (2008) An adaptive rfid middleware for supporting metaphysical data independence. VLDB J 17(2):265–289

    Article  Google Scholar 

  89. Jeffery S, Garofalakis M, Franklin M (2006) Adaptive cleaning for rfid data streams. In: Proceedings of the 32nd international conference on very large data bases (VLDB), pp 163–174

  90. Jensen C, Tiesyte D, Tradisauskas N (2006) The cost benchmark—comparison and evaluation of spatio-temporal indexes. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 125–140

  91. Jestes J, Li F, Yan Z, Yi K (2010) Probabilistic string similarity joins. In: Proceedings of the ACM SIGMOD, pp 327–338

  92. Jiang B, Pei J (2009) Online interval skyline queries on time series. In: Proceedings of the 25th international conference on data engineering (ICDE), pp 1036–1047. IEEE

  93. Jin C, Qian W, Sha C, Yu J, Zhou A (2003) Dynamically maintaining frequent items over a data stream. In: Proceedings of ACM CIKM, pp 287–294

  94. Jin C, Yi K, Chen L, Yu J, Lin X (2008) Sliding-window top-k queries on uncertain streams. In: Proceedings of the international conference on very large data bases (VLDB)

  95. Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th international conference on data engineering (ICDE). IEEE

  96. Khalefa M, Mokbel M, Levandoski J (2008) Skyline query processing for incomplete data. In: Proceedings of the IEEE 24th international conference on data engineering (ICDE). IEEE

  97. Kimelfeld B, Kosharovsky Y, Sagiv Y (2008) Query efficiency in probabilistic xml models. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 701–714

  98. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press

  99. Kriegel H, Kunath P, Pfeifle M, Renz M (2006) Probabilistic similarity join on uncertain data. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 295–309

  100. Kriegel H, Kunath P, Renz M (2007) Probabilistic nearest-neighbor query on uncertain objects. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 337–348

  101. Lakshmanan L, Leone N, Ross R, Subrahmanian V (1997) Probview: a flexible probabilistic database system. ACM Trans Database Syst (TODS) 22(3):419–469

    Article  Google Scholar 

  102. Larson P, Lehner W, Zhou J, Zabback P (2007) Cardinality estimation using sample views with quality assurance. In: Proceedings of ACM SIGMOD, pp 175–186

  103. Lee M, Hsu W, Jensen C, Cui B, Teo K (2003) Supporting frequent updates in r-trees: a bottom-up approach. In: Proceedings of VLDB, pp 608–619

  104. Li F, Yi K, Jestes J (2009) Ranking distributed probabilistic data. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 361–374

  105. Li J, Liu C, Zhou R, Wang W (2011) Top-k keyword search over probabilistic xml data. In: Proceedings of the 27th international conference on data engineering (ICDE), pp 673–684

  106. Li J, Saha B, Deshpande A (2009) A unified approach to ranking in probabilistic data. In: Proceedings of the 31st international conference on very large data bases (VLDB)

  107. Lian X, Chen L (2008) Monochromatic and bichromatic reverse skyline search over uncertain data. In: Proceedings of ACM SIGMOD, pp 213–226

  108. Lian X, Chen L (2008) Probabilistic group nearest neighbor queries in uncertain data. IEEE Trans Knowl Data Eng (TKDE) 20(6):809–824

    Article  Google Scholar 

  109. Lian X, Chen L (2008) Probabilistic ranked queries in uncertain data. In: Proceedings of the ACM EDBT, pp 511–522

  110. Lian X, Chen L (2009) Efficient join processing on uncertain data streams. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), pp 857–866

  111. Lian X, Chen L (2009) Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDBJ 18(3):787–808

    Article  Google Scholar 

  112. Lian X, Chen L (2009) Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the international conference on database systems for advanced applications (DASFAA). Springer, Berlin, pp 35–50

  113. Lian X, Chen L (2009) Top-k dominating queries in uncertain data. In: Proceedings of the ACM EDBT, pp 660–671

  114. Lian X, Chen L (2010) Set similarity join on probabilistic data. Proc VLDB 3(1–2):650–659

    Google Scholar 

  115. Lian X, Chen L (2011) Efficient query answering in probabilistic rdf graphs. In: Proceedings of the ACM SIGMOD

  116. Lin B, Su J (2005) Handling frequent updates of moving objects. In: Proceedings of the 14th ACM international conference on information and knowledge management (CIKM), pp 493–500

  117. Lin X, Lu H, Xu J, Yu J (2004) Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proceedings of IEEE ICDE, pp 362–373

  118. Lin X, Zhang Y, Zhang W, Cheema M (2011) Stochastic skyline operator. In: Proceedings of the 27th international conference on data engineering (ICDE), pp 721–732. IEEE

  119. Liu X, Ye M, Xu J, Tian Y, Lee W (2010) k-selection query over uncertain data. In: Proceedings of DASFAA. Springer, Berlin, pp 444–459

  120. Ljosa V, Singh A (2007) Apla: indexing arbitrary probability distributions. In: Proceedings of IEEE ICDE, pp 946–955

  121. Ljosa V, Singh A (2008) Top-k spatial joins of probabilistic objects. In: Proceedings of IEEE ICDE

  122. Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301

    Article  Google Scholar 

  123. Madden S, Franklin M, Hellerstein J, Hong W (2003) The design of an acquisitional query processor for sensor networks. In: Proceedings of ACM SIGMOD, pp 491–502

  124. Murthy R, Ikeda R, Widom J (2011) Making aggregation work in uncertain and probabilistic databases. IEEE Trans Knowl Data Eng (TKDE) 23(8):1261–1273

    Google Scholar 

  125. Nierman A, Jagadish H (2002) Protdb: probabilistic data in xml. In: Proceedings of the 28th international conference on very large data bases (VLDB), pp 646–657

  126. Olteanu D, Huang J, Koch C (2009) Sprout: Lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of IEEE ICDE, pp 640–651

  127. Pei J, Hua M, Tao Y, Lin X (2008) Query answering techniques on uncertain and probabilistic data: tutorial summary. In: Proceedings of ACM SIGMOD, pp 1357–1364

  128. Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd international conference on very large data bases (VLDB), pp 15–26

  129. Peng L, Diao Y, Liu A (2011) Optimizing probabilistic query processing on continuous uncertain data. In: Proceedings of the international conference on very large data bases (VLDB)

  130. Perez L, Arumugam S, Jermaine C (2010) Evaluation of probabilistic threshold queries in mcdb. In: Proceedings of the ACM SIGMOD international conference on management of data

  131. Potamias M, Bonchi F, Gionis A, Kollios G (2010) K-nearest neighbors in uncertain graphs. In: Proceedings of VLDB, pp 997–1008

  132. Qi Y, Jain R, Singh S, Prabhakar S (2010) Threshold query optimization for uncertain data. In: Proceedings of ACM SIGMOD

  133. Ré C, Dalvi N, Suciu D (2007) Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 886–895. IEEE

  134. Ré C, Letchner J, Balazinksa M, Suciu D (2008) Event queries on correlated probabilistic streams. In: Proceedings of ACM SIGMOD, pp 715–728

  135. Ré C, Suciu D (2007) Efficient evaluation of having queries on a probabilistic database. In: Proceedings of Database programming languages (DBPL). Springer, Berlin, pp 186–200

  136. Ross R, Subrahmanian V, Grant J (2005) Aggregate operators in probabilistic data. J ACM (JACM) 52(1):54–101

    Article  MathSciNet  MATH  Google Scholar 

  137. Sarma A, Benjelloun O, Halevy A, Widom J (2006) Working models for uncertain data. In: Proceedings of the 22nd international conference on data engineering (ICDE)

  138. Sarma A, Theobald M, Widom J (2008) Exploiting lineage for confidence computation in uncertain and probabilistic data. In: Proceedings of IEEE ICDE

  139. Sen P, Deshpande A (2007) Representing and querying correlated tuples in probabilistic data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 596–605

  140. Sen P, Deshpande A, Getoor L (2009) Prdb:managing and exploiting rich correlations in probabilistic databases. VLDB J 18(5):1065–1090

    Article  Google Scholar 

  141. Senellart P, Abiteboul S (2007) On the complexity of managing probabilistic xml data. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 283–292

  142. Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S (2007) Indexing uncertain categorical data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 616–625

  143. Singh S, Mayfield C, Shah R, Prabhakar S, Hambrusch S, Neville J, Cheng R (2008) Database support for probabilistic attributes and tuples. In: Proceedings of IEEE ICDE, pp 1053–1061

  144. Soliman M, Ilyas I, Ben-David S (2010) Supporting ranking queries on uncertain and incomplete data. VLDB J 19(4):477–501

    Article  Google Scholar 

  145. Soliman M, Ilyas I, Chang K (2007) Urank: formulation and efficient evaluation of top-k queries in uncertain databases. In: Proceedings of ACM SIGMOD, pp 1082–1084

  146. Soliman M, Ilyas I, Chang K (2008) Probabilistic top-k and ranking-aggregate queries. ACM Trans Database Syst (TODS) 33(3):1–54

    Article  Google Scholar 

  147. Soliman M, Ilyas I, Chen-Chuan Chang K (2007) Top-k query processing in uncertain data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 896–905. IEEE

  148. Sun C, Chan C, Goenka A (2007) Multiway slca-based keyword search in xml data. In: Proceedings of the 16th international conference on World Wide Web (WWW), pp 1043–1052

  149. Sun J, Papadias D, Tao Y, Liu B (2004) Querying about the past, the present, and the future in spatio-temporal data. In: Proceedings of IEEE ICDE, pp 202–213

  150. Tang M, Li F, Phillips J, Jestes J (2012) Efficient threshold monitoring for distributed probabilistic data. In: Proceedings of the IEEE ICDE

  151. Tao Y, Cheng R, Xiao X, Ngai W, Kao B, Prabhakar S (2005) Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of VLDB, pp 922–933

  152. Tao Y, Papadias D (2006) Maintaining sliding window skylines on data streams. IEEE transactions on knowledge and data engineering (TKDE), pp 377–391

  153. Tao Y, Papadias D, Shen Q (2002) Continuous nearest neighbor search. In: Proceedings of the 28th international conference on very large data bases (VLDB), pp 287–298

  154. Tao Y, Papadias D, Zhai J, Li Q (2005) Venn sampling: a novel prediction technique for moving objects. In: Proceedings of international conference on data engineering (ICDE), pp 680–691

  155. Tao Y, Sun J, Papadias D (2003) Selectivity estimation for predictive spatio-temporal queries. In: Proceedings of the 19th international conference on data engineering (ICDE)

  156. Tao Y, Xiao X, Cheng R (2007) Range search on multidimensional uncertain data. ACM Trans Database Syst (TODS) 32(3):15–63

    Article  Google Scholar 

  157. Tian Y, Patel J, Nair V, Martini S, Kretzler M (2008) Periscope/gq: a graph querying toolkit. In: Proceedings of the 36th international conference on very large data bases (VLDB)

  158. Trajcevski G, Tamassia R, Cruz I, Scheuermann P, Hartglass D, Zamierowski C (2011) Ranking continuous nearest neighbors for uncertain trajectories. VLDB J 20(5):767–791

    Article  Google Scholar 

  159. Trajcevski G, Tamassia R, Ding H, Scheuermann P, Cruz I (2009) Continuous probabilistic nearest-neighbor queries for uncertain trajectories. In: Proceedings of EDBT, pp 874–885

  160. Trajcevski G, Wolfson O, Hinrichs K, Chamberlain S (2004) Managing uncertainty in moving objects databases. ACM Trans Database Syst (TODS) 29(3):463–507

    Article  Google Scholar 

  161. Tran T, McGregor A, Diao Y, Peng L, Liu A (2010) Conditioning and aggregating uncertain data streams: going beyond expectations. In: Proceedings of VLDB, pp 1302–1313

  162. Tran T, Peng L, Diao Y, McGregor A, Liu A (2012) CLARO: modeling and processing uncertain data streams. VLDB J 21(5):651–676

    Google Scholar 

  163. Tran T, Peng L, Li B, Diao Y, Liu A (2010) PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of ACM SIGMOD, pp 157–168

  164. Tran T, Sutton C, Cocci R, Nie Y, Diao Y, Shenoy P (2009) Probabilistic inference over rfid streams in mobile environments. In: Proceedings of IEEE ICDE, pp 1096–1107

  165. Ukkonen E (1992) Approximate string-matching with q-grams and maximal matches. Theor Comput Sci 92(1):191–211

    Article  MathSciNet  MATH  Google Scholar 

  166. Wang D, Michelakis E, Garofalakis M, Hellerstein J (2008) Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of VLDB, pp 340–351

  167. Widom J (2005) Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial CIDR, pp 262–276

  168. Xiong X, Aref W (2006) R-trees with update memos. In: Proceedings of the 22nd international conference on data engineering (ICDE), pp 22–22

  169. Xu C, Wang Y, Lin S, Gu Y, Qiao J (2011) Efficient fuzzy top-k query processing over uncertain objects. In: Proceedings of database and expert systems applications (DEXA), pp 167–182

  170. Yang H, Dasdan A, Hsiao R, Parker D (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of ACM SIGMOD, pp 1029–1040

  171. Yang Y, Wang Y (2011) Towards estimating expected sizes of probabilistic skylines. Sci China Inf Sci 54(12):2554–2564

    Google Scholar 

  172. Yeh M, Wu K, Yu P, Chen M (2009) Proud: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of ACM EDBT, pp 684–695

  173. Yi K, Li F, Kollios G, Srivastava D (2008) Efficient processing of top-k queries in uncertain databases with x-relations. IEEE TKDE 20(12):1669–1682

    Google Scholar 

  174. Zhang W, Lin X, Pei J, Zhang Y (2008) Managing uncertain data: probabilistic approaches. In: Proceedings of international conference on web-age information management (WAIM), pp 405–412

  175. Zhang W, Lin X, Zhang Y, Wang W, Yu J (2009) Probabilistic skyline operator over sliding windows. In: Proceedings of international conference on data engineering (ICDE), pp 1060–1071

  176. Zhang X, Chen K, Shou L, Chen G, Gao Y, Tan K (2012) Efficient processing of probabilistic set-containment queries on uncertain set-valued data. Inf Sci 196:97–117

    Google Scholar 

  177. Zhang X, Chomicki J (2008) On the semantics and evaluation of top-k queries in probabilistic database. In: Proceedings of the DBRank

  178. Zhang Y, Lin X, Zhu G, Zhang W, Lin Q (2010) Efficient rank based knn query processing over uncertain data. In: Proceedings of international conference on data engineering (ICDE), pp 28–39

  179. Zheng K, Trajcevski G, Zhou X, Scheuermann P (2011) Probabilistic range queries for uncertain trajectories on road networks. In: Proceedings of ACM EDBT, pp 283–294

  180. Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77

    Article  MathSciNet  Google Scholar 

  181. Zhou B, Pei J (2012) Aggregate keyword search on large relational databases. Knowl Inf Syst 30(2):283–318

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2011CB302601), the National High Technology Research and Development 863 Program of China (Grant No. 2013AA01A213), the National Natural Science Foundation of China (Grant No. 60873215), the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (Grant No. S2010J5050), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20124307110015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Li, X., Li, X. et al. A survey of queries over uncertain data. Knowl Inf Syst 37, 485–530 (2013). https://doi.org/10.1007/s10115-013-0638-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0638-6

Keywords

Navigation