Abstract
Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top-\(k\) queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.






Similar content being viewed by others
Notes
References
Abiteboul S, Chan T, Kharlamov E, Nutt W, Senellart P (2010) Aggregate queries for discrete and continuous probabilistic xml. In: Proceedings of ICDT, pp 50–61
Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects databases. In: Proceedings of IEEE ICDE, pp 376–385
Aggarwal C (2008) On unifying privacy and uncertain data models. In: Proceedings of the 24th international conference on data engineering (ICDE), pp 386–395
Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng (TKDE) 21(5):609–623
Agrawal P, Widom J (2009) Confidence-aware join algorithms. In: Proceedings of IEEE ICDE
Andritsos P, Fuxman A, Miller R (2006) Clean answers over dirty databases: a probabilistic approach. In: Proceedings of the 22nd international conference on data engineering (ICDE), pp 30–30
Antova L, Jansen T, Koch C, Olteanu D (2008) Fast and simple relational processing of uncertain data. In: Proceedings of the 24th international conference on data engineering (ICDE), pp 983–992
Antova L, Koch C, Olteanu D (2009) \(10^{10^6}\) worlds and beyond: efficient representation and processing of incomplete information. VLDB J 18(5):1021–1040
Aßfalg J, Kriegel H, Kröger P, Renz M (2009) Probabilistic similarity search for uncertain time series. In: Proceedings of international conference on scientific and statistical database management (SSDBM). Springer, Berlin, pp 435–443
Atallah M, Qi Y (2009) Computing all skyline probabilities for uncertain data. In: Proceedings of the ACM symposium on principles of database systems (PODS), pp 279–287
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream. In: Proceedings ACM symposium on principles of database systems (PODS), pp 1–16
Barbará D, Garcia-Molina H, Porter D (1992) The management of probabilistic data. IEEE Trans Knowl Data Eng (TKDE) 4(5):487–502
Benjelloun O, Sarma A, Halevy A, Widom J (2006) Uldbs: databases with uncertainty and lineage. In: Proceedings of international conference on very large data bases (VLDB), pp 953–964
Bernecker T, Emrich T, Kriegel H, Mamoulis N, Renz M, Zufle A (2011) A novel probabilistic pruning approach to speed up similarity queries in uncertain databases. In: Proceedings of IEEE ICDE
Bernecker T, Emrich T, Kriegel H, Renz M, Züfle A (2012) Probabilistic ranking in fuzzy object databases. In: Proceedings of ACM CIKM, pp 2647–2650
Beskales G, Soliman M, IIyas I (2008) Efficient search for the top-k probable nearest neighbors in uncertain data. In: Proceedings of international conference on very large data bases (VLDB)
Beyer K, Haas P, Reinwald B, Sismanis Y, Gemulla R (2007) On synopses for distinct-value estimation under multiset operations. In: Proceedings of ACM SIGMOD, pp 199–210
Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Böhm C, Fiedler F, Oswald A, Plant C, Wackersreuther B (2009) Probabilistic skyline queries. In: Proceedings of ACM CIKM, pp 651–660
Böhm C, Pryakhin A, Schubert M (2006) The gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of IEEE ICDE
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 421–430
Bu Y, Howe B, Balazinska M, Ernst M (2010) Haloop: efficient iterative data processing on large clusters. PVLDB 3(1–2):285–296
Burdick D, Deshpande P, Jayram T, Ramakrishnan R, Vaithyanathan S (2007) Olap over uncertain and imprecise data. VLDB J 16(1):123–144
Chaudhuri S, Das G, Hristidis V, Weikum G (2006) Probabilistic information retrieval approach for ranking of database query results. ACM TODS 31(3):1134–1168
Cheema M, Lin X, Wang W, Zhang W, Pei J (2009) Probabilistic reverse nearest neighbor queries on uncertain data. IEEE TKDE 22(4):550–564
Chen J, Cheng R (2007) Efficient evaluation of imprecise location-dependent queries. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 586–595
Chen L, Özsu M, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD, pp 491–502
Chen Y, Qin X, Liu L (2010) Uncertain distance-based range queries over uncertain moving objects. J Comput Sci Technol 25(5):982–998
Cheng R, Chen J, Mokbel M, Chow C (2008) Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of IEEE ICDE, pp 973–982
Cheng R, Chen L, Chen J, Xie X (2009) Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of ACM EDBT, pp 672–683
Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM SIGMOD, pp 551–562
Cheng R, Kalashnikov D, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng (TKDE) 16(9):1112–1127
Cheng R, Kalashnikov D, Prabhakar S (2007) Evaluation of probabilistic queries over imprecise data in constantly-evolving environments. Inf Syst (IS) 32(1):104–130
Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J (2004) Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of VLDB, pp 876–887
Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J (2006) Efficient join processing over uncertain data. In: Proceedings of ACM CIKM, pp 738–747
Cheng S, Li J (2009) Sampling based (epsilon, delta)-approximate aggregation algorithm in sensor networks. In: Proceedings of IEEE ICDCS, pp 273–280
Chiu S, Huang J, Huang J (2012) On processing continuous frequent k-n-match queries for dynamic data over networked data sources. Knowl Inf Syst 31(3):547–579
Chu D, Deshpande A, Hellerstein J, Hong W (2006) Approximate data collection in sensor networks using probabilistic models. In: Proceedings of IEEE ICDE, pp 48–48
Chung B, Lee W, Chen A (2009) Processing probabilistic spatio-temporal range queries over moving objects with uncertainty. In: Proceedings of ACM EDBT, pp 60–71
Cocci R, Tran T, Diao Y, Shenoy P (2008) Efficient data interpretation and compression over rfid streams. In: Proceedings of IEEE ICDE, pp 1445–1447
Condie T, Conway N, Alvaro P, Hellerstein J, Elmeleegy K, Sears R (2010) Mapreduce online. In: Proceedings of USENIX conference on networked systems design and implementation (NSDI)
Considine J, Li F, Kollios G, Byers J (2004) Approximate aggregation techniques for sensor data. In: Proceedings of IEEE ICDE, pp 449–460
Cormode G, Garofalakis M (2007) Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 281–292
Cormode G, Garofalakis M, Muthukrishnan S, Rastogi R (2005) Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of ACM SIGMOD
Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of IEEE ICDE, pp 305–316
Cuzzocrea A (2011) Retrieving accurate estimates to olap queries over uncertain and imprecise multidimensional data streams. In: Scientific and statistical database management (SSDBM). Springer, Berlin, pp 575–576
Dai X, Yiu M, Mamoulis N, Tao Y, Vaitis M (2005) Probabilistic spatial queries on existentially uncertain data. In: Proceedings of advances in spatial and temporal data (SSTD). Springer, Berlin
Dallachiesa M, Nushi B, Mirylenka K, Palpanas T (2012) Uncertain time-series similarity: return to the basics. In: Proceedings of VLDB endowment, vol 5, pp 1662–1673
Dalvi N, Suciu D (2007) The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the ACM symposium on principles of database systems (PODS). ACM, pp 293–302
Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544
Dalvi N, Suciu D (2007) Management of probabilistic data: foundations and challenges. In: Proceedings of the ACM symposium on principles of database systems (PODS), pp 1–12
Das A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 40–51
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the conference on operating system design and implementation (OSDI), pp 137–150
Deligiannakis A, Kotidis Y, Roussopoulos N (2004) Hierarchical in-network data aggregation with quality guarantees. In: Proceedings of EDBT, pp 577–578
Deligiannakis A, Kotidis Y, Roussopoulos N (2006) Processing approximate aggregate queries in wireless sensor networks. Inf Syst (IS) 31(8):770–792
Dellis E, Seeger B (2007) Efficient computation of reverse skyline queries. In: Proceedings of the 33rd international conference on very large data bases (VLDB), pp 291–302. VLDB endowment
Deng L, Wang F, Huang B (2011) Probabilistic threshold join over distributed uncertain data. In: Proceedings of Web-Age Information Management. Springer, pp 68–80
Deshpande A, Guestrin C, Madden S, Hellerstein J, Hong W (2004) Model-driven data acquisition in sensor networks. In: Proceedings of VLDB
Ding X, Jin H (2010) Efficient and progressive algorithms for distributed skyline queries over uncertain data. In: Proceedings of the 28th international conference on distributed computing systems (ICDCS), pp 149–158
Dittrich J, Quiané-Ruiz J, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceedings of PVLDB 3(1–2):515–529
Emrich T, Kriegel H, Kröger P, Renz M, Züfle A (2010) Boosting spatial pruning: on optimal pruning of mbrs. In: Proceedings ACM SIGMOD, pp 39–50
Fagin R (1996) Combining fuzzy information from multiple systems. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 216–226
Fagin R (1998) Fuzzy queries in multimedia database systems. In: Proceedings of ACM symposium on principles of database systems (PODS). ACM, pp 1–10
Fan W, Geerts F, Li J, Xiong M (2011) Discovering conditional functional dependencies. IEEE Trans Knowl Data Eng (TKDE) 23(5):683–698
Flajolet P, Nigel Martin G (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209
Forbes A, Sousa J (2011) The gum, bayesian inference and the observation and measurement equations. Measurement 44(8):1422–1435
Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. In: Proceedings of the international joint conferences on artificial intelligence (IJCAI)
Fuxman A, Fazli E, Miller R (2005) Conquer: efficient management of inconsistent databases. In: Proceedings of ACM SIGMOD, pp 155–166
Ganguly S, Garofalakis M, Rastogi R (2003) Processing set expressions over continuous update streams. In: Proceedings of ACM SIGMOD, pp 265–276
Ge T, Zdonik S (2008) Handling uncertain data in array database systems. In: Proceedings of the 24th international conference on data engineering (ICDE), pp 1140–1149. IEEE
Ge T, Zdonik S, Madden S (2009) Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of ACM SIGMOD
Golab L, Özsu M (2003) Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of VLDB, pp 500–511
Green T, Tannen V (2006) Models for incomplete and probabilistic information. IEEE Data Eng Bull 29(1):17–24
Guo P (2009) Fuzzy data envelopment analysis and its application to location problems. Inf Sci 179(6):820–829
Gupta R, Sarawagi S (2006) Creating probabilistic databases from information extraction models. In: Proceedings of the international conference on very Large data bases (VLDB)
Haas P, Swami A (1992) Sequential sampling procedures for query size estimation. ACM SIGMOD Record 21(2):341–350
Hong T, Chen C, Lee Y, Wu Y (2008) Genetic-fuzzy data mining with divide-and-conquer strategy. IEEE Trans Evolut Comput 12(2):252–265
Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384
Hua M, Pei J, Zhang W, Lin X (2008) Efficiently answering probabilistic threshold top-k queries on uncertain data. In: Proceedings of IEEE ICDE, pp 1403–1405
Hua M, Pei J, Zhang W, Lin X (2008) Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of ACM SIGMOD, pp 673–686
Huang Y, Chen C, Lee C (2009) Continuous k-nearest neighbor query for moving objects with uncertain velocity. GeoInformatica 13:1–25
Huang Y, Lee C (2010) Efficient evaluation of continuous spatio-temporal queries on moving objects with uncertain velocity. Geoinformatica 14(2):163–200
Hung E, Getoor L, Subrahmanian V (2003) Pxml: a probabilistic semistructured data model and algebra. In: Proceedings of the IEEE 19th international conference on data engineering (ICDE)
Ishikawa Y, Iijima Y, Yu J (2009) Spatial range querying for gaussian-based imprecise query objects. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 676–687
Jampani R, Xu F, Wu M, Perez L, Jermaine C, Haas P (2008) Mcdb: a monte carlo approach to managing uncertain data. In: Proceedings of ACM SIGMOD, pp 687–700
Jayram T, Kale S, Vee E (2007) Efficient aggregation algorithms for probabilistic data. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms (SODA), pp 346–355
Jayram T, McGregor A, Muthukrishnan S, Vee E (2008) Estimating statistical aggregates on probabilistic data streams. ACM Trans Database Syst (TODS) 33(4):26
Jeffery S, Franklin M, Garofalakis M (2008) An adaptive rfid middleware for supporting metaphysical data independence. VLDB J 17(2):265–289
Jeffery S, Garofalakis M, Franklin M (2006) Adaptive cleaning for rfid data streams. In: Proceedings of the 32nd international conference on very large data bases (VLDB), pp 163–174
Jensen C, Tiesyte D, Tradisauskas N (2006) The cost benchmark—comparison and evaluation of spatio-temporal indexes. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 125–140
Jestes J, Li F, Yan Z, Yi K (2010) Probabilistic string similarity joins. In: Proceedings of the ACM SIGMOD, pp 327–338
Jiang B, Pei J (2009) Online interval skyline queries on time series. In: Proceedings of the 25th international conference on data engineering (ICDE), pp 1036–1047. IEEE
Jin C, Qian W, Sha C, Yu J, Zhou A (2003) Dynamically maintaining frequent items over a data stream. In: Proceedings of ACM CIKM, pp 287–294
Jin C, Yi K, Chen L, Yu J, Lin X (2008) Sliding-window top-k queries on uncertain streams. In: Proceedings of the international conference on very large data bases (VLDB)
Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th international conference on data engineering (ICDE). IEEE
Khalefa M, Mokbel M, Levandoski J (2008) Skyline query processing for incomplete data. In: Proceedings of the IEEE 24th international conference on data engineering (ICDE). IEEE
Kimelfeld B, Kosharovsky Y, Sagiv Y (2008) Query efficiency in probabilistic xml models. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 701–714
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press
Kriegel H, Kunath P, Pfeifle M, Renz M (2006) Probabilistic similarity join on uncertain data. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 295–309
Kriegel H, Kunath P, Renz M (2007) Probabilistic nearest-neighbor query on uncertain objects. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 337–348
Lakshmanan L, Leone N, Ross R, Subrahmanian V (1997) Probview: a flexible probabilistic database system. ACM Trans Database Syst (TODS) 22(3):419–469
Larson P, Lehner W, Zhou J, Zabback P (2007) Cardinality estimation using sample views with quality assurance. In: Proceedings of ACM SIGMOD, pp 175–186
Lee M, Hsu W, Jensen C, Cui B, Teo K (2003) Supporting frequent updates in r-trees: a bottom-up approach. In: Proceedings of VLDB, pp 608–619
Li F, Yi K, Jestes J (2009) Ranking distributed probabilistic data. In: Proceedings of the international conference on management of data (SIGMOD). ACM, pp 361–374
Li J, Liu C, Zhou R, Wang W (2011) Top-k keyword search over probabilistic xml data. In: Proceedings of the 27th international conference on data engineering (ICDE), pp 673–684
Li J, Saha B, Deshpande A (2009) A unified approach to ranking in probabilistic data. In: Proceedings of the 31st international conference on very large data bases (VLDB)
Lian X, Chen L (2008) Monochromatic and bichromatic reverse skyline search over uncertain data. In: Proceedings of ACM SIGMOD, pp 213–226
Lian X, Chen L (2008) Probabilistic group nearest neighbor queries in uncertain data. IEEE Trans Knowl Data Eng (TKDE) 20(6):809–824
Lian X, Chen L (2008) Probabilistic ranked queries in uncertain data. In: Proceedings of the ACM EDBT, pp 511–522
Lian X, Chen L (2009) Efficient join processing on uncertain data streams. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), pp 857–866
Lian X, Chen L (2009) Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDBJ 18(3):787–808
Lian X, Chen L (2009) Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the international conference on database systems for advanced applications (DASFAA). Springer, Berlin, pp 35–50
Lian X, Chen L (2009) Top-k dominating queries in uncertain data. In: Proceedings of the ACM EDBT, pp 660–671
Lian X, Chen L (2010) Set similarity join on probabilistic data. Proc VLDB 3(1–2):650–659
Lian X, Chen L (2011) Efficient query answering in probabilistic rdf graphs. In: Proceedings of the ACM SIGMOD
Lin B, Su J (2005) Handling frequent updates of moving objects. In: Proceedings of the 14th ACM international conference on information and knowledge management (CIKM), pp 493–500
Lin X, Lu H, Xu J, Yu J (2004) Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proceedings of IEEE ICDE, pp 362–373
Lin X, Zhang Y, Zhang W, Cheema M (2011) Stochastic skyline operator. In: Proceedings of the 27th international conference on data engineering (ICDE), pp 721–732. IEEE
Liu X, Ye M, Xu J, Tian Y, Lee W (2010) k-selection query over uncertain data. In: Proceedings of DASFAA. Springer, Berlin, pp 444–459
Ljosa V, Singh A (2007) Apla: indexing arbitrary probability distributions. In: Proceedings of IEEE ICDE, pp 946–955
Ljosa V, Singh A (2008) Top-k spatial joins of probabilistic objects. In: Proceedings of IEEE ICDE
Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301
Madden S, Franklin M, Hellerstein J, Hong W (2003) The design of an acquisitional query processor for sensor networks. In: Proceedings of ACM SIGMOD, pp 491–502
Murthy R, Ikeda R, Widom J (2011) Making aggregation work in uncertain and probabilistic databases. IEEE Trans Knowl Data Eng (TKDE) 23(8):1261–1273
Nierman A, Jagadish H (2002) Protdb: probabilistic data in xml. In: Proceedings of the 28th international conference on very large data bases (VLDB), pp 646–657
Olteanu D, Huang J, Koch C (2009) Sprout: Lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of IEEE ICDE, pp 640–651
Pei J, Hua M, Tao Y, Lin X (2008) Query answering techniques on uncertain and probabilistic data: tutorial summary. In: Proceedings of ACM SIGMOD, pp 1357–1364
Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd international conference on very large data bases (VLDB), pp 15–26
Peng L, Diao Y, Liu A (2011) Optimizing probabilistic query processing on continuous uncertain data. In: Proceedings of the international conference on very large data bases (VLDB)
Perez L, Arumugam S, Jermaine C (2010) Evaluation of probabilistic threshold queries in mcdb. In: Proceedings of the ACM SIGMOD international conference on management of data
Potamias M, Bonchi F, Gionis A, Kollios G (2010) K-nearest neighbors in uncertain graphs. In: Proceedings of VLDB, pp 997–1008
Qi Y, Jain R, Singh S, Prabhakar S (2010) Threshold query optimization for uncertain data. In: Proceedings of ACM SIGMOD
Ré C, Dalvi N, Suciu D (2007) Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 886–895. IEEE
Ré C, Letchner J, Balazinksa M, Suciu D (2008) Event queries on correlated probabilistic streams. In: Proceedings of ACM SIGMOD, pp 715–728
Ré C, Suciu D (2007) Efficient evaluation of having queries on a probabilistic database. In: Proceedings of Database programming languages (DBPL). Springer, Berlin, pp 186–200
Ross R, Subrahmanian V, Grant J (2005) Aggregate operators in probabilistic data. J ACM (JACM) 52(1):54–101
Sarma A, Benjelloun O, Halevy A, Widom J (2006) Working models for uncertain data. In: Proceedings of the 22nd international conference on data engineering (ICDE)
Sarma A, Theobald M, Widom J (2008) Exploiting lineage for confidence computation in uncertain and probabilistic data. In: Proceedings of IEEE ICDE
Sen P, Deshpande A (2007) Representing and querying correlated tuples in probabilistic data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 596–605
Sen P, Deshpande A, Getoor L (2009) Prdb:managing and exploiting rich correlations in probabilistic databases. VLDB J 18(5):1065–1090
Senellart P, Abiteboul S (2007) On the complexity of managing probabilistic xml data. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 283–292
Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S (2007) Indexing uncertain categorical data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 616–625
Singh S, Mayfield C, Shah R, Prabhakar S, Hambrusch S, Neville J, Cheng R (2008) Database support for probabilistic attributes and tuples. In: Proceedings of IEEE ICDE, pp 1053–1061
Soliman M, Ilyas I, Ben-David S (2010) Supporting ranking queries on uncertain and incomplete data. VLDB J 19(4):477–501
Soliman M, Ilyas I, Chang K (2007) Urank: formulation and efficient evaluation of top-k queries in uncertain databases. In: Proceedings of ACM SIGMOD, pp 1082–1084
Soliman M, Ilyas I, Chang K (2008) Probabilistic top-k and ranking-aggregate queries. ACM Trans Database Syst (TODS) 33(3):1–54
Soliman M, Ilyas I, Chen-Chuan Chang K (2007) Top-k query processing in uncertain data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 896–905. IEEE
Sun C, Chan C, Goenka A (2007) Multiway slca-based keyword search in xml data. In: Proceedings of the 16th international conference on World Wide Web (WWW), pp 1043–1052
Sun J, Papadias D, Tao Y, Liu B (2004) Querying about the past, the present, and the future in spatio-temporal data. In: Proceedings of IEEE ICDE, pp 202–213
Tang M, Li F, Phillips J, Jestes J (2012) Efficient threshold monitoring for distributed probabilistic data. In: Proceedings of the IEEE ICDE
Tao Y, Cheng R, Xiao X, Ngai W, Kao B, Prabhakar S (2005) Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of VLDB, pp 922–933
Tao Y, Papadias D (2006) Maintaining sliding window skylines on data streams. IEEE transactions on knowledge and data engineering (TKDE), pp 377–391
Tao Y, Papadias D, Shen Q (2002) Continuous nearest neighbor search. In: Proceedings of the 28th international conference on very large data bases (VLDB), pp 287–298
Tao Y, Papadias D, Zhai J, Li Q (2005) Venn sampling: a novel prediction technique for moving objects. In: Proceedings of international conference on data engineering (ICDE), pp 680–691
Tao Y, Sun J, Papadias D (2003) Selectivity estimation for predictive spatio-temporal queries. In: Proceedings of the 19th international conference on data engineering (ICDE)
Tao Y, Xiao X, Cheng R (2007) Range search on multidimensional uncertain data. ACM Trans Database Syst (TODS) 32(3):15–63
Tian Y, Patel J, Nair V, Martini S, Kretzler M (2008) Periscope/gq: a graph querying toolkit. In: Proceedings of the 36th international conference on very large data bases (VLDB)
Trajcevski G, Tamassia R, Cruz I, Scheuermann P, Hartglass D, Zamierowski C (2011) Ranking continuous nearest neighbors for uncertain trajectories. VLDB J 20(5):767–791
Trajcevski G, Tamassia R, Ding H, Scheuermann P, Cruz I (2009) Continuous probabilistic nearest-neighbor queries for uncertain trajectories. In: Proceedings of EDBT, pp 874–885
Trajcevski G, Wolfson O, Hinrichs K, Chamberlain S (2004) Managing uncertainty in moving objects databases. ACM Trans Database Syst (TODS) 29(3):463–507
Tran T, McGregor A, Diao Y, Peng L, Liu A (2010) Conditioning and aggregating uncertain data streams: going beyond expectations. In: Proceedings of VLDB, pp 1302–1313
Tran T, Peng L, Diao Y, McGregor A, Liu A (2012) CLARO: modeling and processing uncertain data streams. VLDB J 21(5):651–676
Tran T, Peng L, Li B, Diao Y, Liu A (2010) PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of ACM SIGMOD, pp 157–168
Tran T, Sutton C, Cocci R, Nie Y, Diao Y, Shenoy P (2009) Probabilistic inference over rfid streams in mobile environments. In: Proceedings of IEEE ICDE, pp 1096–1107
Ukkonen E (1992) Approximate string-matching with q-grams and maximal matches. Theor Comput Sci 92(1):191–211
Wang D, Michelakis E, Garofalakis M, Hellerstein J (2008) Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of VLDB, pp 340–351
Widom J (2005) Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial CIDR, pp 262–276
Xiong X, Aref W (2006) R-trees with update memos. In: Proceedings of the 22nd international conference on data engineering (ICDE), pp 22–22
Xu C, Wang Y, Lin S, Gu Y, Qiao J (2011) Efficient fuzzy top-k query processing over uncertain objects. In: Proceedings of database and expert systems applications (DEXA), pp 167–182
Yang H, Dasdan A, Hsiao R, Parker D (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of ACM SIGMOD, pp 1029–1040
Yang Y, Wang Y (2011) Towards estimating expected sizes of probabilistic skylines. Sci China Inf Sci 54(12):2554–2564
Yeh M, Wu K, Yu P, Chen M (2009) Proud: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of ACM EDBT, pp 684–695
Yi K, Li F, Kollios G, Srivastava D (2008) Efficient processing of top-k queries in uncertain databases with x-relations. IEEE TKDE 20(12):1669–1682
Zhang W, Lin X, Pei J, Zhang Y (2008) Managing uncertain data: probabilistic approaches. In: Proceedings of international conference on web-age information management (WAIM), pp 405–412
Zhang W, Lin X, Zhang Y, Wang W, Yu J (2009) Probabilistic skyline operator over sliding windows. In: Proceedings of international conference on data engineering (ICDE), pp 1060–1071
Zhang X, Chen K, Shou L, Chen G, Gao Y, Tan K (2012) Efficient processing of probabilistic set-containment queries on uncertain set-valued data. Inf Sci 196:97–117
Zhang X, Chomicki J (2008) On the semantics and evaluation of top-k queries in probabilistic database. In: Proceedings of the DBRank
Zhang Y, Lin X, Zhu G, Zhang W, Lin Q (2010) Efficient rank based knn query processing over uncertain data. In: Proceedings of international conference on data engineering (ICDE), pp 28–39
Zheng K, Trajcevski G, Zhou X, Scheuermann P (2011) Probabilistic range queries for uncertain trajectories on road networks. In: Proceedings of ACM EDBT, pp 283–294
Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77
Zhou B, Pei J (2012) Aggregate keyword search on large relational databases. Knowl Inf Syst 30(2):283–318
Acknowledgments
This work was supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2011CB302601), the National High Technology Research and Development 863 Program of China (Grant No. 2013AA01A213), the National Natural Science Foundation of China (Grant No. 60873215), the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (Grant No. S2010J5050), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20124307110015).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Y., Li, X., Li, X. et al. A survey of queries over uncertain data. Knowl Inf Syst 37, 485–530 (2013). https://doi.org/10.1007/s10115-013-0638-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0638-6