Abstract
Sensor fusion is the combining of sensory data from disparate sources such that the resulting information is in some sense better than would be possible when these sources were used individually. The natural uncertainty exists in these data because sensors are not precise enough. Hence, the intuitive method to store this kind of data is using uncertain database. Finding the top-k entities according to one or more attributes is a powerful technique when the uncertain database contains large quantity of data. However, compared to top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. We propose a method to process entity–based global top-k aggregate queries in uncertain database, which returns the top-k entities that have the highest aggregate value. Our method has two levels, entity state generation and G-topk-E query processing. In the former level, entity states, which satisfy the properties of x-tuple, are generated one after the other according to their aggregate values, while in the latter level, dynamic programming–based global top-k entity query processing is employed to return the answers. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.
Similar content being viewed by others
References
Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage year. In: Proceedings of VLDB 2006. pp 9–16
Chaudhuri S, Ganjam K, Ganti V, Motwani R (2003) Robust and efficient fuzzy match for online data cleaning. In: Proceedings of SIGMOD 2003. pp 313–324
Gupta R, Sarawagi S (2006) Creating probabilistic databases from information extraction models. In: Proceedings of VLDB 2006. pp 965–976
Deshpande A, Guestrin C, Madden S, Hellerstein J, Hong W (2004) Model-driven data acquisition in sensor networks. In: Proceedings of VLDB 2004. pp 588–599
Jeffery SR, Garofalakis M, Franklin MJ (2006) Adaptive cleaning for RFID data streams. In: Proceedings of VLDB 2006. pp 163–174
Liu L (2007) From data privacy to location privacy: models and algorithms. In: Proceedings of VLDB 2007. pp 1429–1430
Dalvi N, Suciu D (2007) Management of probabilistic data foundations and challenges. In: Proceedings of SIGMOD 2007. pp 1–12
Abiteboul S, Kanellakis P, Grahne G (1987) On the representation and querying of sets of possible worlds. ACM SIGMOD Rec 16(3):34–48
Green TJ, Tannen V (2006) Models for incomplete and probabilistic information. IEEE Date Eng Bull 29(1):17–24
Sarma AD, Benjelloun O, Halevy A, Widom J (2006) Working models for uncertain data. In: Proceedings of ICDE 2006. p 7-7
Antova L, Koch C, Olteanu D (2007) \( 10^{10^{6}}\) Worlds and beyond: efficient representation and processing of incomplete information. In: Proceedings of ICDE 2007. pp 1021–1040
Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of SIGMOD 2003. pp 551–562
Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544
Antova L, Koch C, Olteanu D (2007) From complete to incomplete information and back. In: Proceedings of SIGMOD 2007. pp 713–724
Tao Y, Cheng R, Xiao X, Ngai WK, Kao B, Prabhakar S (2005) Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of VLDB 2005. pp 922–933
Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S (2007) Indexing uncertain categorical data. In: Proceedings of ICDE 2007. pp 616–625
Ilyas IF, Beskales G, Soliman MA (2008) Survey of Top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):1–58
Soliman MA, Ilyas IF, Chang KC (2008) Probabilistic Top-k and ranking-aggregate queries. TODS 33(3) 13:1–13:54
Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of ICDE 2007. pp 896–905
Lian X, Chen L (2008) Probabilistic ranked queries in uncertain databases. In: Proceedings of EDBT 2008. pp 511–522
Hua M, Pei J, Zhang W, Lin X (2008) Efficiently answering probabilistic threshold Top-k queries on uncertain data. In: Proceedings of ICDE 2008. pp 1357–1364
Zhang X, Chomicki J (2008) On the semantics and evaluation of Top-k queries in probabilistic databases. In: Proceedings of DBRank 2008. pp 556–563
Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of ICDE 2009. pp 305–316
Yi K, Li F, Srivastava D, Kollios G (2008) Efficient processing of Top-k queries in uncertain databases with X-relations. IEEE TKDE 20(12):1669–1682
Jin Ch, Yi K, Chen L, Yu J X, Lin X (2008) Sliding-window Top-k queries on uncertain streams. In: Proceedings of VLDB 2008. pp 301–312
Beskales G, Soliman MA, Ilyas IF (2008) Efficient search for the Topk probable nearest neighbors in uncertain databases. In: Proceedings of VLDB 2008. pp 326–339
Agrawal P, Benjelloun O, Das Sarma A, Hayworth C, Nabar S, Sugihara T, and Widom J (2006) Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB 2006. pp 1151–1154
Liu D (2009) Dynamic programming based Top-k aggregate queries in uncertain database. J Inf Comput Sci 6(3):1589–1596
Acknowledgments
This work is supported by Natural Science Foundation of China (No. 60803105), Science & Technology Project of Department of Education of Jiangxi Province (No. GJJ08508). The author is grateful for the anonymous reviewers of the 4th International Symposium on Security and Multimodality in Pervasive Environments (SMPE2010) who made constructive comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, D., Wan, C., Xiong, N. et al. Top-k entities query processing on uncertainly fused multi-sensory data. Pers Ubiquit Comput 17, 951–963 (2013). https://doi.org/10.1007/s00779-012-0542-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-012-0542-1