Abstract
Applications requiring the handling of uncertain data have led to the development of database management systems extending the scope of relational databases to include uncertain (probabilistic) data as a native data type. New automatic query optimizations having the ability to estimate the cost of execution of a given query plan, as available in existing databases, need to be developed. For probabilistic data this involves providing selectivity estimations that can handle multiple values for each attribute and also new query types with threshold values. This paper presents novel selectivity estimation functions for uncertain data and shows how these functions can be integrated into PostgreSQL to achieve query optimization for probabilistic queries over uncertain data. The proposed methods are able to handle both attribute- and tuple-uncertainty. Our experimental results show that our algorithms are efficient and give good selectivity estimates with low space-time overhead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benjelloun, O., Sarma, A., Halevy, A., Widom, J.: ULDBs: Databases with uncertainty and lineage. In: Proceedings of International Conference on Very Large Databases (2006)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of International Conference on Very Large Databases (2004)
Orion (2008), http://orion.cs.purdue.edu/
Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: Proceedings of IEEE International Conference on Data Engineering (2007)
Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.: Indexing uncertain categorical data. In: Proceedings of IEEE International Conference on Data Engineering (2007)
Tao, Y., Cheng, R., Xiao, X., Ngai, W., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st Very Large Data Bases conference (2005)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM Special Interest Group on Management of Data (2003)
Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J., Xia, Y.: Efficient join processing over uncertain data. In: Proceedings of International Conference on Information and Knowledge Management (2006)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: Proceedings of International Conference on Very Large Databases (2004)
Poosala, V., Ioannidis, Y., Haas, P., Shekita, E.: Improved histograms for selectivity estimation of range predicates. In: Proceedings of ACM Special Interest Group on Management of Data (1996)
Pfoser, D., Jensen, C.: Capturing the uncertainty of moving-objects representations. In: Proceedings of International Conference on Scientific and Statistical Database Management (1999)
Antova, L., Koch, C., Olteanu, D.: 10^10^6 worlds and beyond: Efficient representation and processing of incomplete information. In: Proceedings of 23rd International Conference on Data Engineering (2007)
Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: Mystiq: A system for finding more answers by using probabilities. In: Proceedings of ACM Special Interest Group on Management of Data (2005)
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research (2005)
Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic Data in XML. In: Proceedings of International Conference on Very Large Databases (2002)
Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: A probabilistic semistructured data model and algebra. In: Proceedings of IEEE International Conference on Data Engineering (2003)
Ljosa, V., Singh, A.: APLA: Indexing arbitrary probability distributions. In: Proceedings of IEEE International Conference on Data Engineering (2007)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of International Conference on Very Large Databases (2004)
Lakshmanan, L., Leone, N., Ross, R., Subrahmanina, V.: Probview: A flexible probabilistic database system. ACM Transactions on Database Systems 22(3), 419–469 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S. (2008). Query Selectivity Estimation for Uncertain Data. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-69497-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69476-2
Online ISBN: 978-3-540-69497-7
eBook Packages: Computer ScienceComputer Science (R0)