Selectivity Estimation of High Dimensional Window Queries via Clustering

Böhm, Christian; Kriegel, Hans-Peter; Kröger, Peer; Linhart, Petra

doi:10.1007/11535331_1

Christian Böhm¹⁹,
Hans-Peter Kriegel¹⁹,
Peer Kröger¹⁹ &
…
Petra Linhart¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3633))

Included in the following conference series:

International Symposium on Spatial and Temporal Databases

2011 Accesses

Abstract

Query optimization is an important functionality of modern database systems and often based on estimating the selectivity of queries before actually executing them. Well-known techniques for estimating the result set size of a query are sampling and histogram-based solutions. Sampling-based approaches heavily depend on the size of the drawn sample which causes a trade-off between the quality of the estimation and the time in which the estimation can be executed for large data sets. Histogram-based techniques eliminate this problem but are limited to low-dimensional data sets. They either assume that all attributes are independent which is rarely true for real-world data or else get very inefficient for high-dimensional data. In this paper we present the first multivariate parametric method for estimating the selectivity of window queries for large and high-dimensional data sets. We use clustering to compress the data by generating a precise model of the data using multivariate Gaussian distributions. Additionally, we show efficient techniques to evaluate a window query against the Gaussian distributions we generated. Our experimental evaluation shows that this approach is significantly more efficient for multidimensional data than all previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Selectivity estimation with density-model-based multidimensional histogram

Article 02 February 2021

Copula-Based Module for Selectivity Estimation of Multidimensional Range Queries

Optimizing Window Aggregate Functions via Random Sampling

References

Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and Effective Querying by Image Content. Journal of Intelligent Information Systems 3, 231–262 (1994)
Article Google Scholar
Mehrotra, R., Gary, J.: Feature-Based Retrieval of Similar Shapes. In: Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria, pp. 108–115 (1993)
Google Scholar
Shoichet, B.K., Bodian, D.L., Kuntz, I.D.: Molecular Docking Using Shape Descriptors. Journal of Computational Chemistry 13, 380–397 (1992)
Article Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-Tree: An Index Structure for High-Dimensional Data. In: Proc. 22nd Int. Conf. on Very Large Databases, VLDB 1996 (1996)
Google Scholar
Lin, K.I., Jagadish, H.V., Faloutsos, C.: The TV-tree an index structure for high-dimensional data. VLDB Journal: Very Large Data Bases 3, 517–542 (1994)
Article Google Scholar
Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. 24th Int. Conf. on Very Large Databases (VLDB 1998), pp. 194–205 (1998)
Google Scholar
McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297 (1967)
Google Scholar
Sibson, R.: SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method. The Computer Journal 16, 30–34 (1973)
Article MathSciNet Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1996), Portland, OR, pp. 291–316 (1996)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1999 (1999)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 39, 1–31 (1977)
MATH MathSciNet Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access Path Selection in a Relational Database Management System. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1979 (1979)
Google Scholar
Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1984 (1984)
Google Scholar
Muralikrishna, M., De Witt, D.J.: Equi-Depth Histograms For Estimating Selectivity Factors For Muli-Dimensional Queries. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1988 (1988)
Google Scholar
Poosala, V., Ioannidis, Y.E.: Selectivity Estimation without the Attribute Value Independence Assumption. In: Proc. 23rd Int. Conf. on Very Large Databases, VLDB 1997 (1997)
Google Scholar
Bruno, N., Chaudhuri, S., Gravan, L.: STHoles: a Multidimensional Workload-aware Histogram. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 2001 (2001)
Google Scholar
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-Based Histograms for Selectivity Estimation. In: Proc. ACM Int. Conf. on Management of Data (SIGMOD 1998), pp. 448–459 (1998)
Google Scholar
Lipton, R., Naughton, J.: Query size estimation by adaptive sampling. In: Proc. ACM Symp. on Principles of Database Systems, PODS 1990 (1990)
Google Scholar
Lipton, R., Naughton, J., Schneider, D.: Practical selectivity estimation through adaptive sampling. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1990 (1990)
Google Scholar
Hou, W.C., Ozsoyoglu, G., Dodgu, E.: Error-constrained Count Query: Evaluation in Relational Databases. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1991 (1991)
Google Scholar
Chen, C.M., Roussopoulos, N.: Adaptive Selectivity Estimation Using Query Feedback. In: Proc. ACM Int. Conf. on Management of Data, SIGMOD 1994 (1994)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, London (2001)
Google Scholar
Fayyad, U., Reina, C., Bradley, P.: Initialization of Iterative Refinement Clustering Algorithms. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, KDD 1998 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Science, University of Munich, Germany
Christian Böhm, Hans-Peter Kriegel, Peer Kröger & Petra Linhart

Authors

Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Peer Kröger
View author publications
You can also search for this author in PubMed Google Scholar
Petra Linhart
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing, CP 6176, University of Campinas, 13084-971, Campinas, Brazil
Claudia Bauzer Medeiros
National Center for Geographic Information and Analysis and Department of Spatial Information Science and Engineering, University of Maine, Boardman Hall, ME 04469-5711, Orono, USA
Max J. Egenhofer
Purdue University,
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Böhm, C., Kriegel, HP., Kröger, P., Linhart, P. (2005). Selectivity Estimation of High Dimensional Window Queries via Clustering. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds) Advances in Spatial and Temporal Databases. SSTD 2005. Lecture Notes in Computer Science, vol 3633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11535331_1

Download citation

DOI: https://doi.org/10.1007/11535331_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28127-6
Online ISBN: 978-3-540-31904-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics