skip to main content
10.1145/1559845.1559899acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Kernel-based skyline cardinality estimation

Authors Info & Claims
Published:29 June 2009Publication History

ABSTRACT

The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.

References

  1. Bartolini, I., Ciaccia, P., Patella, M. Efficient Sort-based Skyline Evaluation. TODS, 33(4):31.1--31.49, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bentley, J., Clarkson, K., Levine, D. Fast Linear Expected-Time Algorithms for Computing Maxima and Convex Hulls. SODA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bentley, J., Kung, H., Schkolnick, M., Thompson, C. On the Average Number of Maxima in a Set of Vectors and Applications. Journal of the ACM, 25(4): 536--543, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blohsfeld, B., Korus, D., Seeger, B. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Börzsönyi, S., Kossmann, D., Stocker, K. The Skyline Operator. ICDE, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Casella, G., Berger, R., Statistical Inference. Duxbury Press, 2001.Google ScholarGoogle Scholar
  7. Chan, C.-Y., Eng, P.-K., Tan, K.-L. Stratified Computation of Skylines with Partially-Ordered Domains. SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chan, C.-Y., Jagadish, H., Tan, K.-L. Tung, A., Zhang, Z. Finding k-dominant Skylines in High Dimensional Space. SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chaudhuri, S., Dalvi, N., Kaushik, R. Robust Cardinality and Cost Estimation for the Skyline Operator. ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chomicki, J. Querying with Intrinsic Preferences. EDBT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chomicki, J., Godfrey, P., Gryz, J., Liang, D. Skyline with Presorting. ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. Duda, R., Hart, P., Stork, D. Pattern Classification. Wiley-Interscience, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fridman, J. Exploratory Projection Pursuit. Journal of American Statistics Association, 82:249--266, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  14. Godfrey, P. Skyline Cardinality for Relational Processing. FoIKS, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. Godfrey, P., Shipley, R., Gryz, J. Algorithms and Analyses for Maximal Vector Computation. VLDB Journal, 16(1): 5--28, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gunopoulos, D., Kollios, G., Tsotras, V., Domeniconi, C. Selectivity Estimators for Multidimensional Range Queries over Real Attributes. VLDB Journal, 14(2): 137--154, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Haas, P., Swami, A. Sequential Sampling Procedures for Query Size Estimation. SIGMOD, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hwang, J.-N., Lay, S.-R., Lippman, A. Nonparametric Multivariate Density Estimation: A Comparative Study. IEEE Trans. on Signal Processing, 42(10): 2795--2810, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hwang, J.-N., Lay, S.-R., Lippman, A. Unsupervised Learning for Multivariate Probability Density Estimation: Radial Basis and Projection Pursuit. IEEE Conf. on Neural Networks, 1994.Google ScholarGoogle Scholar
  20. Jagadish, H., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T. Optimal Histograms with Quality Guarantees. VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Koltun, V., Papadimitriou, C. Approximately Dominating Representatives. ICDT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kossmann, D., Ramasak, F., Rost, S. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kung, H., Luccio, F., Preparata, F. On Finding the Maxima of a Set of Vectors. Journal of the ACM, 22(4): 469--476, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, K., Zhang, B., Li, H., Lee, W.-C. Approaching the Skyline in Z Order. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lin, X., Yuan, Y., Wang, W., Lu, H. Stabbing the Sky: Efficient Skyline Computation over Sliding Windows. ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Morse, M., Patel, J., Jagadish, H. Efficient Skyline Computation over Low-Cardinality Domains. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Papadias, D., Tao, Y., Greg, F., Seeger, B. Progressive Skyline Computation in Database Systems. TODS, 30(1): 41--82, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pei, J., Jiang, B., Lin, X., Yuan, Y. Probabilistic Skylines on Uncertain Data. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pei, J., Jin, W., Ester, M., Tao, Y. Catching the Best Views of Skyline: a Semantic Approach Based on Decisive Subspaces. VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Poosala, V., Ioannidis, Y. Selectivity Estimation without the Attribute Value Independence Assumption. VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Press, W., Teukolsky, S., Vetterling, W., Flannery, B. Numerical Recipes in C, Second Edition. Cambridge University Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sacharidis, D., Papadopoulos, S., Papadias, D. Topologically-sorted Skylines for Partially-ordered Domains. ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sharifzadeh, M., Shahabi, C. The Spatial Skyline Query. VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tan, K.-L., Eng, P.-K., Ooi, B.-C. Efficient Progressive Skyline Computation. VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tao, Y., Papadias, D. Maintaining Sliding Window Skylines on Data Streams. IEEE TKDE, 18(3): 377--391, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tao, Y., Xiao, X., Pei J. SUBSKY: Efficient Computation of Skylines in Subspaces. ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xia, T., Zhang, D. Refreshing the Sky: the Compressed SkyCube with Efficient Support for Frequent Updates. SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yuan, Y., Lin, X., Liu, Q, Wang, W., Yu, J. X., Zhang, Q. Efficient Computation of the Skyline Cube. VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Kernel-based skyline cardinality estimation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
        June 2009
        1168 pages
        ISBN:9781605585512
        DOI:10.1145/1559845

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader