ABSTRACT
The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.
- Bartolini, I., Ciaccia, P., Patella, M. Efficient Sort-based Skyline Evaluation. TODS, 33(4):31.1--31.49, 2008. Google ScholarDigital Library
- Bentley, J., Clarkson, K., Levine, D. Fast Linear Expected-Time Algorithms for Computing Maxima and Convex Hulls. SODA, 1999. Google ScholarDigital Library
- Bentley, J., Kung, H., Schkolnick, M., Thompson, C. On the Average Number of Maxima in a Set of Vectors and Applications. Journal of the ACM, 25(4): 536--543, 1978. Google ScholarDigital Library
- Blohsfeld, B., Korus, D., Seeger, B. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. SIGMOD, 1999. Google ScholarDigital Library
- Börzsönyi, S., Kossmann, D., Stocker, K. The Skyline Operator. ICDE, 2001.Google ScholarDigital Library
- Casella, G., Berger, R., Statistical Inference. Duxbury Press, 2001.Google Scholar
- Chan, C.-Y., Eng, P.-K., Tan, K.-L. Stratified Computation of Skylines with Partially-Ordered Domains. SIGMOD, 2005. Google ScholarDigital Library
- Chan, C.-Y., Jagadish, H., Tan, K.-L. Tung, A., Zhang, Z. Finding k-dominant Skylines in High Dimensional Space. SIGMOD, 2006. Google ScholarDigital Library
- Chaudhuri, S., Dalvi, N., Kaushik, R. Robust Cardinality and Cost Estimation for the Skyline Operator. ICDE, 2006. Google ScholarDigital Library
- Chomicki, J. Querying with Intrinsic Preferences. EDBT, 2002. Google ScholarDigital Library
- Chomicki, J., Godfrey, P., Gryz, J., Liang, D. Skyline with Presorting. ICDE, 2003.Google ScholarCross Ref
- Duda, R., Hart, P., Stork, D. Pattern Classification. Wiley-Interscience, 2000. Google ScholarDigital Library
- Fridman, J. Exploratory Projection Pursuit. Journal of American Statistics Association, 82:249--266, 1987.Google ScholarCross Ref
- Godfrey, P. Skyline Cardinality for Relational Processing. FoIKS, 2004.Google ScholarCross Ref
- Godfrey, P., Shipley, R., Gryz, J. Algorithms and Analyses for Maximal Vector Computation. VLDB Journal, 16(1): 5--28, 2007. Google ScholarDigital Library
- Gunopoulos, D., Kollios, G., Tsotras, V., Domeniconi, C. Selectivity Estimators for Multidimensional Range Queries over Real Attributes. VLDB Journal, 14(2): 137--154, 2005. Google ScholarDigital Library
- Haas, P., Swami, A. Sequential Sampling Procedures for Query Size Estimation. SIGMOD, 1992. Google ScholarDigital Library
- Hwang, J.-N., Lay, S.-R., Lippman, A. Nonparametric Multivariate Density Estimation: A Comparative Study. IEEE Trans. on Signal Processing, 42(10): 2795--2810, 1994.Google ScholarDigital Library
- Hwang, J.-N., Lay, S.-R., Lippman, A. Unsupervised Learning for Multivariate Probability Density Estimation: Radial Basis and Projection Pursuit. IEEE Conf. on Neural Networks, 1994.Google Scholar
- Jagadish, H., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T. Optimal Histograms with Quality Guarantees. VLDB, 1998. Google ScholarDigital Library
- Koltun, V., Papadimitriou, C. Approximately Dominating Representatives. ICDT, 2005. Google ScholarDigital Library
- Kossmann, D., Ramasak, F., Rost, S. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. VLDB, 2002. Google ScholarDigital Library
- Kung, H., Luccio, F., Preparata, F. On Finding the Maxima of a Set of Vectors. Journal of the ACM, 22(4): 469--476, 1975. Google ScholarDigital Library
- Lee, K., Zhang, B., Li, H., Lee, W.-C. Approaching the Skyline in Z Order. VLDB, 2007. Google ScholarDigital Library
- Lin, X., Yuan, Y., Wang, W., Lu, H. Stabbing the Sky: Efficient Skyline Computation over Sliding Windows. ICDE, 2005. Google ScholarDigital Library
- Morse, M., Patel, J., Jagadish, H. Efficient Skyline Computation over Low-Cardinality Domains. VLDB, 2007. Google ScholarDigital Library
- Papadias, D., Tao, Y., Greg, F., Seeger, B. Progressive Skyline Computation in Database Systems. TODS, 30(1): 41--82, 2005. Google ScholarDigital Library
- Pei, J., Jiang, B., Lin, X., Yuan, Y. Probabilistic Skylines on Uncertain Data. VLDB, 2007. Google ScholarDigital Library
- Pei, J., Jin, W., Ester, M., Tao, Y. Catching the Best Views of Skyline: a Semantic Approach Based on Decisive Subspaces. VLDB, 2005. Google ScholarDigital Library
- Poosala, V., Ioannidis, Y. Selectivity Estimation without the Attribute Value Independence Assumption. VLDB, 1997. Google ScholarDigital Library
- Press, W., Teukolsky, S., Vetterling, W., Flannery, B. Numerical Recipes in C, Second Edition. Cambridge University Press, 1992. Google ScholarDigital Library
- Sacharidis, D., Papadopoulos, S., Papadias, D. Topologically-sorted Skylines for Partially-ordered Domains. ICDE, 2009. Google ScholarDigital Library
- Sharifzadeh, M., Shahabi, C. The Spatial Skyline Query. VLDB, 2006. Google ScholarDigital Library
- Tan, K.-L., Eng, P.-K., Ooi, B.-C. Efficient Progressive Skyline Computation. VLDB, 2001. Google ScholarDigital Library
- Tao, Y., Papadias, D. Maintaining Sliding Window Skylines on Data Streams. IEEE TKDE, 18(3): 377--391, 2006. Google ScholarDigital Library
- Tao, Y., Xiao, X., Pei J. SUBSKY: Efficient Computation of Skylines in Subspaces. ICDE, 2006. Google ScholarDigital Library
- Xia, T., Zhang, D. Refreshing the Sky: the Compressed SkyCube with Efficient Support for Frequent Updates. SIGMOD, 2006. Google ScholarDigital Library
- Yuan, Y., Lin, X., Liu, Q, Wang, W., Yu, J. X., Zhang, Q. Efficient Computation of the Skyline Cube. VLDB, 2005. Google ScholarDigital Library
Index Terms
- Kernel-based skyline cardinality estimation
Recommendations
On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets
The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are ...
Parallel k-dominant skyline queries in high-dimensional datasets
AbstractThe skyline operator has been used to select preference points in many applications. The previously proposed k-dominant skyline, which relaxes the idea of dominance, reduces the number of skyline points in high-dimensional datasets. ...
Skyline distance: a measure of multidimensional competence
Skyline has been widely recognized as being useful for multi-criteria decision-making applications. While most of the existing work computes skylines in various contexts, in this paper, we consider a novel problem: how far away a point is from the ...
Comments