Abstract
The skyline clause—also called the Pareto clause—recently has been proposed as an extension to SQL. It selects the tuples that are Pareto optimal with respect to a set of designated skyline attributes. This is the maximal vector problem in a relational context, but it represents a powerful extension to SQL which allows for the natural expression of on-line analytic processing (OLAP) queries and preferences in queries.
Cardinality estimation of skyline sets is the focus in this work. A better understanding of skyline cardinality—and other properties of the skyline—is useful for better design of skyline algorithms, is necessary to extend a query optimizer’s cost model to accommodate skyline queries, and helps to understand better how to use skyline effectively for OLAP and preference queries.
Within a basic model with assumptions of sparseness of values on attributes’ domains and statistical independence across attributes, we establish the expected skyline cardinality for skyline queries. While asymptotic bounds have been previously established, they are not widely known nor applied in skyline work. We show concrete estimates, as would be needed in a cost model, and consider the nature of the distribution of skyline. We next establish the effects on skyline cardinality as the constraints on our basic model are relaxed. Some of the results are quite counter-intuitive, and understanding these is critical to skyline’s use in OLAP and preference queries. We consider when attributes’ values repeat on their domains, and show the number of skyline is diminished. We consider the effects of having Zipfian distributions on the attributes’ domains, and generalize the expectation for other distributions. Last, we consider the ramifications of correlation across the attributes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zagat Toronto Restaurants. Zagat Survey, LLC (2002)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th ICDE, pp. 421–430 (2001)
Steuer, R.E.: Multiple Criteria Optimization: Theory, Computation, and Application. John Wiley & Sons, New York (1986)
Barndorff-Nielsen, O., Sobel, M.: On the distribution of the number of admissible points in a vector random sample. Theory of Probability and its Applications 11, 249–269 (1966)
Bai, Z.D., Chao, C.C., Hwang, H.K., Liang, W.Q.: On the variance of the number of maxima in random vectors and its applications. Annals of Applied Probability 8, 886–895 (1998)
Golin, M.J.: Maxima in convex regions. In: Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms (SODA), pp. 352–360. ACM/SIAM (1993)
Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. JACM 22, 469–476 (1975)
Preparata, F.P., Shamos, M.I.: Computational Geometry: An Introduction. Springer, Heidelberg (1985)
Bentley, J.L., Kung, H.T., Schkolnick, M., Thompson, C.D.: On the average number of maxima in a set of vectors and applications. JACM 25, 536–543 (1978)
Buchta, C.: On the average number of maxima in a set of vectors. Information Processing Letters 33, 63–65 (1989)
Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. In: Proceedings of the First Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 179–187. ACM/SIAM (1990)
Matoušek, J.: Computing dominances in En. Information Processing Letters 38, 277–278 (1991)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering, ICDE (2003)
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of the 28th Conference on Very Large Databases, VLDB (2002)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM Press, New York (2003) (to appear)
Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proc. of 27th VLDB, pp. 301–310 (2001)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Carey, M.J., Schneider, D.A. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79. ACM Press, New York (1995)
Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proceedings of the Sixteenth PODS, pp. 78–86 (1997)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of Sigmod, pp. 369–380 (1997)
Chomicki, J.: Querying with intrinsic preferences. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 34. Springer, Heidelberg (2002)
Chu, W.W., Yang, H., Chiang, K., Minock, M., Chow, G., Larson, C.: CoBase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems (JIIS) 6, 223–259 (1996)
Gaasterland, T., Godfrey, P., Minker, J.: Relaxation as a platform for cooperative answering. Journal of Intelligent Information Systems (JIIS) 1, 293–321 (1992)
Gaasterland, T., Lobo, J.: Qualifying answers according to user needs and preferences. Fundamenta Informaticæ 32, 121–137 (1997)
Minker, J.: An overview of cooperative answering in databases. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds.) FQAS 1998. LNCS (LNAI), vol. 1495, pp. 282–285. Springer, Heidelberg (1998)
Agrawal, R., Wimmers, E.L.: A framework for expressing and combining preferences. In: Proc. of Sigmod, pp. 297–306 (2000)
Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: A system for the efficient execution of multi-parametric ranked queries. In: Proceedings of Sigmod, pp. 259–270 (2001)
Kießling, W.: Foundations of preferences in database systems. In: Proceedings of the 28th Conference on Very Large Databases, VLDB (2002)
Kießling, W., Köstler, G.: Preference SQL: Design, implementation, experiences. In: Proceedings of the 28th Conference on Very Large Databases (VLDB) (2002)
Roman, S.: The logarithmic binomial formula. American Mathematics Monthly 99, 641–648 (1992)
Knuth, D.E.: Fundamental Algorithms: The Art of Computer Programming, 2nd edn., vol. 1. Addison Wesley, Reading (1973)
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, ACM Press, New York (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Godfrey, P. (2004). Skyline Cardinality for Relational Processing. In: Seipel, D., Turull-Torres, J.M. (eds) Foundations of Information and Knowledge Systems. FoIKS 2004. Lecture Notes in Computer Science, vol 2942. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24627-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-24627-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20965-2
Online ISBN: 978-3-540-24627-5
eBook Packages: Springer Book Archive