Skip to main content

Skyline Cardinality for Relational Processing

How Many Vectors Are Maximal?

  • Conference paper
Book cover Foundations of Information and Knowledge Systems (FoIKS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2942))

Abstract

The skyline clause—also called the Pareto clause—recently has been proposed as an extension to SQL. It selects the tuples that are Pareto optimal with respect to a set of designated skyline attributes. This is the maximal vector problem in a relational context, but it represents a powerful extension to SQL which allows for the natural expression of on-line analytic processing (OLAP) queries and preferences in queries.

Cardinality estimation of skyline sets is the focus in this work. A better understanding of skyline cardinality—and other properties of the skyline—is useful for better design of skyline algorithms, is necessary to extend a query optimizer’s cost model to accommodate skyline queries, and helps to understand better how to use skyline effectively for OLAP and preference queries.

Within a basic model with assumptions of sparseness of values on attributes’ domains and statistical independence across attributes, we establish the expected skyline cardinality for skyline queries. While asymptotic bounds have been previously established, they are not widely known nor applied in skyline work. We show concrete estimates, as would be needed in a cost model, and consider the nature of the distribution of skyline. We next establish the effects on skyline cardinality as the constraints on our basic model are relaxed. Some of the results are quite counter-intuitive, and understanding these is critical to skyline’s use in OLAP and preference queries. We consider when attributes’ values repeat on their domains, and show the number of skyline is diminished. We consider the effects of having Zipfian distributions on the attributes’ domains, and generalize the expectation for other distributions. Last, we consider the ramifications of correlation across the attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zagat Toronto Restaurants. Zagat Survey, LLC (2002)

    Google Scholar 

  2. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th ICDE, pp. 421–430 (2001)

    Google Scholar 

  3. Steuer, R.E.: Multiple Criteria Optimization: Theory, Computation, and Application. John Wiley & Sons, New York (1986)

    MATH  Google Scholar 

  4. Barndorff-Nielsen, O., Sobel, M.: On the distribution of the number of admissible points in a vector random sample. Theory of Probability and its Applications 11, 249–269 (1966)

    Article  MathSciNet  Google Scholar 

  5. Bai, Z.D., Chao, C.C., Hwang, H.K., Liang, W.Q.: On the variance of the number of maxima in random vectors and its applications. Annals of Applied Probability 8, 886–895 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  6. Golin, M.J.: Maxima in convex regions. In: Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms (SODA), pp. 352–360. ACM/SIAM (1993)

    Google Scholar 

  7. Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. JACM 22, 469–476 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  8. Preparata, F.P., Shamos, M.I.: Computational Geometry: An Introduction. Springer, Heidelberg (1985)

    Google Scholar 

  9. Bentley, J.L., Kung, H.T., Schkolnick, M., Thompson, C.D.: On the average number of maxima in a set of vectors and applications. JACM 25, 536–543 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  10. Buchta, C.: On the average number of maxima in a set of vectors. Information Processing Letters 33, 63–65 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  11. Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. In: Proceedings of the First Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 179–187. ACM/SIAM (1990)

    Google Scholar 

  12. Matoušek, J.: Computing dominances in En. Information Processing Letters 38, 277–278 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  13. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering, ICDE (2003)

    Google Scholar 

  14. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of the 28th Conference on Very Large Databases, VLDB (2002)

    Google Scholar 

  15. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM Press, New York (2003) (to appear)

    Google Scholar 

  16. Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proc. of 27th VLDB, pp. 301–310 (2001)

    Google Scholar 

  17. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Carey, M.J., Schneider, D.A. (eds.) Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79. ACM Press, New York (1995)

    Chapter  Google Scholar 

  18. Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proceedings of the Sixteenth PODS, pp. 78–86 (1997)

    Google Scholar 

  19. Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of Sigmod, pp. 369–380 (1997)

    Google Scholar 

  20. Chomicki, J.: Querying with intrinsic preferences. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 34. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  21. Chu, W.W., Yang, H., Chiang, K., Minock, M., Chow, G., Larson, C.: CoBase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems (JIIS) 6, 223–259 (1996)

    Article  Google Scholar 

  22. Gaasterland, T., Godfrey, P., Minker, J.: Relaxation as a platform for cooperative answering. Journal of Intelligent Information Systems (JIIS) 1, 293–321 (1992)

    Article  Google Scholar 

  23. Gaasterland, T., Lobo, J.: Qualifying answers according to user needs and preferences. Fundamenta Informaticæ 32, 121–137 (1997)

    MATH  MathSciNet  Google Scholar 

  24. Minker, J.: An overview of cooperative answering in databases. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds.) FQAS 1998. LNCS (LNAI), vol. 1495, pp. 282–285. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  25. Agrawal, R., Wimmers, E.L.: A framework for expressing and combining preferences. In: Proc. of Sigmod, pp. 297–306 (2000)

    Google Scholar 

  26. Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: A system for the efficient execution of multi-parametric ranked queries. In: Proceedings of Sigmod, pp. 259–270 (2001)

    Google Scholar 

  27. Kießling, W.: Foundations of preferences in database systems. In: Proceedings of the 28th Conference on Very Large Databases, VLDB (2002)

    Google Scholar 

  28. Kießling, W., Köstler, G.: Preference SQL: Design, implementation, experiences. In: Proceedings of the 28th Conference on Very Large Databases (VLDB) (2002)

    Google Scholar 

  29. Roman, S.: The logarithmic binomial formula. American Mathematics Monthly 99, 641–648 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  30. Knuth, D.E.: Fundamental Algorithms: The Art of Computer Programming, 2nd edn., vol. 1. Addison Wesley, Reading (1973)

    Google Scholar 

  31. Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, ACM Press, New York (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Godfrey, P. (2004). Skyline Cardinality for Relational Processing. In: Seipel, D., Turull-Torres, J.M. (eds) Foundations of Information and Knowledge Systems. FoIKS 2004. Lecture Notes in Computer Science, vol 2942. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24627-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24627-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20965-2

  • Online ISBN: 978-3-540-24627-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics