Skip to main content

On Making Skyline Queries Resistant to Outliers

  • Chapter
  • First Online:
Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 665))

  • 816 Accesses

Abstract

This paper deals with the issue of retrieving the most preferred objects (in the sense of Skyline queries, i.e., of Pareto ordering) from a collection involving outliers. Indeed, many real-world datasets, for instance from ad sales websites, contain odd data and it is important to limit the impact of such odd data (outliers) on the result of skyline queries, and prevent them from hiding more interesting points. The approach we propose relies on the notion of fuzzy typicality and makes it possible to compute a graded skyline where each answer is associated with both a degree of membership to the skyline and a typicality degree. A GPU-based parallel implementation of the algorithm is described and experimental results are presented, which show the scalability of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://docs.nvidia.com/cuda/thrust/.

  2. 2.

    www.leboncoin.fr.

References

  • Bartolini, I., Ciaccia, P., & Patella, M. (2008). Efficient sort-based skyline evaluation. ACM Transactions on Database Systems, 33(4), 1–49.

    Article  Google Scholar 

  • Bøgh, K. S., Assent, I., & Magnani, M. (2013). Efficient GPU-based skyline computation. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 2013 (pp. 5:1–5:6). New York: ACM.

    Google Scholar 

  • Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of ICDE 2001 (pp. 421–430).

    Google Scholar 

  • Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006a). Finding k-dominant skylines in high dimensional space. In Proceedings of SIGMOD 2006 (pp. 503–514).

    Google Scholar 

  • Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006b). On high dimensional skylines. In Proceedings of EDBT 2006. LNCS (Vol. 3896, pp. 478–495).

    Google Scholar 

  • Choi, W., Liu, L., & Yu, B. (2012). Multi-criteria decision making with skyline computation. In 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI) (pp. 316–323).

    Google Scholar 

  • Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of ICDE 2003 (pp. 717–719).

    Google Scholar 

  • Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2005). Skyline with presorting: Theory and optimizations. In Proceedings of IIS 2005 (pp. 595–604).

    Google Scholar 

  • Cole, M. (1991). Algorithmic skeletons: Structured management of parallel computation. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Dubois, D. & Prade, H. (1984). On data summarization with fuzzy sets. In Proceedings of IFSA 1993 (pp. 465–468).

    Google Scholar 

  • Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA (pp. 226–231). AAAI Press.

    Google Scholar 

  • Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  • Friedman, M., Ming, M., & Kandel, A. (1995). On the theory of typicality. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 3(2), 127–142.

    Article  MathSciNet  MATH  Google Scholar 

  • Gabel, M., Keren, D., & Schuster, A. (2013). Communication-efficient outlier detection for scale-out systems. In Proceedings of the First International Workshop on Big Dynamic Distributed Data, Riva del Garda, Italy, 30 August 2013 (pp. 19–24).

    Google Scholar 

  • Goncalves, M., & Tineo, L. (2007). Fuzzy dominance skyline queries. In R. Wagner, N. Revell & G. Pernul (Eds.), Proceedings 18th International Conference on Database and Expert Systems Applications, DEXA 2007, Regensburg, Germany, 3–7 September 2007 (Vol. 4653, pp. 469–478). Lecture notes in computer science. Heidelberg: Springer.

    Google Scholar 

  • Gupta, M., Gao, J., and Han, J. (2013). Community distribution outlier detection in heterogeneous information networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (eds.), Machine learning and knowledge discovery in databases (Vol. 8188, pp. 557–573). Lecture notes in computer science. Berlin: Springer.

    Google Scholar 

  • Hadjali, A., Pivert, O., & Prade, H. (2011). On different types of fuzzy skylines. In Proceedings of ISMIS 2011 (pp. 581–591).

    Google Scholar 

  • Hampton, J. (1988). Overextension of conjunctive concepts: evidence for a unitary model of concept typicality and class inclusion. Journal of Experimental Psychology: Learning Memory and Cognition, 14(1), 12–32.

    Google Scholar 

  • Harris, M. (2007). Optimizing parallel reduction in cuda. Technical report, nVidia.

    Google Scholar 

  • Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.

    Article  MATH  Google Scholar 

  • Ji, T., Yang, D., & Gao, J. (2013). Incremental local evolutionary outlier detection for dynamic social networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (Eds.), Machine learning and knowledge discovery in databases (Vol. 8189, pp. 1–15). Lecture notes in computer science. Berlin: Springer.

    Google Scholar 

  • Lee, K. C. K., Zheng, B., Li, H., & Lee, W.-C. (2007). Approaching the skyline in z order. In VLDB (pp. 279–290).

    Google Scholar 

  • Lesot, M. (2006). Typicality-based clustering. International Journal of Information technology and Intelligent Computing, 12, 279–292.

    Google Scholar 

  • Lin, X., Yuan, Y., Zhang, Q., & Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In Proceedings of the ICDE 2007 (pp. 86–95).

    Google Scholar 

  • Niu, Z., Shi, S., Sun, J., & He, X. (2011). A survey of outlier detection methodologies and their applications. In Proceedings of Third International Conference Artificial Intelligence and Computational Intelligence, Part I, AICI 2011, Taiyuan, China, September 24–25, 2011 (pp. 380–387).

    Google Scholar 

  • Osherson, D., & Smith, E. (1997). On typicality and vagueness. Cognition, 64, 189–206.

    Article  Google Scholar 

  • Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 2005.

    Article  Google Scholar 

  • Park, S., Kim, T., Park, J., Kim, J., & Im, H. (2009). Parallel skyline computation on multicore architectures. In ICDE (pp. 760–771).

    Google Scholar 

  • Pivert, O., & Bosc, P. (2012). Fuzzy preference queries to relational databases. London: Imperial College Press.

    Book  MATH  Google Scholar 

  • Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B., and Lepailleur, A. (2014). Mining (soft-) skypatterns using dynamic CSP. In H. Simonis (ed.), Proceedings of 11th International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR 2014, Cork, Ireland, 19–23 May 2014 (Vol. 8451, pp. 71–87). Lecture notes in computer science. Cham: Springer.

    Google Scholar 

  • Rosch, E., & Mervis, C. (1975). Family resemblance: Studies in the internal structure of categories. Cogn. Psychol., 7, 573–605.

    Article  Google Scholar 

  • Tan, K.-L., Eng, P.-K., & Ooi, B. C. (2001). Efficient progressive skyline computation. In Proceedings of VLDB 2001 (pp. 301–310).

    Google Scholar 

  • Tao, Y., Ding, L., Lin, X., & Pei, J. (2009). Distance-based representative skyline. In Y. E. Ioannidis, D. L. Lee & R. T. Ng (Eds.), ICDE (pp. 892–903). IEEE.

    Google Scholar 

  • Yager, R. (1997). A note on a fuzzy measure of typicality. International Journal of Intelligent Systems, 12(3), 233–249.

    Article  MATH  Google Scholar 

  • Zadeh, L. A. (1982). A note on prototype theory and fuzzy sets. Cognition, 12, 291–298.

    Google Scholar 

  • Zadeh, L. A. (1984). A computational theory of dispositions. In Y. Wilks (Ed.), COLING (pp. 312–318). ACL.

    Google Scholar 

  • Zadeh, L. (1987). A computational theory of dispositions. International Journal of Intelligent Systems, 2, 39–63.

    MATH  Google Scholar 

  • Zhang, J. (2013). Advancements of outlier detection: A survey. EAI Endorsed Transactions on Scalable Information Systems, 1, e2.

    Article  Google Scholar 

  • Zimek, A., Gaudet, M., Campello, R. J. G. B., & Sander, J. (2013). Subsampling for efficient and effective unsupervised outlier detection ensembles. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013 (pp. 428–436).

    Google Scholar 

  • Zimek, A., Schubert, E., & Kriegel, H. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 5(5), 363–387.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hélène Jaudoin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Jaudoin, H., Nerzic, P., Pivert, O., Rocacher, D. (2017). On Making Skyline Queries Resistant to Outliers. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45763-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45762-8

  • Online ISBN: 978-3-319-45763-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics