Abstract
This paper deals with the issue of retrieving the most preferred objects (in the sense of Skyline queries, i.e., of Pareto ordering) from a collection involving outliers. Indeed, many real-world datasets, for instance from ad sales websites, contain odd data and it is important to limit the impact of such odd data (outliers) on the result of skyline queries, and prevent them from hiding more interesting points. The approach we propose relies on the notion of fuzzy typicality and makes it possible to compute a graded skyline where each answer is associated with both a degree of membership to the skyline and a typicality degree. A GPU-based parallel implementation of the algorithm is described and experimental results are presented, which show the scalability of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bartolini, I., Ciaccia, P., & Patella, M. (2008). Efficient sort-based skyline evaluation. ACM Transactions on Database Systems, 33(4), 1–49.
Bøgh, K. S., Assent, I., & Magnani, M. (2013). Efficient GPU-based skyline computation. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 2013 (pp. 5:1–5:6). New York: ACM.
Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of ICDE 2001 (pp. 421–430).
Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006a). Finding k-dominant skylines in high dimensional space. In Proceedings of SIGMOD 2006 (pp. 503–514).
Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006b). On high dimensional skylines. In Proceedings of EDBT 2006. LNCS (Vol. 3896, pp. 478–495).
Choi, W., Liu, L., & Yu, B. (2012). Multi-criteria decision making with skyline computation. In 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI) (pp. 316–323).
Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of ICDE 2003 (pp. 717–719).
Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2005). Skyline with presorting: Theory and optimizations. In Proceedings of IIS 2005 (pp. 595–604).
Cole, M. (1991). Algorithmic skeletons: Structured management of parallel computation. Cambridge: MIT Press.
Dubois, D. & Prade, H. (1984). On data summarization with fuzzy sets. In Proceedings of IFSA 1993 (pp. 465–468).
Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA (pp. 226–231). AAAI Press.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
Friedman, M., Ming, M., & Kandel, A. (1995). On the theory of typicality. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 3(2), 127–142.
Gabel, M., Keren, D., & Schuster, A. (2013). Communication-efficient outlier detection for scale-out systems. In Proceedings of the First International Workshop on Big Dynamic Distributed Data, Riva del Garda, Italy, 30 August 2013 (pp. 19–24).
Goncalves, M., & Tineo, L. (2007). Fuzzy dominance skyline queries. In R. Wagner, N. Revell & G. Pernul (Eds.), Proceedings 18th International Conference on Database and Expert Systems Applications, DEXA 2007, Regensburg, Germany, 3–7 September 2007 (Vol. 4653, pp. 469–478). Lecture notes in computer science. Heidelberg: Springer.
Gupta, M., Gao, J., and Han, J. (2013). Community distribution outlier detection in heterogeneous information networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (eds.), Machine learning and knowledge discovery in databases (Vol. 8188, pp. 557–573). Lecture notes in computer science. Berlin: Springer.
Hadjali, A., Pivert, O., & Prade, H. (2011). On different types of fuzzy skylines. In Proceedings of ISMIS 2011 (pp. 581–591).
Hampton, J. (1988). Overextension of conjunctive concepts: evidence for a unitary model of concept typicality and class inclusion. Journal of Experimental Psychology: Learning Memory and Cognition, 14(1), 12–32.
Harris, M. (2007). Optimizing parallel reduction in cuda. Technical report, nVidia.
Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.
Ji, T., Yang, D., & Gao, J. (2013). Incremental local evolutionary outlier detection for dynamic social networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (Eds.), Machine learning and knowledge discovery in databases (Vol. 8189, pp. 1–15). Lecture notes in computer science. Berlin: Springer.
Lee, K. C. K., Zheng, B., Li, H., & Lee, W.-C. (2007). Approaching the skyline in z order. In VLDB (pp. 279–290).
Lesot, M. (2006). Typicality-based clustering. International Journal of Information technology and Intelligent Computing, 12, 279–292.
Lin, X., Yuan, Y., Zhang, Q., & Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In Proceedings of the ICDE 2007 (pp. 86–95).
Niu, Z., Shi, S., Sun, J., & He, X. (2011). A survey of outlier detection methodologies and their applications. In Proceedings of Third International Conference Artificial Intelligence and Computational Intelligence, Part I, AICI 2011, Taiyuan, China, September 24–25, 2011 (pp. 380–387).
Osherson, D., & Smith, E. (1997). On typicality and vagueness. Cognition, 64, 189–206.
Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 2005.
Park, S., Kim, T., Park, J., Kim, J., & Im, H. (2009). Parallel skyline computation on multicore architectures. In ICDE (pp. 760–771).
Pivert, O., & Bosc, P. (2012). Fuzzy preference queries to relational databases. London: Imperial College Press.
Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B., and Lepailleur, A. (2014). Mining (soft-) skypatterns using dynamic CSP. In H. Simonis (ed.), Proceedings of 11th International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR 2014, Cork, Ireland, 19–23 May 2014 (Vol. 8451, pp. 71–87). Lecture notes in computer science. Cham: Springer.
Rosch, E., & Mervis, C. (1975). Family resemblance: Studies in the internal structure of categories. Cogn. Psychol., 7, 573–605.
Tan, K.-L., Eng, P.-K., & Ooi, B. C. (2001). Efficient progressive skyline computation. In Proceedings of VLDB 2001 (pp. 301–310).
Tao, Y., Ding, L., Lin, X., & Pei, J. (2009). Distance-based representative skyline. In Y. E. Ioannidis, D. L. Lee & R. T. Ng (Eds.), ICDE (pp. 892–903). IEEE.
Yager, R. (1997). A note on a fuzzy measure of typicality. International Journal of Intelligent Systems, 12(3), 233–249.
Zadeh, L. A. (1982). A note on prototype theory and fuzzy sets. Cognition, 12, 291–298.
Zadeh, L. A. (1984). A computational theory of dispositions. In Y. Wilks (Ed.), COLING (pp. 312–318). ACL.
Zadeh, L. (1987). A computational theory of dispositions. International Journal of Intelligent Systems, 2, 39–63.
Zhang, J. (2013). Advancements of outlier detection: A survey. EAI Endorsed Transactions on Scalable Information Systems, 1, e2.
Zimek, A., Gaudet, M., Campello, R. J. G. B., & Sander, J. (2013). Subsampling for efficient and effective unsupervised outlier detection ensembles. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013 (pp. 428–436).
Zimek, A., Schubert, E., & Kriegel, H. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 5(5), 363–387.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Jaudoin, H., Nerzic, P., Pivert, O., Rocacher, D. (2017). On Making Skyline Queries Resistant to Outliers. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-45763-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)