Abstract
Outlier detection in mixed-type data, which contain both discrete and continuous features, is still a challenging problem. Here we newly introduce concept-based outlierness, which is defined on a hierarchy of clusters of data points and features, called the concept lattice, obtained by formal concept analysis (FCA). Intuitively, this outlierness is the degree of isolation of clusters on the hierarchy. Moreover, we investigate discretization of continuous features to embed the original continuous (Euclidean) space into the concept lattice. Our experiments show that the proposed method which detects concept-based outliers is more effective than other popular distance-based outlier detection methods that ignore the discreteness of features and do not take cluster relationships into account.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adda, M., Wu, L., White, S., Feng, Y.: Pattern detection with rare item-set mining. arXiv:1209.3089 (2012)
Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)
Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2003)
Bhaduri, K., Matthews, B.L., Giannella, C.R.: Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 859–867 (2011)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Caputo, B., Sim, K., Furesjo, F., Smola, A.: Appearance-based object recognition using SVMs: which kernel should I use? In: Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision (2002)
Davey, B.A., Priestley, H.A.: Introduction to Lattices and Order, 2nd edn. Cambridge University Press, Cambridge (2002)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, New York (1998)
Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)
Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1342–1347 (2011)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403 (1998)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8(3), 237–253 (2000)
Liu, F.T., Ting, K.M., Zhou, Z.-H.: On detecting clustered anomalies using SCiForest. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 274–290. Springer, Heidelberg (2010)
Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004)
Okubo, Y., Haraguchi, M.: An algorithm for extracting rare concepts with concise intents. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS, vol. 5986, pp. 145–160. Springer, Heidelberg (2010)
Orair, G.H., Teixeira, C.H.C., Wang, Y., Meira Jr., W., Parthasarathy, S.: Distance-based outlier detection: consolidation and renewed bearing. Proc. VLDB Endowment 3(1–2), 1469–1480 (2010)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2013). http://www.R-project.org
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Sugiyama, M., Imajo, K., Otaki, K., Yamamoto, A.: Semi-supervised ligand finding using formal concept analysis. IPSJ Trans. Math. Model. Appl. (TOM) 5(2), 39–48 (2012)
Sugiyama, M., Yamamoto, A.: Semi-supervised learning on closed set lattices. Intell. Data Anal. 17(3), 399–421 (2013)
Tsuiki, H.: Real number computation through Gray code embedding. Theor. Comput. Sci. 284(2), 467–485 (2002)
Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge discovery and data mining: the new challenges. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 352–371. Springer, Heidelberg (2004)
Weihrauch, K.: Computable Analysis: An Introduction. Springer, New York (2000)
Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS, vol. 5548, pp. 314–339. Springer, Heidelberg (2009)
Acknowledgments
This work is supported by the Alexander von Humboldt Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sugiyama, M. (2014). Outliers on Concept Lattices. In: Nakano, Y., Satoh, K., Bekki, D. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2013. Lecture Notes in Computer Science(), vol 8417. Springer, Cham. https://doi.org/10.1007/978-3-319-10061-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-10061-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10060-9
Online ISBN: 978-3-319-10061-6
eBook Packages: Computer ScienceComputer Science (R0)