Parameter-Free Anomaly Detection for Categorical Data

Wu, Shu; Wang, Shengrui

doi:10.1007/978-3-642-23199-5_9

Parameter-Free Anomaly Detection for Categorical Data

Shu Wu²⁰ &
Shengrui Wang²⁰

Conference paper

2068 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

Outlier detection can usually be considered as a preprocessing step for locating, from a data set, the objects that do not conform to well defined notions of expected behaviors. It is a major issue of data mining for discovering novel or rare events, actions and phenomena. We investigate outlier detection from a categorical data set. The problem is especially challenging because of difficulty in defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and formulize outlier detection as an optimization problem. To solve the optimization problem, we design a practical and parameter-free method, named ITB. Experimental results show that the ITB method is much more effective and efficient than existing main-stream methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ferreira, P., Alves, R., Belo, O., Cortesao, L.: Establishing Fraud Detection Patterns Based on Signatures. In: Industrial Conference on Data Mining 2006 (2006)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Computing Surveys (2009)
Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. John Wiley & Sons, Chichester
Google Scholar
Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast Distributed Outlier Detection in Mixed-Attribute Data Sets. DMKD 12, 203–228 (2006)
MathSciNet Google Scholar
He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: Frequent pattern based outlier detection. Computer Sci. and Info. Sys. 2, 103–118 (2005)
Article Google Scholar
Li, S., Lee, R., Lang, S.: Mining Distance-based Outliers from Categorical Data. In: ICDM 2007 (2007)
Google Scholar
Bohm, C., Haegler, K., Muller, N.S., Plant, C.: CoCo: Coding Cost for Parameter-Free Outlier Detection. In: KDD 2009 (2009)
Google Scholar
Wu, M., Song, X., Jermaine, C., Ranka, S., Gums, J.: A LRT Framework for Fast Spatial Anomaly Detection. In: KDD 2009 (2009)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD 1993 (1993)
Google Scholar
Li, T., Ma, S., Ogihara, M.: Entropy-Based Criterion in Categorical Cluster. In: ICML 2004 (2004)
Google Scholar
Srinivasa, S.: A Review on Multivariate Mutual Information. Univ. of Notre Dame (2008)
Google Scholar
Watanabe, S.: Information Theoretical Analysis of Multivariate Correlation. IBM Journal of Research and Development 4, 66–82 (1960)
Article MathSciNet MATH Google Scholar
Wei, L., Qian, W., Zhou, A., Jin, W., Yu, J.X.: HOT: Hypergraph-Based Outlier Test for Categorical Data. In: PAKDD 2003 (2003)
Google Scholar
Breunig, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-based Local Outliers. In: ACM SIGMOD 2000 (2000)
Google Scholar
Chan, P.K., Mahoney, M.V., Arshad, M.H.: A machine learning approach to anomaly detection, Technical Report CS-2003-06, Florida Institute of Technology (2003)
Google Scholar
Fox, M., Gramajo, G., Koufakou, A., Georgiopoulos, M.: Detecting Outliers in Categorical Data Sets Using Non-Derivable Itemsets, Technical Report TR-2008-04, The AMALTHEA REU Program (2008)
Google Scholar
Koufakou, A., Ortiz, E.G., Georgiopoulos, M., et al.: A Scalable and Efficient Outlier Detection Strategy for Categorical Data. In: ICTAI 2007 (2007)
Google Scholar
Han, J., Kamber, M.: Data Mining - Concepts and Techniques. Elsevier, Amsterdam (2006)
MATH Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using thelocal correlation integral. In: ICDE 2003 (2003)
Google Scholar
http://nsl.cs.unb.ca/NSL-KDD/
http://www.cs.umb.edu/dana/GAClust/index.html
UCI Machine Learning Repository, http://www.ics.uci.edu/mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sherbrooke, Quebec, J1K2R1, Canada
Shu Wu & Shengrui Wang

Authors

Shu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shengrui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intitute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, S., Wang, S. (2011). Parameter-Free Anomaly Detection for Categorical Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-23199-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics