Skip to main content

Parameter-Free Anomaly Detection for Categorical Data

  • Conference paper
  • 2068 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

Outlier detection can usually be considered as a preprocessing step for locating, from a data set, the objects that do not conform to well defined notions of expected behaviors. It is a major issue of data mining for discovering novel or rare events, actions and phenomena. We investigate outlier detection from a categorical data set. The problem is especially challenging because of difficulty in defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and formulize outlier detection as an optimization problem. To solve the optimization problem, we design a practical and parameter-free method, named ITB. Experimental results show that the ITB method is much more effective and efficient than existing main-stream methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ferreira, P., Alves, R., Belo, O., Cortesao, L.: Establishing Fraud Detection Patterns Based on Signatures. In: Industrial Conference on Data Mining 2006 (2006)

    Google Scholar 

  2. Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Computing Surveys (2009)

    Google Scholar 

  3. Cover, T., Thomas, J.: Elements of Information Theory. John Wiley & Sons, Chichester

    Google Scholar 

  4. Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast Distributed Outlier Detection in Mixed-Attribute Data Sets. DMKD 12, 203–228 (2006)

    MathSciNet  Google Scholar 

  5. He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: Frequent pattern based outlier detection. Computer Sci. and Info. Sys. 2, 103–118 (2005)

    Article  Google Scholar 

  6. Li, S., Lee, R., Lang, S.: Mining Distance-based Outliers from Categorical Data. In: ICDM 2007 (2007)

    Google Scholar 

  7. Bohm, C., Haegler, K., Muller, N.S., Plant, C.: CoCo: Coding Cost for Parameter-Free Outlier Detection. In: KDD 2009 (2009)

    Google Scholar 

  8. Wu, M., Song, X., Jermaine, C., Ranka, S., Gums, J.: A LRT Framework for Fast Spatial Anomaly Detection. In: KDD 2009 (2009)

    Google Scholar 

  9. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD 1993 (1993)

    Google Scholar 

  10. Li, T., Ma, S., Ogihara, M.: Entropy-Based Criterion in Categorical Cluster. In: ICML 2004 (2004)

    Google Scholar 

  11. Srinivasa, S.: A Review on Multivariate Mutual Information. Univ. of Notre Dame (2008)

    Google Scholar 

  12. Watanabe, S.: Information Theoretical Analysis of Multivariate Correlation. IBM Journal of Research and Development 4, 66–82 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  13. Wei, L., Qian, W., Zhou, A., Jin, W., Yu, J.X.: HOT: Hypergraph-Based Outlier Test for Categorical Data. In: PAKDD 2003 (2003)

    Google Scholar 

  14. Breunig, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-based Local Outliers. In: ACM SIGMOD 2000 (2000)

    Google Scholar 

  15. Chan, P.K., Mahoney, M.V., Arshad, M.H.: A machine learning approach to anomaly detection, Technical Report CS-2003-06, Florida Institute of Technology (2003)

    Google Scholar 

  16. Fox, M., Gramajo, G., Koufakou, A., Georgiopoulos, M.: Detecting Outliers in Categorical Data Sets Using Non-Derivable Itemsets, Technical Report TR-2008-04, The AMALTHEA REU Program (2008)

    Google Scholar 

  17. Koufakou, A., Ortiz, E.G., Georgiopoulos, M., et al.: A Scalable and Efficient Outlier Detection Strategy for Categorical Data. In: ICTAI 2007 (2007)

    Google Scholar 

  18. Han, J., Kamber, M.: Data Mining - Concepts and Techniques. Elsevier, Amsterdam (2006)

    MATH  Google Scholar 

  19. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using thelocal correlation integral. In: ICDE 2003 (2003)

    Google Scholar 

  20. http://nsl.cs.unb.ca/NSL-KDD/

  21. http://www.cs.umb.edu/dana/GAClust/index.html

  22. UCI Machine Learning Repository, http://www.ics.uci.edu/mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, S., Wang, S. (2011). Parameter-Free Anomaly Detection for Categorical Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23199-5_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23198-8

  • Online ISBN: 978-3-642-23199-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics