Skip to main content

All or In-cloud: How the Identification of Six Types of Anomalies Is Affected by the Discretization Method

  • Conference paper
  • First Online:
Artificial Intelligence (BNAIC 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1021))

Included in the following conference series:

Abstract

Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset. Numerous algorithms use discretization of numerical data in their detection processes. This study investigates the effect of the employed discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a recent typology of data anomalies. To this end, experiments are conducted with various datasets and SECODA, a general-purpose algorithm for unsupervised non-parametric anomaly detection in datasets with numerical and categorical attributes. This algorithm employs discretization of continuous attributes, exponentially increasing weights and discretization cut points, and a pruning heuristic to detect anomalies with an optimal number of iterations. The empirical results of experiments with synthetic and real-world data demonstrate that standard SECODA can detect all six types, but that different discretization methods favor the discovery of certain anomaly types. These main findings also hold for other detection techniques using discretization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Chichester (1994)

    MATH  Google Scholar 

  2. Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms. PLoS ONE 11(4), e0152173 (2016)

    Article  Google Scholar 

  3. Foorthuis, R.: A typology of data anomalies. In: Medina, J., et al. (eds.) IPMU 2018. CCIS, vol. 854, pp. 26–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91476-3_3

    Chapter  Google Scholar 

  4. Pang, G., Cao, L., Chin, L.: Outlier detection in complex categorical data by modelling the feature value couplings. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (2016)

    Google Scholar 

  5. Riahi, F., Schulte, O.: Propositionalization for unsupervised outlier detection in multi-relational data. In: Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference (2016)

    Google Scholar 

  6. Hengst, F., den Hoogendoorn, M.: Detecting interesting outliers: active learning for anomaly detection. In: Proceedings of the 28th Benelux Conference on Artificial Intelligence, Amsterdam, The Netherlands (2016)

    Google Scholar 

  7. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2005)

    Google Scholar 

  8. Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003)

    Google Scholar 

  9. Schubert, E., Weiler, M., Zimek, A.: Outlier detection and trend detection: two sides of the same coin. In: Proceedings of the 15th IEEE International Conference on Data Mining Workshops (2015)

    Google Scholar 

  10. Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate functional outlier detection. Stat. Methods Appl. 24(2), 177–202 (2015)

    Article  MathSciNet  Google Scholar 

  11. Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C., Samatova, N.F.: Anomaly detection in dynamic networks: a survey. WIREs Comput. Stat. 7(3), 223–247 (2015)

    Article  MathSciNet  Google Scholar 

  12. Fielding, J., Gilbert, N.: Understanding Social Statistics. Sage Publications, London (2000)

    Google Scholar 

  13. Gartner: Hype Cycle for Data Science and Machine Learning, 2017. Gartner, Inc (2017)

    Google Scholar 

  14. Forrester: The Forrester Wave: Security Analytics Platforms, Q1 2017. Forrester Research, Inc. (2017)

    Google Scholar 

  15. Leys, C., Ley, C., Klein, O., Bernard, P., Licata, L.: Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 49(4), 764–766 (2013)

    Article  Google Scholar 

  16. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB 1998, Proceedings of the 24th International Conference on Very Large Data Bases (1998)

    Google Scholar 

  17. Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)

    Google Scholar 

  18. Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discovery 30(4), 891–927 (2016)

    Article  MathSciNet  Google Scholar 

  19. Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. Adv. Neural Inf. Process. 12, 582–588 (2000)

    Google Scholar 

  20. Liu, F.T., Ting, K.M., Zhou, Z.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 3 (2012)

    Article  Google Scholar 

  21. Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.W.: A novel anomaly detection scheme based on principal component classifier. In: Proceedings of the ICDM Foundation and New Direction of Data Mining workshop, pp. 172–179 (2003)

    Google Scholar 

  22. Pimentel, M.A.F., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)

    Article  Google Scholar 

  23. Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle (2004)

    Google Scholar 

  24. Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: Proceedings of the 35th German Conference on Artificial Intelligence (KI-2012), pp. 59–63 (2012)

    Google Scholar 

  25. Foorthuis, R.: SECODA: segmentation- and combination-based detection of anomalies. In: Proceedings of the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017), pp. 755–764, Tokyo (2017). https://doi.org/10.1109/dsaa.2017.35

  26. Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)

    Article  Google Scholar 

  27. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)

    Google Scholar 

  28. Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 47–58 (2006)

    Google Scholar 

  29. Foorthuis, R.: Anomaly detection with SECODA. In: Poster Presentation at the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo (2017). https://doi.org/10.13140/rg.2.2.21212.08325

  30. Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: Maimon, O., Rockach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Kluwer Academic Publishers (2005)

    Google Scholar 

  31. Li, H., Hussain, F., Tan, C.M., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  32. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Search. Technical report SFI-TR-95-02-010, Santa Fe Institute (1996)

    Google Scholar 

  33. Clarke, B., Fokoué, E., Zhang, H.H.: Principles and Theory for Data Mining and Machine Learning. Springer, New York (2009). https://doi.org/10.1007/978-0-387-98135-2

    Book  MATH  Google Scholar 

  34. Rokach, L., Maimon, O.: Data Mining With Decision Trees: Theory and Applications, 2nd edn. World Scientific Publishing, Singapore (2015)

    MATH  Google Scholar 

  35. Janssens, J.H.M.: Outlier Selection and One-Class Classification. Ph.D. thesis, Tilburg University (2013)

    Google Scholar 

  36. Maxion, R.A., Tan, K.M.C.: Benchmarking anomaly-based detection systems. In: International Conference on Dependable Systems and Networks, New York (2000)

    Google Scholar 

  37. LAK: Anomaly Detection at the Dutch Alliance on Income Data and Taxes (2018). www.loonaangifteketen.nl

  38. Pijnenburg, M., Kowalczyk, W.: Singular outliers: finding common observations with an uncommon feature. In: Medina, J., Ojeda-Aciego, M., Verdegay, J.L., Perfilieva, I., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2018. CCIS, vol. 855, pp. 492–503. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91479-4_41

    Chapter  Google Scholar 

  39. Greenacre, M., Ayhan, H.: Identifying Inliers. Barcelona GSE Working Paper Series (2014)

    Google Scholar 

  40. Foorthuis, R.: (Un)certain anomalies in income data. In: Presentation at the Mini-Symposium on Uncertainty in Data-Driven Systems, Utrecht University, 28 January 2019

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ralph Foorthuis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Foorthuis, R. (2019). All or In-cloud: How the Identification of Six Types of Anomalies Is Affected by the Discretization Method. In: Atzmueller, M., Duivesteijn, W. (eds) Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science, vol 1021. Springer, Cham. https://doi.org/10.1007/978-3-030-31978-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31978-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31977-9

  • Online ISBN: 978-3-030-31978-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics