Skip to main content
Log in

Outlier Detection Algorithms in Data Mining Systems

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

The paper discusses outlier detection algorithms used in data mining systems. Basic approaches currently used for solving this problem are considered, and their advantages and disadvantages are discussed. A new outlier detection algorithm is suggested. It is based on methods of fuzzy set theory and the use of kernel functions and possesses a number of advantages compared to the existing methods. The performance of the algorithm suggested is studied by the example of the applied problem of anomaly detection arising in computer protection systems, the so-called intrusion detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.

  2. Knorr, E.M. and Ng, R.T., Algorithms for Mining Distance-Based Outliers in Large Datasets, Proc. 24th VLDB, 1998.

  3. Yamanishi, K, Takeichi, J., and Williams, G., On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms, Proc. of the Sixth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, 2000, pp. 320-324.

  4. Kemmerer, R.A. and Vigna, G., Intrusion Detection: Brief History and Survey, http://kiev-security.org.ua/box/12/-19.shtml.

  5. Intrusion Detection Pages, Purdue University, 2003, http://www.cerias.purdue.edu/coast/intrusion-detection/-index.html.

  6. Hadi, A.S., A New Measure of Overall Potential Influence in Linear Regression, Computational Statistics Data Analysis, 1992, vol. 14, pp. 1-27.

    Google Scholar 

  7. Hawkins, S., He, H., Williams, G., and Baxter, R., Outlier Detection Using Replicator Neural Networks, Proc. of the Fifth Int. Conf. on Data Warehousing and Knowledge Discovery, 2002.

  8. Knorr, E.M. and Ng, R.T., Algorithms for Mining Distance-Based Outliers in Large Datasets, Proc. 24th VLDB, 1998.

  9. Knorr, E.M., Ng, R.T., and Tucakov, V., Distance-Based Outliers: Algorithms and Applications, VLDB J., 2000, vol. 8, no. 3-4, pp. 237-253.

    Google Scholar 

  10. Ramaswamy, S., Rastogi, R., and Shim, K., Efficient Algorithms for Mining Outliers from Large Data Sets, Proc. of ACM SIGMOD Int. Conf. on Management of Data, 2000, pp. 427-438.

  11. Breunig, M.M., Kriegel, H.-P., Ng, R., and Sander, J., OPTICS-OF: Identifying Local Outliers, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999.

  12. Tang, J., Chen, Z., Wai-chee Fu A., and Cheung, D., A Robust Outlier Detection Scheme for Large Data Sets, 2001.

  13. Breunig, S., Kriegel, H.-P., Ng, R., and Sander, J., LOF: Identifying Density-Based Local Outliers, ACM SIGMOD Int. Conf. on Management of Data, Dallas, 2000.

  14. Wen Jin, Tung, A.K.H., and Han, J., Mining Top-n Local Outliers in Large Databases, KDD, 2001, pp. 293-298.

  15. Scholkopf, B. and Smola, A.J., Learning with Kernels, Cambridge, London: MIT, 2002.

    Google Scholar 

  16. Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potentsial'nykh funktsii v teorii obucheniya mashin (Kernel Function Method in Machine Learning), Moscow: Nauka, 1970.

    Google Scholar 

  17. Haussler, D., Convolution Kernels on Discrete Structures, Techn. Report CSD-TR-98-11 from Royal Holloway Univ. of London, 1999.

  18. Petrovskiy, M.I., Similarity Measure for Comparing Precedents in Data Mining Systems Supporting OLEDB Standard in Programmnye sistemy i instrumenty, Moscow: Izdatel'skii otdel fakul'teta VMiK MGU, 2002, no. 3, pp. 33-43.

    Google Scholar 

  19. Levene, M. and Loizou, G., A Fully Precise Null Extended Nested Relational Algebra, Fundamenta Informaticae, 1993, vol. 19, pp. 303-343.

    Google Scholar 

  20. OLE DB for Data Mining Specification, Microsoft Corp., 2000, http://www.microsoft.com/data/oledb/dm.htm.

  21. Ben-Hur, A., Horn, D., Siegelmann, H.T., and Vapnik, V., Support Vector Clustering, J. Machine Learning Research, 2001, no. 2, pp. 125-137.

    Google Scholar 

  22. Takuya Inoue and Shigeo Abe, Fuzzy Support Vector Machine for Pattern Classification, Proc. of IJCNN 2001, pp. 1449-1455.

  23. Girolami, M., Mercer Kernel Based Clustering in Feature Space, IEEE Trans. Neural Networks, 2001, vol. 13, no. 4, pp. 780-784.

    Google Scholar 

  24. Lukatskii, A.V., Attack Detection, St. Petersburg: BKhV-Peterburg, 2003.

    Google Scholar 

  25. Mell, P., Computer Attacks: What They Are and How To Defend against Them, NIST, Comput. Security Division, 1999.

  26. Portnoy, L., Eskin, E., and Stolfo, S.J., Intrusion Detection with Unlabeled Data Using Clustering, Proc. of ACM CSS.

  27. MIT Lincoln Lab KDD Cup 99 Data Set, http://www.ll.mit.edu/IST/ideval/data.

  28. Kumar, V., Data Mining for Network Intrusion Detection, NSF Workshop on Next Generation Data Mining, 2002.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Petrovskiy, M.I. Outlier Detection Algorithms in Data Mining Systems. Programming and Computer Software 29, 228–237 (2003). https://doi.org/10.1023/A:1024974810270

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024974810270

Keywords

Navigation