Outlier Detection Algorithms in Data Mining Systems

Petrovskiy, M. I.

doi:10.1023/A:1024974810270

Outlier Detection Algorithms in Data Mining Systems

Published: July 2003

Volume 29, pages 228–237, (2003)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

M. I. Petrovskiy¹

857 Accesses
49 Citations
Explore all metrics

Abstract

The paper discusses outlier detection algorithms used in data mining systems. Basic approaches currently used for solving this problem are considered, and their advantages and disadvantages are discussed. A new outlier detection algorithm is suggested. It is based on methods of fuzzy set theory and the use of kernel functions and possesses a number of advantages compared to the existing methods. The performance of the algorithm suggested is studied by the example of the applied problem of anomaly detection arising in computer protection systems, the so-called intrusion detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
Knorr, E.M. and Ng, R.T., Algorithms for Mining Distance-Based Outliers in Large Datasets, Proc. 24th VLDB, 1998.
Yamanishi, K, Takeichi, J., and Williams, G., On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms, Proc. of the Sixth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, 2000, pp. 320-324.
Kemmerer, R.A. and Vigna, G., Intrusion Detection: Brief History and Survey, http://kiev-security.org.ua/box/12/-19.shtml.
Intrusion Detection Pages, Purdue University, 2003, http://www.cerias.purdue.edu/coast/intrusion-detection/-index.html.
Hadi, A.S., A New Measure of Overall Potential Influence in Linear Regression, Computational Statistics Data Analysis, 1992, vol. 14, pp. 1-27.
Google Scholar
Hawkins, S., He, H., Williams, G., and Baxter, R., Outlier Detection Using Replicator Neural Networks, Proc. of the Fifth Int. Conf. on Data Warehousing and Knowledge Discovery, 2002.
Knorr, E.M. and Ng, R.T., Algorithms for Mining Distance-Based Outliers in Large Datasets, Proc. 24th VLDB, 1998.
Knorr, E.M., Ng, R.T., and Tucakov, V., Distance-Based Outliers: Algorithms and Applications, VLDB J., 2000, vol. 8, no. 3-4, pp. 237-253.
Google Scholar
Ramaswamy, S., Rastogi, R., and Shim, K., Efficient Algorithms for Mining Outliers from Large Data Sets, Proc. of ACM SIGMOD Int. Conf. on Management of Data, 2000, pp. 427-438.
Breunig, M.M., Kriegel, H.-P., Ng, R., and Sander, J., OPTICS-OF: Identifying Local Outliers, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999.
Tang, J., Chen, Z., Wai-chee Fu A., and Cheung, D., A Robust Outlier Detection Scheme for Large Data Sets, 2001.
Breunig, S., Kriegel, H.-P., Ng, R., and Sander, J., LOF: Identifying Density-Based Local Outliers, ACM SIGMOD Int. Conf. on Management of Data, Dallas, 2000.
Wen Jin, Tung, A.K.H., and Han, J., Mining Top-n Local Outliers in Large Databases, KDD, 2001, pp. 293-298.
Scholkopf, B. and Smola, A.J., Learning with Kernels, Cambridge, London: MIT, 2002.
Google Scholar
Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potentsial'nykh funktsii v teorii obucheniya mashin (Kernel Function Method in Machine Learning), Moscow: Nauka, 1970.
Google Scholar
Haussler, D., Convolution Kernels on Discrete Structures, Techn. Report CSD-TR-98-11 from Royal Holloway Univ. of London, 1999.
Petrovskiy, M.I., Similarity Measure for Comparing Precedents in Data Mining Systems Supporting OLEDB Standard in Programmnye sistemy i instrumenty, Moscow: Izdatel'skii otdel fakul'teta VMiK MGU, 2002, no. 3, pp. 33-43.
Google Scholar
Levene, M. and Loizou, G., A Fully Precise Null Extended Nested Relational Algebra, Fundamenta Informaticae, 1993, vol. 19, pp. 303-343.
Google Scholar
OLE DB for Data Mining Specification, Microsoft Corp., 2000, http://www.microsoft.com/data/oledb/dm.htm.
Ben-Hur, A., Horn, D., Siegelmann, H.T., and Vapnik, V., Support Vector Clustering, J. Machine Learning Research, 2001, no. 2, pp. 125-137.
Google Scholar
Takuya Inoue and Shigeo Abe, Fuzzy Support Vector Machine for Pattern Classification, Proc. of IJCNN 2001, pp. 1449-1455.
Girolami, M., Mercer Kernel Based Clustering in Feature Space, IEEE Trans. Neural Networks, 2001, vol. 13, no. 4, pp. 780-784.
Google Scholar
Lukatskii, A.V., Attack Detection, St. Petersburg: BKhV-Peterburg, 2003.
Google Scholar
Mell, P., Computer Attacks: What They Are and How To Defend against Them, NIST, Comput. Security Division, 1999.
Portnoy, L., Eskin, E., and Stolfo, S.J., Intrusion Detection with Unlabeled Data Using Clustering, Proc. of ACM CSS.
MIT Lincoln Lab KDD Cup 99 Data Set, http://www.ll.mit.edu/IST/ideval/data.
Kumar, V., Data Mining for Network Intrusion Detection, NSF Workshop on Next Generation Data Mining, 2002.

Download references

Author information

Authors and Affiliations

Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992, Russia
M. I. Petrovskiy

Authors

M. I. Petrovskiy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Petrovskiy, M.I. Outlier Detection Algorithms in Data Mining Systems. Programming and Computer Software 29, 228–237 (2003). https://doi.org/10.1023/A:1024974810270

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1024974810270

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier Detection Algorithms in Data Mining Systems

Abstract

Access this article

Similar content being viewed by others

Methodically Unified Procedures for Outlier Detection, Clustering and Classification

A New Measure of Outlier Detection Performance

Outlier Detection Techniques: A Comparative Study

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Outlier Detection Algorithms in Data Mining Systems

Abstract

Access this article

Similar content being viewed by others

Methodically Unified Procedures for Outlier Detection, Clustering and Classification

A New Measure of Outlier Detection Performance

Outlier Detection Techniques: A Comparative Study

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation