skip to main content
10.1145/1150402.1150459acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Outlier detection by active learning

Published: 20 August 2006 Publication History

Abstract

Most existing approaches to outlier detection are based on density estimation methods. There are two notable issues with these methods: one is the lack of explanation for outlier flagging decisions, and the other is the relatively high computational requirement. In this paper, we present a novel approach to outlier detection based on classification, in an attempt to address both of these issues. Our approach isbased on two key ideas. First, we present a simple reduction of outlier detection to classification, via a procedure that involves applying classification to a labeled data set containing artificially generated examples that play the role of potential outliers. Once the task has been reduced to classification, we then invoke a selective sampling mechanism based on active learning to the reduced classification problem. We empirically evaluate the proposed approach using a number of data sets, and find that our method is superior to other methods based on the same reduction to classification, but using standard classification methods. We also show that it is competitive to the state-of-the-art outlier detection methods in the literature based on density estimation, while significantly improving the computational complexity and explanatory power.

References

[1]
N. Abe, C. V. Apte, B. Bhattacharjee, K. A. Goldman, J. Langford, and B. Zadrozny. Sampling approach to resource light data mining. In Workshop at SIAM 2004 - Workshop on Data Mining in Resource Constrained Environments, February 2004.]]
[2]
N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.]]
[3]
S. Ben-David and M. Lindenbaum. Learning distributions by their density levels: a paradigm for learning without a teacher. Journal of Computer and System Sciences, 55:171--182, 1997.]]
[4]
L. Breiman. Bagging predictors. Machine Learning, 24:123--140, 1996.]]
[5]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. Identifying density based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 2000.]]
[6]
C. Elkan. Results of the kdd'99 classification learning contest. Available at http://www.cs.ucsd.edu/users/elkan/clresults.html, 1999.]]
[7]
W. Fan, M. Miller, S. J. Stolfo, W. Lee, and P. K. Chan. Using artificial anomalies to detect unknown and known network intrusions. In Proceedings of the First IEEE International Conference on Data Mining (ICDM'01), pages 123--130, 2001.]]
[8]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997.]]
[9]
E. Knorr and R. Ng. Algorithms for mining distance based outliers in large data sets. In Proceedings of the Very Large Databases (VLDB) Conference, August 1998.]]
[10]
A. Lazarevic and V. Kumar. Feature bagging for outlier detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2005.]]
[11]
H. Mamitsuka and N. Abe. Efficient mining from large databases by query learning. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.]]
[12]
P. Melville and R. Mooney. Diverse ensemble for active learning. In Proceedings of the 21st International Conference on Machine Learning, pages 584--591, 2004.]]
[13]
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 2000.]]
[14]
H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 287--294. ACM Press, New York, NY, 1992.]]
[15]
I. Steinwart, D. Hush, and C. Scovel. A classification framework for anomaly detection. Journal of Machine Learning Research, 6:211--232, 2005.]]
[16]
T. Theiler and D. M. Cai. Resampling approach for anomaly detection in multispectral images. In Proceedings of the SPIE 5093, pages 230--240, 2003.]]
[17]
D. Y. Yeung and C. Chow. Parzen-window network intrusion detectors. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR'02), pages 385--388, 2003.]]

Cited By

View all
  • (2025)Enhanced flow number prediction of asphalt mixtures using stacking ensemble-based machine learning model and grey relational analysisConstruction and Building Materials10.1016/j.conbuildmat.2025.140001463(140001)Online publication date: Feb-2025
  • (2025)A comparative evaluation of clustering-based outlier detectionData Mining and Knowledge Discovery10.1007/s10618-024-01086-z39:2Online publication date: 1-Mar-2025
  • (2024)Anomaly detection research using Isolation Forest in Machine LearningHerald of Dagestan State Technical University. Technical Sciences10.21822/2073-6185-2024-51-1-106-11251:1(106-112)Online publication date: 16-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. ensemble method
  3. outlier detection

Qualifiers

  • Article

Conference

KDD06

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)171
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Enhanced flow number prediction of asphalt mixtures using stacking ensemble-based machine learning model and grey relational analysisConstruction and Building Materials10.1016/j.conbuildmat.2025.140001463(140001)Online publication date: Feb-2025
  • (2025)A comparative evaluation of clustering-based outlier detectionData Mining and Knowledge Discovery10.1007/s10618-024-01086-z39:2Online publication date: 1-Mar-2025
  • (2024)Anomaly detection research using Isolation Forest in Machine LearningHerald of Dagestan State Technical University. Technical Sciences10.21822/2073-6185-2024-51-1-106-11251:1(106-112)Online publication date: 16-Apr-2024
  • (2024)Efficient Generation of Hidden Outliers for Improved Outlier DetectionACM Transactions on Knowledge Discovery from Data10.1145/369082718:9(1-21)Online publication date: 8-Nov-2024
  • (2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/366336916:2(1-45)Online publication date: 11-May-2024
  • (2024)Outlier Detection Using a GPU-Based Parallel Algorithm: Quantum ClusteringInternational Journal on Artificial Intelligence Tools10.1142/S021821302350077X33:04Online publication date: 30-May-2024
  • (2024)An Iterative Method for Unsupervised Robust Anomaly Detection Under Data ContaminationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326702835:10(13327-13339)Online publication date: Oct-2024
  • (2024)Random clustering-based outlier detectorInformation Sciences: an International Journal10.1016/j.ins.2024.120498667:COnline publication date: 1-May-2024
  • (2024)Analysis of Smooth and Enhanced Smooth Quadrature-Inspired Generalized Choquet IntegralFuzzy Sets and Systems10.1016/j.fss.2024.108926(108926)Online publication date: Mar-2024
  • (2024)Evidential uncertainty sampling strategies for active learningMachine Language10.1007/s10994-024-06567-2113:9(6453-6474)Online publication date: 1-Sep-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media