ABSTRACT
Outlier detection has been extensively researched in the context of unsupervised learning. But the learning results are not always satisfactory, which can be significantly improved using supervision of some labeled points. In this paper, we are concerned with employing supervision of limited amount of label information to detect outliers more accurately. The key of our approach is an objective function that punishes poor clustering results and deviation from known labels as well as restricts the number of outliers. The outliers can be found as a solution to the discrete optimization problem regarding the objective function. By this way, this method can detect meaningful outliers that can not be identified by existing unsupervised methods.
- S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD '04: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 59--68. ACM Press, 2004. Google ScholarDigital Library
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93--104. ACM Press, 2000. Google ScholarDigital Library
- E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-based outliers: Algorithms and applications. VLDB Journal: Very Large Data Bases, 8(3--4):237--253, 2000. Google ScholarDigital Library
- J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Symposium on Math, Statistics, and Probability, pages 281--297, 1967.Google Scholar
- K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell. Text classification from labeled and unlabeled documents using em. Machine Learning, 39(2/3):103--134, 2000. Google ScholarDigital Library
Index Terms
- Semi-supervised outlier detection
Recommendations
Semi-supervised Based Training Set Construction for Outlier Detection
CLOUDCOM-ASIA '13: Proceedings of the 2013 International Conference on Cloud Computing and Big DataOutliers are sparse and few. It's costly to obtain a training set with enough outliers so that existing approaches to the problem of outlier detection seldom processed with supervised manner. However, given a training set with sufficient outliers, ...
Entropy-based outlier detection using semi-supervised approach with few positive examples
Outlier detection is an important problem in data mining that aims to discover useful exceptional and unusual patterns hidden in large data sets. Fraud detection, time series monitoring, intrusion detection and medical condition monitoring are some of ...
Rough-based semi-supervised outlier detection
FSKD'09: Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1With the help of some labeled samples and rough C-means clustering, a rough-based semi-supervised outlier detection (RBSSOD) is proposed, which integrates the advantage of semi-supervised outlier detection (SSOD) and rough C-means clustering. This ...
Comments