Minimum spanning tree based one-class classifier

doi:10.1016/j.neucom.2008.05.003

Neurocomputing

Volume 72, Issues 7–9, March 2009, Pages 1859-1869

https://doi.org/10.1016/j.neucom.2008.05.003 Get rights and content

Abstract

In the problem of one-class classification one of the classes, called the target class, has to be distinguished from all other possible objects. These are considered as non-targets. The need for solving such a task arises in many practical applications, e.g. in machine fault detection, face recognition, authorship verification, fraud recognition or person identification based on biometric data.

This paper proposes a new one-class classifier, the minimum spanning tree class descriptor (MST_CD). This classifier builds on the structure of the minimum spanning tree constructed on the target training set only. The classification of test objects relies on their distances to the closest edge of that tree, hence the proposed method is an example of a distance-based one-class classifier. Our experiments show that the MST_CD performs especially well in case of small sample size problems and in high-dimensional spaces.

Introduction

In the problem of one-class classification [29], [17], [41], [27], [14], [19] one of the classes, called the target class, has to be distinguished from all other possible objects, also called non-targets. The need for solving such a task arises in many practical applications. Examples are any type of fault detection [46] or target detection such as face detection in images, abnormal behaviour, disease detection [40], person identification based on biometric data or authorship verification [23]. The problem of one-class classification is characterised by the presence of a target class, e.g. a collection of face images of a particular person. The goal is to determine a proximity function of a test object to the target class such that the resembling objects are accepted as targets and non-targets are rejected. It is assumed that a well-sampled training set of the target objects is available, while no (or very few) non-target examples are present. The reason for this assumption is practical since non-targets may occur only occasionally or their measurements might be very costly. Moreover, even when non-targets are available in a training stage, they may not always be trusted. They may be badly sampled, with unknown priors and ill-defined distributions. In essence, non-targets are weakly defined as they may appear as any kind of deviation or anomaly from the target examples, e.g. images of a face of non-target people or images of arbitrary (non-face) objects. Still, one-class classifiers need to be trained in such a way that the errors on both target and non-target classes are taken into account.

Many one-class classifiers have been proposed so far; see [41], [19] for a survey. They often rely on strong assumptions concerning the distribution of objects, such as a normal distribution of the target class [6], [2], [37] or a uniform distribution of the non-target class [41]. Following the later assumption, the training of classifiers is based on a minimisation of the volume of a one-class classifier (which is the volume captured by the classifier's boundary) such that the error on the target class does not increase [42], [39], [4], [33]. Usually such classifiers can be applied to any distribution as they do not make assumptions on the target distribution, but they may need to estimate many parameters. Examples are support vector data description (SVDD) [42] or auto-encoder neural net [17], [28].

In this paper we propose a non-parametric classifier which is based on a graph representation of the target training data, aiming to capture the underlining data structure. The basic elements of the proposed one-class classifier are the edges of the graph. Graph edges can be considered as an additional set of virtual target objects. These additional objects, in turn, can help to model a target distribution in high-dimensional spaces and in small sample size problems. This enriches the representation of relations in the data. Additionally, we can look at graph edges as a set of possible transformation paths that allow one to transform one target object into another within the domain of the target class.

The layout of this paper is as follows. Section 2 presents the formal notation and describes the framework of one-class classification. In Section 3, a data descriptor based on the minimum spanning tree (MST) is introduced. Section 3.2 discusses a possible complexity parameter which gives a handle to describe the data complexity and to simplify the classifier. Section 4 discusses the related work. Section 5 explores both advantages and disadvantages of the proposed classifier based on a set of experiments conducted on both artificial and real-world data. The final conclusions are presented in Section 6.

Section snippets

One-class classifiers

One-class classifiers are trained to accept target examples and reject non-targets. It is assumed that during training no or only a few non-target objects are available. In a part of the further discussion we will also assume a presence of outliers during training. Outliers may, e.g. arise from measurement errors and can be considered as mislabelled target objects in the training set.

Let $X = {x_{i} | x_{i} \in R^{N}, i = 1, \dots, n}$ be a training set in an N-dimensional vector space drawn from the target distribution.

Description of a target class by an MST

Let ${x_{i}, x_{j}} \in X \subset R^{N}$ be two examples from a target class. If these two examples describe two similar objects in reality, they should be neighbours in the representation space $R^{N}$ . We assume that not only these examples but points from their proper neighbourhoods also belong to the target class. For example, if we assume the continuity within the target class in $R^{N}$ , then there exists a continuous transformation between these two examples. This means that we can find a transformation for which all

Related work

The proposed classifier can be related to other known methods. In particular, a multi-class classifier, the nearest feature line method (NFLM) was introduced in [26]. In the NFLM, one describes a training set by a set of lines between all pairs of objects from a particular class. The new object is classified into one of the classes from the training set based on the distance to the nearest line from the set. It has been shown there that the NFLM performs well in face recognition problems in

Experiments

To study the performance of one-class classifiers, a receiver–operator characteristics (ROC) curve is often used [3]. It is a function of the true positive ratio (target acceptance) versus the false positive ratio (non-target acceptance). Of course, examples of non-target objects are necessary to evaluate it. These are available in a validation stage only. In order to compare the performance of various classifiers, the area under the ROC curve (AUC) measure can be used [3]. It computes the AUC,

Conclusions

This paper proposes a new one-class classifier based on the minimum spanning tree (MST). The complexity of the classifier equals the complexity of the MST and the threshold is determined as a fraction on a distribution of the edge lengths in the MST. The basic elements of the classifier are the edges of the tree which can be considered as additional virtual elements that capture more characteristics of the training objects. As an extension, we also propose a way to reduce the complexity of the

Acknowledgement

This work was partly supported by the Dutch Organisation for Scientific Research (NWO).

References (46)

A.P. Bradley
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognition
(1997)
S. Harmeling et al.
From outliers to prototypes: ordering data
Neurocomputing
(2006)
M.F. Jiang et al.
Two-phase clustering process for outliers detection
Pattern Recognition Lett.
(2001)
L. Manevitz et al.
One-class document classification via neural networks
Neurocomputing
(2007)
D.M.J. Tax et al.
Support vector domain description
Pattern Recognition Lett.
(1999)
U. Alon et al.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proc. Natl. Acad. Sci.
(1999)
C.M. Bishop
Neural Networks for Pattern Recognition
(1995)
C. Campbell, K.P. Bennett, A linear programming approach to novelty detection, in: Neural Information Processing...
B. Chazelle
A minimum spanning tree algorithm with inverse-Ackermann type complexity
J. ACM
(2000)
C.K. Chow
On optimum recognition error and reject tradeoff
IEEE Trans. Inf. Theory
(1970)

T.H. Cormen et al.

Introduction to Algorithms

(1990)

T.F. Cox et al.

Multidimensional Scaling

(1994)

F. Dehne, S. Gotz, Practical parallel algorithms for minimum spanning trees, in: The 17th IEEE Symposium on Reliable...

R.P.W. Duin

On the choice of the smoothing parameters for Parzen estimators of probability density functions

IEEE Trans. Comput.

(1976)

T.R. Golub et al.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

(1999)

J.C. Gower

Metric and Euclidean properties of dissimilarity coefficients

J. Classification

(1986)

R.L. Graham et al.

On the history of the minimum spanning tree problem

Ann. Hist. Comput.

(1985)

S. Hettich, C.L. Blake, C.J. Merz, UCI repository of machine learning databases, 1998...

D. Hochbaum et al.

A best possible heuristic for the k-center problem

Math. Oper. Res.

(1985)

N. Japkowicz, Concept-learning in the absence of counter-examples: an autoassociation-based approach to classification,...

P. Juszczak, Learning to recognise. A study on one-class classification and active learning, Ph.D. Thesis, Delft...

P. Juszczak et al.

Uncertainty sampling for one-class classifiers

G. Karypis et al.

Chameleon: a hierarchical clustering using dynamic modeling

IEEE Comput.

(1999)

Cited by (118)

Emerging investigator series: a machine learning approach to quantify the impact of meteorology on tropospheric ozone in the inland southern California
2023, Environmental Science: Atmospheres
The role of meteorology in facilitating the formation and accumulation of ground-level ozone is of great theoretical and practical interest, especially due to changing global climate. In this study, with appropriate machine learning algorithms, we analyzed large meteorology and air quality datasets to train machine learning models to (1) enhance the prediction of ozone levels in the South Coast Air Basin of California, (2) investigate the impact of recent meteorological shifts on ozone formation, and (3) determine the most critical factors influencing ozone exceedance hours. Random forest regression was used to predict historical and future trends of ozone levels, and k-nearest neighbor was used as a binary classifier for ozone exceedance prediction. The models were trained on meteorology data from Ontario and Los Angeles International Airport stations and air quality data from the Fontana, California air monitoring station, and data were collected for the 1994 to 2018 time period. Upon model evaluation, the correlation of the RFR model was 0.92, and the probability of detection for ozone exceedances using k-nearest neighbors was 0.81 for the most recent years of the analysis (2014–2018). We also ran a 4 km Community Multiscale Air Quality model simulation to generate air pollution estimates over Southern California. As expected, ozone in Fontana was positively correlated with temperature. The ozone exceedance hours usually occurred when the temperature was above 25 °C, and the wind direction was from 270° (westerly). Ozone sensitivity as a function of temperature and NO_x was also examined. Observed troughs in hourly NO_x concentrations during midday under high temperatures suggests that most of the ambient NO_x reacted, also as expected. The results indicate that machine learning can support state implementation planning by complementing traditional air quality modeling, reducing simulation time, and exploiting large datasets for historical simulations and future air quality predictions.
Dissimilarity-vector spaces based on Dynamic Time Warpings of spectral/time-frequency information for structural health monitoring
2022, Computers and Structures
This paper presents a powerful data representation, so far unexplored for data-based Structural Health Monitoring, relying on the dissimilarity pattern recognition paradigm and the proximity-learning, providing highly discriminant dissimilarity-based vector spaces, —also called generalized dissimilarity kernels—, where any classifier can be trained for damage classification issues. Conventionally, these damage detection tasks involve a domain-dependent and, quite often, non-trivial preprocessing step by which is computed a feature set from each observation. A crucial consequence is that valuable structural information could be lost in this feature extraction step, leading to models with poor performance. In particular, in this paper we introduce a novel type of dissimilarity-based vector spaces for structural health diagnosis, building up them on a direct pairwise comparison between spectral/time–frequency structural information via the Dynamic Time Warping distance, without a previous feature extraction step, for learning one-class classifiers and using only undamaged data during training. The very sound results, using two data sets widely referenced in the scientific literature, clearly show its potential to complement the state-of-art of pattern recognition algorithms that are used on data-based Structural Health Monitoring.
OCmst: One-class novelty detection using convolutional neural network and minimum spanning trees
2022, Pattern Recognition Letters
We present a novel model called One Class Minimum Spanning Tree (OCmst) for novelty detection problem that uses a Convolutional Neural Network (CNN) as deep feature extractor and graph-based model based on Minimum Spanning Tree (MST). In a novelty detection scenario, the training data is no polluted by outliers (abnormal class) and the goal is to recognize if a test instance belongs to the normal class or the abnormal class. Our approach uses the deep features from CNN to feed a pair of MSTs built starting from each test instance. To cut down the computational time we use a parameter $γ$ to specify the size of the MST’s starting to the neighbours from the test instance. To prove the effectiveness of the proposed approach we conducted experiments on two publicly available datasets, well-known in literature and we achieved the state-of-the-art results on the CIFAR10 dataset.
A time-varying neural network for solving minimum spanning tree problem on time-varying network
2021, Neurocomputing
In this study, we propose a time-varying neural network (TVNN) for solving the time-varying minimum spanning tree problem with constraints (CTMST), which is a variant of the time-varying network minimum spanning tree problem (TNMSTP), a well-known NP-hard problem. Unlike traditional algorithms that use heuristic search, the proposed TVNN is based on time-varying neurons and can achieve parallel computing without any training requirements. Time-varying neurons are novel computational neurons designed in this work. They consist of six parts: input, wave receiver, neuron state, wave generator, wave sender, and output. The parallel computing strategy and self-feedback mechanism of the proposed algorithm greatly improve the response speed and solution accuracy on large-scale time-varying networks. The analysis of time complexity and experimental results on the New York City dataset show that the performance of the proposed algorithm is significantly improved compared with the existing methods.
A new method for anomaly detection based on non-convex boundaries with random two-dimensional projections
2021, Information Fusion
The implementation of anomaly detection systems represents a key problem that has been focusing the efforts of scientific community. In this context, the use one-class techniques to model a training set of non-anomalous objects can play a significant role. One common approach to face the one-class problem is based on determining the geometric boundaries of the target set. More specifically, the use of convex hull combined with random projections offers good results but presents low performance when it is applied to non-convex sets. Then, this work proposes a new method that face this issue by implementing non-convex boundaries over each projection. The proposal was assessed and compared with the most common one-class techniques, over different sets, obtaining successful results.
An improved weighted one class support vector machine for turboshaft engine fault detection
2020, Engineering Applications of Artificial Intelligence
One-class support vector machine (OC-SVM) is a common algorithm to solve one-class classification (OCC) problem. Weighted OC-SVM (WOC-SVM) is an improved algorithm based on OC-SVM, which assigns a weight to each sample through a specific weight calculation method so as to improve the robustness of the algorithm. The recently proposed WOC-SVM algorithm based on neighbors’ distribution named as WOC-SVM(ND) is an easily understandable and effective algorithm. The weighting strategy of WOC-SVM(ND) is only related to the distribution of instance’s k-nearest neighbors. In other words, the farther the distance between the instance and the boundary of the data distribution is, the more even the distribution of k-nearest neighbors is and the bigger the corresponding weight of the instance is. However, this weight calculation method is unreasonable to some extent. That is to say, it only considers the distribution angle of k-nearest neighbors, but does not consider the influence of the distance between k-nearest neighbors and the instance on the weight. Besides, WOC-SVM(ND) cannot effectively solve the problem which has complex dataset consisting of multiple clusters. The algorithm proposed in this paper can solve these two problems simultaneously, which is composed of two parts. One is an improved version on the basis of WOC-SVM(ND), and the other takes into account the distribution density of samples’ k-nearest neighbors. Their linear combination makes the weighting strategy more reasonable. Experimental results on eight benchmark datasets show that the proposed algorithm is feasible and effective. Moreover, when the proposed algorithm is applied to the fault detection of turboshaft engine, an impressive effectiveness is obtained.

View all citing articles on Scopus

Piotr Juszczak studied physics at the Wrocław University of Technology, Poland and computer science at the National University of Ireland, Galway. He received Master degree in 2001 with the thesis “Automatic recognition of arrhythmia from ECG signal”. He received his PhD in 2006 from the Information and Communication Theory Group at Delft University of Technology, under the supervision of R.P.W. Duin. The title of his PhD thesis is “Learning to recognise. Study on one-class classification and active learning”. Since 2006 he is a research associate at the Institute for Mathematical Science at Imperial College London. His main research interest involves theoretical and practical aspects of machine learning especially one-class classification, domain-based classification and enhancement of classification models by unlabelled data.

David M.J. Tax studied physics at the University of Nijmegen, The Netherlands in 1996, and received Master degree with the thesis “Learning of structure by Many-take-all Neural Networks”. After that he had his PhD at the Delft University of Technology in the Pattern Recognition group, under the supervision of R.P.W. Duin. In 2001 he promoted with the thesis ‘One-class classification’. After working for two years as a Marie Curie Fellow in the Intelligent Data Analysis group in Berlin, at present he is post doc in the Information and Communication Theory group at the Delft university of Technology. His main research interest is in the learning and development of outlier detection algorithms and systems, using techniques from machine learning and pattern recognition. In particular the problems concerning the representation of data, simple and elegant classifiers and the evaluation of methods have focus.

Elżbieta Pe¸kalska is a purpose-driven, creative and inspiring researcher, teacher, facilitator and mentor. She studied computer science at the University of Wrocław, Poland. In the years 1998 - 2004 she worked on both applied and fundamental projects in pattern recognition and machine learning at the Delft University of Technology, The Netherlnds. In 2005 she obtained a cum-laude PhD degree, under the supervision R.P.W. Duin, for her seminal work on dissimilarity-based learning methods, aka generalised kernel approaches. Currently, she is an Engineering and Physical Sciences Research Council postdoctoral fellow at the University of Manchester, UK. She is passionate about the learning process and learning strategies. This not only includes intelligent learning from data and sensors, but also humans on their development paths. Some of her key questions refer to the issue of representation, learning and combining paradigms and the use of proximity and kernel in the learning from examples.

Robert P.W. Duin studied applied physics at Delft University of Technology in the Netherlands. In 1978 he received the Ph.D. degree for a thesis on the accuracy of statistical pattern recognizers. In his research he included various aspects of the automatic interpretation of measurements, learning systems and classifiers. Between 1980 and 1990 he studied and developed hardware architectures and software configurations for interactive image analysis. After this period his interest was redirected to neural networks and pattern recognition.

At this moment he is an associate professor of the Faculty of Electrical Engineering, Mathematics and Computer Science of Delft University of Technology. His present research is in the design, evaluation and application of algorithms that learn from examples. This includes neural network classifiers, support vector machines and classifier combining strategies. Recently he started to investigate alternative object representations for classification and became thereby interested in dissimilarity based pattern recognition and in the possibilities to learn domain descriptions. Additionally he is interested in the relation between pattern recognition and consciousness.

View full text

Minimum spanning tree based one-class classifier

Abstract

Introduction

Section snippets

One-class classifiers

Description of a target class by an MST

Related work

Experiments

Conclusions

Acknowledgement

Pattern Recognition

Neurocomputing

Pattern Recognition Lett.

Neurocomputing

Pattern Recognition Lett.

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays

Proc. Natl. Acad. Sci.

Neural Networks for Pattern Recognition

A minimum spanning tree algorithm with inverse-Ackermann type complexity

J. ACM

On optimum recognition error and reject tradeoff

IEEE Trans. Inf. Theory

Introduction to Algorithms

Multidimensional Scaling

On the choice of the smoothing parameters for Parzen estimators of probability density functions

IEEE Trans. Comput.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

Science

Metric and Euclidean properties of dissimilarity coefficients

J. Classification

On the history of the minimum spanning tree problem

Ann. Hist. Comput.

A best possible heuristic for the k-center problem

Math. Oper. Res.

Uncertainty sampling for one-class classifiers

Chameleon: a hierarchical clustering using dynamic modeling

IEEE Comput.