Elsevier

Neurocomputing

Volume 72, Issues 7–9, March 2009, Pages 1859-1869
Neurocomputing

Minimum spanning tree based one-class classifier

https://doi.org/10.1016/j.neucom.2008.05.003Get rights and content

Abstract

In the problem of one-class classification one of the classes, called the target class, has to be distinguished from all other possible objects. These are considered as non-targets. The need for solving such a task arises in many practical applications, e.g. in machine fault detection, face recognition, authorship verification, fraud recognition or person identification based on biometric data.

This paper proposes a new one-class classifier, the minimum spanning tree class descriptor (MST_CD). This classifier builds on the structure of the minimum spanning tree constructed on the target training set only. The classification of test objects relies on their distances to the closest edge of that tree, hence the proposed method is an example of a distance-based one-class classifier. Our experiments show that the MST_CD performs especially well in case of small sample size problems and in high-dimensional spaces.

Introduction

In the problem of one-class classification [29], [17], [41], [27], [14], [19] one of the classes, called the target class, has to be distinguished from all other possible objects, also called non-targets. The need for solving such a task arises in many practical applications. Examples are any type of fault detection [46] or target detection such as face detection in images, abnormal behaviour, disease detection [40], person identification based on biometric data or authorship verification [23]. The problem of one-class classification is characterised by the presence of a target class, e.g. a collection of face images of a particular person. The goal is to determine a proximity function of a test object to the target class such that the resembling objects are accepted as targets and non-targets are rejected. It is assumed that a well-sampled training set of the target objects is available, while no (or very few) non-target examples are present. The reason for this assumption is practical since non-targets may occur only occasionally or their measurements might be very costly. Moreover, even when non-targets are available in a training stage, they may not always be trusted. They may be badly sampled, with unknown priors and ill-defined distributions. In essence, non-targets are weakly defined as they may appear as any kind of deviation or anomaly from the target examples, e.g. images of a face of non-target people or images of arbitrary (non-face) objects. Still, one-class classifiers need to be trained in such a way that the errors on both target and non-target classes are taken into account.

Many one-class classifiers have been proposed so far; see [41], [19] for a survey. They often rely on strong assumptions concerning the distribution of objects, such as a normal distribution of the target class [6], [2], [37] or a uniform distribution of the non-target class [41]. Following the later assumption, the training of classifiers is based on a minimisation of the volume of a one-class classifier (which is the volume captured by the classifier's boundary) such that the error on the target class does not increase [42], [39], [4], [33]. Usually such classifiers can be applied to any distribution as they do not make assumptions on the target distribution, but they may need to estimate many parameters. Examples are support vector data description (SVDD) [42] or auto-encoder neural net [17], [28].

In this paper we propose a non-parametric classifier which is based on a graph representation of the target training data, aiming to capture the underlining data structure. The basic elements of the proposed one-class classifier are the edges of the graph. Graph edges can be considered as an additional set of virtual target objects. These additional objects, in turn, can help to model a target distribution in high-dimensional spaces and in small sample size problems. This enriches the representation of relations in the data. Additionally, we can look at graph edges as a set of possible transformation paths that allow one to transform one target object into another within the domain of the target class.

The layout of this paper is as follows. Section 2 presents the formal notation and describes the framework of one-class classification. In Section 3, a data descriptor based on the minimum spanning tree (MST) is introduced. Section 3.2 discusses a possible complexity parameter which gives a handle to describe the data complexity and to simplify the classifier. Section 4 discusses the related work. Section 5 explores both advantages and disadvantages of the proposed classifier based on a set of experiments conducted on both artificial and real-world data. The final conclusions are presented in Section 6.

Section snippets

One-class classifiers

One-class classifiers are trained to accept target examples and reject non-targets. It is assumed that during training no or only a few non-target objects are available. In a part of the further discussion we will also assume a presence of outliers during training. Outliers may, e.g. arise from measurement errors and can be considered as mislabelled target objects in the training set.

Let X={xi|xiRN,i=1,,n} be a training set in an N-dimensional vector space drawn from the target distribution.

Description of a target class by an MST

Let {xi,xj}XRN be two examples from a target class. If these two examples describe two similar objects in reality, they should be neighbours in the representation space RN. We assume that not only these examples but points from their proper neighbourhoods also belong to the target class. For example, if we assume the continuity within the target class in RN, then there exists a continuous transformation between these two examples. This means that we can find a transformation for which all

Related work

The proposed classifier can be related to other known methods. In particular, a multi-class classifier, the nearest feature line method (NFLM) was introduced in [26]. In the NFLM, one describes a training set by a set of lines between all pairs of objects from a particular class. The new object is classified into one of the classes from the training set based on the distance to the nearest line from the set. It has been shown there that the NFLM performs well in face recognition problems in

Experiments

To study the performance of one-class classifiers, a receiver–operator characteristics (ROC) curve is often used [3]. It is a function of the true positive ratio (target acceptance) versus the false positive ratio (non-target acceptance). Of course, examples of non-target objects are necessary to evaluate it. These are available in a validation stage only. In order to compare the performance of various classifiers, the area under the ROC curve (AUC) measure can be used [3]. It computes the AUC,

Conclusions

This paper proposes a new one-class classifier based on the minimum spanning tree (MST). The complexity of the classifier equals the complexity of the MST and the threshold is determined as a fraction on a distribution of the edge lengths in the MST. The basic elements of the classifier are the edges of the tree which can be considered as additional virtual elements that capture more characteristics of the training objects. As an extension, we also propose a way to reduce the complexity of the

Acknowledgement

This work was partly supported by the Dutch Organisation for Scientific Research (NWO).

Piotr Juszczak studied physics at the Wrocław University of Technology, Poland and computer science at the National University of Ireland, Galway. He received Master degree in 2001 with the thesis “Automatic recognition of arrhythmia from ECG signal”. He received his PhD in 2006 from the Information and Communication Theory Group at Delft University of Technology, under the supervision of R.P.W. Duin. The title of his PhD thesis is “Learning to recognise. Study on one-class classification and

References (46)

  • T.H. Cormen et al.

    Introduction to Algorithms

    (1990)
  • T.F. Cox et al.

    Multidimensional Scaling

    (1994)
  • F. Dehne, S. Gotz, Practical parallel algorithms for minimum spanning trees, in: The 17th IEEE Symposium on Reliable...
  • R.P.W. Duin

    On the choice of the smoothing parameters for Parzen estimators of probability density functions

    IEEE Trans. Comput.

    (1976)
  • T.R. Golub et al.

    Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

    Science

    (1999)
  • J.C. Gower

    Metric and Euclidean properties of dissimilarity coefficients

    J. Classification

    (1986)
  • R.L. Graham et al.

    On the history of the minimum spanning tree problem

    Ann. Hist. Comput.

    (1985)
  • S. Hettich, C.L. Blake, C.J. Merz, UCI repository of machine learning databases, 1998...
  • D. Hochbaum et al.

    A best possible heuristic for the k-center problem

    Math. Oper. Res.

    (1985)
  • N. Japkowicz, Concept-learning in the absence of counter-examples: an autoassociation-based approach to classification,...
  • P. Juszczak, Learning to recognise. A study on one-class classification and active learning, Ph.D. Thesis, Delft...
  • P. Juszczak et al.

    Uncertainty sampling for one-class classifiers

  • G. Karypis et al.

    Chameleon: a hierarchical clustering using dynamic modeling

    IEEE Comput.

    (1999)
  • Cited by (118)

    View all citing articles on Scopus

    Piotr Juszczak studied physics at the Wrocław University of Technology, Poland and computer science at the National University of Ireland, Galway. He received Master degree in 2001 with the thesis “Automatic recognition of arrhythmia from ECG signal”. He received his PhD in 2006 from the Information and Communication Theory Group at Delft University of Technology, under the supervision of R.P.W. Duin. The title of his PhD thesis is “Learning to recognise. Study on one-class classification and active learning”. Since 2006 he is a research associate at the Institute for Mathematical Science at Imperial College London. His main research interest involves theoretical and practical aspects of machine learning especially one-class classification, domain-based classification and enhancement of classification models by unlabelled data.

    David M.J. Tax studied physics at the University of Nijmegen, The Netherlands in 1996, and received Master degree with the thesis “Learning of structure by Many-take-all Neural Networks”. After that he had his PhD at the Delft University of Technology in the Pattern Recognition group, under the supervision of R.P.W. Duin. In 2001 he promoted with the thesis ‘One-class classification’. After working for two years as a Marie Curie Fellow in the Intelligent Data Analysis group in Berlin, at present he is post doc in the Information and Communication Theory group at the Delft university of Technology. His main research interest is in the learning and development of outlier detection algorithms and systems, using techniques from machine learning and pattern recognition. In particular the problems concerning the representation of data, simple and elegant classifiers and the evaluation of methods have focus.

    Elżbieta Pe¸kalska is a purpose-driven, creative and inspiring researcher, teacher, facilitator and mentor. She studied computer science at the University of Wrocław, Poland. In the years 1998 - 2004 she worked on both applied and fundamental projects in pattern recognition and machine learning at the Delft University of Technology, The Netherlnds. In 2005 she obtained a cum-laude PhD degree, under the supervision R.P.W. Duin, for her seminal work on dissimilarity-based learning methods, aka generalised kernel approaches. Currently, she is an Engineering and Physical Sciences Research Council postdoctoral fellow at the University of Manchester, UK. She is passionate about the learning process and learning strategies. This not only includes intelligent learning from data and sensors, but also humans on their development paths. Some of her key questions refer to the issue of representation, learning and combining paradigms and the use of proximity and kernel in the learning from examples.

    Robert P.W. Duin studied applied physics at Delft University of Technology in the Netherlands. In 1978 he received the Ph.D. degree for a thesis on the accuracy of statistical pattern recognizers. In his research he included various aspects of the automatic interpretation of measurements, learning systems and classifiers. Between 1980 and 1990 he studied and developed hardware architectures and software configurations for interactive image analysis. After this period his interest was redirected to neural networks and pattern recognition.

    At this moment he is an associate professor of the Faculty of Electrical Engineering, Mathematics and Computer Science of Delft University of Technology. His present research is in the design, evaluation and application of algorithms that learn from examples. This includes neural network classifiers, support vector machines and classifier combining strategies. Recently he started to investigate alternative object representations for classification and became thereby interested in dissimilarity based pattern recognition and in the possibilities to learn domain descriptions. Additionally he is interested in the relation between pattern recognition and consciousness.

    View full text