Elsevier

Pattern Recognition

Volume 37, Issue 12, December 2004, Pages 2281-2291
Pattern Recognition

Similarity-based classification of sequences using hidden Markov models

https://doi.org/10.1016/j.patcog.2004.04.005Get rights and content

Abstract

Hidden Markov models (HMM) are a widely used tool for sequence modelling. In the sequence classification case, the standard approach consists of training one HMM for each class and then using a standard Bayesian classification rule. In this paper, we introduce a novel classification scheme for sequences based on HMMs, which is obtained by extending the recently proposed similarity-based classification paradigm to HMM-based classification. In this approach, each object is described by the vector of its similarities with respect to a predetermined set of other objects, where these similarities are supported by HMMs. A central problem is the high dimensionality of resulting space, and, to deal with it, three alternatives are investigated. Synthetic and real experiments show that the similarity-based approach outperforms standard HMM classification schemes.

Introduction

The analysis of sequential data is an interesting and important research area. Probabilistic modelling and classification is intrinsically more difficult when each observation is a sequence, compared to the standard scenario where each observation is a set (vector) of features. In fact, since the length of the sequences may vary, it is not possible to directly use standard pattern recognition techniques. Moreover, sequence classification problems usually involve very large data sets.

Hidden Markov models (HMMs) are commonly employed probabilistic models of sequential data [1]. HMMs can be viewed as stochastic generalizations of finite-state automata, when both the transitions between states and the generation of output symbols are governed by probabilistic mechanisms [1]. Although the basic theory and inference tools were developed in the late 1960s [2], [3], HMMs have only been extensively applied in the last decade. Speech recognition [1], DNA and protein modelling [4], [5], handwritten character recognition [6], gesture recognition [7], and behavior analysis and synthesis [8] are examples of problems for which HMMs have been exploited.

The standard HMM-based approach to sequence classification consists in training one HMM for each class, which are subsequently used as class-conditional densities in a standard Bayes classification paradigm. For example, assuming a priori equiprobable classes, an unknown sequence is classified into the class whose model shows the highest probability (likelihood) of having generated this sequence (this is the well-known maximum-likelihood (ML) classification rule).

In this paper, an alternative classification scheme is proposed, by extending the similarity-based paradigm [9], [10], [11], [12], [13], [14] to HMM-based classification. This paradigm, which has been introduced recently, differs from typical pattern recognition approaches where objects to be classified are represented by sets (vectors) of features. In the similarity-based paradigm, objects are described using pairwise (dis)similarities, i.e. distances from other objects in the data set. In this way, objects are not constrained to be explicitly represented in a feature space, and all that is necessary is a way to compute (dis)similarities between pairs of objects. The goal is then to learn a classifier only from these relational data.

The literature on similarity-based classification is not vast [9], [10], [11], [12], [13], [14] (a brief review is given in Section 2.1). The general idea behind all these approaches is basically the same: given a set of pairwise dissimilarity values, a new representation space can be built, in which each object is described by these values. In Ref. [13], a simple synthetic experiment shows that a complex problem in a 2D space (requiring a quadratic classifier to achieve almost correct separation), becomes a linearly separable problem in a dissimilarity space.

In this paper, we extend this dissimilarity-based classification paradigm to HMM-based sequences classification problems. We propose to build a similarity1 space, representing each object (sequence) by the vector of its similarities with respect to a predetermined set of objects (this can be the whole data set, in the simplest approach), called the representatives set; the classification is then performed in this new representation space. The similarities are derived by considering the likelihood P(O|λ) as a measure of the similarity between the sequence O and the HMM specified by the set of parameters λ. This similarity measure was previously used in sequence clustering applications [15], [16].

The similarity-based classification paradigm seems to be particularly well suited to HMMs, as it can be seen as a natural extension of the standard HMM classification scheme. Specifically, the standard ML approach assigns an unknown sequence O to the class whose model shows the highest likelihood. To do so, the likelihoods of O with respect to the HMMs of all classes are evaluated, each stating a likelihood-based measure of the similarity between that class and the observed sequence. In other words, HMMs are used to compute similarities between sequences and classes, with each class being represented by a single HMM. Subsequently, only the maximum of these values is used to take the classification decision. In the similarity-based approach, the classification decision is taken using the whole set of similarities between each observed sequence and all the other sequences. We will show that this strategy results in a substantial improvement in the classification performance, compared to standard HMM-based approaches. Moreover, with the use of HMMs and the similarity representation, the problem of classification of sequences is reduced to a more standard classification task (where each object is described by a fixed-length feature vector), for which arbitrarily sophisticated techniques can be used, allowing to increase even more the classification performance.

The proposed approach was successfully tested on both synthetic and real data, involving 2D shape recognition and face recognition problems. In comparison with the standard HMM-based ML classification approach, our method showed a significant performance improvement, confirming all the potential of the similarity-based classification approach.

The main problem of the similarity-based approach, of particular relevance in practical applications, is the high dimensionality of the resulting similarity space. Actually, in the basic approach, this dimensionality is equal to the cardinality of the whole training data set, possibly leading to a huge computational burden. In the literature, two types of solutions of this problem could be identified, summarized in Section 2.2. In this paper, three methods to face this problem are proposed. The first one aims at removing redundancy from the data by applying linear dimensionality reduction techniques, such as Fisher discriminant analysis (FDA) [17] and principal component analysis (PCA) [19]. The second proposed method is based on a greedy strategy known as matching pursuit [20], which selects a subset of representatives based on which the similarity values are computed. These two approaches are very general, and can be applied in all distance-based classification contexts. The third proposed approach is more specific to the HMM case, and is based on a simple adaptation of the similarity-based classification approach to the standard HMM learning procedure. All these approaches were experimentally evaluated, confirming the discriminative power of the similarity space, even when the dimensionality is reduced to more manageable numbers.

Summarizing, the main contribution of this paper is the introduction of the similarity-based recognition paradigm in an HMM context, resulting in a significant performance improvement with respect to standard HMM-based classification. The mapping to the similarity space proposed in this paper allows us to reduce complex problems of sequence classification to a more standard point classification problem, for which arbitrarily sophisticated techniques could be used. From the point of view of similarity-based recognition, we propose two different approaches for dealing with the high dimensionality of the similarity space, which is one of the main problems of the method. First, the potential of linear reduction techniques, as PCA and FDA, is exploited, showing that they are able to reduce the curse of dimensionality impact on the classification process. Second, we address the choice of a set of appropriate representatives using the matching pursuit algorithm, which proves to be a robust and effective approach.

The rest of the paper is organized as follows. In Section 2, the state of the art related to the similarity-based classification and to the dimensionality issue is summarized. In Section 3, HMMs are introduced, together with the standard classification scheme. The proposed strategy is described in Section 4, and Section 5 reports the experiments and the related results are discussed. Finally, in Section 6, conclusions are drawn and future perspectives are envisaged.

Section snippets

Similarity-based classification

The literature on similarity-based classification is not vast. The approach seems to have been firstly introduced by Jain and Zongker [9], who have obtained a dissimilarity measure, based on deformable templates, for the hand-written digit recognition problem. A multidimensional scaling approach was then used to project this dissimilarity space onto a low-dimensional space, where a 1-nearest-neighbor (1-NN) classifier was employed to classify new objects. In Ref. [10], Graepel et al.

Hidden Markov models

A discrete-time HMM is a probabilistic model that describes a random sequence O=O1,O2,...,OT as being an indirect observation of an underlying (hidden) random sequence Q=Q1,Q2,...,QT, where this hidden process is Markovian, even though the observed process may not be. Due to lack of space, HMM theory will not be covered in detail here; for a comprehensive tutorial, see Ref. [1]. Basically, an HMM λ is a 4-tuple λ=(S,A,π,B), where S is the set of states, A is the transition matrix (representing

Results and discussion

In this section, experimental results are reported, in order to validate the proposed approach. Firstly, we investigate the discriminative power of the space SR with R=T, i.e. using as reference set the whole training set T. The standard ML classification scheme and the proposed approach are compared, with both synthetic and real data. The use of PCA and FDA is investigated in this context, also with the aim of visualizing the data. Secondly, experimental results concerning the two different

Conclusions

In this paper we have proposed a novel sequence classification scheme by combining hidden Markov models (HMM) with the similarity-based paradigm. This approach creates a representation space for sequences in which standard feature-based classification techniques can be used. We showed that a simple classifier in a such space outperforms standard HMM-based classification schemes. Three approaches to deal with the high dimensionality of the resulting space were also considered and investigated,

About the Author—MANUELE BICEGO received his Laurea degree and Ph.D. degree in Computer Science from University of Verona in 1999 and 2003, respectively. Since 2000 he is working in VIPS (Vision, Image Processing and Sound) laboratory of the Computer Science Department of University of Verona. His research interests include statistical pattern recognition, artificial vision, electronic noses, neural networks, hidden Markov models and video analysis. Manuele Bicego is member of the IAPR-IC

References (29)

  • E. Pekalska et al.

    Dissimilarity representations allow for building good classifiers

    Pattern Recogn. Lett.

    (2002)
  • L.R. Rabiner

    A tutorial on hidden Markov models and selected applications in speech recognition

    Proc. IEEE

    (1989)
  • L.E. Baum et al.

    A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains

    Ann. Math. Stat.

    (1970)
  • L.E. Baum

    An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes

    Inequality

    (1970)
  • R. Durbin et al.

    Biological Sequence AnalysisProbabilistic Models of Proteins and Nucleic Acids

    (1998)
  • R. Hughey et al.

    Hidden Markov model for sequence analysisextension and analysis of the basic method

    Comput. Appl. Biosci.

    (1996)
  • J. Hu et al.

    HMM based online handwriting recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1996)
  • S. Eickeler et al.

    Hidden Markov model based continuous online gesture recognition

  • T. Jebara et al.

    Action reaction learningautomatic visual analysis and synthesis of interactive behavior

  • A.K. Jain et al.

    Representation and recognition of handwritten digits using deformable templates

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • T. Graepel et al.

    Classification on pairwise proximity data

  • D.W. Jacobs et al.

    Classification with nonmetric distancesimage retrieval and class representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • E. Pekalska et al.

    Automatic pattern recognition by similarity representations

    Electron. Lett.

    (2001)
  • E. Pekalska et al.

    A generalized kernel approach to dissimilarity-based classification

    J. Mach. Learning Res.

    (2002)
  • Cited by (105)

    • Clustering via binary embedding

      2018, Pattern Recognition
    • Multiple structure recovery with T-linkage

      2017, Journal of Visual Communication and Image Representation
    • Co-spectral for robust shape clustering

      2016, Pattern Recognition Letters
    • Improving the text classification using clustering and a novel HMM to reduce the dimensionality

      2016, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      In this paradigm, objects (documents, in our case) are described using pairwise (dis)similarities, i.e. distances from other objects in the dataset. In this way, documents are not limited to being represented in a feature word space, and all that is needed is a way to compute distances between documents [8]. The goal of this paradigm is then to train and test a classifier using only these relational data.

    View all citing articles on Scopus

    About the Author—MANUELE BICEGO received his Laurea degree and Ph.D. degree in Computer Science from University of Verona in 1999 and 2003, respectively. Since 2000 he is working in VIPS (Vision, Image Processing and Sound) laboratory of the Computer Science Department of University of Verona. His research interests include statistical pattern recognition, artificial vision, electronic noses, neural networks, hidden Markov models and video analysis. Manuele Bicego is member of the IAPR-IC society and student member of the IEEE Systems, Man, and Cybernetics society.

    About the Author—VITTORIO MURINO is professor and chairman of the Department of Computer Science of the University of Verona, Italy. He received the Laurea degree in Electronic Engineering in 1989 and the Ph.D. in Electronic Engineering and Computer Science in 1993, both at the University of Genoa. He was a Post-Doctoral Fellow at the University of Genoa, working in the Signal Processing and Understanding Group of the Department of Biophysical and Electronic Engineering, as supervisor of research activities on signal and image processing in underwater environments. From 1995 to 1998, he was assistant professor at the Department of Mathematics and Computer Science of the University of Udine, Italy, supervising research activities regarding multisensorial underwater vision for object recognition and virtual reality applications. From 1998 he is at the University of Verona, where he founded the Vision, Image Processing and Sound (VIPS) laboratory. He worked at several national and European projects, especially in the context of the MAST (MArine Science and Technology) programme concerning with the investigation of underwater scenes by visual and acoustical sensors. He is an evaluator for the European Commission of project proposals related to several programmes. His main research interests include: 3D computer vision and pattern recognition, acoustic and optical underwater vision, probabilistic techniques for image processing (specifically, Hidden Markov models, Markov random fields, Bayesian networks), data fusion, and neural networks with applications on surveillance, autonomous driving, visual inspection, and robotics. He is also interested in the integration of image analysis and synthesis methodologies for object recognition and 3D modelling. Dr. Murino is author of more than 100 papers in the above subjects, and associate editor of the Pattern Recognition and IEEE Transactions on Systems, Man, and Cybernetics journals, and the electronic journal ELCVIA (Electronic Letters on Computer Vision and Image Analysis). He is also referee for many international journals, member of IAPR, and senior member of IEEE.

    About the Author—MÁRIO A.T. FIGUEIREDO received the E.E., M.Sc. and Ph.D. degrees in electrical and computer engineering, all from “Instituto Superior Tecnico” (IST), the engineering school of the Technical University of Lisbon, Portugal, in 1985, 1990, and 1994, respectively. He has been an Assistant Professor with the Department of Electrical and Computer Engineering of IST, since 1994. He is also a researcher with the Communication Theory and Pattern Recognition Group at the Institute of Telecommunications, Lisbon. In 1998, he was a visiting professor with the Department of Computer Science and Engineering, at Michigan State University. His research interests are in the fields of image analysis, computer vision, and statistical pattern recognition. He is currently an Associate Editor of the IEEE Transactions on Image Processing, the IEEE Transactions on Mobile Computing, and Pattern Recognition Letters. He is co-chair of the 2001 and 2003 editions of the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition. Mário Figueiredo is a Senior Member of the IEEE and received the 1995 Portuguese IBM Scientific Prize for his work on unsupervised image restoration.

    View full text