Similarity-based classification of sequences using hidden Markov models

doi:10.1016/j.patcog.2004.04.005

Pattern Recognition

Volume 37, Issue 12, December 2004, Pages 2281-2291

https://doi.org/10.1016/j.patcog.2004.04.005 Get rights and content

Abstract

Hidden Markov models (HMM) are a widely used tool for sequence modelling. In the sequence classification case, the standard approach consists of training one HMM for each class and then using a standard Bayesian classification rule. In this paper, we introduce a novel classification scheme for sequences based on HMMs, which is obtained by extending the recently proposed similarity-based classification paradigm to HMM-based classification. In this approach, each object is described by the vector of its similarities with respect to a predetermined set of other objects, where these similarities are supported by HMMs. A central problem is the high dimensionality of resulting space, and, to deal with it, three alternatives are investigated. Synthetic and real experiments show that the similarity-based approach outperforms standard HMM classification schemes.

Introduction

The analysis of sequential data is an interesting and important research area. Probabilistic modelling and classification is intrinsically more difficult when each observation is a sequence, compared to the standard scenario where each observation is a set (vector) of features. In fact, since the length of the sequences may vary, it is not possible to directly use standard pattern recognition techniques. Moreover, sequence classification problems usually involve very large data sets.

Hidden Markov models (HMMs) are commonly employed probabilistic models of sequential data [1]. HMMs can be viewed as stochastic generalizations of finite-state automata, when both the transitions between states and the generation of output symbols are governed by probabilistic mechanisms [1]. Although the basic theory and inference tools were developed in the late 1960s [2], [3], HMMs have only been extensively applied in the last decade. Speech recognition [1], DNA and protein modelling [4], [5], handwritten character recognition [6], gesture recognition [7], and behavior analysis and synthesis [8] are examples of problems for which HMMs have been exploited.

The standard HMM-based approach to sequence classification consists in training one HMM for each class, which are subsequently used as class-conditional densities in a standard Bayes classification paradigm. For example, assuming a priori equiprobable classes, an unknown sequence is classified into the class whose model shows the highest probability (likelihood) of having generated this sequence (this is the well-known maximum-likelihood (ML) classification rule).

In this paper, an alternative classification scheme is proposed, by extending the similarity-based paradigm [9], [10], [11], [12], [13], [14] to HMM-based classification. This paradigm, which has been introduced recently, differs from typical pattern recognition approaches where objects to be classified are represented by sets (vectors) of features. In the similarity-based paradigm, objects are described using pairwise (dis)similarities, i.e. distances from other objects in the data set. In this way, objects are not constrained to be explicitly represented in a feature space, and all that is necessary is a way to compute (dis)similarities between pairs of objects. The goal is then to learn a classifier only from these relational data.

The literature on similarity-based classification is not vast [9], [10], [11], [12], [13], [14] (a brief review is given in Section 2.1). The general idea behind all these approaches is basically the same: given a set of pairwise dissimilarity values, a new representation space can be built, in which each object is described by these values. In Ref. [13], a simple synthetic experiment shows that a complex problem in a 2D space (requiring a quadratic classifier to achieve almost correct separation), becomes a linearly separable problem in a dissimilarity space.

In this paper, we extend this dissimilarity-based classification paradigm to HMM-based sequences classification problems. We propose to build a similarity¹ space, representing each object (sequence) by the vector of its similarities with respect to a predetermined set of objects (this can be the whole data set, in the simplest approach), called the representatives set; the classification is then performed in this new representation space. The similarities are derived by considering the likelihood $P (O | λ)$ as a measure of the similarity between the sequence $O$ and the HMM specified by the set of parameters $λ$ . This similarity measure was previously used in sequence clustering applications [15], [16].

The similarity-based classification paradigm seems to be particularly well suited to HMMs, as it can be seen as a natural extension of the standard HMM classification scheme. Specifically, the standard ML approach assigns an unknown sequence $O$ to the class whose model shows the highest likelihood. To do so, the likelihoods of $O$ with respect to the HMMs of all classes are evaluated, each stating a likelihood-based measure of the similarity between that class and the observed sequence. In other words, HMMs are used to compute similarities between sequences and classes, with each class being represented by a single HMM. Subsequently, only the maximum of these values is used to take the classification decision. In the similarity-based approach, the classification decision is taken using the whole set of similarities between each observed sequence and all the other sequences. We will show that this strategy results in a substantial improvement in the classification performance, compared to standard HMM-based approaches. Moreover, with the use of HMMs and the similarity representation, the problem of classification of sequences is reduced to a more standard classification task (where each object is described by a fixed-length feature vector), for which arbitrarily sophisticated techniques can be used, allowing to increase even more the classification performance.

The proposed approach was successfully tested on both synthetic and real data, involving 2D shape recognition and face recognition problems. In comparison with the standard HMM-based ML classification approach, our method showed a significant performance improvement, confirming all the potential of the similarity-based classification approach.

The main problem of the similarity-based approach, of particular relevance in practical applications, is the high dimensionality of the resulting similarity space. Actually, in the basic approach, this dimensionality is equal to the cardinality of the whole training data set, possibly leading to a huge computational burden. In the literature, two types of solutions of this problem could be identified, summarized in Section 2.2. In this paper, three methods to face this problem are proposed. The first one aims at removing redundancy from the data by applying linear dimensionality reduction techniques, such as Fisher discriminant analysis (FDA) [17] and principal component analysis (PCA) [19]. The second proposed method is based on a greedy strategy known as matching pursuit [20], which selects a subset of representatives based on which the similarity values are computed. These two approaches are very general, and can be applied in all distance-based classification contexts. The third proposed approach is more specific to the HMM case, and is based on a simple adaptation of the similarity-based classification approach to the standard HMM learning procedure. All these approaches were experimentally evaluated, confirming the discriminative power of the similarity space, even when the dimensionality is reduced to more manageable numbers.

Summarizing, the main contribution of this paper is the introduction of the similarity-based recognition paradigm in an HMM context, resulting in a significant performance improvement with respect to standard HMM-based classification. The mapping to the similarity space proposed in this paper allows us to reduce complex problems of sequence classification to a more standard point classification problem, for which arbitrarily sophisticated techniques could be used. From the point of view of similarity-based recognition, we propose two different approaches for dealing with the high dimensionality of the similarity space, which is one of the main problems of the method. First, the potential of linear reduction techniques, as PCA and FDA, is exploited, showing that they are able to reduce the curse of dimensionality impact on the classification process. Second, we address the choice of a set of appropriate representatives using the matching pursuit algorithm, which proves to be a robust and effective approach.

The rest of the paper is organized as follows. In Section 2, the state of the art related to the similarity-based classification and to the dimensionality issue is summarized. In Section 3, HMMs are introduced, together with the standard classification scheme. The proposed strategy is described in Section 4, and Section 5 reports the experiments and the related results are discussed. Finally, in Section 6, conclusions are drawn and future perspectives are envisaged.

Section snippets

Similarity-based classification

The literature on similarity-based classification is not vast. The approach seems to have been firstly introduced by Jain and Zongker [9], who have obtained a dissimilarity measure, based on deformable templates, for the hand-written digit recognition problem. A multidimensional scaling approach was then used to project this dissimilarity space onto a low-dimensional space, where a 1-nearest-neighbor (1-NN) classifier was employed to classify new objects. In Ref. [10], Graepel et al.

Hidden Markov models

A discrete-time HMM is a probabilistic model that describes a random sequence $O = O_{1}, O_{2},..., O_{T}$ as being an indirect observation of an underlying (hidden) random sequence $Q = Q_{1}, Q_{2},..., Q_{T}$ , where this hidden process is Markovian, even though the observed process may not be. Due to lack of space, HMM theory will not be covered in detail here; for a comprehensive tutorial, see Ref. [1]. Basically, an HMM $λ$ is a 4-tuple $λ = (S, A, π, B)$ , where S is the set of states, $A$ is the transition matrix (representing

Results and discussion

In this section, experimental results are reported, in order to validate the proposed approach. Firstly, we investigate the discriminative power of the space $S_{R}$ with $R = T$ , i.e. using as reference set the whole training set $T$ . The standard ML classification scheme and the proposed approach are compared, with both synthetic and real data. The use of PCA and FDA is investigated in this context, also with the aim of visualizing the data. Secondly, experimental results concerning the two different

Conclusions

In this paper we have proposed a novel sequence classification scheme by combining hidden Markov models (HMM) with the similarity-based paradigm. This approach creates a representation space for sequences in which standard feature-based classification techniques can be used. We showed that a simple classifier in a such space outperforms standard HMM-based classification schemes. Three approaches to deal with the high dimensionality of the resulting space were also considered and investigated,

About the Author—MANUELE BICEGO received his Laurea degree and Ph.D. degree in Computer Science from University of Verona in 1999 and 2003, respectively. Since 2000 he is working in VIPS (Vision, Image Processing and Sound) laboratory of the Computer Science Department of University of Verona. His research interests include statistical pattern recognition, artificial vision, electronic noses, neural networks, hidden Markov models and video analysis. Manuele Bicego is member of the IAPR-IC

References (29)

E. Pekalska et al.
Dissimilarity representations allow for building good classifiers
Pattern Recogn. Lett.
(2002)
L.R. Rabiner
A tutorial on hidden Markov models and selected applications in speech recognition
Proc. IEEE
(1989)
L.E. Baum et al.
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
Ann. Math. Stat.
(1970)
L.E. Baum
An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes
Inequality
(1970)
R. Durbin et al.
Biological Sequence AnalysisProbabilistic Models of Proteins and Nucleic Acids
(1998)
R. Hughey et al.
Hidden Markov model for sequence analysisextension and analysis of the basic method
Comput. Appl. Biosci.
(1996)
J. Hu et al.
HMM based online handwriting recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(1996)
S. Eickeler et al.
Hidden Markov model based continuous online gesture recognition
T. Jebara et al.
Action reaction learningautomatic visual analysis and synthesis of interactive behavior
A.K. Jain et al.
Representation and recognition of handwritten digits using deformable templates
IEEE Trans. Pattern Anal. Mach. Intell.
(1997)

T. Graepel et al.

Classification on pairwise proximity data

D.W. Jacobs et al.

Classification with nonmetric distancesimage retrieval and class representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

E. Pekalska et al.

Automatic pattern recognition by similarity representations

Electron. Lett.

(2001)

E. Pekalska et al.

A generalized kernel approach to dissimilarity-based classification

J. Mach. Learning Res.

(2002)

Cited by (105)

Real-time walking gait terrain classification from foot-mounted Inertial Measurement Unit using Convolutional Long Short-Term Memory neural network
2022, Expert Systems with Applications
We propose a novel online real-time gait terrain detection algorithm from the measurements of a foot-mounted Inertial Measurement Unit (IMU), using a shallow cascaded Convolutional and Long Short-Term Memory neural network (CNN-LSTM). Gait data is acquired from healthy subjects walking in an unstructured environment that includes level ground, stair ascent and stair descent. The CNN-LSTM subject-independent classifier is trained to continuously detect the terrain from the time series data, invariant to IMU initial pose.
Our results show that the classifier is able to correctly detect the terrain on data from unseen subjects, in less than 90ms from toe-off (f1-score $> 0.89$ ), improving further its classification performance in less than 135ms from toe-off (f1-score $> 0.98$ ). Furthermore, we present a novel capability with this classifier to timely detect terrain transitions, switching from the starting to the final terrain during midswing. The CNN-LSTM classifier is therefore suitable to be used in assistive devices, timely adjusting to the different gait kinematics, using a single foot-mounted IMU.
An almost complete curvature scale space representation: Euclidean case
2019, Signal Processing: Image Communication
Here, we intend to propose local shape curve features which are invariant under planar Euclidean transformations and independent with respect to the original curve parameterization. The present work generalizes the family of Curvature Scale Space descriptors in order to increase the shape information quantity to tend to the completeness property. For this, a more pragmatic criterion is introduced in this paper which we call the almost completeness. We define it as a pre-completeness for a given resolution of features. Such descriptors are formed by the curvatures on the set of curve points obtained from the antecedents of different curvature levels. This level set is fixed with a given rule. The idea of the almost completeness is to make a compromise between the cardinal of the set of curvature’s levels and the optimal number of scales. The rule is submitted to an unsupervised statistical study and the scales are obtained with a spectral analysis. Experiments are conducted on several known datasets. Promising results in the sense of shape retrieval and shape recognition rates are demonstrated.
Clustering via binary embedding
2018, Pattern Recognition
In this paper, we present a novel clustering scheme based on binary embeddings, which provides compact and informative binary representations of high-dimensional objects. The binary representations are obtained with a collection of one-class classifiers learned from (pseudo) randomly selected points in the dataset. To cluster the binary representations, we consider two approaches: a mixture of Bernoulli distributions and a recent biclustering approach called CRAFT. The empirical evaluation in comparison with both classic and recent clustering methods, based on 12 different datasets, provides encouraging results. The main feature of the proposed method is that it is agnostic to the shape of the clusters.
Multiple structure recovery with T-linkage
2017, Journal of Visual Communication and Image Representation
This work addresses the problem of robust fitting of geometric structures to noisy data corrupted by outliers. An extension of J-linkage (called T-linkage) is presented and elaborated. T-linkage improves the preference analysis implemented by J-linkage in term of performances and robustness, considering both the representation and the segmentation steps. A strategy to reject outliers and to estimate the inlier threshold is proposed, resulting in a versatile tool, suitable for multi-model fitting “in the wild”. Experiments demonstrate that our methods perform better than J-linkage on simulated data, and compare favorably with state-of-the-art methods on public domain real datasets.
Co-spectral for robust shape clustering
2016, Pattern Recognition Letters
Shape clustering is a difficult visual task due to large intra-class variations and small inter-class variations induced by shape articulation, rotation, occlusion, etc. To tackle this problem, we attempt to leverage the complementary nature among features of different statistics (e.g., skeleton-based descriptors and contour-based descriptors) for robust clustering. In this paper, a similarity fusion framework based on spectral analysis is proposed. The proposed method, which we call co-spectral, is a spectral clustering algorithm. It has two inborn merits for shape clustering: (1) it can automatically make use of the complementarity of various shape similarities based on a co-training framework; (2) it does not require shape representations to be vectors. Co-spectral is evaluated on several popular shape benchmarks. The experimental results demonstrate that co-spectral outperforms the state-of-the-art algorithms by a large margin.
Improving the text classification using clustering and a novel HMM to reduce the dimensionality
2016, Computer Methods and Programs in Biomedicine
Citation Excerpt :
In this paradigm, objects (documents, in our case) are described using pairwise (dis)similarities, i.e. distances from other objects in the dataset. In this way, documents are not limited to being represented in a feature word space, and all that is needed is a way to compute distances between documents [8]. The goal of this paradigm is then to train and test a classifier using only these relational data.
In text classification problems, the representation of a document has a strong impact on the performance of learning systems. The high dimensionality of the classical structured representations can lead to burdensome computations due to the great size of real-world data. Consequently, there is a need for reducing the quantity of handled information to improve the classification process. In this paper, we propose a method to reduce the dimensionality of a classical text representation based on a clustering technique to group documents, and a previously developed Hidden Markov Model to represent them. We have applied tests with the k-NN and SVM classifiers on the OHSUMED and TREC benchmark text corpora using the proposed dimensionality reduction technique. The experimental results obtained are very satisfactory compared to commonly used techniques like InfoGain and the statistical tests performed demonstrate the suitability of the proposed technique for the preprocessing step in a text classification task.

View all citing articles on Scopus

About the Author—VITTORIO MURINO is professor and chairman of the Department of Computer Science of the University of Verona, Italy. He received the Laurea degree in Electronic Engineering in 1989 and the Ph.D. in Electronic Engineering and Computer Science in 1993, both at the University of Genoa. He was a Post-Doctoral Fellow at the University of Genoa, working in the Signal Processing and Understanding Group of the Department of Biophysical and Electronic Engineering, as supervisor of research activities on signal and image processing in underwater environments. From 1995 to 1998, he was assistant professor at the Department of Mathematics and Computer Science of the University of Udine, Italy, supervising research activities regarding multisensorial underwater vision for object recognition and virtual reality applications. From 1998 he is at the University of Verona, where he founded the Vision, Image Processing and Sound (VIPS) laboratory. He worked at several national and European projects, especially in the context of the MAST (MArine Science and Technology) programme concerning with the investigation of underwater scenes by visual and acoustical sensors. He is an evaluator for the European Commission of project proposals related to several programmes. His main research interests include: 3D computer vision and pattern recognition, acoustic and optical underwater vision, probabilistic techniques for image processing (specifically, Hidden Markov models, Markov random fields, Bayesian networks), data fusion, and neural networks with applications on surveillance, autonomous driving, visual inspection, and robotics. He is also interested in the integration of image analysis and synthesis methodologies for object recognition and 3D modelling. Dr. Murino is author of more than 100 papers in the above subjects, and associate editor of the Pattern Recognition and IEEE Transactions on Systems, Man, and Cybernetics journals, and the electronic journal ELCVIA (Electronic Letters on Computer Vision and Image Analysis). He is also referee for many international journals, member of IAPR, and senior member of IEEE.

About the Author—MÁRIO A.T. FIGUEIREDO received the E.E., M.Sc. and Ph.D. degrees in electrical and computer engineering, all from “Instituto Superior Tecnico” (IST), the engineering school of the Technical University of Lisbon, Portugal, in 1985, 1990, and 1994, respectively. He has been an Assistant Professor with the Department of Electrical and Computer Engineering of IST, since 1994. He is also a researcher with the Communication Theory and Pattern Recognition Group at the Institute of Telecommunications, Lisbon. In 1998, he was a visiting professor with the Department of Computer Science and Engineering, at Michigan State University. His research interests are in the fields of image analysis, computer vision, and statistical pattern recognition. He is currently an Associate Editor of the IEEE Transactions on Image Processing, the IEEE Transactions on Mobile Computing, and Pattern Recognition Letters. He is co-chair of the 2001 and 2003 editions of the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition. Mário Figueiredo is a Senior Member of the IEEE and received the 1995 Portuguese IBM Scientific Prize for his work on unsupervised image restoration.

View full text

Similarity-based classification of sequences using hidden Markov models

Abstract

Introduction

Section snippets

Similarity-based classification

Hidden Markov models

Results and discussion

Conclusions

Pattern Recogn. Lett.

A tutorial on hidden Markov models and selected applications in speech recognition

Proc. IEEE

A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains

Ann. Math. Stat.

An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes

Inequality

Biological Sequence AnalysisProbabilistic Models of Proteins and Nucleic Acids

Hidden Markov model for sequence analysisextension and analysis of the basic method

Comput. Appl. Biosci.

HMM based online handwriting recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Hidden Markov model based continuous online gesture recognition

Action reaction learningautomatic visual analysis and synthesis of interactive behavior

Representation and recognition of handwritten digits using deformable templates

IEEE Trans. Pattern Anal. Mach. Intell.

Classification on pairwise proximity data

Classification with nonmetric distancesimage retrieval and class representation

IEEE Trans. Pattern Anal. Mach. Intell.

Automatic pattern recognition by similarity representations

Electron. Lett.

A generalized kernel approach to dissimilarity-based classification

J. Mach. Learning Res.