Sequence classification via large margin hidden Markov models

Kim, Minyoung; Pavlovic, Vladimir

doi:10.1007/s10618-010-0206-6

Sequence classification via large margin hidden Markov models

Published: 25 November 2010

Volume 23, pages 322–344, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Minyoung Kim¹ &
Vladimir Pavlovic²

360 Accesses
7 Citations
Explore all metrics

Abstract

We address the sequence classification problem using a probabilistic model based on hidden Markov models (HMMs). In contrast to commonly-used likelihood-based learning methods such as the joint/conditional maximum likelihood estimator, we introduce a discriminative learning algorithm that focuses on class margin maximization. Our approach has two main advantages: (i) As an extension of support vector machines (SVMs) to sequential, non-Euclidean data, the approach inherits benefits of margin-based classifiers, such as the provable generalization error bounds. (ii) Unlike many algorithms based on non-parametric estimation of similarity measures that enforce weak constraints on the data domain, our approach utilizes the HMM’s latent Markov structure to regularize the model in the high-dimensional sequence space. We demonstrate significant improvements in classification performance of the proposed method in an extensive set of evaluations on time-series sequence data that frequently appear in data mining and computer vision domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alon J, Sclaroff S, Kollios G, Pavlovic V (2003) Discovering clusters in motion time-series data. In: Computer vision pattern recognition, Madison, WI
Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: International conference on machine learning, Washington, DC
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory 44(2): 525–536
Article MathSciNet MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Nashua
MATH Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory, Pittsburgh, PA
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar
Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Empirical methods in natural language processing, Philadelphia, PA
Crammer K, Singer Y, Cristianini N, Shawe-Taylor J,Williamson B (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Machine Learn Res 2:265–292
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39: 185–197
MathSciNet Google Scholar
Duan K, Keerthi S (2003) Which is the best multiclass SVM method? An empirical study. In: Neural information processing systems, Vancouver, BC, Canada
Durbin R, Eddy S, Krogh A, Mitchenson G (2002) Biological sequence analysis. Cambridge University Press, Cambridge
Google Scholar
Greiner R, Zhou W (2002) Structural extension to logistic regression: discriminative parameter learning of belief net classifiers. In: Proceedings of annual meeting of the American Association for Artificial Intelligence, Edmonton, Alberta, Canada
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Neural information processing systems, Vancouver, BC, Canada
Heigold G, Schluter R, Ney H (2007) On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields. In: Proceedings of the international conference on spoken language processing (Interspeech). Antwerp, Belgium
Hettich S, Bay SD (1999) The UCI KDD archive. University of California, Department of Information and Computer Science, Irvine. http://kdd.ics.uci.edu
Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. In: International conference on intelligent systems for molecular biology, Heidelberg, Germany
Juang BH, Rabiner LR (1985) A probabilistic distance measure for hidden Markov models. AT & T Tech J 64:391–408
MathSciNet Google Scholar
Keogh E, Folias T (2002) The UCR time series data mining archive. University of California – Computer Science & Engineering Department, Riverside. http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
Keshet J, Shalev-Shwartz S, Bengio S, Singer Y, Chazan D (2006) Discriminative kernel-based phoneme sequence recognition. In: The 9th international conference on spoken language processing (INTERSPEECH), Pittsburgh, PA
Krogh A (1994) Hidden markov models for labeled sequences. In: In proceedings of the 12th IAPR ICPR’94, IEEE Computer Society Press, pp. 140–144
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning, Williamstown, MA
Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. Pacific Symp Biocomput 7: 566–575
Google Scholar
Li X, Jiang H, Liu C (2005) Large margin hidden Markov models for speech recognition. In: International conference on acoustics, speech, and signal processing, Philadelphia, PA
Li J, Yuan M, Lee CH (2006) Soft margin estimation of hidden Markov model parameters. In: International conference on spoken language processing, Pittsburgh, PA
Liu C, Jiang H, Li X (2005) Discriminative training of CDHMMs for maximum relative separation margin. In: International conference on acoustics, speech, and signal processing, Philadelphia, PA
Nadas A (1983) A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Acoust Speech Signal Process 31(4): 814–817
Article Google Scholar
Ng AY, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: Neural information processing systems, Vancouver, BC, Canada
Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian Network Classifiers. In: International conference on machine learning, Bonn, Germany
Quattoni A, Wang S, Morency LP, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10): 1848–1852
Article Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–286
Article Google Scholar
Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: SIAM international conference on data mining, Lake Buena Vista, FL
Ratanamahatana CA, Keogh E (2005) Three myths about dynamic time warping. In: SIAM international conference on data mining, Newport Beach, CA
Sakoe H, Chiba C (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1): 43–49
Article MATH Google Scholar
Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of human language technology-NAACL, Edmonton, Alberta, Canada
Sha F, Saul LK (2007) Large margin hidden Markov models for automatic speech recognition. In: Neural information processing systems, Vancouver, BC, Canada
Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th annual conference on computational learning theory, Desenzano sul Garda, Italy
Starner T, Pentland A (1995) Real-time American sign language recognition from video using hidden Markov models. In: International symposium on computer vision, Coral Gables, FL
Tanawongsuwan R, Bobick A (2003) Performance analysis of time-distance gait parameters under different speeds. In: International conference on audio and video based biometric person authentication, Guildford, UK
Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Neural information processing systems, Vancouver, BC, Canada
Taskar B, Lacoste-Julien S, Klein D (2005) A discriminative matching approach to word alignment. In: Empirical methods in natural language processing, Vancouver, BC, Canada
Tian TP, Li R, Sclaroff S (2005) Articulated pose estimation in a learned smooth space of feasible solutions. In: Proceedings of IEEE workshop in computer vision and pattern recognition, San Diego, CA
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Veeraraghavan A, Chellappa R, Roy-Chowdhury A (2006) The function space of an activity. In: Computer vision and pattern recognition, New York, NY
Wilson AD, Bobick AF (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9): 884–900
Article Google Scholar
Woodland P, Povey D (2002) Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang 16(1): 25–47
Article Google Scholar
Zhang T (2002) Covering number bounds of certain regularized linear function classes. J Mach Learn Res 2: 527–550
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic & Information Engineering, Seoul National University of Science & Technology, Seoul, 139-743, Korea
Minyoung Kim
Department of Computer Science, Rutgers University, Piscataway, NJ, USA
Vladimir Pavlovic

Authors

Minyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Pavlovic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minyoung Kim.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M., Pavlovic, V. Sequence classification via large margin hidden Markov models. Data Min Knowl Disc 23, 322–344 (2011). https://doi.org/10.1007/s10618-010-0206-6

Download citation

Received: 15 March 2010
Accepted: 01 November 2010
Published: 25 November 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10618-010-0206-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequence classification via large margin hidden Markov models

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of methods for time series change point detection

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequence classification via large margin hidden Markov models

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of methods for time series change point detection

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation