Skip to main content
Log in

Sequence classification via large margin hidden Markov models

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We address the sequence classification problem using a probabilistic model based on hidden Markov models (HMMs). In contrast to commonly-used likelihood-based learning methods such as the joint/conditional maximum likelihood estimator, we introduce a discriminative learning algorithm that focuses on class margin maximization. Our approach has two main advantages: (i) As an extension of support vector machines (SVMs) to sequential, non-Euclidean data, the approach inherits benefits of margin-based classifiers, such as the provable generalization error bounds. (ii) Unlike many algorithms based on non-parametric estimation of similarity measures that enforce weak constraints on the data domain, our approach utilizes the HMM’s latent Markov structure to regularize the model in the high-dimensional sequence space. We demonstrate significant improvements in classification performance of the proposed method in an extensive set of evaluations on time-series sequence data that frequently appear in data mining and computer vision domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alon J, Sclaroff S, Kollios G, Pavlovic V (2003) Discovering clusters in motion time-series data. In: Computer vision pattern recognition, Madison, WI

  • Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: International conference on machine learning, Washington, DC

  • Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory 44(2): 525–536

    Article  MathSciNet  MATH  Google Scholar 

  • Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Nashua

    MATH  Google Scholar 

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory, Pittsburgh, PA

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Empirical methods in natural language processing, Philadelphia, PA

  • Crammer K, Singer Y, Cristianini N, Shawe-Taylor J,Williamson B (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Machine Learn Res 2:265–292

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39: 185–197

    MathSciNet  Google Scholar 

  • Duan K, Keerthi S (2003) Which is the best multiclass SVM method? An empirical study. In: Neural information processing systems, Vancouver, BC, Canada

  • Durbin R, Eddy S, Krogh A, Mitchenson G (2002) Biological sequence analysis. Cambridge University Press, Cambridge

    Google Scholar 

  • Greiner R, Zhou W (2002) Structural extension to logistic regression: discriminative parameter learning of belief net classifiers. In: Proceedings of annual meeting of the American Association for Artificial Intelligence, Edmonton, Alberta, Canada

  • Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Neural information processing systems, Vancouver, BC, Canada

  • Heigold G, Schluter R, Ney H (2007) On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields. In: Proceedings of the international conference on spoken language processing (Interspeech). Antwerp, Belgium

  • Hettich S, Bay SD (1999) The UCI KDD archive. University of California, Department of Information and Computer Science, Irvine. http://kdd.ics.uci.edu

  • Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. In: International conference on intelligent systems for molecular biology, Heidelberg, Germany

  • Juang BH, Rabiner LR (1985) A probabilistic distance measure for hidden Markov models. AT & T Tech J 64:391–408

    MathSciNet  Google Scholar 

  • Keogh E, Folias T (2002) The UCR time series data mining archive. University of California – Computer Science & Engineering Department, Riverside. http://www.cs.ucr.edu/~eamonn/TSDMA/index.html

  • Keshet J, Shalev-Shwartz S, Bengio S, Singer Y, Chazan D (2006) Discriminative kernel-based phoneme sequence recognition. In: The 9th international conference on spoken language processing (INTERSPEECH), Pittsburgh, PA

  • Krogh A (1994) Hidden markov models for labeled sequences. In: In proceedings of the 12th IAPR ICPR’94, IEEE Computer Society Press, pp. 140–144

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning, Williamstown, MA

  • Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. Pacific Symp Biocomput 7: 566–575

    Google Scholar 

  • Li X, Jiang H, Liu C (2005) Large margin hidden Markov models for speech recognition. In: International conference on acoustics, speech, and signal processing, Philadelphia, PA

  • Li J, Yuan M, Lee CH (2006) Soft margin estimation of hidden Markov model parameters. In: International conference on spoken language processing, Pittsburgh, PA

  • Liu C, Jiang H, Li X (2005) Discriminative training of CDHMMs for maximum relative separation margin. In: International conference on acoustics, speech, and signal processing, Philadelphia, PA

  • Nadas A (1983) A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Acoust Speech Signal Process 31(4): 814–817

    Article  Google Scholar 

  • Ng AY, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: Neural information processing systems, Vancouver, BC, Canada

  • Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian Network Classifiers. In: International conference on machine learning, Bonn, Germany

  • Quattoni A, Wang S, Morency LP, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10): 1848–1852

    Article  Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–286

    Article  Google Scholar 

  • Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: SIAM international conference on data mining, Lake Buena Vista, FL

  • Ratanamahatana CA, Keogh E (2005) Three myths about dynamic time warping. In: SIAM international conference on data mining, Newport Beach, CA

  • Sakoe H, Chiba C (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1): 43–49

    Article  MATH  Google Scholar 

  • Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of human language technology-NAACL, Edmonton, Alberta, Canada

  • Sha F, Saul LK (2007) Large margin hidden Markov models for automatic speech recognition. In: Neural information processing systems, Vancouver, BC, Canada

  • Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th annual conference on computational learning theory, Desenzano sul Garda, Italy

  • Starner T, Pentland A (1995) Real-time American sign language recognition from video using hidden Markov models. In: International symposium on computer vision, Coral Gables, FL

  • Tanawongsuwan R, Bobick A (2003) Performance analysis of time-distance gait parameters under different speeds. In: International conference on audio and video based biometric person authentication, Guildford, UK

  • Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Neural information processing systems, Vancouver, BC, Canada

  • Taskar B, Lacoste-Julien S, Klein D (2005) A discriminative matching approach to word alignment. In: Empirical methods in natural language processing, Vancouver, BC, Canada

  • Tian TP, Li R, Sclaroff S (2005) Articulated pose estimation in a learned smooth space of feasible solutions. In: Proceedings of IEEE workshop in computer vision and pattern recognition, San Diego, CA

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  • Veeraraghavan A, Chellappa R, Roy-Chowdhury A (2006) The function space of an activity. In: Computer vision and pattern recognition, New York, NY

  • Wilson AD, Bobick AF (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9): 884–900

    Article  Google Scholar 

  • Woodland P, Povey D (2002) Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang 16(1): 25–47

    Article  Google Scholar 

  • Zhang T (2002) Covering number bounds of certain regularized linear function classes. J Mach Learn Res 2: 527–550

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minyoung Kim.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M., Pavlovic, V. Sequence classification via large margin hidden Markov models. Data Min Knowl Disc 23, 322–344 (2011). https://doi.org/10.1007/s10618-010-0206-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0206-6

Keywords

Navigation