research-article

Fast likelihood search for hidden Markov models

Authors:

Yasuhiro Fujiwara,

Yasushi Sakurai,

Masaru KitsuregawaAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 3, Issue 4

Article No.: 18, Pages 1 - 37

https://doi.org/10.1145/1631162.1631166

Published: 04 December 2009 Publication History

Abstract

Hidden Markov models (HMMs) are receiving considerable attention in various communities and many applications that use HMMs have emerged such as mental task classification, biological analysis, traffic monitoring, and anomaly detection. This article has two goals; The first goal is exact and efficient identification of the model whose state sequence has the highest likelihood for the given query sequence (more precisely, no HMM that actually has a high-probability path for the given sequence is missed by the algorithm), and the second goal is exact and efficient monitoring of streaming data sequences to find the best model. We propose SPIRAL, a fast search method for HMM datasets. SPIRAL is based on three ideas; (1) it clusters states of models to compute approximate likelihood, (2) it uses several granularities and approximates likelihood values in search processing, and (3) it focuses on just the promising likelihood computations by pruning out low-likelihood state sequences. Experiments verify the effectiveness of SPIRAL and show that it is more than 490 times faster than the naive method.

References

[1]

Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. B. 2003. Aurora: a new model and architecture for data stream management. VLDB J. 12, 2, 120--139.

Digital Library

[2]

Agrawal, R., Faloutsos, C., and Swami, A. N. 1993. Efficient similarity search in sequence databases. In Proceedings of the International Conference on Foundations of Data Organization and Algorithms (FODO). Lecture Notes in Computer Science, vol. 730, Springer-Verlag, Berlin, Germany, 69--84.

Digital Library

[3]

Agrawal, R., Lin, K.-I., Sawhney, H. S., and Shim, K. 1995. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 490--501.

Digital Library

[4]

Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. 2002. Characterizing memory requirements for queries over continuous data streams. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGANT Symposium on Principles of Database Systems. ACM, New York, 221--232.

Digital Library

[5]

Babcock, B., Babu, S., Datar, M., and Motwani, R. 2003. Chain: Operator scheduling for memory minimization in data stream systems. In Proceedings of the SIGMOD Conference. ACM, New York, 253--264.

Digital Library

[6]

Baldi, P., Chauvin, Y., Hunkapiller, T., and McClure, M. A. 1994. Hidden Markov models of biological primary sequence information. Proc. Nat. Acad. Sci. 91, 1059--1063.

[7]

Barbara, D., Couto, J., Jajodia, S., and Wu, N. 2001. Adam: A testbed for exploring the use of data mining in intrusion detection. SIGMOD Record 30, 4, 15--24.

Digital Library

[8]

Bickel, P., Chen, C., Kwon, J., Pravin, J. R., and Zwet, V. E. V. 2001. Traffic flow on a freeway network. In Proceedings of the Workshop on Nonlinear Estimation and Classification.

[9]

Bishop, C. M. 2007. Pattern Recognition and Machine Learning. Springer-Verlag, Berlin, Germany.

Digital Library

[10]

Bocchieri, E. 1993. Vector quantization for the efficient computation of continuous density likelihoods. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). IEEE Computer Society Press, Los Alamitos, CA, 692--695.

[11]

Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., and Shah, M. A. 2003. Telegraphcq: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). ACM, New York.

Digital Library

[12]

Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the SIGMOD Conference. ACM, New York, 551--562.

Digital Library

[13]

Cheng, R., Xia, Y., Prabhakar, S., Shah, R., and Vitter, J. S. 2004. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 876--887.

Digital Library

[14]

Cranor, C. D., Johnson, T., Spatscheck, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the SIGMOD Conference. ACM, New York, 647--651.

Digital Library

[15]

Das, G., Gunopulos, D., and Mannila, H. 1997. Finding similar time series. In Proceedings of the Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 88--100.

Digital Library

[16]

Denning, D. E. 1998. Cyberspace Attacks and Countermeasures. ACM Press/Addison-Wesley Publishing Co., New York.

Digital Library

[17]

Deshpande, A., Guestrin, C., Hong, W., and Madden, S. 2005. Exploiting correlated attributes in acquisitional query processing. In ICDE. 143--154.

Digital Library

[18]

Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. 1999. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

[19]

Eickeler, S., Kosmala, A., and Rigoll, G. 1998. Hidden Markov model based continuous online gesture recognition. In ICPR. 1206--1208.

Digital Library

[20]

Esposito, R. and Radicioni, D. P. 2007. Carpediem: An algorithm for the fast evaluation of SSL classifiers. In ICML. 257--264.

Digital Library

[21]

Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the SIGMOD Conference. ACM, New York, 419--429.

Digital Library

[22]

Fujiwara, Y., Sakurai, Y., and Yamamuro, M. 2008. Spiral: Efficient and exact model identification for hidden Markov models. In KDD. 247--255.

Digital Library

[23]

Gales, M., Knill, K., and Young, S. 1999. State-based Gaussian selection in large vocabulary continuous speech recognition using HMMS. In TSAP. 152--161.

[24]

Ganti, V., Gehrke, J., and Ramakrishnan, R. 2000. Demon: Mining and monitoring evolving data. In ICDE. 439--448.

Digital Library

[25]

Gao, L. and Wang, X. S. 2005. Continuous similarity-based queries on streaming time series. IEEE Trans. Knowl. Data Eng. 17, 10, 1320--1332.

Digital Library

[26]

Gehrke, J., Korn, F., and Srivastava, D. 2001. On computing correlated aggregates over continual data streams. In Proceedings of the SIGMOD Conference. ACM, New York, 13--24.

Digital Library

[27]

Haussler, D., Krogh, A., Mian, I. S., and Sjolander, K. 1993. Protein modeling using hidden Markov models: Analysis of globins. In HICSS 39. 792--802.

[28]

Helbing, D., Herrmann, H. J., Schreckenberg, M., and Wolf, D. E. 2000. Traffic and Granular Flow ‘99: Social, Traffic, and Granular Dynamics. Springer-Verlag, Berlin, Germany.

[29]

Hu, J., Brown, M. K., and Turin, W. 1996. HMM based on-line handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18, 10, 1039--1045.

Digital Library

[30]

Huang, J., Liu, Z., and Wang, Y. 2005. Joint scene classification and segmentation based on hidden markov model. IEEE Trans. Multimed. 7, 3, 538--550.

Digital Library

[31]

Hunt, M. and Lefebvre, C. 1989. A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). IEEE Computer Society Press, Los Alamitos, CA, 262--265.

[32]

Jelinek, F. 1999. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, MA.

Digital Library

[33]

Kahn, J. M., Katz, R. H., and Pister, K. S. J. 1999. Next century challenges: Mobile networking for “smart dust”. In Proceedings of MOBICOM. 271--278.

Digital Library

[34]

Kaufman, L. and Rousseeuw, P. J. 2005. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, New York.

[35]

Keogh, E. J. 2002. Exact indexing of dynamic time warping. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 406--417.

Digital Library

[36]

Kwon, J. and Murphy, K. 2000. Modeling freeway traffic with coupled hmms. Tech. Rep. University of California at Berkeley, Berkeley, CA.

[37]

Lane, T. 1999. Hidden markov models for human/computer interface modeling. In Proceedings of the IJCAI-99 Workshop on Learning About Users. 35--44.

[38]

Law, M. H. C. and Kwok, J. T. 2000. Rival penalized competitive learning for model-based sequence clustering. In ICPR. 2195--2198.

[39]

Levinson, S. E., Rabiner, L. R., and Sondhi, M. M. 1982. An introduction to the application of the theory of probabilistic functions of a markov process to automatic speech recognition. Bell Syst. Tech. J. 62, 1035--1074.

[40]

Li, C. and Biswas, G. 1999. Clustering sequence data using hidden markov model representation. In Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology. 14--21.

[41]

Li, C. and Biswas, G. 2000. A Bayesian approach to temporal data clustering using hidden markov models. In ICML. 543--550.

Digital Library

[42]

Moon, Y.-S., Whang, K.-Y., and Han, W.-S. 2002. General match: A subsequence matching method in time-series databases based on generalized windows. In Proceedings of the SIGMOD Conference. ACM, New York, 382--393.

Digital Library

[43]

Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G. S., Olston, C., Rosenstein, J., and Varma, R. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). ACM, New York.

[44]

Mount, D. W. 2001. Bioinformatics: Sequence and genome analysis. Cold Spring Harbor Laboratory Press.

[45]

Ney, H., Mergel, D., Noll, A., and Paesler, A. 1992. Data driven search organization for continuous speech recognition. IEEE Trans. Signal Processing. 40, 2, 272--281.

Digital Library

[46]

Novak, D., T. Al-Ani, Y. H., and Lhotska, L. 2004. Electroencephalogram processing using hidden markov models. In EUROSIM.

[47]

Pfurtscheller, G., Flotzinger, D., and Neuper, C. 1994. Differentiation between finger, toe and tongue movement in man based on 40 hz EEG. Electroencephalog. Clin. Neurophysiol. 90, 6, 456--460.

[48]

Rabiner, L. R. and Juang, B. H. 1986. An introduction to hidden markov models. IEEE ASSP Magazine 3, 4--16.

[49]

Sagayama, S., Knill, K., and Takahashi, S. 1995. On the use of scalar quantization for fast hmm computation. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). IEEE Computer Society Press, Los Alamitos, CA, 213--216.

[50]

Siddiqi, S. M. and Moore, A. W. 2005. Fast inference and learning in large-state-space hmms. In ICML. 800--807.

Digital Library

[51]

Singh, S. P., Jaakkola, T., and Jordan, M. I. 1994. Reinforcement learning with soft state aggregation. In NIPS. 361--368.

[52]

Smyth, P. 1996. Clustering sequences with hidden markov models. In NIPS. 648--654.

[53]

Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., and Prabhakar, S. 2005. Indexing multidimensional uncertain data with arbitrary probability density functions. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 922--933.

Digital Library

[54]

Tatbul, N., Çetintrnel, U., Zdonik, S. B., Cherniack, M., and Stonebraker, M. 2003. Load shedding in a data stream manager. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 309--320.

Digital Library

[55]

Warrender, C., Forrest, S., and Pearlmutter, B. A. 1999. Detecting intrusions using system calls: Alternative data models. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA, 133--145.

[56]

Yi, B.-K., Jagadish, H. V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of the ICDE. 201--208.

Digital Library

[57]

Zhang, T., Ramakrishnan, R., and Livny, M. 1996. Birch: An efficient data clustering method for very large databases. In Proceedings of the SIGMOD Conference. ACM, New York, 103--114.

Digital Library

[58]

Zhong, S. and Ghosh, J. 2002. Hmms and coupled hmms for multi-channel eeg classification. In Proceedings of the IEEE International Joint Conference on Neural Networks. IEEE Computer Society Press, Los Alamitos, CA, 1154--1159.

[59]

Zhu, Y. and Shasha, D. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 358--369.

Digital Library

Cited By

Monteiro BDavis CFonseca F(2016)A survey on the geographic scope of textual documentsComputers & Geosciences10.1016/j.cageo.2016.07.01796:C(23-34)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.cageo.2016.07.017
Huang DWang XDou RLiu SFang J(2015)N-gram distribution and unification gain problem and its optimal solutionInternational Journal of Systems Science10.1080/00207721.2013.82260846:7(1327-1336)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1080/00207721.2013.822608
Liao ZFu XWang Y(2012)The Research of Improved Apriori AlgorithmApplied Mechanics and Materials10.4028/www.scientific.net/AMM.263-266.2179263-266(2179-2184)Online publication date: Dec-2012
https://doi.org/10.4028/www.scientific.net/AMM.263-266.2179
Show More Cited By

Index Terms

Fast likelihood search for hidden Markov models
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

SPIRAL: efficient and exact model identification for hidden Markov models
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Hidden Markov models (HMMs) have received considerable attention in various communities (e.g, speech recognition, neurology and bioinformatic) since many applications that use HMM have emerged. The goal of this work is to identify efficiently and ...
Texture Classification Using Noncausal Hidden Markov Models

This paper addresses the problem of using noncausal hidden Markov models (HMMs) for texture classification. In noncausal models, the state of each pixel may be dependent on its neighbors in all directions. New algorithms are given to learn the ...
Hidden Markov models with arbitrary state dwell-time distributions

A hidden Markov model (HMM) with a special structure that captures the 'semi'-property of hidden semi-Markov models (HSMMs) is considered. The proposed model allows arbitrary dwell-time distributions in the states of the Markov chain. For dwell-time ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 3, Issue 4

November 2009

196 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/1631162

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2009

Accepted: 01 July 2009

Revised: 01 May 2009

Received: 01 January 2009

Published in TKDD Volume 3, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
743
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Monteiro BDavis CFonseca F(2016)A survey on the geographic scope of textual documentsComputers & Geosciences10.1016/j.cageo.2016.07.01796:C(23-34)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.cageo.2016.07.017
Huang DWang XDou RLiu SFang J(2015)N-gram distribution and unification gain problem and its optimal solutionInternational Journal of Systems Science10.1080/00207721.2013.82260846:7(1327-1336)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1080/00207721.2013.822608
Liao ZFu XWang Y(2012)The Research of Improved Apriori AlgorithmApplied Mechanics and Materials10.4028/www.scientific.net/AMM.263-266.2179263-266(2179-2184)Online publication date: Dec-2012
https://doi.org/10.4028/www.scientific.net/AMM.263-266.2179
Fujiwara YSakurai Y(2011)Fast Algorithm for Monitoring Data Streams by Using Hidden Markov ModelsNTT Technical Review10.53829/ntr201112ra29:12(53-60)Online publication date: Dec-2011
https://doi.org/10.53829/ntr201112ra2
Wang PShi LWang BWu YLiu Y(2010)Survey on HMM based anomaly intrusion detection using system calls2010 5th International Conference on Computer Science & Education10.1109/ICCSE.2010.5593839(102-105)Online publication date: Aug-2010
https://doi.org/10.1109/ICCSE.2010.5593839

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents