skip to main content
research-article

Fast likelihood search for hidden Markov models

Published: 04 December 2009 Publication History

Abstract

Hidden Markov models (HMMs) are receiving considerable attention in various communities and many applications that use HMMs have emerged such as mental task classification, biological analysis, traffic monitoring, and anomaly detection. This article has two goals; The first goal is exact and efficient identification of the model whose state sequence has the highest likelihood for the given query sequence (more precisely, no HMM that actually has a high-probability path for the given sequence is missed by the algorithm), and the second goal is exact and efficient monitoring of streaming data sequences to find the best model. We propose SPIRAL, a fast search method for HMM datasets. SPIRAL is based on three ideas; (1) it clusters states of models to compute approximate likelihood, (2) it uses several granularities and approximates likelihood values in search processing, and (3) it focuses on just the promising likelihood computations by pruning out low-likelihood state sequences. Experiments verify the effectiveness of SPIRAL and show that it is more than 490 times faster than the naive method.

References

[1]
Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. B. 2003. Aurora: a new model and architecture for data stream management. VLDB J. 12, 2, 120--139.
[2]
Agrawal, R., Faloutsos, C., and Swami, A. N. 1993. Efficient similarity search in sequence databases. In Proceedings of the International Conference on Foundations of Data Organization and Algorithms (FODO). Lecture Notes in Computer Science, vol. 730, Springer-Verlag, Berlin, Germany, 69--84.
[3]
Agrawal, R., Lin, K.-I., Sawhney, H. S., and Shim, K. 1995. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 490--501.
[4]
Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. 2002. Characterizing memory requirements for queries over continuous data streams. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGANT Symposium on Principles of Database Systems. ACM, New York, 221--232.
[5]
Babcock, B., Babu, S., Datar, M., and Motwani, R. 2003. Chain: Operator scheduling for memory minimization in data stream systems. In Proceedings of the SIGMOD Conference. ACM, New York, 253--264.
[6]
Baldi, P., Chauvin, Y., Hunkapiller, T., and McClure, M. A. 1994. Hidden Markov models of biological primary sequence information. Proc. Nat. Acad. Sci. 91, 1059--1063.
[7]
Barbara, D., Couto, J., Jajodia, S., and Wu, N. 2001. Adam: A testbed for exploring the use of data mining in intrusion detection. SIGMOD Record 30, 4, 15--24.
[8]
Bickel, P., Chen, C., Kwon, J., Pravin, J. R., and Zwet, V. E. V. 2001. Traffic flow on a freeway network. In Proceedings of the Workshop on Nonlinear Estimation and Classification.
[9]
Bishop, C. M. 2007. Pattern Recognition and Machine Learning. Springer-Verlag, Berlin, Germany.
[10]
Bocchieri, E. 1993. Vector quantization for the efficient computation of continuous density likelihoods. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). IEEE Computer Society Press, Los Alamitos, CA, 692--695.
[11]
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., and Shah, M. A. 2003. Telegraphcq: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). ACM, New York.
[12]
Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the SIGMOD Conference. ACM, New York, 551--562.
[13]
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., and Vitter, J. S. 2004. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 876--887.
[14]
Cranor, C. D., Johnson, T., Spatscheck, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the SIGMOD Conference. ACM, New York, 647--651.
[15]
Das, G., Gunopulos, D., and Mannila, H. 1997. Finding similar time series. In Proceedings of the Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 88--100.
[16]
Denning, D. E. 1998. Cyberspace Attacks and Countermeasures. ACM Press/Addison-Wesley Publishing Co., New York.
[17]
Deshpande, A., Guestrin, C., Hong, W., and Madden, S. 2005. Exploiting correlated attributes in acquisitional query processing. In ICDE. 143--154.
[18]
Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. 1999. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
[19]
Eickeler, S., Kosmala, A., and Rigoll, G. 1998. Hidden Markov model based continuous online gesture recognition. In ICPR. 1206--1208.
[20]
Esposito, R. and Radicioni, D. P. 2007. Carpediem: An algorithm for the fast evaluation of SSL classifiers. In ICML. 257--264.
[21]
Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the SIGMOD Conference. ACM, New York, 419--429.
[22]
Fujiwara, Y., Sakurai, Y., and Yamamuro, M. 2008. Spiral: Efficient and exact model identification for hidden Markov models. In KDD. 247--255.
[23]
Gales, M., Knill, K., and Young, S. 1999. State-based Gaussian selection in large vocabulary continuous speech recognition using HMMS. In TSAP. 152--161.
[24]
Ganti, V., Gehrke, J., and Ramakrishnan, R. 2000. Demon: Mining and monitoring evolving data. In ICDE. 439--448.
[25]
Gao, L. and Wang, X. S. 2005. Continuous similarity-based queries on streaming time series. IEEE Trans. Knowl. Data Eng. 17, 10, 1320--1332.
[26]
Gehrke, J., Korn, F., and Srivastava, D. 2001. On computing correlated aggregates over continual data streams. In Proceedings of the SIGMOD Conference. ACM, New York, 13--24.
[27]
Haussler, D., Krogh, A., Mian, I. S., and Sjolander, K. 1993. Protein modeling using hidden Markov models: Analysis of globins. In HICSS 39. 792--802.
[28]
Helbing, D., Herrmann, H. J., Schreckenberg, M., and Wolf, D. E. 2000. Traffic and Granular Flow ‘99: Social, Traffic, and Granular Dynamics. Springer-Verlag, Berlin, Germany.
[29]
Hu, J., Brown, M. K., and Turin, W. 1996. HMM based on-line handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18, 10, 1039--1045.
[30]
Huang, J., Liu, Z., and Wang, Y. 2005. Joint scene classification and segmentation based on hidden markov model. IEEE Trans. Multimed. 7, 3, 538--550.
[31]
Hunt, M. and Lefebvre, C. 1989. A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). IEEE Computer Society Press, Los Alamitos, CA, 262--265.
[32]
Jelinek, F. 1999. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, MA.
[33]
Kahn, J. M., Katz, R. H., and Pister, K. S. J. 1999. Next century challenges: Mobile networking for “smart dust”. In Proceedings of MOBICOM. 271--278.
[34]
Kaufman, L. and Rousseeuw, P. J. 2005. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, New York.
[35]
Keogh, E. J. 2002. Exact indexing of dynamic time warping. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 406--417.
[36]
Kwon, J. and Murphy, K. 2000. Modeling freeway traffic with coupled hmms. Tech. Rep. University of California at Berkeley, Berkeley, CA.
[37]
Lane, T. 1999. Hidden markov models for human/computer interface modeling. In Proceedings of the IJCAI-99 Workshop on Learning About Users. 35--44.
[38]
Law, M. H. C. and Kwok, J. T. 2000. Rival penalized competitive learning for model-based sequence clustering. In ICPR. 2195--2198.
[39]
Levinson, S. E., Rabiner, L. R., and Sondhi, M. M. 1982. An introduction to the application of the theory of probabilistic functions of a markov process to automatic speech recognition. Bell Syst. Tech. J. 62, 1035--1074.
[40]
Li, C. and Biswas, G. 1999. Clustering sequence data using hidden markov model representation. In Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology. 14--21.
[41]
Li, C. and Biswas, G. 2000. A Bayesian approach to temporal data clustering using hidden markov models. In ICML. 543--550.
[42]
Moon, Y.-S., Whang, K.-Y., and Han, W.-S. 2002. General match: A subsequence matching method in time-series databases based on generalized windows. In Proceedings of the SIGMOD Conference. ACM, New York, 382--393.
[43]
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G. S., Olston, C., Rosenstein, J., and Varma, R. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR). ACM, New York.
[44]
Mount, D. W. 2001. Bioinformatics: Sequence and genome analysis. Cold Spring Harbor Laboratory Press.
[45]
Ney, H., Mergel, D., Noll, A., and Paesler, A. 1992. Data driven search organization for continuous speech recognition. IEEE Trans. Signal Processing. 40, 2, 272--281.
[46]
Novak, D., T. Al-Ani, Y. H., and Lhotska, L. 2004. Electroencephalogram processing using hidden markov models. In EUROSIM.
[47]
Pfurtscheller, G., Flotzinger, D., and Neuper, C. 1994. Differentiation between finger, toe and tongue movement in man based on 40 hz EEG. Electroencephalog. Clin. Neurophysiol. 90, 6, 456--460.
[48]
Rabiner, L. R. and Juang, B. H. 1986. An introduction to hidden markov models. IEEE ASSP Magazine 3, 4--16.
[49]
Sagayama, S., Knill, K., and Takahashi, S. 1995. On the use of scalar quantization for fast hmm computation. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP). IEEE Computer Society Press, Los Alamitos, CA, 213--216.
[50]
Siddiqi, S. M. and Moore, A. W. 2005. Fast inference and learning in large-state-space hmms. In ICML. 800--807.
[51]
Singh, S. P., Jaakkola, T., and Jordan, M. I. 1994. Reinforcement learning with soft state aggregation. In NIPS. 361--368.
[52]
Smyth, P. 1996. Clustering sequences with hidden markov models. In NIPS. 648--654.
[53]
Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., and Prabhakar, S. 2005. Indexing multidimensional uncertain data with arbitrary probability density functions. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 922--933.
[54]
Tatbul, N., Çetintrnel, U., Zdonik, S. B., Cherniack, M., and Stonebraker, M. 2003. Load shedding in a data stream manager. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 309--320.
[55]
Warrender, C., Forrest, S., and Pearlmutter, B. A. 1999. Detecting intrusions using system calls: Alternative data models. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA, 133--145.
[56]
Yi, B.-K., Jagadish, H. V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of the ICDE. 201--208.
[57]
Zhang, T., Ramakrishnan, R., and Livny, M. 1996. Birch: An efficient data clustering method for very large databases. In Proceedings of the SIGMOD Conference. ACM, New York, 103--114.
[58]
Zhong, S. and Ghosh, J. 2002. Hmms and coupled hmms for multi-channel eeg classification. In Proceedings of the IEEE International Joint Conference on Neural Networks. IEEE Computer Society Press, Los Alamitos, CA, 1154--1159.
[59]
Zhu, Y. and Shasha, D. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the Conference on Very Large Databases (VLDB). ACM, New York, 358--369.

Cited By

View all

Index Terms

  1. Fast likelihood search for hidden Markov models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 3, Issue 4
    November 2009
    196 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/1631162
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 December 2009
    Accepted: 01 July 2009
    Revised: 01 May 2009
    Received: 01 January 2009
    Published in TKDD Volume 3, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Hidden Markov model
    2. likelihood
    3. upper bound

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)A survey on the geographic scope of textual documentsComputers & Geosciences10.1016/j.cageo.2016.07.01796:C(23-34)Online publication date: 1-Nov-2016
    • (2015)N-gram distribution and unification gain problem and its optimal solutionInternational Journal of Systems Science10.1080/00207721.2013.82260846:7(1327-1336)Online publication date: 1-May-2015
    • (2012)The Research of Improved Apriori AlgorithmApplied Mechanics and Materials10.4028/www.scientific.net/AMM.263-266.2179263-266(2179-2184)Online publication date: Dec-2012
    • (2011)Fast Algorithm for Monitoring Data Streams by Using Hidden Markov ModelsNTT Technical Review10.53829/ntr201112ra29:12(53-60)Online publication date: Dec-2011
    • (2010)Survey on HMM based anomaly intrusion detection using system calls2010 5th International Conference on Computer Science & Education10.1109/ICCSE.2010.5593839(102-105)Online publication date: Aug-2010

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media