Skip to main content
Log in

Using interesting sequences to interactively build Hidden Markov Models

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The paper presents a method of interactive construction of global Hidden Markov Models (HMMs) based on local sequence patterns discovered in data. The method is based on finding interesting sequences whose frequency in the database differs from that predicted by the model. The patterns are then presented to the user who updates the model using their intelligence and their understanding of the modelled domain. It is demonstrated that such an approach leads to more understandable models than automated approaches. Two variants of the problem are considered: mining patterns occurring only at the beginning of sequences and mining patterns occurring at any position; both practically meaningful. For each variant, algorithms have been developed allowing for efficient discovery of all sequences with given minimum interestingness. Applications to modelling webpage visitors behavior and to modelling protein secondary structure are presented, validating the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD conference on management of data, pp 207–216

  • Asai K, Hayamizu S, Handa K (1993) Prediction of protein secondary structure by the Hidden Markov Model. Bioinformatics 9(2): 141–146

    Article  Google Scholar 

  • Balasubramanian V (1993) Equivalence and reduction of Hidden Markov Models. A.I. Technical Report 1370, Artificial Intelligence Laboratory, MIT

  • Bouchaffra D, Tan J (2006) Protein fold recognition using a structural Hidden Markov Model. In: International conference on pattern recognition, vol 3. Los Alamitos, CA, pp 186–189

  • Dunham MH (2003) Data mining, introductory and advanced topics, Part III. http://engr.smu.edu/~mhd/dmbook/part3.ppt

  • Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1): 55–86

    Article  MathSciNet  Google Scholar 

  • Hand D (2002) Pattern detection and discovery. In: Hand D, Adams N, Bolton R (eds) Pattern detection and discovery. Springer, Berlin, pp 1–12

    Chapter  Google Scholar 

  • Hunter L (1993) Molecular biology for computer scientists. In: Hunter L (eds) Artificial intelligence and molecular biology, chap 1. AAAI Press, Menlo Park, pp 1–46

    Google Scholar 

  • Huo Q, Lee CH (1997) On-line adaptive learning of the continuous density Hidden Markov Model based on approximate recursive Bayes estimate. IEEE Trans Speech Audio Process 5(2): 161–172

    Article  Google Scholar 

  • Huo Q, Chan C, Lee CH (1995) Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition. IEEE Trans Speech Audio Process 3(5): 334–345

    Article  Google Scholar 

  • Inoue M, Ueda N (2003) Exploitation of unlabeled sequences in Hidden Markov Models. IEEE Trans Pattern Anal Mach Intell 25: 1570–1581

    Article  Google Scholar 

  • Jaroszewicz S (2008) Interactive HMM construction based on interesting sequences. In: Proceedings of local patterns to global models (LeGo’08) workshop at the 12th European conference on principles and practice of knowledge discovery in databases (PKDD’08), Antwerp, Belgium, pp 82–91

  • Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a Bayesian network. In: 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2005), Chicago, IL, pp 118–127

  • Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), Seattle, WA, pp 178–186

  • Jaroszewicz S, Scheffer T, Simovici D (2009) Scalable pattern mining with Bayesian networks as background knowledge. Data Min Knowl Discov 18(1): 56–100

    Article  Google Scholar 

  • Ji S, Watson L, Carin L (2009) Semisupervised learning of Hidden Markov Models via a homotopy method. IEEE Trans Pattern Anal Mach Intell 31: 275–287

    Article  Google Scholar 

  • Laxman S, Sastry P (2005) Discovering frequent episodes and learning Hidden Markov Models: a formal connection. IEEE Trans Knowl Data Eng 17(11): 1505–1517

    Article  Google Scholar 

  • Laxman S, Sastry P (2006) A survey of temporal data mining. Sadhana 31(2): 173–198

    Article  MATH  MathSciNet  Google Scholar 

  • Lee CH, Gauvin JL (1996) Bayesian adaptive learning and map estimation of HMM. In: Lee CH, Soong F, Paliwal K (eds) Automatic speech and speaker recognition: advanced topics. Springer, New York, pp 83–108

    Google Scholar 

  • Li D, Laurent A, Poncelet P (2007) Mining unexpected sequential patterns and rules. Technical Report RR-07027, Laboratoire d’Informatique de Robotique et de Microélectronique de Montpellier

  • Low-Kam C, Mas A, Teisseire M (2009) Mining for unexpected sequential patterns given a Markov model. http://www.math.univ-montp2.fr/~mas/lmt_siam09.pdf

  • Meyer C (2001) Matrix analysis and applied linear algebra book and solutions manual. SIAM, Philadelphia

    Google Scholar 

  • Prum B, Rodolphe F, de Turckheim E (1995) Finding words with unexpected frequencies in DNA sequences. J R Stat Soc Ser B 57: 205–220

    MATH  Google Scholar 

  • Qian N, Sejnowski T (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202: 865–884

    Article  Google Scholar 

  • Rabiner LR (1989) A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc IEEE 77(2): 257–286

    Article  Google Scholar 

  • Robbins H (1956) An empirical bayes approach to statistics. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1: contributions to the theory of statistics, University of California Press, pp 157–163

  • Satsangi A, Zaiane O (2007) Contrasting the contrast sets: an alternative approach. In: Eleventh international database engineering and applications symposium (IDEAS 2007), Banff, Canada

  • Spiliopoulou M (1999) Managing interesting rules in sequence mining. In: Proceedings of the third European conference on principles of data mining and knowledge discovery, pp 554–560

  • Uhlig F (2001) Transform linear algebra. Prentice Hall, Upper Saddle River

    Google Scholar 

  • van Loan C, Golub G (1996) Matrix computations. Johns Hopkins University Press, London

    MATH  Google Scholar 

  • Welch L (2003) Hidden Markov Models and the Baum-Welch algorithm. IEEE Inform Theory Soc Newsl 53(4): 1,10–13

    MathSciNet  Google Scholar 

  • Zaiane O, Yacef K, Kay J (2007) Finding top-n emerging sequences to contrast sequence sets. Tech. Rep. TR07-03, Department of Computing Science, University of Alberta, Edmonton, AB, Canada

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Szymon Jaroszewicz.

Additional information

Responsible editor: Johannes Fürnkranz and Arno Knobbe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaroszewicz, S. Using interesting sequences to interactively build Hidden Markov Models. Data Min Knowl Disc 21, 186–220 (2010). https://doi.org/10.1007/s10618-010-0171-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0171-0

Keywords

Navigation