Skip to main content

Online Stochastic Pattern Matching

  • Conference paper
  • First Online:
Implementation and Application of Automata (CIAA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10977))

Included in the following conference series:

  • 420 Accesses

Abstract

The pattern matching problem is to find all occurrences of a given pattern in an input text. In particular, we consider the case when the pattern is a stochastic regular language where each pattern string has its own probability. Our problem is to find all matching patterns—(start, end) indices in the text—whose probability is larger than a given threshold probability. A pattern matching procedure is frequently used on streaming data in several applications, and often it is very challenging to find the start index of a matching in streaming data. We design an efficient algorithm for the stochastic pattern matching problem over streaming data based on the transformation of the pattern PFA into a weighted automaton and a constant bound on the number of backtracks required to find a start index while reading the streaming input. We also employ heuristics that enable us to reduce the number of backtracks, which improves the practical runtime of our algorithm. We establish the tight theoretical runtime of the proposed algorithm and experimentally demonstrate its practical performance. Finally, we show a possible application of our algorithm to another stochastic pattern matching problem where we search for the maximum probability substring of a text that is a superstring of a specified string.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 333–340 (1975)

    Article  MathSciNet  Google Scholar 

  2. Birney, E.: Hidden Markov models in biological sequence analysis. IBM J. Res. Dev. 45, 449–454 (2001)

    Article  Google Scholar 

  3. Blondel, V.D., Canterini, V.: Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Syst. 36, 231–245 (2003)

    Article  MathSciNet  Google Scholar 

  4. Casacuberta, F., de la Higuera, C.: Computational complexity of problems on probabilistic grammars and transducers. In: Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications, pp. 15–24 (2000)

    Chapter  Google Scholar 

  5. Droste, M., Kuich, W., Volger, H.: Handbook of Weighted Automata. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01492-5

    Book  MATH  Google Scholar 

  6. Dupont, P., Denis, F., Esposito, Y.: Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recogn. 38, 1349–1371 (2005)

    Article  Google Scholar 

  7. Fred, A.L.N.: Computation of substring probabilities in stochastic grammars. In: Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications, pp. 103–114 (2000)

    Chapter  Google Scholar 

  8. Guttman, O.: Probabilistic automata and distributions over sequences. Ph.D. thesis, The Australian National University (2006)

    Google Scholar 

  9. de la Higuera, C., Oncina, J.: Computing the most probable string with a probabilistic finite state machine. In: Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing, pp. 1–8 (2013)

    Google Scholar 

  10. de la Higuera, C., Oncina, J.: The most probable string: an algorithmic study. J. Logic Comput. 24, 311–330 (2014)

    Article  MathSciNet  Google Scholar 

  11. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6, 323–350 (1977)

    Article  MathSciNet  Google Scholar 

  12. Mohri, M., Pereira, F., Riley, M.: Speech recognition with weighted finite-state transducers. Comput. Speech Lang. 16, 69–88 (2002)

    Article  Google Scholar 

  13. Nederhof, M., Satta, G.: Computation of infix probabilities for probabilistic context-free grammars. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1213–1221 (2011)

    Google Scholar 

  14. Thompson, K.: Regular expression search algorithm. Commun. ACM 11, 419–422 (1968)

    Article  Google Scholar 

  15. Verwer, S., Eyraud, R., de la Higuera, C.: PAutomaC: a probabilistic automata and hidden Markov models learning competition. Mach. Learn. 96(1–2), 129–154 (2014)

    Article  MathSciNet  Google Scholar 

  16. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines-part I. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1013–1025 (2005)

    Article  Google Scholar 

  17. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines-part II. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1026–1039 (2005)

    Article  Google Scholar 

  18. Yoon, B.J.: Hidden Markov models and their applications in biological sequence analysis. Current Genomics 10(6), 402–415 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yo-Sub Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cognetta, M., Han, YS. (2018). Online Stochastic Pattern Matching. In: Câmpeanu, C. (eds) Implementation and Application of Automata. CIAA 2018. Lecture Notes in Computer Science(), vol 10977. Springer, Cham. https://doi.org/10.1007/978-3-319-94812-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94812-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94811-9

  • Online ISBN: 978-3-319-94812-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics