Skip to main content
Log in

Strict approximate pattern matching with general gaps

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Pattern matching with gap constraints is one of the essential problems in computer science such as music information retrieval and sequential pattern mining. One of the cases is called loose matching, which only considers the matching position of the last pattern substring in the sequence. One more challenging problem is considering the matching positions of each character in the sequence, called strict pattern matching which is one of the essential tasks of sequential pattern mining with gap constraints. Some strict pattern matching algorithms were designed to handle pattern mining tasks, since strict pattern matching can be used to compute the frequency of some patterns occurring in the given sequence and then the frequent patterns can be derived. In this article, we address a more general strict approximate pattern matching with Hamming distance, named SAP (Strict Approximate Pattern matching with general gaps and length constraints), which means that the gap constraints can be negative. We show that a SAP instance can be transformed into an exponential amount of the exact pattern matching with general gaps instances. Hence, we propose an effective online algorithm, named SETA (SubnETtree for sAp), based on the subnettree structure (a Nettree is an extension of a tree with multi-parents and multi-roots) and show the completeness of the algorithm. The space and time complexities of the algorithm are O(m × Maxlen × W × d) and O(Maxlen × W × m 2 × n × d), respectively, where m, Maxlen, W, and d are the length of pattern P, the maximal length constraint, the maximal gap length of pattern P and the approximate threshold. Extensive experimental results validate the correctness and effectiveness of SETA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Chouvalit K, Veera B (2013) A new linear-time dynamic dictionary matching algorithm. Comput Inform 32(5):897–923

    MathSciNet  Google Scholar 

  2. Aligon J, Golfarelli M, Marcel P, Rizzi S, Turricchia E (2014) Similarity measures for OLAP sessions. Knowl Inf Syst 39(2):463–489

    Article  Google Scholar 

  3. Knuth DE, Morris JH, Pratt VR (1977) Fast pattern matching in strings. SIAM J. Comput 6(2):323–350

    Article  MATH  MathSciNet  Google Scholar 

  4. Fischer MJ , Paterson MS (1974) String matching and other products . In: Proceedings of the 7th SIAM AMS complexity of computation, Cambridge, USA, pp 113-125

  5. Manber U, Baeza YR (1991) An algorithm for string matching with a sequence of don’t cares. Inf Process Lett 37(2):133–136

    Article  MATH  Google Scholar 

  6. Navarro G, Raffinot M (2003) Fast and simple character classes and bounded gaps pattern matching with applications to protein searching. J Comput Biol 10(6):903–923

    Article  Google Scholar 

  7. Cole R, Gottlieb L, Lewenstein M (2004) Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th ACM symposium on the theory of computing, Chicago, USA, pp 91-100

  8. Crochemore M, Iliopoulos C, Makris C, Rytter W, Tsakalidis A, Trichlas K (2002) Approximate string matching with gaps. Nord J Comput 9(1):54–65

    MATH  Google Scholar 

  9. Cantone D, Cristofaro S, Faro S (2009) New efficient bit-parallel algorithms for the (δ, α)-matching problem with applications in music information retrieval. Int J Found Comput Sci 20(6):1087–1108

    Article  MATH  MathSciNet  Google Scholar 

  10. Ji X, Bailey J, Dong G (2007) Mining minimal distinguishing subsequence patterns with gap constraints. Knowl Inf Syst 11(2):259–286

    Article  Google Scholar 

  11. Ferreira PG, Azevedo PJ (2005) Protein sequence pattern mining with constraints. In: European conference on principles and practice of knowledge discovery in databases (PKDD), Porto, Portugal, pp 96-107

  12. Zhang M, Kao B, Cheung D, Yip K (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1(2):7–es

    Article  Google Scholar 

  13. Zhu X, Wu X (2007) Mining complex patterns across sequences with gap requirements. In: Proceedings of the 20th international joint conference on artificial intelligence (IJCAI), Hyderabad, India, pp 2934–2940

  14. Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99–116

    Article  Google Scholar 

  15. Tsai CY, Chen CJ, Chien CJ (2013) A time-interval sequence classification method. Knowl Inf Syst 37(2):251–278

    Article  Google Scholar 

  16. Wu Y, Liu Y, Guo L, Wu X (2013) Subnettrees for strict pattern matching with general gaps and length constraints. J Softw 24(5):915–932

    Article  MathSciNet  Google Scholar 

  17. Fredriksson K, Grabowski S (2006) Efficient algorithms for pattern matching with general gaps and character classes . In: International conference on string processing and information retrieval, Glasgow, UK, pp 267-278

  18. Fredriksson K, Grabowski S (2008) Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf Retrieval 11(4):335–357

    Article  Google Scholar 

  19. Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intelligence 39(1):57–74

    Article  Google Scholar 

  20. Huang Y, Wu X, Hu X, Xie F, Gao J, Wu G (2009) Mining frequent patterns with gaps and one-off condition . In: IEEE international conference on computational science and engineering (CSE’09), Vancouver, BC, Canada, pp 180–186

  21. Lam HT, Mörchen F, Fradkin D (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52

    Article  MathSciNet  Google Scholar 

  22. Ding B, Lo D, Han J (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database . In: IEEE 25th international conference on data engineering(ICDE), Shanghai, China, pp 1024–1035

  23. Min F, Wu X, Lu Z (2009) Pattern matching with independent wildcard gaps. In: Proceedings of the 8th international conference on pervasive intelligence and computing, Chengdu, China, pp 194–199

  24. Bille P, Gørtz I, Vildhøj H, Wind D (2010) String matching with variable length gaps. In: Proceedings of the 17th international conference on string processing and information retrieval, SPIRE, Mexico, pp 385-394

  25. Rahman S, Iliopoulos C , Lee I, Mohamed M , Smyth W (2006) Finding patterns with variable length gaps or don’t cares. In: 12th annual international conference computing and combinatorics, Taiwan, pp 146-155

  26. Bille P, IL Gørtz, Vildhøj HW (2012) String matching with variable length gaps. Theor Comput Sci 443:25–34

    Article  MATH  Google Scholar 

  27. He D, Wu X, Zhu X (2007) SAIL-APPROX: An efficient on-line algorithm for approximate pattern matching with wildcards and length constraints. In: Proceedings of the 2007 IEEE international conference on bioinformatics and biomedicine (BIBM’07), Silicon Valley, USA, pp 151-0-158

  28. Wu Y, Wu X, Min F, Li Y (2010) A Nettree for pattern matching with flexible wildcard constraints . In: Proceedings of the 2010 IEEE international conference on information reuse and integration (IRI2010), Las Vegas, USA, pp 109-114

  29. Rasheed F, Adnan M, Alhajj R (2013) Out-of-core detection of periodicity from sequence databases. Knowl Inf Syst 36(1): 277–301

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by the National Natural Foundation of China under grants No. 61229301 and 61370144, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under grant IRT13059, the Natural Science Foundation of Hebei Province of China under grant No. F2013202138, and the Key Project of the Educational Commission of Hebei Province under grant No. ZH2012038.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youxi Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Fu, S., Jiang, H. et al. Strict approximate pattern matching with general gaps. Appl Intell 42, 566–580 (2015). https://doi.org/10.1007/s10489-014-0612-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0612-3

Keywords

Navigation