Skip to main content
Log in

NetNPG: Nonoverlapping pattern matching with general gap constraints

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Pattern matching (PM) with gap constraints (or flexible wildcards) is one of the essential tasks in repetitive sequential pattern mining (or sequence pattern mining), since it can compute the support of a pattern in a sequence. Nonoverlapping PM (or PM under nonoverlapping condition) which is a kind of PM with gap constraints methods allows the same position character in the sequence to be reused at different locations in the pattern, but is not allowed to be reused in the same position of the pattern. The researches on nonoverlapping are under non-negative gaps which are more restrictive on the order of each character occurring in the sequence. As we know that it is easy to obtain valuable patterns under the nonoverlapping condition in sequence pattern mining. This paper addresses a nonoverlapping PM problem with general gaps which means that the gap can be a negative value. We proposes an effective algorithm which employs Nettree structure to convert the problem into a general gap Nettree at first. In order to find the nonoverlapping occurrences, the algorithm employs a backtracking strategy to find the leftmost full path in each iteration. This paper also analyzes the time and space complexities of the proposed algorithm. Experimental results verify the proposed algorithm has better performance and demonstrate that the general gap is more flexible than the non-negative gap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26 (1):97–107

    Article  Google Scholar 

  2. Wu M, Wu X (2019) On big wisdom. Knowl Inf Syst 58(1):1–8

    Article  Google Scholar 

  3. Bai L, Lin Y, Liu J (2017) FSPTwigFast: Holistic twig query on fuzzy spatiotemporal XML data. Appl Intell 47(4):1224–1239

    Article  Google Scholar 

  4. Jiang H, Nie L, Sun Z, Ren Z, Kong W, Luo X (2016) ROSF: Leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Services Comput 12(1):34–46

    Article  Google Scholar 

  5. Ibañez R, Soria Á, Teyseyre A, Rodríguez G, Campo M (2017) Approximate string matching: A lightweight approach to recognize gestures with kinect. Pattern Recogn 62:73–86

    Article  Google Scholar 

  6. Joseph J A, Korah R, Salivahanan S (2018) Efficient string matching FPGA for speed up network intrusion detection. Appl Math Inf Sci 12(2):397–404

    Article  MathSciNet  Google Scholar 

  7. Dong X, Gong Y, Cao L (2018) e-RNSP: An efficient method for mining repetition negative sequential patterns. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2018.2869907

  8. Fischer M J, Paterson MS (1974) String matching and other products. In: Proceedings of the 7th SIAM AMS complexity of computation, Cambridge, USA, pp 113–125

  9. Manber U, Baeza-Yates R (1991) An algorithm for string matching with a sequence of don’t cares. Inf Process Lett 37(3):133–136

    Article  MathSciNet  MATH  Google Scholar 

  10. Chen G, Wu X, Zhu X, Arslan A N, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419

    Article  Google Scholar 

  11. Ding B, Lo D, Han J, Khoo SC (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th international conference on data engineering(ICDE), Shanghai, China, pp 1024–1035

  12. Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Sci China Inf Sci 60(1):012101:1–16

    Article  Google Scholar 

  13. Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822

    Article  Google Scholar 

  14. Liu H, Liu Z, Huang H, Wu X (2018) Sequential pattern matching with general gap and one-off condition. J Soft 29(2):363–382

    MathSciNet  Google Scholar 

  15. Drory Retwitzer M, Polishchuk M, Churkin E, Kifer I, Yakhini Z, Barash D (2015) RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucl Acids Res 43(W1):W507–W512

    Article  Google Scholar 

  16. Tan C, Min F, Wang M, zhang H, Zhang Z (2016) Discovering patterns with weak-wildcard gaps. IEEE Access 4:4922–4932

    Article  Google Scholar 

  17. Zhou K, Chen H, Xiong Z, Li C, Sun H (2018) Parallel pattern matching algorithm with sparse gap constrain. J Soft 29(12):3799–3819

    MATH  Google Scholar 

  18. Liu H, Wang L, Liu Z, Zhao P, Wu X (2018) Efficient pattern matching with periodical wildcards in uncertain sequences. Intell Data Anal 22(4):829–842

    Article  Google Scholar 

  19. Dong X, Qiu P, Lv J, Cao L, Xu T (2019) Mining top-k useful negative sequential patterns via learning. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2018.2886199

  20. Han C, Duan L, Lin Z, Qin R, Zhang P, Nummenmaa J (2019) Discovering relationship patterns among associated temporal event sequences. In: International conference on database systems for advanced applications 2019:. Springer, Cham, pp 107–123

  21. Min F, Zhang Z, Zhai W, Shen R (2018) Frequent pattern discovery with tri-partition alphabets. Information Sciences. https://doi.org/10.1016/j.ins.2018.04.013

  22. Jiang X, Xu T, Dong X (2019) Campus data analysis based on positive and negative sequential patterns. Int J Pattern Recogn Artificial Intell 33(5):1959016

    Article  Google Scholar 

  23. Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206

    Article  Google Scholar 

  24. Zhang B, Lin J C W, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. PloS one 12(7):e0180931

    Article  Google Scholar 

  25. Hu H, Zheng K, Wang X, Zhou A (2015) GFilter: A general gram filter for string similarity search. IEEE Trans Knowl Data Eng 27(4):1005–1018

    Article  Google Scholar 

  26. Wang H, Duan L, Zuo J, Wang W, Li Z, Tang C (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint. J Comput 39(10):1979–1991

    MathSciNet  Google Scholar 

  27. Duan L, Tang G, Pei J, Bailey J, Dong G, Nguyen V, Campbell A, Tang C (2016) Efficient discovery of contrast subspaces for object explanation and characterization. Knowl Inf Syst 47(1):99–129

    Article  Google Scholar 

  28. Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580

    Article  Google Scholar 

  29. Yang C, Jiang Y, Liu Y, Wang L (2018) CNOR: A non-overlapping wildcard rule caching system for software-defined networks. In: 2018 IEEE symposium on computers and communications, Natal, pp 00707–00712

  30. Fredriksson K, Grabowski S (2006) Efficient algorithms for pattern matching with general gaps and character classed. In: International conference on string processing and information retrieval, Glasgow, UK, pp 267–278

  31. Fredriksson K, Grabowski S (2008) Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf Retrieval 11(4):335–357

    Article  Google Scholar 

  32. Bouakkaz M, Ouinten Y, Loudcher S, Fournier-Viger P (2018) Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48(4):1013–1019

    Article  Google Scholar 

  33. Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst 115:27–39

    Article  Google Scholar 

  34. Wu Y, Liu Y, Guo L, Wu X (2013) Subnettrees for strict pattern matching with general gaps and length constraints. J Softw 24(5):915–932

    Article  MathSciNet  Google Scholar 

  35. Warmuth M K, David H (1984) On the complexity of iterated shuffle. J Comput Syst Sci 28(3):345–358

    Article  MathSciNet  MATH  Google Scholar 

  36. Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99–116

    Article  Google Scholar 

  37. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: A novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144

    Article  MathSciNet  Google Scholar 

  38. Okolica J S, Peterson G, Mills R F, Grimaila M R (2018) Sequence pattern mining with variables. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2018.2881675

  39. Jia Y, Bailey J, Ramamohanarao K, Leckie C, Ma X (2019) Exploiting patterns to explain individual predictions. Knowledge and Information Systems. https://doi.org/10.1007/s10115-019-01368-9

  40. Le T, Vo B, Fournier-Viger P, Lee M Y, Baik S W (2019) SPPC: A new tree structure for mining erasable patterns in data streams. Appl Intell 49(2):478–495

Download references

Acknowledgements

This work was party supported by National Natural Science Foundation of China (61976240, 61702157, 917446209), National Key Research and Development Program of China (2016YFB1000901), and Graduate Student Innovation Program of Hebei Province (CXZZSS2019023).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youxi Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Q., Shan, J., Yan, W. et al. NetNPG: Nonoverlapping pattern matching with general gap constraints. Appl Intell 50, 1832–1845 (2020). https://doi.org/10.1007/s10489-019-01616-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01616-z

Keywords

Navigation