Skip to main content
Log in

NetDAP: (δ, γ) −approximate pattern matching with length constraints

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Pattern matching(PM) with gap constraints has been applied to compute the support of a pattern in a sequence, which is an essential task of the repetitive sequential pattern mining (or sequence pattern mining). Compared with exact PM, approximate PM allows data noise (differences) between the pattern and the matched subsequence. Therefore, more valuable patterns can be found. Approximate PM with gap constraints mainly adopts the Hamming distance to measure the approximation degree which only reflects the number of different characters between two sequences, but ignores the distance between different characters. Hence, this paper addresses (δ, γ) approximate PM with length constraints which employs local-global constraints to improve the accuracy of the PM, namely, the maximal distance between two corresponding characters is less or equal to the local threshold δ, and the sum of all the δ distances is also less or equal to the global threshold γ. To tackle the problem effectively, this paper proposes an effective online algorithm, named NetDAP, which employs a special designed data structure named approximate single-leaf Nettree. An approximate single-leaf Nettree can be created by adopting dynamic programming to determine the range of rootleaf, the minimal root, the maximal root, the range of nodes for each level, and the range of parents for each node. To improve the performance, two pruning strategies are proposed to prune the nodes and the parent-child relationships which do not satisfy the δ and γ distance constraints respectively. Finally, extensive experimental results on real protein data sets and time series verify the performance of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Fernau H, Manea F, Mercaş R, Schmid ML (2020) Pattern matching with variables: efficient algorithms and complexity results. ACM Transactions on Computation Theory (TOCT) 12(1):1–37

    MATH  Google Scholar 

  2. Sotoodeh M, Tajeripour F, Teimori S, Jorgensen K (2018) A music symbols recognition method using pattern matching along with integrated projection and morphological operation techniques. Multimed Tools Appl 77(13):16833–16866

    Google Scholar 

  3. Navarro G (2014) Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Computing Surveys (CSUR) 46(4):52

    MATH  Google Scholar 

  4. Chen X, Rao Y, Xie H, Wang FL, Zhao Y, Yin J (2019) Sentiment classification using negative and intensive sentiment supplement information. Data Sci Eng 4(2):109–118

    Google Scholar 

  5. Hu H, Zheng K, Wang X, Zhou A (2014) GFIlter: a general gram filter for string similarity search. IEEE Trans Knowl Data Eng 27(4):1005–1018

    Google Scholar 

  6. Aldwairi M, Hamzah AY, Jarrah M (2019) MultiPLZW: a novel multiple pattern matching search in LZW-compressed data. Comput Commun 145:126–136

    Google Scholar 

  7. Choi B, Chae J, Jamshed M, Park K, Han D (2016) DFC: accelerating string pattern matching for network applications. USENIX Symposium on Networked Systems Design and Implementation 2016:551–565

    Google Scholar 

  8. Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Transactions on Internet Technology (TOIT) 18(2):1–28

    Google Scholar 

  9. Le H, Prasanna VK (2012) A memory-efficient and modular approach for large-scale string pattern matching. IEEE Trans Comput 62(5):844–857

    MathSciNet  MATH  Google Scholar 

  10. Ghosh S, Li J, Cao L, Ramamohanarao K (2017) Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns. J Biomed Inform 66:19–31

    Google Scholar 

  11. Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107

    Google Scholar 

  12. Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43

    Google Scholar 

  13. Wu M, Wu X (2019) On big wisdom. Knowledge and Information Systems 58(1):1–8

    Google Scholar 

  14. Bille P, Fischer J, Gørtz IL, Kopelowitz T, Sach B, Vildhøj HW (2016) Sparse text indexing in small space. ACM Transactions on Algorithms (TALG) 12(3):39

    MathSciNet  MATH  Google Scholar 

  15. Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu SP (2019) HUOPM: high-utility occupancy pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2019.2896267

  16. Dong X, Qiu P, Lu J, Cao L (2019) Mining top-k useful negative sequential patterns via learning. IEEE Transactions on Neural Networks and Learning Systems 30(9):2764–2778

    Google Scholar 

  17. Belhadi A, Djenouri Y, Lin JCW, Cano A (2020) A general-purpose distributed pattern mining system. Applied Intelligence. https://doi.org/10.1007/s10489-020-01664-w

  18. Bai L, Li Y, Liu J (2017) FSPTWigfast: holistic twig query on fuzzy spatiotemporal XML data. Appl Intell 47(4):1224–1239

    Google Scholar 

  19. Bouakkaz M, Ouinten Y, Loudcher S, Fournier-Viger P (2018) Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48(4):1013–1019

    Google Scholar 

  20. Wu Y, Tang Z, Jiang H, Wu X (2016) Approximate pattern matching with gap constraints. J Inf Sci 42(5):639–658

    Google Scholar 

  21. Nip K, Wang Z, Xing W (2016) A study on several combination problems of classic shop scheduling and shortest path. Theor Comput Sci 654:175–187

    MathSciNet  MATH  Google Scholar 

  22. Drory Retwitzer M, Polishchuk M, Churkin E, Kifer L, Yakhini Z, Barash D (2015) RNAPAttmatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res 43(W1):W507–W512

    Google Scholar 

  23. Tan CD, Min F, Wang M, Zhang HR, Zhang ZH (2016) Discovering patterns with weak-wildcard gaps. IEEE Access 4:4922–4932

    Google Scholar 

  24. Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39 (4):727–738

    MathSciNet  Google Scholar 

  25. Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):2

    Google Scholar 

  26. Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580

    Google Scholar 

  27. Yang H, Duan L, Hu B, Deng S, Wang W, Qin P (2015) Mining top-k distinguishing sequential patterns with gap constraint. Journal of Software 26(11):2994–3009

    MathSciNet  MATH  Google Scholar 

  28. Wang HF, Duan L, Zuo J, Wang W, Li Z, Tang C (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chinese Journal of Computers 39(10):1979– 1991

    MathSciNet  Google Scholar 

  29. Wu Y, Liu Y, Guo L, Wu X (2013) Subnettrees for strict pattern matching with general gaps and length constraints. Journal of Software 24(5):915–932

    MathSciNet  Google Scholar 

  30. Haapasalo T, Silvasti P, Sippu S, Soisalon-Soininen E (2011) Online dictionary matching with variable-length gaps. International Symposium on Experimental Algorithms 2011:76–87

    Google Scholar 

  31. Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: nonoverlapping pattern matching with general gap constraints. Applied Intelligence. https://doi.org/10.1007/s10489-019-01616-z

  32. Sippu S, Soisalon-Soininen E (2013) Online matching of multiple regular patterns with gaps and character classes. International Conference on Language and Automata Theory and Applications 2013:523–534

    MathSciNet  MATH  Google Scholar 

  33. Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Science China Information Sciences 60(1):012101

    Google Scholar 

  34. Hu H, Wang H, Li J, Gao H (2016) An efficient pruning strategy for approximate string matching over suffix tree. Knowl Inf Syst 49(1):121–141

    Google Scholar 

  35. Wu Y, Li S, Liu J, Wu X (2018) NETASPNO: approximate Strict pattern matching under nonoverlapping condition. IEEE Access 6:24350–24361

    Google Scholar 

  36. Arslan AN (2018) A fast algorithm for all-pairs Hamming distances. Inf Process Lett 139:49–52

    MathSciNet  MATH  Google Scholar 

  37. Bille P, Gørtz IL, Vildhøj HW, Wind DK (2012) String matching with variable length gaps. Theoretical Computer Science 443:25–34

    MathSciNet  MATH  Google Scholar 

  38. Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99– 116

    Google Scholar 

  39. Wang X, Duan L, Dong G, Ye Z, Tang C (2014) Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International conference on database systems for advanced applications. Springer, Cham, pp 372–387

  40. Arslan AN, George B, Stor K (2015) New algorithms for pattern matching with wildcards and length constraints. Discrete Mathematics, Algorithms and Applications 7(3):1550032

    MathSciNet  MATH  Google Scholar 

  41. Liu N, Xie F, Wu X (2018) Multi-pattern matching with variable-length wildcards using suffix tree. Pattern Anal Applic 21(4):1151–1165

    MathSciNet  Google Scholar 

  42. Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22(3):5905–5917

    Google Scholar 

  43. Liu H, Wang L, Liu Z, Zhao P, Wu X (2018) Efficient pattern matching with periodical wildcards in uncertain sequences. Intelligent Data Analysis 22(4):829–842

    Google Scholar 

  44. Kim J, Eades P, Fleischer R, Hong S, Iliopoulos CS, Park K, Puglisi SJ, Tokuyama T (2014) Order-preserving matching. Theor Comput Sci 525:68–79

    MathSciNet  MATH  Google Scholar 

  45. Crochemore M, Iliopoulos CS, Makris C, Rytter W, Tsakalidis AK, Tsichlas T (2002) Approximate string matching with gaps. Nordic Journal of Computing 9(1):54–65

    MathSciNet  MATH  Google Scholar 

  46. Navarro G, Raffinot M (2013) Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J Comput Biol 10(6):903–23

    Google Scholar 

  47. Dong X, Gong Y, Cao L (2018) e-RNSP: an efficient method for mining repetition negative sequential patterns. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2018.2869907

  48. Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao SY, Yuan C (2018) Review on mining data from multiple data sources. Pattern Recogn Lett 109:120–128

    Google Scholar 

  49. Le T, Vo B, Fournier-Viger P, Lee MY, Baik SW (2019) SPPC: a new tree structure for mining erasable patterns in data streams. Appl Intell 49(2):478–495

    Google Scholar 

  50. Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822

    Google Scholar 

  51. Min F, Zhang Z, Zhai WJ, Shen RP (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732

    MathSciNet  Google Scholar 

  52. Song W, Jiang B, Qiao Y (2018) Mining multi-relational high utility itemsets from star schemas. Intelligent Data Analysis 22(1):143–165

    Google Scholar 

  53. Yun U, Nam H, Lee G, Yoon E (2019) Efficient approach for incremental high utility pattern mining with indexed list structure. Futur Gener Comput Syst 95:221–239

    Google Scholar 

  54. Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst 115:27–39

    Google Scholar 

  55. Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39:57–74

    Google Scholar 

  56. Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: nonoverlapping closed sequential pattern mining. Knowledge-Based Systems

  57. Fischer MJ, Paterson MS (1974) String-matching and other products. Proceedings of the 7th SIAM ANS Complexity of Computation 1974:113–125

    MathSciNet  MATH  Google Scholar 

  58. Manber U, Baeza–Yates R (1991) An algorithm for string matching with a sequence of don’t cares. Inf Process Lett 37(3):133–136

    MathSciNet  MATH  Google Scholar 

  59. Min F, Wu X, Lu Z (2009) Pattern matching with independent wildcard gaps. Proceedings of the 8th International Conference on Pervasive Intelligence and Computing 2009:194–199

    Google Scholar 

  60. Wu Y, Wu X, Min F, Li Y (2010) A Nettree for pattern matching with flexible wildcard constraints. In: 2010 IEEE international conference on information reuse and integration, vol 2010, pp 109–114

  61. Warmuth MK, David H (1984) On the complexity of iterated shuffle. J Comput Syst Sci 28 (3):345–358

    MathSciNet  MATH  Google Scholar 

  62. Guo D, Yuan E, Hu X (2016) Frequent pattern mining based on approximate edit distance matrix. IEEE First International Conference on Data Science in Cyberspace (DSC) 2016:179–188

    Google Scholar 

  63. Min F, Wu Y, Wu X (2010) The Apriori property of sequence pattern mining with wildcard gaps. IEEE International Conference on Bioinformatics and Biomedicine Workshops 2010:138–143

    Google Scholar 

  64. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was partly supported by National Natural Science Foundation of China (61976240, 61571180, 917446209), National Key Research and Development Program of China(2016YFB1000901), and Graduate Student Innovation Program of Hebei Province (CXZZSS2019035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Fan, J., Li, Y. et al. NetDAP: (δ, γ) −approximate pattern matching with length constraints. Appl Intell 50, 4094–4116 (2020). https://doi.org/10.1007/s10489-020-01778-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01778-1

Keywords

Navigation