Abstract
Pattern matching(PM) with gap constraints has been applied to compute the support of a pattern in a sequence, which is an essential task of the repetitive sequential pattern mining (or sequence pattern mining). Compared with exact PM, approximate PM allows data noise (differences) between the pattern and the matched subsequence. Therefore, more valuable patterns can be found. Approximate PM with gap constraints mainly adopts the Hamming distance to measure the approximation degree which only reflects the number of different characters between two sequences, but ignores the distance between different characters. Hence, this paper addresses (δ, γ) approximate PM with length constraints which employs local-global constraints to improve the accuracy of the PM, namely, the maximal distance between two corresponding characters is less or equal to the local threshold δ, and the sum of all the δ distances is also less or equal to the global threshold γ. To tackle the problem effectively, this paper proposes an effective online algorithm, named NetDAP, which employs a special designed data structure named approximate single-leaf Nettree. An approximate single-leaf Nettree can be created by adopting dynamic programming to determine the range of rootleaf, the minimal root, the maximal root, the range of nodes for each level, and the range of parents for each node. To improve the performance, two pruning strategies are proposed to prune the nodes and the parent-child relationships which do not satisfy the δ and γ distance constraints respectively. Finally, extensive experimental results on real protein data sets and time series verify the performance of the proposed algorithm.

















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Fernau H, Manea F, Mercaş R, Schmid ML (2020) Pattern matching with variables: efficient algorithms and complexity results. ACM Transactions on Computation Theory (TOCT) 12(1):1–37
Sotoodeh M, Tajeripour F, Teimori S, Jorgensen K (2018) A music symbols recognition method using pattern matching along with integrated projection and morphological operation techniques. Multimed Tools Appl 77(13):16833–16866
Navarro G (2014) Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Computing Surveys (CSUR) 46(4):52
Chen X, Rao Y, Xie H, Wang FL, Zhao Y, Yin J (2019) Sentiment classification using negative and intensive sentiment supplement information. Data Sci Eng 4(2):109–118
Hu H, Zheng K, Wang X, Zhou A (2014) GFIlter: a general gram filter for string similarity search. IEEE Trans Knowl Data Eng 27(4):1005–1018
Aldwairi M, Hamzah AY, Jarrah M (2019) MultiPLZW: a novel multiple pattern matching search in LZW-compressed data. Comput Commun 145:126–136
Choi B, Chae J, Jamshed M, Park K, Han D (2016) DFC: accelerating string pattern matching for network applications. USENIX Symposium on Networked Systems Design and Implementation 2016:551–565
Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Transactions on Internet Technology (TOIT) 18(2):1–28
Le H, Prasanna VK (2012) A memory-efficient and modular approach for large-scale string pattern matching. IEEE Trans Comput 62(5):844–857
Ghosh S, Li J, Cao L, Ramamohanarao K (2017) Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns. J Biomed Inform 66:19–31
Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43
Wu M, Wu X (2019) On big wisdom. Knowledge and Information Systems 58(1):1–8
Bille P, Fischer J, Gørtz IL, Kopelowitz T, Sach B, Vildhøj HW (2016) Sparse text indexing in small space. ACM Transactions on Algorithms (TALG) 12(3):39
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu SP (2019) HUOPM: high-utility occupancy pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2019.2896267
Dong X, Qiu P, Lu J, Cao L (2019) Mining top-k useful negative sequential patterns via learning. IEEE Transactions on Neural Networks and Learning Systems 30(9):2764–2778
Belhadi A, Djenouri Y, Lin JCW, Cano A (2020) A general-purpose distributed pattern mining system. Applied Intelligence. https://doi.org/10.1007/s10489-020-01664-w
Bai L, Li Y, Liu J (2017) FSPTWigfast: holistic twig query on fuzzy spatiotemporal XML data. Appl Intell 47(4):1224–1239
Bouakkaz M, Ouinten Y, Loudcher S, Fournier-Viger P (2018) Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48(4):1013–1019
Wu Y, Tang Z, Jiang H, Wu X (2016) Approximate pattern matching with gap constraints. J Inf Sci 42(5):639–658
Nip K, Wang Z, Xing W (2016) A study on several combination problems of classic shop scheduling and shortest path. Theor Comput Sci 654:175–187
Drory Retwitzer M, Polishchuk M, Churkin E, Kifer L, Yakhini Z, Barash D (2015) RNAPAttmatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res 43(W1):W507–W512
Tan CD, Min F, Wang M, Zhang HR, Zhang ZH (2016) Discovering patterns with weak-wildcard gaps. IEEE Access 4:4922–4932
Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39 (4):727–738
Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):2
Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580
Yang H, Duan L, Hu B, Deng S, Wang W, Qin P (2015) Mining top-k distinguishing sequential patterns with gap constraint. Journal of Software 26(11):2994–3009
Wang HF, Duan L, Zuo J, Wang W, Li Z, Tang C (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chinese Journal of Computers 39(10):1979– 1991
Wu Y, Liu Y, Guo L, Wu X (2013) Subnettrees for strict pattern matching with general gaps and length constraints. Journal of Software 24(5):915–932
Haapasalo T, Silvasti P, Sippu S, Soisalon-Soininen E (2011) Online dictionary matching with variable-length gaps. International Symposium on Experimental Algorithms 2011:76–87
Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: nonoverlapping pattern matching with general gap constraints. Applied Intelligence. https://doi.org/10.1007/s10489-019-01616-z
Sippu S, Soisalon-Soininen E (2013) Online matching of multiple regular patterns with gaps and character classes. International Conference on Language and Automata Theory and Applications 2013:523–534
Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Science China Information Sciences 60(1):012101
Hu H, Wang H, Li J, Gao H (2016) An efficient pruning strategy for approximate string matching over suffix tree. Knowl Inf Syst 49(1):121–141
Wu Y, Li S, Liu J, Wu X (2018) NETASPNO: approximate Strict pattern matching under nonoverlapping condition. IEEE Access 6:24350–24361
Arslan AN (2018) A fast algorithm for all-pairs Hamming distances. Inf Process Lett 139:49–52
Bille P, Gørtz IL, Vildhøj HW, Wind DK (2012) String matching with variable length gaps. Theoretical Computer Science 443:25–34
Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99– 116
Wang X, Duan L, Dong G, Ye Z, Tang C (2014) Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International conference on database systems for advanced applications. Springer, Cham, pp 372–387
Arslan AN, George B, Stor K (2015) New algorithms for pattern matching with wildcards and length constraints. Discrete Mathematics, Algorithms and Applications 7(3):1550032
Liu N, Xie F, Wu X (2018) Multi-pattern matching with variable-length wildcards using suffix tree. Pattern Anal Applic 21(4):1151–1165
Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22(3):5905–5917
Liu H, Wang L, Liu Z, Zhao P, Wu X (2018) Efficient pattern matching with periodical wildcards in uncertain sequences. Intelligent Data Analysis 22(4):829–842
Kim J, Eades P, Fleischer R, Hong S, Iliopoulos CS, Park K, Puglisi SJ, Tokuyama T (2014) Order-preserving matching. Theor Comput Sci 525:68–79
Crochemore M, Iliopoulos CS, Makris C, Rytter W, Tsakalidis AK, Tsichlas T (2002) Approximate string matching with gaps. Nordic Journal of Computing 9(1):54–65
Navarro G, Raffinot M (2013) Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J Comput Biol 10(6):903–23
Dong X, Gong Y, Cao L (2018) e-RNSP: an efficient method for mining repetition negative sequential patterns. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2018.2869907
Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao SY, Yuan C (2018) Review on mining data from multiple data sources. Pattern Recogn Lett 109:120–128
Le T, Vo B, Fournier-Viger P, Lee MY, Baik SW (2019) SPPC: a new tree structure for mining erasable patterns in data streams. Appl Intell 49(2):478–495
Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822
Min F, Zhang Z, Zhai WJ, Shen RP (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732
Song W, Jiang B, Qiao Y (2018) Mining multi-relational high utility itemsets from star schemas. Intelligent Data Analysis 22(1):143–165
Yun U, Nam H, Lee G, Yoon E (2019) Efficient approach for incremental high utility pattern mining with indexed list structure. Futur Gener Comput Syst 95:221–239
Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst 115:27–39
Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39:57–74
Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: nonoverlapping closed sequential pattern mining. Knowledge-Based Systems
Fischer MJ, Paterson MS (1974) String-matching and other products. Proceedings of the 7th SIAM ANS Complexity of Computation 1974:113–125
Manber U, Baeza–Yates R (1991) An algorithm for string matching with a sequence of don’t cares. Inf Process Lett 37(3):133–136
Min F, Wu X, Lu Z (2009) Pattern matching with independent wildcard gaps. Proceedings of the 8th International Conference on Pervasive Intelligence and Computing 2009:194–199
Wu Y, Wu X, Min F, Li Y (2010) A Nettree for pattern matching with flexible wildcard constraints. In: 2010 IEEE international conference on information reuse and integration, vol 2010, pp 109–114
Warmuth MK, David H (1984) On the complexity of iterated shuffle. J Comput Syst Sci 28 (3):345–358
Guo D, Yuan E, Hu X (2016) Frequent pattern mining based on approximate edit distance matrix. IEEE First International Conference on Data Science in Cyberspace (DSC) 2016:179–188
Min F, Wu Y, Wu X (2010) The Apriori property of sequence pattern mining with wildcard gaps. IEEE International Conference on Bioinformatics and Biomedicine Workshops 2010:138–143
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Acknowledgements
This work was partly supported by National Natural Science Foundation of China (61976240, 61571180, 917446209), National Key Research and Development Program of China(2016YFB1000901), and Graduate Student Innovation Program of Hebei Province (CXZZSS2019035).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, Y., Fan, J., Li, Y. et al. NetDAP: (δ, γ) −approximate pattern matching with length constraints. Appl Intell 50, 4094–4116 (2020). https://doi.org/10.1007/s10489-020-01778-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01778-1