Abstract
Approximate pattern matching not only is more general than exact pattern matching, but also allows some data noise. Most of them adopt the Hamming distance to measure similarity, which indicates the number of different characters in two sequences, but it cannot reflect the approximation between two characters. This paper addresses the approximate pattern matching with a local distance no larger than δ and a global distance no larger than γ, which is named Delta and gamma Pattern matching with gap constraints under One-off condition (DPO). First, we show that the problem is an NP-Hard problem. Therefore, we construct a heuristic algorithm named approximate Nettree for DPO (NetDPO), which transforms the problem into an approximate Nettree based on δ distance which is a specially designed data structure. Then, NetDPO calculates the number of paths that reach the roots within γ distance. To find the maximal occurrences, we employ the rightmost parent strategy and the optimal parent strategy to select the better occurrence which can minimize the influence after removing the occurrence. Iterate this process until there are no occurrences. Finally, we analyze the time and space complexities of NetDPO. Extensive experimental results verify the superiority of the proposed algorithm.
























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu M, Zhang Y, Xu J, Chen Y (2021) Deep bi-directional interaction network for sentence matching. Appl Intell 51(7):4305–4329
Dinh D T, Le B, Fournier-Viger P, Huynh V N (2018) An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48(12):4694–4714
Wu M, Wu X (2019) On big wisdom. Knowl Inf Syst 58(1):1–8
Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: Nonoverlapping closed sequential pattern mining. Knowl-Based Syst 196(105812)
Li Y, Zhang S, Guo L, Liu J, Wu Y, Wu X (2021) NetNMSP: Nonoverlapping maximal sequential pattern mining. Applied Intelligence. DOI:
Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: Nonoverlapping Sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822
Wang Y, Wu Y, Li Y, Yao F, Fournier-Viger P, Wu X (2021) Self-adaptive nonoverlapping sequential pattern mining. Applied Intelligence. https://doi.org/10.1007/s10489-021-02763-y
Wang X, Chai L, Xu Q, Yang Y, Li J, Wang J, Chai Y (2019) Efficient subgraph matching on large RDF graphs using mapreduce. Data Sci Eng 4(1):24–43
Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Sci China Inf Sci 60(1):012101
Liu N, Xie F, Wu X (2018) Multi-pattern matching with variable-length wildcards using suffix tree. Pattern Anal Applic 21(4):1151–1165
Wu X, Qiang J, Xie F (2014) Pattern matching with flexible wildcards. J Comput Sci Technol 29(5):740–750
Drory Retwitzer M, Polishchuk M, Churkin E, Kifer I, Yakhini Z, Barash D (2015) RNAPattmatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res 43(W1):W507–W512
Wu Y, Geng M, Li Y, Guo L, Li Z, Fournier-Viger P, Zhu X, Wu X (2021) HANP-Miner: High average utility nonoverlapping sequential pattern mining. Knowl-Based Syst 229(107361)
Qiu P, Gong Y, Zhao Y, Cao L, Zhang C, Dong X (2021) An efficient method for modeling non-occurring behaviors by negative sequential patterns with loose constraints. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3063162
Wang T, Duan L, Dong G, Bao Z (2020) Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans Knowl Discov Data 14(5):62
Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580
Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: Nonoverlapping pattern matching with general gap constraints. Appl Intell 50(6):1832–1845
Wu Y, Liu X, Yan W, Guo L, Wu X (2021) Efficient algorithm for solving strict pattern matching under nonoverlapping condition. J Softw 32(11):3331–3350
Wu Y, Tang Z, Jiang H, Wu X (2016) Approximate pattern matching with gap constraints. J Inf Sci 42(5):639–658
Wu Y, Li S, Liu J, Guo L, Wu X (2018) NetASPNO: Approximate strict pattern matching under nonoverlapping condition. IEEE Access 6:24350–24361
Clifford P, Clifford R, Iliopoulos C (2005) Faster algorithms for delta, gamma-matching and related problems. In: Annual symposium on combinatorial pattern matching. Springer, Berlin, pp 68–78
Lee I, Mendivelso J, Pinzón YJ (2008) Delta gamma–parameterized matching. In International Symposium on String Processing and Information Retrieval, pp 236–248
Ardila Y J P, Christodoulakis M, Iliopoulos C S, Mohamed M (2005) Efficient (delta, gamma)-pattern-matching with don‘t cares. In: Proceeding the 16th Australasian Workshop on Combinatorial Algorithms (AWOCA), Ballarat, pp 27–38
Wu Y, Fan J, Li Y, Guo L, Wu X (2020) NetDAP: (delta, gamma)-Approximate pattern matching with length constraints. Appl Intell 50(11):4094–4116
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2020) Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering (TKDE). https://doi.org/10.1109/TKDE.2020.2992485
Liu H, Wang L, Liu Z, Zhao P, Wu X (2018) Efficient pattern matching with periodical wildcards in uncertain sequences. Intell Data Anal 22(4):829–842
Siedenburg K, Ichiro F, Stephen M (2016) A comparison of approaches to timbre descriptors in music information retrieval and music psychology. J Music Res 45(1):27–41
Nie L, Jiang H, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783
Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Trans Internet Technol 18(2):1–28
Ghosh S, Feng M, Nguyen H, Li J (2016) Hypotension risk prediction via sequential contrast patterns of icu blood pressure. IEEE J Biomed Health Inf 20(5):1416–1426
Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22(3):5905–5917
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29– 43
Song W, Jiang B, Qiao Y (2018) Mining multi-relational high utility itemsets from star schemas. Intell Data Anal 22(1):143– 165
Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99–116
Tan C, Min F, Wang M, Zhang H, Zhang Z (2016) Discovering patterns with weak-wildcard gaps. IEEE Access 4:4922–4932
Wu Y, Wang Y, Li Y, Zhu X, Wu X (2021) Top-k self-adaptive contrast sequential pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3082114
Chen G, Wu X, Zhu X, Arslan A N, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419
Wu Y, Lei R, Li Y, Guo L, Wu X (2021) HAOP-Miner: Self-adaptive high-average utility one-off sequential pattern mining. Expert Syst Appl 184(115449)
Wu Y, Wang X, Li Y, Guo L, Li Z, Zhang J, Wu X (2021) OWSP-Miner: Self-adaptive one-off weak-gap strong pattern mining. ACM Transactions on Management Information Systems. https://doi.org/10.1145/3476247
Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39(1):57–74
Krallinger M, Rabal O, Lourenco A, Oyarzabal J, Valencia A (2017) Information retrieval and text mining technologies for chemistry. Chem Rev 117(12):7673–7761
Lin J, Jiang Y, Harner E J, Jiang B, Adjeroh D (2017) IDPM: An improved degenerate pattern matching algorithm for biological sequences. Int J Found Comput Sci 28(7):889–914
He D, Wu X, Zhu X (2007) SAIL-Approx: An efficient on-line algorithm for approximate pattern matching with wildcards and length constraints. IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007), pp 151–158
Huang G, Guo D, Hu X (2013) Algorithms for approximate pattern matching with wildcards and length constraints. J Comput Appl 33(3):800–805
Yip K K, Nembhard D A (2015) Mining approximate sequential patterns with gaps. Int J Data Min Modell Manag 7(2):108–129
Miao S, Vespier U, Cachucho R, Meeng M, Knobbe A (2016) Predefined pattern detection in large time series. Inf Sci 329:950–964
Wu Y, Liu D, Jiang H (2017) Length-changeable incremental extreme learning machine. J Comput Sci Technol 32(3):630– 643
Min F, Zhang Z, Zhai W, Shen R (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732
Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2021) NTP-Miner: Nonoverlapping three-way sequential pattern mining. ACM Transactions on Knowledge Discovery from Data. https://doi.org/10.1145/3480245
Cheng S, Wu Y, Li Y, Yao F, Min F (2021) TWD-SFNN: Three-Way decisions with a single hidden layer feedforward neural network. Inf Sci 579:15–32
Zhang Z, Min F, Chen G, Shen S, Wen Z, Zhou X (2021) Tri-partition state alphabet-based sequential pattern for multivariate time series. Cognitive Computation. https://doi.org/10.1007/s12559-021-09871-4
Zhang P, Atallah M J (2017) On approximate pattern matching with thresholds. Inf Process Lett 123:21–26
Warmuth M K, Haussler D (1984) On the complexity of iterated shuffle. Journal of Computer and System Sciences 28(3):345–358
Chen G, Wu X, Zhu X, Arslan A N, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419
Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst 115:27–39
Acknowledgements
This work was partly supported by National Natural Science Foundation of China (61976240, 52077056, 917446209), National Key Research and Development Program of China (2016YFB1000901), and Natural Science Foundation of Hebei Province, China (Nos. F2020202013, E2020202033).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Yu, L., Liu, J. et al. NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition. Appl Intell 52, 12155–12174 (2022). https://doi.org/10.1007/s10489-021-03000-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03000-2