Skip to main content
Log in

NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Approximate pattern matching not only is more general than exact pattern matching, but also allows some data noise. Most of them adopt the Hamming distance to measure similarity, which indicates the number of different characters in two sequences, but it cannot reflect the approximation between two characters. This paper addresses the approximate pattern matching with a local distance no larger than δ and a global distance no larger than γ, which is named Delta and gamma Pattern matching with gap constraints under One-off condition (DPO). First, we show that the problem is an NP-Hard problem. Therefore, we construct a heuristic algorithm named approximate Nettree for DPO (NetDPO), which transforms the problem into an approximate Nettree based on δ distance which is a specially designed data structure. Then, NetDPO calculates the number of paths that reach the roots within γ distance. To find the maximal occurrences, we employ the rightmost parent strategy and the optimal parent strategy to select the better occurrence which can minimize the influence after removing the occurrence. Iterate this process until there are no occurrences. Finally, we analyze the time and space complexities of NetDPO. Extensive experimental results verify the superiority of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  1. Liu M, Zhang Y, Xu J, Chen Y (2021) Deep bi-directional interaction network for sentence matching. Appl Intell 51(7):4305–4329

    Article  Google Scholar 

  2. Dinh D T, Le B, Fournier-Viger P, Huynh V N (2018) An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48(12):4694–4714

  3. Wu M, Wu X (2019) On big wisdom. Knowl Inf Syst 58(1):1–8

    Article  Google Scholar 

  4. Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: Nonoverlapping closed sequential pattern mining. Knowl-Based Syst 196(105812)

  5. Li Y, Zhang S, Guo L, Liu J, Wu Y, Wu X (2021) NetNMSP: Nonoverlapping maximal sequential pattern mining. Applied Intelligence. DOI:

  6. Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: Nonoverlapping Sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822

    Article  Google Scholar 

  7. Wang Y, Wu Y, Li Y, Yao F, Fournier-Viger P, Wu X (2021) Self-adaptive nonoverlapping sequential pattern mining. Applied Intelligence. https://doi.org/10.1007/s10489-021-02763-y

  8. Wang X, Chai L, Xu Q, Yang Y, Li J, Wang J, Chai Y (2019) Efficient subgraph matching on large RDF graphs using mapreduce. Data Sci Eng 4(1):24–43

    Article  Google Scholar 

  9. Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Sci China Inf Sci 60(1):012101

  10. Liu N, Xie F, Wu X (2018) Multi-pattern matching with variable-length wildcards using suffix tree. Pattern Anal Applic 21(4):1151–1165

    Article  MathSciNet  Google Scholar 

  11. Wu X, Qiang J, Xie F (2014) Pattern matching with flexible wildcards. J Comput Sci Technol 29(5):740–750

    Article  MathSciNet  Google Scholar 

  12. Drory Retwitzer M, Polishchuk M, Churkin E, Kifer I, Yakhini Z, Barash D (2015) RNAPattmatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res 43(W1):W507–W512

    Article  Google Scholar 

  13. Wu Y, Geng M, Li Y, Guo L, Li Z, Fournier-Viger P, Zhu X, Wu X (2021) HANP-Miner: High average utility nonoverlapping sequential pattern mining. Knowl-Based Syst 229(107361)

  14. Qiu P, Gong Y, Zhao Y, Cao L, Zhang C, Dong X (2021) An efficient method for modeling non-occurring behaviors by negative sequential patterns with loose constraints. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3063162

  15. Wang T, Duan L, Dong G, Bao Z (2020) Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans Knowl Discov Data 14(5):62

    Article  Google Scholar 

  16. Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580

    Article  Google Scholar 

  17. Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: Nonoverlapping pattern matching with general gap constraints. Appl Intell 50(6):1832–1845

    Article  Google Scholar 

  18. Wu Y, Liu X, Yan W, Guo L, Wu X (2021) Efficient algorithm for solving strict pattern matching under nonoverlapping condition. J Softw 32(11):3331–3350

    Google Scholar 

  19. Wu Y, Tang Z, Jiang H, Wu X (2016) Approximate pattern matching with gap constraints. J Inf Sci 42(5):639–658

    Article  Google Scholar 

  20. Wu Y, Li S, Liu J, Guo L, Wu X (2018) NetASPNO: Approximate strict pattern matching under nonoverlapping condition. IEEE Access 6:24350–24361

    Article  Google Scholar 

  21. Clifford P, Clifford R, Iliopoulos C (2005) Faster algorithms for delta, gamma-matching and related problems. In: Annual symposium on combinatorial pattern matching. Springer, Berlin, pp 68–78

  22. Lee I, Mendivelso J, Pinzón YJ (2008) Delta gamma–parameterized matching. In International Symposium on String Processing and Information Retrieval, pp 236–248

  23. Ardila Y J P, Christodoulakis M, Iliopoulos C S, Mohamed M (2005) Efficient (delta, gamma)-pattern-matching with don‘t cares. In: Proceeding the 16th Australasian Workshop on Combinatorial Algorithms (AWOCA), Ballarat, pp 27–38

  24. Wu Y, Fan J, Li Y, Guo L, Wu X (2020) NetDAP: (delta, gamma)-Approximate pattern matching with length constraints. Appl Intell 50(11):4094–4116

    Article  Google Scholar 

  25. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144

    Article  MathSciNet  Google Scholar 

  26. Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2020) Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering (TKDE). https://doi.org/10.1109/TKDE.2020.2992485

  27. Liu H, Wang L, Liu Z, Zhao P, Wu X (2018) Efficient pattern matching with periodical wildcards in uncertain sequences. Intell Data Anal 22(4):829–842

    Article  Google Scholar 

  28. Siedenburg K, Ichiro F, Stephen M (2016) A comparison of approaches to timbre descriptors in music information retrieval and music psychology. J Music Res 45(1):27–41

    Article  Google Scholar 

  29. Nie L, Jiang H, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783

    Article  Google Scholar 

  30. Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Trans Internet Technol 18(2):1–28

    Article  Google Scholar 

  31. Ghosh S, Feng M, Nguyen H, Li J (2016) Hypotension risk prediction via sequential contrast patterns of icu blood pressure. IEEE J Biomed Health Inf 20(5):1416–1426

    Article  Google Scholar 

  32. Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22(3):5905–5917

    Article  Google Scholar 

  33. Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29– 43

    Article  Google Scholar 

  34. Song W, Jiang B, Qiao Y (2018) Mining multi-relational high utility itemsets from star schemas. Intell Data Anal 22(1):143– 165

    Article  Google Scholar 

  35. Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99–116

    Article  Google Scholar 

  36. Tan C, Min F, Wang M, Zhang H, Zhang Z (2016) Discovering patterns with weak-wildcard gaps. IEEE Access 4:4922–4932

    Article  Google Scholar 

  37. Wu Y, Wang Y, Li Y, Zhu X, Wu X (2021) Top-k self-adaptive contrast sequential pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3082114

  38. Chen G, Wu X, Zhu X, Arslan A N, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419

    Article  Google Scholar 

  39. Wu Y, Lei R, Li Y, Guo L, Wu X (2021) HAOP-Miner: Self-adaptive high-average utility one-off sequential pattern mining. Expert Syst Appl 184(115449)

  40. Wu Y, Wang X, Li Y, Guo L, Li Z, Zhang J, Wu X (2021) OWSP-Miner: Self-adaptive one-off weak-gap strong pattern mining. ACM Transactions on Management Information Systems. https://doi.org/10.1145/3476247

  41. Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39(1):57–74

    Article  Google Scholar 

  42. Krallinger M, Rabal O, Lourenco A, Oyarzabal J, Valencia A (2017) Information retrieval and text mining technologies for chemistry. Chem Rev 117(12):7673–7761

    Article  Google Scholar 

  43. Lin J, Jiang Y, Harner E J, Jiang B, Adjeroh D (2017) IDPM: An improved degenerate pattern matching algorithm for biological sequences. Int J Found Comput Sci 28(7):889–914

    Article  MathSciNet  Google Scholar 

  44. He D, Wu X, Zhu X (2007) SAIL-Approx: An efficient on-line algorithm for approximate pattern matching with wildcards and length constraints. IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007), pp 151–158

  45. Huang G, Guo D, Hu X (2013) Algorithms for approximate pattern matching with wildcards and length constraints. J Comput Appl 33(3):800–805

    Google Scholar 

  46. Yip K K, Nembhard D A (2015) Mining approximate sequential patterns with gaps. Int J Data Min Modell Manag 7(2):108–129

    Google Scholar 

  47. Miao S, Vespier U, Cachucho R, Meeng M, Knobbe A (2016) Predefined pattern detection in large time series. Inf Sci 329:950–964

    Article  Google Scholar 

  48. Wu Y, Liu D, Jiang H (2017) Length-changeable incremental extreme learning machine. J Comput Sci Technol 32(3):630– 643

    Article  MathSciNet  Google Scholar 

  49. Min F, Zhang Z, Zhai W, Shen R (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732

    Article  MathSciNet  Google Scholar 

  50. Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2021) NTP-Miner: Nonoverlapping three-way sequential pattern mining. ACM Transactions on Knowledge Discovery from Data. https://doi.org/10.1145/3480245

  51. Cheng S, Wu Y, Li Y, Yao F, Min F (2021) TWD-SFNN: Three-Way decisions with a single hidden layer feedforward neural network. Inf Sci 579:15–32

    Article  MathSciNet  Google Scholar 

  52. Zhang Z, Min F, Chen G, Shen S, Wen Z, Zhou X (2021) Tri-partition state alphabet-based sequential pattern for multivariate time series. Cognitive Computation. https://doi.org/10.1007/s12559-021-09871-4

  53. Zhang P, Atallah M J (2017) On approximate pattern matching with thresholds. Inf Process Lett 123:21–26

    Article  MathSciNet  Google Scholar 

  54. Warmuth M K, Haussler D (1984) On the complexity of iterated shuffle. Journal of Computer and System Sciences 28(3):345–358

    Article  MathSciNet  Google Scholar 

  55. Chen G, Wu X, Zhu X, Arslan A N, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419

    Article  Google Scholar 

  56. Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst 115:27–39

    Article  Google Scholar 

Download references

Acknowledgements

This work was partly supported by National Natural Science Foundation of China (61976240, 52077056, 917446209), National Key Research and Development Program of China (2016YFB1000901), and Natural Science Foundation of Hebei Province, China (Nos. F2020202013, E2020202033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youxi Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Yu, L., Liu, J. et al. NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition. Appl Intell 52, 12155–12174 (2022). https://doi.org/10.1007/s10489-021-03000-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03000-2

Keywords

Navigation