NetDAP: (δ, γ) −approximate pattern matching with length constraints

Wu, Youxi; Fan, Jinquan; Li, Yan; Guo, Lei; Wu, Xindong

doi:10.1007/s10489-020-01778-1

NetDAP: (δ, γ) −approximate pattern matching with length constraints

Published: 10 July 2020

Volume 50, pages 4094–4116, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Youxi Wu ORCID: orcid.org/0000-0001-5314-3468^1,2,3,
Jinquan Fan¹,
Yan Li⁴,
Lei Guo² &
…
Xindong Wu^5,6

259 Accesses
15 Citations
Explore all metrics

Abstract

Pattern matching(PM) with gap constraints has been applied to compute the support of a pattern in a sequence, which is an essential task of the repetitive sequential pattern mining (or sequence pattern mining). Compared with exact PM, approximate PM allows data noise (differences) between the pattern and the matched subsequence. Therefore, more valuable patterns can be found. Approximate PM with gap constraints mainly adopts the Hamming distance to measure the approximation degree which only reflects the number of different characters between two sequences, but ignores the distance between different characters. Hence, this paper addresses (δ, γ) approximate PM with length constraints which employs local-global constraints to improve the accuracy of the PM, namely, the maximal distance between two corresponding characters is less or equal to the local threshold δ, and the sum of all the δ distances is also less or equal to the global threshold γ. To tackle the problem effectively, this paper proposes an effective online algorithm, named NetDAP, which employs a special designed data structure named approximate single-leaf Nettree. An approximate single-leaf Nettree can be created by adopting dynamic programming to determine the range of rootleaf, the minimal root, the maximal root, the range of nodes for each level, and the range of parents for each node. To improve the performance, two pruning strategies are proposed to prune the nodes and the parent-child relationships which do not satisfy the δ and γ distance constraints respectively. Finally, extensive experimental results on real protein data sets and time series verify the performance of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Strict approximate pattern matching with general gaps

Article 25 November 2014

NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition

Article 02 February 2022

Strict pattern matching under non-overlapping condition

Article 15 November 2016

References

Fernau H, Manea F, Mercaş R, Schmid ML (2020) Pattern matching with variables: efficient algorithms and complexity results. ACM Transactions on Computation Theory (TOCT) 12(1):1–37
MATH Google Scholar
Sotoodeh M, Tajeripour F, Teimori S, Jorgensen K (2018) A music symbols recognition method using pattern matching along with integrated projection and morphological operation techniques. Multimed Tools Appl 77(13):16833–16866
Google Scholar
Navarro G (2014) Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Computing Surveys (CSUR) 46(4):52
MATH Google Scholar
Chen X, Rao Y, Xie H, Wang FL, Zhao Y, Yin J (2019) Sentiment classification using negative and intensive sentiment supplement information. Data Sci Eng 4(2):109–118
Google Scholar
Hu H, Zheng K, Wang X, Zhou A (2014) GFIlter: a general gram filter for string similarity search. IEEE Trans Knowl Data Eng 27(4):1005–1018
Google Scholar
Aldwairi M, Hamzah AY, Jarrah M (2019) MultiPLZW: a novel multiple pattern matching search in LZW-compressed data. Comput Commun 145:126–136
Google Scholar
Choi B, Chae J, Jamshed M, Park K, Han D (2016) DFC: accelerating string pattern matching for network applications. USENIX Symposium on Networked Systems Design and Implementation 2016:551–565
Google Scholar
Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Transactions on Internet Technology (TOIT) 18(2):1–28
Google Scholar
Le H, Prasanna VK (2012) A memory-efficient and modular approach for large-scale string pattern matching. IEEE Trans Comput 62(5):844–857
MathSciNet MATH Google Scholar
Ghosh S, Li J, Cao L, Ramamohanarao K (2017) Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns. J Biomed Inform 66:19–31
Google Scholar
Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Google Scholar
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43
Google Scholar
Wu M, Wu X (2019) On big wisdom. Knowledge and Information Systems 58(1):1–8
Google Scholar
Bille P, Fischer J, Gørtz IL, Kopelowitz T, Sach B, Vildhøj HW (2016) Sparse text indexing in small space. ACM Transactions on Algorithms (TALG) 12(3):39
MathSciNet MATH Google Scholar
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu SP (2019) HUOPM: high-utility occupancy pattern mining. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2019.2896267
Dong X, Qiu P, Lu J, Cao L (2019) Mining top-k useful negative sequential patterns via learning. IEEE Transactions on Neural Networks and Learning Systems 30(9):2764–2778
Google Scholar
Belhadi A, Djenouri Y, Lin JCW, Cano A (2020) A general-purpose distributed pattern mining system. Applied Intelligence. https://doi.org/10.1007/s10489-020-01664-w
Bai L, Li Y, Liu J (2017) FSPTWigfast: holistic twig query on fuzzy spatiotemporal XML data. Appl Intell 47(4):1224–1239
Google Scholar
Bouakkaz M, Ouinten Y, Loudcher S, Fournier-Viger P (2018) Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48(4):1013–1019
Google Scholar
Wu Y, Tang Z, Jiang H, Wu X (2016) Approximate pattern matching with gap constraints. J Inf Sci 42(5):639–658
Google Scholar
Nip K, Wang Z, Xing W (2016) A study on several combination problems of classic shop scheduling and shortest path. Theor Comput Sci 654:175–187
MathSciNet MATH Google Scholar
Drory Retwitzer M, Polishchuk M, Churkin E, Kifer L, Yakhini Z, Barash D (2015) RNAPAttmatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res 43(W1):W507–W512
Google Scholar
Tan CD, Min F, Wang M, Zhang HR, Zhang ZH (2016) Discovering patterns with weak-wildcard gaps. IEEE Access 4:4922–4932
Google Scholar
Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39 (4):727–738
MathSciNet Google Scholar
Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):2
Google Scholar
Wu Y, Fu S, Jiang H, Wu X (2015) Strict approximate pattern matching with general gaps. Appl Intell 42(3):566–580
Google Scholar
Yang H, Duan L, Hu B, Deng S, Wang W, Qin P (2015) Mining top-k distinguishing sequential patterns with gap constraint. Journal of Software 26(11):2994–3009
MathSciNet MATH Google Scholar
Wang HF, Duan L, Zuo J, Wang W, Li Z, Tang C (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chinese Journal of Computers 39(10):1979– 1991
MathSciNet Google Scholar
Wu Y, Liu Y, Guo L, Wu X (2013) Subnettrees for strict pattern matching with general gaps and length constraints. Journal of Software 24(5):915–932
MathSciNet Google Scholar
Haapasalo T, Silvasti P, Sippu S, Soisalon-Soininen E (2011) Online dictionary matching with variable-length gaps. International Symposium on Experimental Algorithms 2011:76–87
Google Scholar
Shi Q, Shan J, Yan W, Wu Y, Wu X (2020) NetNPG: nonoverlapping pattern matching with general gap constraints. Applied Intelligence. https://doi.org/10.1007/s10489-019-01616-z
Sippu S, Soisalon-Soininen E (2013) Online matching of multiple regular patterns with gaps and character classes. International Conference on Language and Automata Theory and Applications 2013:523–534
MathSciNet MATH Google Scholar
Wu Y, Shen C, Jiang H, Wu X (2017) Strict pattern matching under non-overlapping condition. Science China Information Sciences 60(1):012101
Google Scholar
Hu H, Wang H, Li J, Gao H (2016) An efficient pruning strategy for approximate string matching over suffix tree. Knowl Inf Syst 49(1):121–141
Google Scholar
Wu Y, Li S, Liu J, Wu X (2018) NETASPNO: approximate Strict pattern matching under nonoverlapping condition. IEEE Access 6:24350–24361
Google Scholar
Arslan AN (2018) A fast algorithm for all-pairs Hamming distances. Inf Process Lett 139:49–52
MathSciNet MATH Google Scholar
Bille P, Gørtz IL, Vildhøj HW, Wind DK (2012) String matching with variable length gaps. Theoretical Computer Science 443:25–34
MathSciNet MATH Google Scholar
Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99– 116
Google Scholar
Wang X, Duan L, Dong G, Ye Z, Tang C (2014) Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International conference on database systems for advanced applications. Springer, Cham, pp 372–387
Arslan AN, George B, Stor K (2015) New algorithms for pattern matching with wildcards and length constraints. Discrete Mathematics, Algorithms and Applications 7(3):1550032
MathSciNet MATH Google Scholar
Liu N, Xie F, Wu X (2018) Multi-pattern matching with variable-length wildcards using suffix tree. Pattern Anal Applic 21(4):1151–1165
MathSciNet Google Scholar
Wu Y, Wang Y, Liu J, Yu M, Liu J, Li Y (2019) Mining distinguishing subsequence patterns with nonoverlapping condition. Clust Comput 22(3):5905–5917
Google Scholar
Liu H, Wang L, Liu Z, Zhao P, Wu X (2018) Efficient pattern matching with periodical wildcards in uncertain sequences. Intelligent Data Analysis 22(4):829–842
Google Scholar
Kim J, Eades P, Fleischer R, Hong S, Iliopoulos CS, Park K, Puglisi SJ, Tokuyama T (2014) Order-preserving matching. Theor Comput Sci 525:68–79
MathSciNet MATH Google Scholar
Crochemore M, Iliopoulos CS, Makris C, Rytter W, Tsakalidis AK, Tsichlas T (2002) Approximate string matching with gaps. Nordic Journal of Computing 9(1):54–65
MathSciNet MATH Google Scholar
Navarro G, Raffinot M (2013) Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J Comput Biol 10(6):903–23
Google Scholar
Dong X, Gong Y, Cao L (2018) e-RNSP: an efficient method for mining repetition negative sequential patterns. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2018.2869907
Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao SY, Yuan C (2018) Review on mining data from multiple data sources. Pattern Recogn Lett 109:120–128
Google Scholar
Le T, Vo B, Fournier-Viger P, Lee MY, Baik SW (2019) SPPC: a new tree structure for mining erasable patterns in data streams. Appl Intell 49(2):478–495
Google Scholar
Wu Y, Tong Y, Zhu X, Wu X (2018) NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822
Google Scholar
Min F, Zhang Z, Zhai WJ, Shen RP (2020) Frequent pattern discovery with tri-partition alphabets. Inf Sci 507:715–732
MathSciNet Google Scholar
Song W, Jiang B, Qiao Y (2018) Mining multi-relational high utility itemsets from star schemas. Intelligent Data Analysis 22(1):143–165
Google Scholar
Yun U, Nam H, Lee G, Yoon E (2019) Efficient approach for incremental high utility pattern mining with indexed list structure. Futur Gener Comput Syst 95:221–239
Google Scholar
Xie F, Wu X, Zhu X (2017) Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl-Based Syst 115:27–39
Google Scholar
Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39:57–74
Google Scholar
Wu Y, Zhu C, Li Y, Guo L, Wu X (2020) NetNCSP: nonoverlapping closed sequential pattern mining. Knowledge-Based Systems
Fischer MJ, Paterson MS (1974) String-matching and other products. Proceedings of the 7th SIAM ANS Complexity of Computation 1974:113–125
MathSciNet MATH Google Scholar
Manber U, Baeza–Yates R (1991) An algorithm for string matching with a sequence of don’t cares. Inf Process Lett 37(3):133–136
MathSciNet MATH Google Scholar
Min F, Wu X, Lu Z (2009) Pattern matching with independent wildcard gaps. Proceedings of the 8th International Conference on Pervasive Intelligence and Computing 2009:194–199
Google Scholar
Wu Y, Wu X, Min F, Li Y (2010) A Nettree for pattern matching with flexible wildcard constraints. In: 2010 IEEE international conference on information reuse and integration, vol 2010, pp 109–114
Warmuth MK, David H (1984) On the complexity of iterated shuffle. J Comput Syst Sci 28 (3):345–358
MathSciNet MATH Google Scholar
Guo D, Yuan E, Hu X (2016) Frequent pattern mining based on approximate edit distance matrix. IEEE First International Conference on Data Science in Cyberspace (DSC) 2016:179–188
Google Scholar
Min F, Wu Y, Wu X (2010) The Apriori property of sequence pattern mining with wildcard gaps. IEEE International Conference on Bioinformatics and Biomedicine Workshops 2010:138–143
Google Scholar
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
MathSciNet Google Scholar

Download references

Acknowledgements

This work was partly supported by National Natural Science Foundation of China (61976240, 61571180, 917446209), National Key Research and Development Program of China(2016YFB1000901), and Graduate Student Innovation Program of Hebei Province (CXZZSS2019035).

Author information

Authors and Affiliations

School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China
Youxi Wu & Jinquan Fan
State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, 300401, China
Youxi Wu & Lei Guo
Hebei Key Laboratory of Big Data Computing, Tianjin, 300401, China
Youxi Wu
School of Economics and Management, Hebei University of Technology, Tianjin, 300401, China
Yan Li
Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei, 230009, China
Xindong Wu
Mininglamp Academy of Sciences, Mininglamp Technology, Beijing, 100084, China
Xindong Wu

Authors

Youxi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jinquan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Fan, J., Li, Y. et al. NetDAP: (δ, γ) −approximate pattern matching with length constraints. Appl Intell 50, 4094–4116 (2020). https://doi.org/10.1007/s10489-020-01778-1

Download citation

Published: 10 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10489-020-01778-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NetDAP: (δ, γ) −approximate pattern matching with length constraints

Abstract

Access this article

Similar content being viewed by others

Strict approximate pattern matching with general gaps

NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition

Strict pattern matching under non-overlapping condition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NetDAP: (δ, γ) −approximate pattern matching with length constraints

Abstract

Access this article

Similar content being viewed by others

Strict approximate pattern matching with general gaps

NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition

Strict pattern matching under non-overlapping condition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation