Strict pattern matching under non-overlapping condition

Wu, Youxi; Shen, Cong; Jiang, He; Wu, Xindong

doi:10.1007/s11432-015-0935-3

Strict pattern matching under non-overlapping condition

无重叠条件的严格模式匹配

Research Paper
Published: 15 November 2016

Volume 60, article number 012101, (2017)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Youxi Wu^1,2,
Cong Shen^1,3,
He Jiang⁴ &
…
Xindong Wu^5,6

185 Accesses
34 Citations
Explore all metrics

Abstract

Pattern matching (or string matching) is an essential task in computer science, especially in sequential pattern mining, since pattern matching methods can be used to calculate the support (or the number of occurrences) of a pattern and then to determine whether the pattern is frequent or not. A state-of-the-art sequential pattern mining with gap constraints (or flexible wildcards) uses the number of non-overlapping occurrences to denote the frequency of a pattern. Non-overlapping means that any two occurrences cannot use the same character of the sequence at the same position of the pattern. In this paper, we investigate strict pattern matching under the non-overlapping condition. We show that the problem is in P at first. Then we propose an algorithm, called NETLAP-Best, which uses Nettree structure. NETLAP-Best transforms the pattern matching problem into a Nettree and iterates to find the rightmost root-leaf path, to prune the useless nodes in the Nettree after removing the rightmost root-leaf path. We show that NETLAP-Best is a complete algorithm and analyse the time and space complexities of the algorithm. Extensive experimental results demonstrate the correctness and efficiency of NETLAP-Best.

摘要

创新点

模式匹配 (串匹配) 是计算机科学中至关重要的一个任务, 特别是在序列模式挖掘中, 因为模式匹配方法可以用来计算一个模式在序列中的支持度 (出现数), 进而判断这个模式是否频繁。一种具有间隙约束 (可变长度通配符) 的序列模式挖掘算法采用模式的无重叠出现数目来表示这个模式的频度, 这里无重叠是指任何两个出现不能共用序列的相同位置的字符。首先理论证明了无重叠条件的严格模式匹配的计算复杂度是 P, 然后提出了一个基于网树结构的 NETLAP-Best 算法, 该算法将模式匹配问题转换为一颗网树, 并在网树上迭代地寻找最右树根-叶子路径, 之后剪去这条路径和无用的网树结点。之后理论证明了 NETLAP-Best 算法的完备性并分析了该算法的时间和空间复杂度。大量实验结果验证了 NETLAP-Best 算法的正确性和有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Li C, Yang Q Y, Wang J Y, et al. Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data, 2012, 6: 2
Article MathSciNet Google Scholar
Wang P, Xu B W, Wu Y R, et al. Link prediction in social networks: the state-of-the-art. Sci China Inf Sci, 2015, 58: 011101
Google Scholar
Liu J, Ma Z M, Feng X. Answering ordered tree pattern queries over fuzzy XML data. Knowl Inf Syst, 2015, 43: 473–495
Article Google Scholar
Xuan J F, Jiang H, Hu Y, et al. Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng, 2015, 27: 264–280
Article Google Scholar
Cook D, Krishnan N C, Rashidi P. Activity discovery and activity recognition: a new partnership. IEEE Trans Cybern, 2013, 43: 820–828
Article Google Scholar
Weng L N, Zhang P, Feng Z Y, et al. Short-term link quality prediction using nonparametric time series analysis. Sci China Inf Sci, 2015, 58: 082308
Article Google Scholar
Rajpathak D, De S. A data-and ontology-driven text mining-based construction of reliability model to analyze and predict component failures. Knowl Inf Syst, 2016, 46: 87–113
Article Google Scholar
Navarro G. Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Comput Surv, 2014, 46: 52
Article MATH Google Scholar
Jiang H, Xuan J F, Ren Z L, et al. Misleading classification. Sci China Inf Sci, 2014, 57: 052106
Google Scholar
Le H, Prasanna V K. A memory-efficient and modular approach for large-scale string pattern matching. IEEE Trans Comput, 2013, 62: 844–857
Article MathSciNet Google Scholar
Claude F, Navarro G, Peltola H, et al. String matching with alphabet sampling. J Discrete Algorithms, 2012, 11: 37–50
Article MathSciNet MATH Google Scholar
Wandelt S, Deng D, Gerdjikov S, et al. State-of-the-art in string similarity search and join. ACM SIGMOD Rec, 2014, 43: 64–76
Article Google Scholar
Li Z, Ge T J. Online windowed subsequence matching over probabilistic sequences. In: Proceedings of ACM International Conference on Management of Data. New York: ACM, 2012. 277–288
Google Scholar
Chen K-H, Huang G-S, Lee R C-T. Bit-parallel algorithms for exact circular string matching. Comput J, 2014, 57: 731–743
Article Google Scholar
Hu H, Wang H Z, Li J Z, et al. An efficient pruning strategy for approximate string matching over suffix tree. Knowl Inf Syst, 2016, 49: 121–141
Article Google Scholar
Li F F, Yao B, Tang M W, et al. Spatial approximate string search. IEEE Trans Knowl Data Eng, 2013, 25: 1394–1409
Article Google Scholar
Wu X D, Qiang J P, Xie F. Pattern matching with flexible wildcards. J Comput Sci Technol, 2014, 29: 740–750
Article MathSciNet Google Scholar
Wu Y X, Wu X D, Min F, et al. A Nettree for pattern matching with flexible wildcard constraints. In: Proceeding of IEEE International Conference on Information Reuse and Integration, Las Vegas, 2010. 109–114
Google Scholar
Retwitzer M D, Polishchuk M, Churkin E, et al. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res, 2015, doi: 10.1093/nar/gkv435
Google Scholar
Wang X M, Duan L, Dong G Z, et al. Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: Proceedings of International Conference Database Systems for Advanced Applications, Bali, 2014. 372–387
Chapter Google Scholar
Liao V C-C, Chen M-S. Efficient mining gapped sequential patterns for motifs in biological sequences. BMC Syst Biol, 2013, 7: S7
Article Google Scholar
Ding B L, Lo D, Han J W, et al. Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of IEEE International Conference on Data Engineering, Shanghai, 2009. 1024–1035
Google Scholar
Yang H, Duan L, Hu B, et al. Mining top-k distinguishing sequential patterns with gap constraint. J Softw, 2015, 26: 2994–3009
MathSciNet MATH Google Scholar
Crochemore M, Iliopoulos C, Makris C, et al. Approximate string matching with gaps. Nordic J Comput, 2002, 9: 54–65
MathSciNet MATH Google Scholar
Cantone D, Cristofaro S, Faro S. New efficient bit-parallel algorithms for the (δ, α)-matching problem with applications in music information retrieval. Int J Found Comput Sci, 2009, 20: 1087–1108
Article MathSciNet MATH Google Scholar
Cole J, Chai B, Farris R, et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res, 2005, 33: 294–296
Article Google Scholar
Cole R, Gottlieb L, Lewenstein M. Dictionary matching and indexing with errors and don’t care. In: Proceeding of Symposium on Theory of Computing, Chicago, 2004. 91–100
Google Scholar
Zhang M H, Kao B, Cheung D W, et al. Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data, 2007, 1: 7
Article Google Scholar
Wu Y X, Wang L L, Ren J D, et al. Mining sequential patterns with periodic wildcard gaps. Appl Intell, 2014, 41: 99–116
Article Google Scholar
Wu X D, Zhu X Q, He Y, et al. PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med, 2013, 43: 481–492
Article Google Scholar
Ibrahim A, Sastry S, Sastry P S. Discovering compressing serial episodes from event sequences. Knowl Inf Syst, 2016, 47: 405–432
Article Google Scholar
Lam H, Mörchen F, Fradkin D, et al. Mining compressing sequential patterns. Stat Anal Data Min, 2013, 7: 34–52
Article MathSciNet Google Scholar
El-Ramly M, Stroulia E, Sorenson P. From run-time behavior to usage scenarios: an interaction-pattern mining approach. In: Proceeding of ACM International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 315–324
Google Scholar
Bille P, Gørtz I, Vildhøj H W, et al. String matching with variable length gaps. Theor Comput Sci, 2012, 443: 25–34
Article MathSciNet MATH Google Scholar
Wu Y X, Fu S, Jiang H, et al. Strict approximate pattern matching with general gaps. Appl Intell, 2015, 42: 566–580
Article Google Scholar
Wu Y X, Tang Z Q, Jiang H, et al. Approximate pattern matching with gap constraints. J Inf Sci, 2016, 42: 639–658
Article Google Scholar
Chai X, Jia X F, Wu Y X, et al. Strict pattern matching with general gaps and one-off condition (in Chinese). J Softw, 2015, 26: 1096–1112
MathSciNet Google Scholar
Guo D, Hu X G, Xie F, et al. Pattern matching with wildcards and gap-Length constraints based on a centrality-degree graph. Appl Intell, 2013, 39: 57–74
Article Google Scholar
Wu Y X, Wu X D, Jiang H, et al. A heuristic algorithm for MPMGOOC. Chin J Comput, 2011, 34: 1452–1462
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hebei University of Technology, Tianjin, 300401, China
Youxi Wu & Cong Shen
Hebei Province Key Laboratory of Big Data Calculation, Tianjin, 300401, China
Youxi Wu
School of Computer Science and Technology, Tianjin University, Tianjin, 300072, China
Cong Shen
School of Software, Dalian University of Technology, Dalian, 116621, China
He Jiang
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, China
Xindong Wu
School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, LA, 70503, USA
Xindong Wu

Authors

Youxi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Shen
View author publications
You can also search for this author in PubMed Google Scholar
He Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youxi Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Shen, C., Jiang, H. et al. Strict pattern matching under non-overlapping condition. Sci. China Inf. Sci. 60, 012101 (2017). https://doi.org/10.1007/s11432-015-0935-3

Download citation

Received: 14 February 2016
Accepted: 11 May 2016
Published: 15 November 2016
DOI: https://doi.org/10.1007/s11432-015-0935-3

Keywords

关键词

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Strict pattern matching under non-overlapping condition

Abstract

摘要

创新点

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

NetNPG: Nonoverlapping pattern matching with general gap constraints

Self-adaptive nonoverlapping sequential pattern mining

NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

关键词

Subscribe and save

Buy Now

Navigation

Strict pattern matching under non-overlapping condition

Abstract

摘要

创新点

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

NetNPG: Nonoverlapping pattern matching with general gap constraints

Self-adaptive nonoverlapping sequential pattern mining

NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

Subscribe and save

Buy Now

Search

Navigation