ABSTRACT
With the generation of a large amount of biological data, researches on methods that can automatically analyze these biological data has become a hot spot. Contrast sequential patterns play an important role in identifying the characteristics of different biological sequences. However, previous studies on mining contrast sequential pattern did not consider the effects of gene/amino acid location distribution on patterns in given biological sequences. In this paper, we introduce the subsequence location distribution into the conditions of the contrast sequence pattern mining, extending previous studies which only considered support of patterns. We also design a novel algorithm, SLD-tree, which compresses datasets into the tree to avoid repeated scanning of the dataset, and can effectively mines contrast sequential patterns based on subsequence location distribution. The empirical study using real-world biological sequence demonstrates the effectiveness of our method. Moreover, we carry out classification experiment, the results verify our method have higher classification accuracy.
- Cao, Y., Liu, C., & Han, Y.. (2018). A frequent sequential pattern based approach for discovering event correlations.Google Scholar
- Fumarola, F., Lanotte, P. F., Ceci, M., & Malerba, D.. (2016). Clofast: closed sequential pattern mining using sparse and vertical id-lists. Knowledge and Information Systems, 48(2), 429--463. Google ScholarDigital Library
- Duy-Tai, D., Bac, L., Philippe, F. V., & Van-Nam, H.. (2018). An efficient algorithm for mining periodic high-utility sequential patterns. Applied Intelligence. Google ScholarDigital Library
- Nichols, M., & Taylor, D.. (2007). A faster closure algorithm for pattern matching in partial-order event data.Google Scholar
- Zheng, Z., Wei, W., Liu, C., Cao, W., & Bhatia, M.. (2015). An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web, 19(4), 633--651. Google ScholarDigital Library
- Pang, T., Duan, L., Li-Ling, J., & Dong, G.. (2017). Mining Similarity-Aware Distinguishing Sequential Patterns from Biomedical Sequences. 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). IEEE.Google ScholarCross Ref
- Huang, Z., Lu, X., & Duan, H.. (2012). On mining clinical pathway patterns from medical behaviors. Artificial Intelligence in Medicine, 56(1), 35--50. Google ScholarDigital Library
- Liao, C. C., & Chen, M. S.. (2014). Dfsp: a depth-first spelling algorithm for sequential pattern mining of biological sequences. Knowledge and Information Systems, 38(3), 623--639.Google ScholarCross Ref
- Li, W., & Ren, J.. (2017). Biological sequence pattern mining algorithm based on data index technology. Informatics in Medicine Unlocked.Google Scholar
- Wu, X., Zhu, X., He, Y., & Arslan, A. N.. (2013). Pmbc: pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 43(5), 481--492. Google ScholarDigital Library
- Hao Y, Lei D, Bin H U, et al. (2015). Mining Top-k Distinguishing Sequential Patterns with Gap Constraint. Journal of Software.Google Scholar
- Gao, C., Duan, L., Dong, G., Zhang, H., Yang, H., & Tang, C.. (2016). Mining Top-k Distinguishing Sequential Patterns with Flexible Gap Constraints. International Conference on Web-age Information Management. Springer, Cham.Google Scholar
- Duan, L., Yan, L., Dong, G., Nummenmaa, J., & Yang, H.. (2017). Mining Top-k Distinguishing Temporal Sequential Patterns from Event Sequences. International Conference on Database Systems for Advanced Applications. Springer, Cham.Google ScholarCross Ref
- Xiang-Tao, C., & Bi-Wen, X.. (2017). Emerging sequences pattern mining based on location information. Computer Science.Google Scholar
- Pfam. http://pfam.xfam.org/Google Scholar
- Ji, X., Bailey, J., & Dong, G.. (2007). Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 11(3), 259--286.Google ScholarDigital Library
Index Terms
- Mining Contrast Sequential Patterns based on Subsequence Location Distribution from Biological Sequences
Recommendations
Mining negative sequential patterns
ACOS'07: Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6Sequential pattern mining is to discover all frequent sequences from a sequence database and has been an important issue in data mining. A lot of methods have been proposed for mining sequential pattern. However, conventional methods consider only the ...
Mining Sequential Patterns across Time Sequences
AbstractIn this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an ...
An effective approach for mining frequent patterns in multiple biological sequences
BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and BiomedicineMost of the existing algorithms for mining frequent patterns in multiple biosequences could produce lots of projected databases and short candidate patterns which could increase the time and memory cost of mining. In order to overcome such shortcoming, ...
Comments