skip to main content
10.1145/3352411.3352443acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdsitConference Proceedingsconference-collections
research-article

Mining Contrast Sequential Patterns based on Subsequence Location Distribution from Biological Sequences

Authors Info & Claims
Published:19 July 2019Publication History

ABSTRACT

With the generation of a large amount of biological data, researches on methods that can automatically analyze these biological data has become a hot spot. Contrast sequential patterns play an important role in identifying the characteristics of different biological sequences. However, previous studies on mining contrast sequential pattern did not consider the effects of gene/amino acid location distribution on patterns in given biological sequences. In this paper, we introduce the subsequence location distribution into the conditions of the contrast sequence pattern mining, extending previous studies which only considered support of patterns. We also design a novel algorithm, SLD-tree, which compresses datasets into the tree to avoid repeated scanning of the dataset, and can effectively mines contrast sequential patterns based on subsequence location distribution. The empirical study using real-world biological sequence demonstrates the effectiveness of our method. Moreover, we carry out classification experiment, the results verify our method have higher classification accuracy.

References

  1. Cao, Y., Liu, C., & Han, Y.. (2018). A frequent sequential pattern based approach for discovering event correlations.Google ScholarGoogle Scholar
  2. Fumarola, F., Lanotte, P. F., Ceci, M., & Malerba, D.. (2016). Clofast: closed sequential pattern mining using sparse and vertical id-lists. Knowledge and Information Systems, 48(2), 429--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Duy-Tai, D., Bac, L., Philippe, F. V., & Van-Nam, H.. (2018). An efficient algorithm for mining periodic high-utility sequential patterns. Applied Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nichols, M., & Taylor, D.. (2007). A faster closure algorithm for pattern matching in partial-order event data.Google ScholarGoogle Scholar
  5. Zheng, Z., Wei, W., Liu, C., Cao, W., & Bhatia, M.. (2015). An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web, 19(4), 633--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pang, T., Duan, L., Li-Ling, J., & Dong, G.. (2017). Mining Similarity-Aware Distinguishing Sequential Patterns from Biomedical Sequences. 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  7. Huang, Z., Lu, X., & Duan, H.. (2012). On mining clinical pathway patterns from medical behaviors. Artificial Intelligence in Medicine, 56(1), 35--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Liao, C. C., & Chen, M. S.. (2014). Dfsp: a depth-first spelling algorithm for sequential pattern mining of biological sequences. Knowledge and Information Systems, 38(3), 623--639.Google ScholarGoogle ScholarCross RefCross Ref
  9. Li, W., & Ren, J.. (2017). Biological sequence pattern mining algorithm based on data index technology. Informatics in Medicine Unlocked.Google ScholarGoogle Scholar
  10. Wu, X., Zhu, X., He, Y., & Arslan, A. N.. (2013). Pmbc: pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 43(5), 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hao Y, Lei D, Bin H U, et al. (2015). Mining Top-k Distinguishing Sequential Patterns with Gap Constraint. Journal of Software.Google ScholarGoogle Scholar
  12. Gao, C., Duan, L., Dong, G., Zhang, H., Yang, H., & Tang, C.. (2016). Mining Top-k Distinguishing Sequential Patterns with Flexible Gap Constraints. International Conference on Web-age Information Management. Springer, Cham.Google ScholarGoogle Scholar
  13. Duan, L., Yan, L., Dong, G., Nummenmaa, J., & Yang, H.. (2017). Mining Top-k Distinguishing Temporal Sequential Patterns from Event Sequences. International Conference on Database Systems for Advanced Applications. Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  14. Xiang-Tao, C., & Bi-Wen, X.. (2017). Emerging sequences pattern mining based on location information. Computer Science.Google ScholarGoogle Scholar
  15. Pfam. http://pfam.xfam.org/Google ScholarGoogle Scholar
  16. Ji, X., Bailey, J., & Dong, G.. (2007). Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 11(3), 259--286.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining Contrast Sequential Patterns based on Subsequence Location Distribution from Biological Sequences

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      DSIT 2019: Proceedings of the 2019 2nd International Conference on Data Science and Information Technology
      July 2019
      280 pages
      ISBN:9781450371414
      DOI:10.1145/3352411

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      DSIT 2019 Paper Acceptance Rate43of95submissions,45%Overall Acceptance Rate114of277submissions,41%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader