skip to main content
10.1145/3314367.3314383acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbbConference Proceedingsconference-collections
research-article

CNN-SVR for CRISPR-Cpf1 Guide RNA Activity Prediction with Data Augmentation

Authors Info & Claims
Published:07 January 2019Publication History

ABSTRACT

CRISPR from Prevotella and Francisella 1 (Cpf1), a RNA-guided DNA endonuclease that belongs to a novel class II CRISPR system, has recently become a popular tool for genome editing. How to improve the on-target efficiency and specificity of this system is an important and challenging problem. This paper presents a method for CRISPR-Cpf1 guide RNA activity prediction. Convolutional Neural Network (CNN) and support vector regression (SVR) are combined for this purpose. In the proposed framework, single-base substitution mutation data augmentation technique is applied to generate guide RNAs with indel frequencies, thus increasing the labeled data. In the hybrid CNN-SVR model, CNN works as a trainable feature extractor and SVR performs as the regression operator. Specifically, a merged CNN-based regression model is used to pre-train the model for predicting Cpf1 activity based on target sequence composition. Considering the chromatin accessibility information, the SVR is used to generate the predictions. Experiments on the commonly datasets show that our algorithm outperforms the available state-of-the-art tools.

References

  1. Zetsche, B., Heidenreich, M., Mohanraju, P., Fedorova, I., Kneppers, J., DeGennaro, E.M., Winblad, N., Choudhury, S.R., Abudayyeh, O.O., Gootenberg, J.S., Wu, W.Y., Scott, D.A., Severinov, K., van der Oost, J. and Zhang, F. 2017. Multiplex gene editing by CRISPR-Cpf1 using a single crRNA array. Nat. Biotechnol. 35, 31--34.Google ScholarGoogle ScholarCross RefCross Ref
  2. Kim, H.K., Song, M., Lee, J., Menon, A.V., Jung, S., Kang, Y.M., Choi, J.W., Woo, E., Koh, H.C., Nam, J.W. and Kim, H. 2017. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat Methods 14, 153--159.Google ScholarGoogle ScholarCross RefCross Ref
  3. Zetsche, B., Gootenberg, J.S., Abudayyeh, O.O., Slaymaker, I.M., Makarova, K.S., Essletzbichler, P., Volz, S.E., Joung, J., van der Oost, J., Regev, A., Koonin, E.V. and Zhang, F. 2015. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163, 759--771.Google ScholarGoogle ScholarCross RefCross Ref
  4. Hsu, P.D., Scott, D.A., Weinstein, J.A., Ran, F.A., Konermann, S., Agarwala, V., Li, Y., Fine, E.J., Wu, X., Shalem, O., Cradick, T.J., Marraffini, L.A., Bao, G. and Zhang, F. 2013. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827--832.Google ScholarGoogle ScholarCross RefCross Ref
  5. Kuan, P.F., Powers, S., He, S., Li, K., Zhao, X. and Huang, B. 2017. A systematic evaluation of nucleotide properties for CRISPR sgRNA design. BMC Bioinformatics 18, 297.Google ScholarGoogle ScholarCross RefCross Ref
  6. Xie, S., Shen, B., Zhang, C., Huang, X. and Zhang, Y. 2014. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. Plos One 9, e100448.Google ScholarGoogle ScholarCross RefCross Ref
  7. Erard, N., Knott, S.R.V. and Hannon, G.J. 2017. A CRISPR Resource for Individual, Combinatorial, or Multiplexed Gene Knockout. Molecular Cell. 67, 348.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ma, J., Köster, J., Qin, Q., Hu, S., Li, W., Chen, C., Cao, Q., Wang, J., Mei, S. and Liu, Q. 2016. CRISPR-DO for genome-wide CRISPR design and optimization. Bioinformatics 32, 3336--3338.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kim, H.K., Min, S., Song, M., Jung, S., Choi, J.W., Kim, Y., Lee, S., Yoon, S. and Kim, H.H. 2018. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239--241.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kuscu, C., Arslan, S., Singh, R., Thorpe, J. and Adli, M. 2014. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 32, 677--683.Google ScholarGoogle ScholarCross RefCross Ref
  11. Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F., Smith, I., Tothova, Z., Wilen, C., Orchard, R., Virgin, H.W., Listgarten, J. and Root, D.E. 2016. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184--191.Google ScholarGoogle ScholarCross RefCross Ref
  12. Aach, J., Mali, P. and Church, G.M. 2014. CasFinder: Flexible algorithm for identifying specific Cas9 targets in genomes. Biorxiv.Google ScholarGoogle Scholar
  13. LeCun, Y., Bengio, Y. and Hinton, G. 2015. Deep learning. Nature 521, 436--444.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chuai, G., Ma, H., Yan, J., Chen, M., Hong, N., Xue, D., Zhou, C., Zhu, C., Chen, K., Duan, B., Gu, F., Qu, S., Huang, D., Wei, J. and Liu, Q. 2018. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol 19: 80.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kim, D., Kim, J., Hur, J.K., Been, K.W., Yoon, S.H. and Kim, J.S. 2016. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863--868.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kleinstiver, B.P., Tsai, S.Q., Prew, M.S., Nguyen, N.T., Welch, M.M., Lopez, J.M., Mccaw, Z.R., Aryee, M.J. and Joung, J.K. 2016. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869--874.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ioffe, S. and Szegedy, C. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 448--456 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nair, V. and Hinton, G.E. 2010.in International Conference on International Conference on Machine Learning, 807--814 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Huang, G., Liu, Z., Laurens, V.D.M. and Weinberger, K.Q. 2016. Densely Connected Convolutional Networks. 2261--2269.Google ScholarGoogle Scholar
  20. Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J. and Vapnik, V. 1997.in Advances in neural information processing systems, 155--161.Google ScholarGoogle Scholar
  21. Basak, D., Pal, S. and Patranabis, D.C. 2007. Support vector regression. Neural Information Processing-Letters and Reviews 11, 203--224Google ScholarGoogle Scholar
  22. Chari, R., Mali, P., Moosburner, M. and Church, G.M. 2015. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12, 823--826.Google ScholarGoogle ScholarCross RefCross Ref
  23. Liu, Y., Fu, L., Kaufmann, K., Chen, D. and Chen, M. 2018. A practical guide for DNase-seq data analysis: from data management to common applications. Brief BioinformGoogle ScholarGoogle Scholar
  24. Kingma, D. and Ba, J. 2014. Adam: A Method for Stochastic Optimization. Computer ScienceGoogle ScholarGoogle Scholar

Index Terms

  1. CNN-SVR for CRISPR-Cpf1 Guide RNA Activity Prediction with Data Augmentation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICBBB '19: Proceedings of the 2019 9th International Conference on Bioscience, Biochemistry and Bioinformatics
      January 2019
      115 pages
      ISBN:9781450366540
      DOI:10.1145/3314367

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 January 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader