ABSTRACT
CRISPR from Prevotella and Francisella 1 (Cpf1), a RNA-guided DNA endonuclease that belongs to a novel class II CRISPR system, has recently become a popular tool for genome editing. How to improve the on-target efficiency and specificity of this system is an important and challenging problem. This paper presents a method for CRISPR-Cpf1 guide RNA activity prediction. Convolutional Neural Network (CNN) and support vector regression (SVR) are combined for this purpose. In the proposed framework, single-base substitution mutation data augmentation technique is applied to generate guide RNAs with indel frequencies, thus increasing the labeled data. In the hybrid CNN-SVR model, CNN works as a trainable feature extractor and SVR performs as the regression operator. Specifically, a merged CNN-based regression model is used to pre-train the model for predicting Cpf1 activity based on target sequence composition. Considering the chromatin accessibility information, the SVR is used to generate the predictions. Experiments on the commonly datasets show that our algorithm outperforms the available state-of-the-art tools.
- Zetsche, B., Heidenreich, M., Mohanraju, P., Fedorova, I., Kneppers, J., DeGennaro, E.M., Winblad, N., Choudhury, S.R., Abudayyeh, O.O., Gootenberg, J.S., Wu, W.Y., Scott, D.A., Severinov, K., van der Oost, J. and Zhang, F. 2017. Multiplex gene editing by CRISPR-Cpf1 using a single crRNA array. Nat. Biotechnol. 35, 31--34.Google ScholarCross Ref
- Kim, H.K., Song, M., Lee, J., Menon, A.V., Jung, S., Kang, Y.M., Choi, J.W., Woo, E., Koh, H.C., Nam, J.W. and Kim, H. 2017. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat Methods 14, 153--159.Google ScholarCross Ref
- Zetsche, B., Gootenberg, J.S., Abudayyeh, O.O., Slaymaker, I.M., Makarova, K.S., Essletzbichler, P., Volz, S.E., Joung, J., van der Oost, J., Regev, A., Koonin, E.V. and Zhang, F. 2015. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163, 759--771.Google ScholarCross Ref
- Hsu, P.D., Scott, D.A., Weinstein, J.A., Ran, F.A., Konermann, S., Agarwala, V., Li, Y., Fine, E.J., Wu, X., Shalem, O., Cradick, T.J., Marraffini, L.A., Bao, G. and Zhang, F. 2013. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827--832.Google ScholarCross Ref
- Kuan, P.F., Powers, S., He, S., Li, K., Zhao, X. and Huang, B. 2017. A systematic evaluation of nucleotide properties for CRISPR sgRNA design. BMC Bioinformatics 18, 297.Google ScholarCross Ref
- Xie, S., Shen, B., Zhang, C., Huang, X. and Zhang, Y. 2014. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. Plos One 9, e100448.Google ScholarCross Ref
- Erard, N., Knott, S.R.V. and Hannon, G.J. 2017. A CRISPR Resource for Individual, Combinatorial, or Multiplexed Gene Knockout. Molecular Cell. 67, 348.Google ScholarCross Ref
- Ma, J., Köster, J., Qin, Q., Hu, S., Li, W., Chen, C., Cao, Q., Wang, J., Mei, S. and Liu, Q. 2016. CRISPR-DO for genome-wide CRISPR design and optimization. Bioinformatics 32, 3336--3338.Google ScholarCross Ref
- Kim, H.K., Min, S., Song, M., Jung, S., Choi, J.W., Kim, Y., Lee, S., Yoon, S. and Kim, H.H. 2018. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239--241.Google ScholarCross Ref
- Kuscu, C., Arslan, S., Singh, R., Thorpe, J. and Adli, M. 2014. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 32, 677--683.Google ScholarCross Ref
- Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F., Smith, I., Tothova, Z., Wilen, C., Orchard, R., Virgin, H.W., Listgarten, J. and Root, D.E. 2016. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184--191.Google ScholarCross Ref
- Aach, J., Mali, P. and Church, G.M. 2014. CasFinder: Flexible algorithm for identifying specific Cas9 targets in genomes. Biorxiv.Google Scholar
- LeCun, Y., Bengio, Y. and Hinton, G. 2015. Deep learning. Nature 521, 436--444.Google ScholarCross Ref
- Chuai, G., Ma, H., Yan, J., Chen, M., Hong, N., Xue, D., Zhou, C., Zhu, C., Chen, K., Duan, B., Gu, F., Qu, S., Huang, D., Wei, J. and Liu, Q. 2018. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol 19: 80.Google ScholarCross Ref
- Kim, D., Kim, J., Hur, J.K., Been, K.W., Yoon, S.H. and Kim, J.S. 2016. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863--868.Google ScholarCross Ref
- Kleinstiver, B.P., Tsai, S.Q., Prew, M.S., Nguyen, N.T., Welch, M.M., Lopez, J.M., Mccaw, Z.R., Aryee, M.J. and Joung, J.K. 2016. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869--874.Google ScholarCross Ref
- Ioffe, S. and Szegedy, C. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 448--456 Google ScholarDigital Library
- Nair, V. and Hinton, G.E. 2010.in International Conference on International Conference on Machine Learning, 807--814 Google ScholarDigital Library
- Huang, G., Liu, Z., Laurens, V.D.M. and Weinberger, K.Q. 2016. Densely Connected Convolutional Networks. 2261--2269.Google Scholar
- Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J. and Vapnik, V. 1997.in Advances in neural information processing systems, 155--161.Google Scholar
- Basak, D., Pal, S. and Patranabis, D.C. 2007. Support vector regression. Neural Information Processing-Letters and Reviews 11, 203--224Google Scholar
- Chari, R., Mali, P., Moosburner, M. and Church, G.M. 2015. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12, 823--826.Google ScholarCross Ref
- Liu, Y., Fu, L., Kaufmann, K., Chen, D. and Chen, M. 2018. A practical guide for DNase-seq data analysis: from data management to common applications. Brief BioinformGoogle Scholar
- Kingma, D. and Ba, J. 2014. Adam: A Method for Stochastic Optimization. Computer ScienceGoogle Scholar
Index Terms
- CNN-SVR for CRISPR-Cpf1 Guide RNA Activity Prediction with Data Augmentation
Recommendations
Computational prediction of RNA editing sites
Motivation: Some organisms edit their messenger RNA resulting in differences between the genomic sequence for a gene and the corresponding messenger RNA sequence. This difference complicates experimental and computational attempts to find and study ...
Identifying small interfering RNA loci from high-throughput sequencing data
Motivation: Small interfering RNAs (siRNAs) are produced from much longer sequences of double-stranded RNA precursors through cleavage by Dicer or a Dicer-like protein. These small RNAs play a key role in genetic and epigenetic regulation; however, a ...
Comments