Abstract
Computer-aided rational vaccine design (RVD) and synthetic pharmacology are rapidly developing fields that leverage existing datasets for developing compounds of interest. Computational proteomics utilizes algorithms and models to probe proteins for functional prediction. A potentially strong target for computational approach is autoimmune antibodies, which are the result of broken tolerance in the immune system where it cannot distinguish “self” from “non-self” resulting in attack of its own structures (proteins and DNA, mainly). The information on structure, function, and pathogenicity of autoantibodies may assist in engineering RVD against autoimmune diseases. Current computational approaches exploit large datasets curated with extensive domain knowledge, most of which include the need for many resources and have been applied indirectly to problems of interest for DNA, RNA, and monomer protein binding. We present a novel method for discovering potential binding sites. We employed long short-term memory (LSTM) models trained on FASTA primary sequences to predict protein binding in DNA-binding hydrolytic antibodies (abzymes). We also employed CNN models applied to the same dataset for comparison with LSTM. While the CNN model outperformed the LSTM on the primary task of binding prediction, analysis of internal model representations of both models showed that the LSTM models recovered sub-sequences that were strongly correlated with sites known to be involved in binding. These results demonstrate that analysis of internal processes of LSTM models may serve as a powerful tool for primary sequence analysis.
Graphical abstract















Similar content being viewed by others
References
Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature Biotechnology 33(8):831–838
Aotsuka S (1988) A kit for the simultaneous estimation of IgG-class antibodies to double-stranded and single-stranded DNA for clinical purposes. The Ryumachi 28:96–101
Beckingham JA, Cleary J, Bobeck M, Glick GD (2003) Kinetic analysis of sequence-specific recognition of ssDNA by an autoantibody. Biochemistry 42(14):4118–4126
Berikov V (2020) Autoencoder-based low-rank spectral ensemble clustering of biological data. In: 2020 Cognitive sciences, genomics and bioinformatics (CSGB). IEEE, pp 43–46
Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H (2000) I. 443 n. Shindyalov, and PE Bourne, 235–242
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv:1412.3555
Consortium M, Consortium (2019) Uniprot: a worldwide hub of protein knowledge. Nucleic Acids Research 47(D1):D506–D515
Gu D, Zhou Y, Kallhoff V, Baban B, Tanner JJ, Becker DF (2004) Identification and characterization of the DNA-binding domain of the multifunctional PutA flavoenzyme. Journal of Biological Chemistry 279(30):31171–31176
Herron JN, He X, Ballard D, Blier P, Pace P, Bothwell A, Voss E Jr, Edmundson A (1991) An autoantibody to single-stranded DNA: comparison of the three-dimensional structures of the unliganded Fab and a deoxynucleotide–Fab complex. Proteins: Structure, Function, and Bioinformatics 11(3):159–175
Hochreiter S, Schmidhuber J (1997) LSTM can solve hard long time lag problems. In: Advances in neural information processing systems, pp 473–479
Hou T, Chen K, McLaughlin WA, Lu B, Wang W (2006) Computational analysis and prediction of the binding motif and protein interacting partners of the Abl SH3 domain. PLoS Comput Biol 2(1):e1
Kaufmann J, Asalone K, Corizzo R, Saldanha C, Bracht J, Japkowicz N (2020) One-class ensembles for rare genomic sequences identification. In: International conference on discovery science. Springer, pp 340–354
Kong Y, Yu T (2020) forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction. Bioinformatics 36(11):3507–3515
Kozyr A (1996) A novel method for purification of catalytic antibodies toward DNA from sera of patients with lymphoproliferative diseases. IUBMB Life 39(2):403–413
Liu J, Gong X (2019) Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinformatics 20(1):609
Liu X (2017) Deep recurrent neural network for protein function prediction from sequence. arXiv:1701.08318
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Briefings in Bioinformatics 18(5):851–869
Mooney C, Pollastri G, Shields DC, Haslam NJ (2012) Prediction of short linear protein binding regions. Journal of Molecular Biology 415(1):193–204
Nielsen M, Lundegaard C, Lund O (2007) Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 8(1):238
Ou Z, Bottoms CA, Henzl MT, Tanner JJ (2007) Impact of DNA hairpin folding energetics on antibody-ssDNA association. Journal of Molecular Biology 374(4):1029–1040
Pan X, Rijnbeek P, Yan J, Shen H-B (2018) Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics 19(1):511
Pan X, Shen H-B (2017) RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics 18(1):136
Paul M, Rachel SC, William EH, Elan B (2020) Predicting binding from screening assays with transformer network embeddings. Journal of Chemical Information and Modeling
Pavlovic M (2009) The role of anti-DNA antibodies in systemic lupus erythematosus (SLE): ranges and perspectives. Rheumatic Disease Clinics of North America
Pavlovic M, Chen R, Kats AM, Cavallo MF, Saccocio S, Keating P, Hartmann JX (2007) Highly specific novel method for isolation and purification of lupus anti-DNA antibody via oligo-(dT) magnetic beads. Annals of the New York Academy of Sciences 1108(1):203–217
Pavlovic M, Kats A, Cavallo M, Shoenfeld Y (2010) Clinical and molecular evidence for association of SLE with parvovirus B19. Lupus 19:7
Pietrokovski S, Henikoff S (1997) A helix-turn-helix DNA-binding motif predicted for transposases of DNA transposons. Molecular and General Genetics MGG 254(6):689–695
Qu Y-H, Yu H, Gong X-J, Xu J-H, Lee H-S (2017) On the prediction of DNA-binding proteins only from primary sequences: a deep learning approach. Plos One 12(12):1–18
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Rives A, Goyal S, Meier J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R (2019) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. aRxiv:622803
Rodkey L, Gololobov G, Rumbley C, Rumbley J, Schourov D, Makarevich O, Gabibov A, Voss E (2000) DNA hydrolysis by monoclonal autoantibody BV 04-01. Applied Biochemistry and Biotechnology 83(1–3):95–105
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) ImageNet large scale visual recognition challenge. International journal of Computer Vision 115(3):211–252
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792):706–710
Spatz L, Iliev A, Saenko V, Jones L, Irigoyen M, Manheimer-Lory A, Gaynor B, Putterman C, Bynoe M, Kowal C et al (1997) Studies on the structure, regulation, and pathogenic potential of anti-dsDNA antibodies. Methods 11(1):70–78
Sun T, Zhou B, Lai L, Pei J (2017) Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18(1):1–8
Swanson PC, Ackroyd C, Glick GD (1996) Ligand recognition by anti-DNA autoantibodies. affinity, specificity, and mode of binding. Biochemistry 35(5):1624–1633
Tanner JJ, Komissarov AA, Deutscher SL (2001) Crystal structure of an antigen-binding fragment bound to single-stranded DNA. Journal of molecular biology 314(4):807–822
Teodorescu M (2002) Clinical value of anti-ssDNA (denatured DNA) autoantibody test: beauty is in the eyes of the beholder. Clinical and Applied Immunology Reviews 2(2):115–128
Tonkovic P, Kalajdziski S, Zdravevski E, Lameski P, Corizzo R, Pires IM, Garcia NM, Loncar-Turukalo T, Trajkovik V (2020) Literature on applied machine learning in metagenomic classification: a scoping review. Biology 9(12):453
Trabelsi A, Chaabane M, Ben-Hur A (2019) Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics 35(14):i269–i277
Wang S, Guo Y, Wang Y, Sun H, Huang J (2019) SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In: Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pp 429–436
Yoon S-H, Ha S-M, Kwon S, Lim J, Kim Y, Seo H, Chun J (2017) Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. International Journal of Systematic and Evolutionary Microbiology 67(5):1613
Zhang P, Meng J, Luan Y, Liu C (2020) Plant miRNA-lncRNA interaction prediction with the ensemble of CNN and IndRNN. Interdisciplinary Sciences: Computational Life Sciences 12(1):82–89
Zhao Z, Gong X (2017) Protein-protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM transactions on computational biology and bioinformatics
Acknowledgements
The authors thank PhD candidate Paul Morris at the Center for Complex Systems and Brain Sciences for their insightful discussions on natural language processing models and data analysis.
Funding
Research was supported by the Graduate Neuroscientist Training Program and Center Complex Systems and Brain Sciences at Florida Atlantic University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Rights and permissions
About this article
Cite this article
St. Clair, R., Teti, M., Pavlovic, M. et al. Predicting residues involved in anti-DNA autoantibodies with limited neural networks. Med Biol Eng Comput 60, 1279–1293 (2022). https://doi.org/10.1007/s11517-022-02539-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-022-02539-7