Skip to main content

Advertisement

Log in

Predicting residues involved in anti-DNA autoantibodies with limited neural networks

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Computer-aided rational vaccine design (RVD) and synthetic pharmacology are rapidly developing fields that leverage existing datasets for developing compounds of interest. Computational proteomics utilizes algorithms and models to probe proteins for functional prediction. A potentially strong target for computational approach is autoimmune antibodies, which are the result of broken tolerance in the immune system where it cannot distinguish “self” from “non-self” resulting in attack of its own structures (proteins and DNA, mainly). The information on structure, function, and pathogenicity of autoantibodies may assist in engineering RVD against autoimmune diseases. Current computational approaches exploit large datasets curated with extensive domain knowledge, most of which include the need for many resources and have been applied indirectly to problems of interest for DNA, RNA, and monomer protein binding. We present a novel method for discovering potential binding sites. We employed long short-term memory (LSTM) models trained on FASTA primary sequences to predict protein binding in DNA-binding hydrolytic antibodies (abzymes). We also employed CNN models applied to the same dataset for comparison with LSTM. While the CNN model outperformed the LSTM on the primary task of binding prediction, analysis of internal model representations of both models showed that the LSTM models recovered sub-sequences that were strongly correlated with sites known to be involved in binding. These results demonstrate that analysis of internal processes of LSTM models may serve as a powerful tool for primary sequence analysis.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature Biotechnology 33(8):831–838

    Article  CAS  PubMed  Google Scholar 

  2. Aotsuka S (1988) A kit for the simultaneous estimation of IgG-class antibodies to double-stranded and single-stranded DNA for clinical purposes. The Ryumachi 28:96–101

    CAS  PubMed  Google Scholar 

  3. Beckingham JA, Cleary J, Bobeck M, Glick GD (2003) Kinetic analysis of sequence-specific recognition of ssDNA by an autoantibody. Biochemistry 42(14):4118–4126

    Article  CAS  PubMed  Google Scholar 

  4. Berikov V (2020) Autoencoder-based low-rank spectral ensemble clustering of biological data. In: 2020 Cognitive sciences, genomics and bioinformatics (CSGB). IEEE, pp 43–46

  5. Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H (2000) I. 443 n. Shindyalov, and PE Bourne, 235–242

  6. Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv:1412.3555

  7. Consortium M, Consortium (2019) Uniprot: a worldwide hub of protein knowledge. Nucleic Acids Research 47(D1):D506–D515

    Article  Google Scholar 

  8. Gu D, Zhou Y, Kallhoff V, Baban B, Tanner JJ, Becker DF (2004) Identification and characterization of the DNA-binding domain of the multifunctional PutA flavoenzyme. Journal of Biological Chemistry 279(30):31171–31176

    Article  CAS  Google Scholar 

  9. Herron JN, He X, Ballard D, Blier P, Pace P, Bothwell A, Voss E Jr, Edmundson A (1991) An autoantibody to single-stranded DNA: comparison of the three-dimensional structures of the unliganded Fab and a deoxynucleotide–Fab complex. Proteins: Structure, Function, and Bioinformatics 11(3):159–175

    Article  CAS  Google Scholar 

  10. Hochreiter S, Schmidhuber J (1997) LSTM can solve hard long time lag problems. In: Advances in neural information processing systems, pp 473–479

  11. Hou T, Chen K, McLaughlin WA, Lu B, Wang W (2006) Computational analysis and prediction of the binding motif and protein interacting partners of the Abl SH3 domain. PLoS Comput Biol 2(1):e1

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kaufmann J, Asalone K, Corizzo R, Saldanha C, Bracht J, Japkowicz N (2020) One-class ensembles for rare genomic sequences identification. In: International conference on discovery science. Springer, pp 340–354

  13. Kong Y, Yu T (2020) forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction. Bioinformatics 36(11):3507–3515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kozyr A (1996) A novel method for purification of catalytic antibodies toward DNA from sera of patients with lymphoproliferative diseases. IUBMB Life 39(2):403–413

    Article  CAS  Google Scholar 

  15. Liu J, Gong X (2019) Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinformatics 20(1):609

    Article  PubMed  PubMed Central  Google Scholar 

  16. Liu X (2017) Deep recurrent neural network for protein function prediction from sequence. arXiv:1701.08318

  17. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Briefings in Bioinformatics 18(5):851–869

    PubMed  Google Scholar 

  18. Mooney C, Pollastri G, Shields DC, Haslam NJ (2012) Prediction of short linear protein binding regions. Journal of Molecular Biology 415(1):193–204

    Article  CAS  PubMed  Google Scholar 

  19. Nielsen M, Lundegaard C, Lund O (2007) Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 8(1):238

    Article  PubMed  PubMed Central  Google Scholar 

  20. Ou Z, Bottoms CA, Henzl MT, Tanner JJ (2007) Impact of DNA hairpin folding energetics on antibody-ssDNA association. Journal of Molecular Biology 374(4):1029–1040

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Pan X, Rijnbeek P, Yan J, Shen H-B (2018) Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics 19(1):511

    Article  PubMed  PubMed Central  Google Scholar 

  22. Pan X, Shen H-B (2017) RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics 18(1):136

    Article  PubMed  PubMed Central  Google Scholar 

  23. Paul M, Rachel SC, William EH, Elan B (2020) Predicting binding from screening assays with transformer network embeddings. Journal of Chemical Information and Modeling

  24. Pavlovic M (2009) The role of anti-DNA antibodies in systemic lupus erythematosus (SLE): ranges and perspectives. Rheumatic Disease Clinics of North America

  25. Pavlovic M, Chen R, Kats AM, Cavallo MF, Saccocio S, Keating P, Hartmann JX (2007) Highly specific novel method for isolation and purification of lupus anti-DNA antibody via oligo-(dT) magnetic beads. Annals of the New York Academy of Sciences 1108(1):203–217

    Article  CAS  PubMed  Google Scholar 

  26. Pavlovic M, Kats A, Cavallo M, Shoenfeld Y (2010) Clinical and molecular evidence for association of SLE with parvovirus B19. Lupus 19:7

    Article  Google Scholar 

  27. Pietrokovski S, Henikoff S (1997) A helix-turn-helix DNA-binding motif predicted for transposases of DNA transposons. Molecular and General Genetics MGG 254(6):689–695

    Article  CAS  PubMed  Google Scholar 

  28. Qu Y-H, Yu H, Gong X-J, Xu J-H, Lee H-S (2017) On the prediction of DNA-binding proteins only from primary sequences: a deep learning approach. Plos One 12(12):1–18

    Article  Google Scholar 

  29. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training

  30. Rives A, Goyal S, Meier J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R (2019) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. aRxiv:622803

  31. Rodkey L, Gololobov G, Rumbley C, Rumbley J, Schourov D, Makarevich O, Gabibov A, Voss E (2000) DNA hydrolysis by monoclonal autoantibody BV 04-01. Applied Biochemistry and Biotechnology 83(1–3):95–105

    Article  CAS  PubMed  Google Scholar 

  32. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) ImageNet large scale visual recognition challenge. International journal of Computer Vision 115(3):211–252

    Article  Google Scholar 

  33. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792):706–710

    Article  CAS  PubMed  Google Scholar 

  34. Spatz L, Iliev A, Saenko V, Jones L, Irigoyen M, Manheimer-Lory A, Gaynor B, Putterman C, Bynoe M, Kowal C et al (1997) Studies on the structure, regulation, and pathogenic potential of anti-dsDNA antibodies. Methods 11(1):70–78

    Article  CAS  PubMed  Google Scholar 

  35. Sun T, Zhou B, Lai L, Pei J (2017) Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18(1):1–8

    Article  CAS  Google Scholar 

  36. Swanson PC, Ackroyd C, Glick GD (1996) Ligand recognition by anti-DNA autoantibodies. affinity, specificity, and mode of binding. Biochemistry 35(5):1624–1633

    Article  CAS  PubMed  Google Scholar 

  37. Tanner JJ, Komissarov AA, Deutscher SL (2001) Crystal structure of an antigen-binding fragment bound to single-stranded DNA. Journal of molecular biology 314(4):807–822

    Article  CAS  PubMed  Google Scholar 

  38. Teodorescu M (2002) Clinical value of anti-ssDNA (denatured DNA) autoantibody test: beauty is in the eyes of the beholder. Clinical and Applied Immunology Reviews 2(2):115–128

    Article  CAS  Google Scholar 

  39. Tonkovic P, Kalajdziski S, Zdravevski E, Lameski P, Corizzo R, Pires IM, Garcia NM, Loncar-Turukalo T, Trajkovik V (2020) Literature on applied machine learning in metagenomic classification: a scoping review. Biology 9(12):453

    Article  PubMed Central  Google Scholar 

  40. Trabelsi A, Chaabane M, Ben-Hur A (2019) Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics 35(14):i269–i277

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Wang S, Guo Y, Wang Y, Sun H, Huang J (2019) SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In: Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pp 429–436

  42. Yoon S-H, Ha S-M, Kwon S, Lim J, Kim Y, Seo H, Chun J (2017) Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. International Journal of Systematic and Evolutionary Microbiology 67(5):1613

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Zhang P, Meng J, Luan Y, Liu C (2020) Plant miRNA-lncRNA interaction prediction with the ensemble of CNN and IndRNN. Interdisciplinary Sciences: Computational Life Sciences 12(1):82–89

    CAS  Google Scholar 

  44. Zhao Z, Gong X (2017) Protein-protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM transactions on computational biology and bioinformatics

Download references

Acknowledgements

The authors thank PhD candidate Paul Morris at the Center for Complex Systems and Brain Sciences for their insightful discussions on natural language processing models and data analysis.

Funding

Research was supported by the Graduate Neuroscientist Training Program and Center Complex Systems and Brain Sciences at Florida Atlantic University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachel St. Clair.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

St. Clair, R., Teti, M., Pavlovic, M. et al. Predicting residues involved in anti-DNA autoantibodies with limited neural networks. Med Biol Eng Comput 60, 1279–1293 (2022). https://doi.org/10.1007/s11517-022-02539-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-022-02539-7

Keywords