Skip to main content

Prediction of Functional Effects of Protein Amino Acid Mutations

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2023)

Abstract

Human Single Amino Acid Polymorphisms (SAPs) or Single Amino Acid Variants (SAVs) usually named as nonsynonymous Single Nucleotide Variants nsSNVs) represent the most frequent type of genetic variation among the population. They originate from non-synonymous single nucleotide variations (missense variants) where a single base pair substitution alters the genetic code in such a way that it produces a different amino acid at a given position. Since mutations are commonly associated with the development of various genetic diseases, it is of utmost importance to understand and predict which variations are deleterious and which are neutral. Computational tools based on machine learning are becoming promising alternatives to tedious and highly costly mutagenic experiments. Generally, varying quality, incompleteness and inconsistencies of nsSNVs datasets degrade the usefulness of machine learning approaches. Consequently, robust and more accurate approaches are essential to address these issues. In this paper, we present the application of a consensus classifier based on the holdout sampling, which shows robust and accurate results, outperforming currently available tools. We generated 100 holdouts to sample different classifiers’ architectures and different classification variables during the training stage. The best performing holdouts were selected to construct a consensus classifier and tested by blindly utilizing a k-fold (1 ≤ k ≤ 5) cross-validation approach. We also performed an analysis of the best protein attributes for predicting the effects of nsSNVs by calculating their discriminatory power. Our results show that our method outperforms other currently available tools, and provides robust results, with small standard deviations among folds and high accuracy. The superiority of our algorithm is based on the utilization of a tree of holdouts, where different machine learning algorithms are sampled with different boundary conditions or different predictive attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sunyaev, S., Ramensky, V., Bork, P.: Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet. 16, 198–200 (2000)

    Article  CAS  PubMed  Google Scholar 

  2. Cargill, M., et al.: Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22, 231–238 (1999)

    Article  CAS  PubMed  Google Scholar 

  3. Collins, F.S., Brooks, L.D., Chakravarti, A.: A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8, 1229–1231 (1998)

    Article  CAS  PubMed  Google Scholar 

  4. Abecasis, G.R., et al.: A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

    Article  PubMed  Google Scholar 

  5. Collins, F.S., Guyer, M.S., Charkravarti, A.: Variations on a theme: cataloging human DNA sequence variation. Science 278, 1580–1581 (1997)

    Article  CAS  PubMed  Google Scholar 

  6. Risch, N., Merikangas, K.: The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996)

    Article  CAS  PubMed  Google Scholar 

  7. Studer, R.A., Dessailly, B.H., Orengo, C.A.: Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem. J. 449, 581–594 (2013)

    Article  CAS  PubMed  Google Scholar 

  8. Halushka, M.K., et al.: Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 22, 239–247 (1999)

    Article  CAS  PubMed  Google Scholar 

  9. Capriotti, E., Nehrt, N.L., Kann, M.G., Bromberg, Y.: Bioinformatics for personal genome interpretation. Brief. Bioinform. 13, 495–512 (2012)

    Article  PubMed  PubMed Central  Google Scholar 

  10. Niu, B.: Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 2016(48), 827–837 (2016)

    Article  Google Scholar 

  11. Goode, D.L., et al.: A simple consensus approach improves somatic mutation prediction accuracy. Genome Med. 5, 90 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  12. Choi, Y., Sims, G.E., Murphy, S., Miller, J.R., Chan, A.P.: Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7, e46688 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Choi, Y., Chan, A.P.: PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kumar, P., Henikoff, S., Ng, P.C.: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009)

    Article  CAS  PubMed  Google Scholar 

  15. Tang, H., Thomas, P.D.: PANTHER-PSEP: predicting disease-causing genetic variants using position-specific evolutionary preservation. Bioinformatics 32, 2230–2232 (2016)

    Article  CAS  PubMed  Google Scholar 

  16. Katsonis, P., Lichtarge, O.: A formal perturbation equation between genotype and phenotype determines the evolutionary action of protein-coding variations on fitness. Genome Res. 24, 2050–2058 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Gallion, J., et al.: Predicting phenotype from genotype: improving accuracy through more robust experimental and computational modeling. Hum. Mutat. 38, 569–580 (2017)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Schwarz, J.M., Rödelsperger, C., Schuelke, M., Seelow, D.: MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010)

    Article  CAS  PubMed  Google Scholar 

  19. Reva, B., Antipin, Y., Sander, C.: Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Adzhubei, I.A., et al.: A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Capriotti, E., et al.: WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation. BMC Genomics 14, S6 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  22. Capriotti, E., Calabrese, R., Casadio, R.: Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22, 2729–2734 (2006)

    Article  CAS  PubMed  Google Scholar 

  23. Bendl, J., et al.: PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput. Biol. 10, e1003440 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  24. Stone, E.A., Sidow, A.: Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978–986 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Miosge, L.A.: Comparison of predicted and actual consequences of missense mutations. Proc. Natl. Acad. Sci. USA 112, 189–198 (2015)

    Article  Google Scholar 

  26. Saunders, C.T., Baker, D.: Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J. Mol. Biol. 322, 891–901 (2002)

    Article  CAS  PubMed  Google Scholar 

  27. Stefl, S., Nishi, H., Petukh, M., Panchenko, A.R., Alexov, E.: Molecular mechanisms of disease-causing missense mutations. J. Mol. Biol. 425, 3919–3936 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Pires, D.E.V., Chen, J., Blundell, T.L., Ascher, D.B.: In silico functional dissection of saturation mutagenesis: interpreting the relationship between phenotypes and changes in protein stability, interactions and activity. Sci. Rep. 6, 19848 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Castaldi, P.J., Dahabreh, I.J., Ioannidis, J.P.A.: An empirical assessment of validation practices for molecular classifiers. Brief. Bioinform. 12, 189–202 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  30. Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge (2001)

    Google Scholar 

  31. Thusberg, J., Olatubosun, A., Vihinen, M.: Performance of mutation pathogenicity prediction methods on missense variants. Hum. Mutat. 32, 358–368 (2011)

    Article  PubMed  Google Scholar 

  32. Ng, P.C., Henikoff, S.: Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61–80 (2006)

    Article  CAS  PubMed  Google Scholar 

  33. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)

    Article  Google Scholar 

  34. Capriotti, E., Altman, R.B., Bromberg, Y.: Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14, S2 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  35. González-Pérez, A., López-Bigas, N.: Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score. Condel. Am. J. Hum. Genet. 88, 440–449 (2011)

    Article  PubMed  Google Scholar 

  36. The UniProt Consortium: The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008)

    Article  Google Scholar 

  37. Fernández-Martínez, J.L., Fernández-Muñiz, Z., Tompkins, M.J.: On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 77, W1–W5 (2012)

    Article  Google Scholar 

  38. Fernández-Martínez, J.L., Pallero, J.L.G., Fernández-Muñiz, Z., Pedruelo-González, L.M.: From Bayes to Tarantola: new insights to understand uncertainty in inverse problems. J. App. Geophys. 98, 62–72 (2013)

    Article  Google Scholar 

  39. Fernández-Martínez, J.L., Fernández-Muñiz, Z.: The curse of dimensionality in inverse problems. J. Comput. Appl. Math. 369, 112571 (2020)

    Article  Google Scholar 

  40. Álvarez-Machancoses, Ó., deAndrés-Galiana, J.E., Fernández-Martínez, J.L., Kloczkowski, A.: Robust prediction of single and multiple point protein mutations stability changes. Biomolecules 10, 67 (2020)

    Article  Google Scholar 

  41. Fernández-Martínez, J.L., Álvarez-Machancoses, Ó., deAndrés-Galiana, E.J., Bea, G., Kloczkowski, A.: Robust sampling of defective pathways in Alzheimer’s disease. Implications in drug repositioning. Int. J. Mol. Sci. 10, 3594 (2020)

    Article  Google Scholar 

  42. Fernández-Martínez, J.L., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Cernea, A., Kloczkowski, A.: Robust sampling of defective pathways in multiple myeloma. Int. J. Mol. Sci. 20, 4681 (2019)

    Article  PubMed  PubMed Central  Google Scholar 

  43. deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Cernea, A., Fernández-Martínez, J.L., Kloczkowski, A.: Deep neural networks for phenotype prediction in rare disease inclusion body myositis: a case study. In: Artificial Intelligence in Precision Health. From Concept to Applications (Debmalya Barth, Editor), pp. 189–202. Elsevier, Amsterdam (2020)

    Google Scholar 

  44. Álvarez-Machancoses, Ó., deAndrés-Galiana, E., Fernández-Martínez, J.L., Kloczkowski, A.: The utilization of different classifiers to perform drug repositioning in inclusion body myositis supports the concept of biological invariance. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2020. LNCS (LNAI), vol. 12415, pp. 589–598. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61401-0_55

    Chapter  Google Scholar 

  45. Efron, B., Tibshirani, R.: An Introduction to Bootstrap. Chapman & Hall, Boca Raton (1993)

    Book  Google Scholar 

  46. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  47. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  CAS  PubMed  Google Scholar 

  48. Thomas, P.D., et al.: PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Thomas, P.D., et al.: Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 34, W645–W650 (2006)

    Article  PubMed  PubMed Central  Google Scholar 

  50. Faraggi, E., Zhou, Y., Kloczkowski, A.: Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins: Struct. Funct. Bioinform. 82, 3170–3176 (2014)

    Article  CAS  Google Scholar 

  51. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence 2 (Montreal 20–25 August), pp. 1137–1145 (1995)

    Google Scholar 

  52. Fernández-Martínez, J.L., et al.: Sampling defective pathways in phenotype prediction problems via the holdout sampler. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2018. LNCS, vol. 10814, pp. 24–32. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78759-6_3

    Chapter  Google Scholar 

  53. Fernández-Muñiz, Z., Hassan, K., Fernández-Martínez, J.L.: Data kit inversion and uncertainty analysis. J. Appl. Geophys. 161, 228 (2019)

    Article  Google Scholar 

  54. Fernández-Martínez, J.L., Fernández-Muñiz, Z., Breysse, D.: The uncertainty analysis in linear and nonlinear regression revisited: application to concrete strength estimation. Inverse Probl. Sci. Eng. 27, 1740–1764 (2018)

    Google Scholar 

  55. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)

    Article  Google Scholar 

  56. Huang, G.B.: An insight into extreme learning machines: random neurons, random features and kernels. Cogn. Comput. 6, 376–390 (2014)

    Article  Google Scholar 

  57. Huang, G.B., Lei, C., Chee-Kheong, S.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17, 879–892 (2006)

    Article  PubMed  Google Scholar 

  58. Huang, G.B.: What are extreme learning machines? Filling the gap between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cogn. Comput. 7, 263–278 (2015)

    Article  Google Scholar 

  59. Huang, G.B., Hongming, Z., Xiaojian, D., Rui, Z.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. - Part B: Cybern. 42, 513–529 (2012)

    Article  Google Scholar 

  60. Ertugrul, O.F., Tagluk, M.E., Kaya, Y., Tekin, R.: EMG signal classification by extreme learning machine. In: 21st 2013 Signal Processing and Communications Applications Conference (SIU), April 24, pp. 1–4 (2013)

    Google Scholar 

  61. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Neural Networks. Proceedings of the 2004 IEEE International Joint Conference on 2004 July 25, vol. 2, pp. 985–990 (2004)

    Google Scholar 

  62. Ho, T.K.: Random decision forest. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition (Montreal) 14–16, pp. 278–282 (1995)

    Google Scholar 

Download references

Acknowledgment

AK acknowledges the financial support from NSF grant DBI 1661391, and NIH grants R01GM127701, and R01HG012117.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrzej Kloczkowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Álvarez-Machancoses, Ó., Faraggi, E., de Andrés-Galiana, E.J., Fernández-Martínez, J.L., Kloczkowski, A. (2023). Prediction of Functional Effects of Protein Amino Acid Mutations. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34960-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34959-1

  • Online ISBN: 978-3-031-34960-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics