Skip to main content

Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2017)

Abstract

The Human Phenotype Ontology (HPO) provides a standard categorization of the phenotypic abnormalities encountered in human diseases and of the semantic relationship between them. Quite surprisingly the problem of the automated prediction of the association between genes and abnormal human phenotypes has been widely overlooked, even if this issue represents an important step toward the characterization of gene-disease associations, especially when no or very limited knowledge is available about the genetic etiology of the disease under study. We present a novel ensemble method able to capture the hierarchical relationships between HPO terms, and able to improve existing hierarchical ensemble algorithms by explicitly considering the predictions of the descendant terms of the ontology. In this way the algorithm exploits the information embedded in the most specific ontology terms that closely characterize the phenotypic information associated with each human gene. Genome-wide results obtained by integrating multiple sources of information show the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amberger, J., Bocchini, C., Amosh, A.: A new face and new challenges for online mendelian inheritance in man (OMIM). Hum. Mutat. 32, 564–7 (2011)

    Article  Google Scholar 

  2. Ashburner, M., et al.: Creating the gene ontology resource: design and implementation. Genome Res. 11(8), 1425–1433 (2001)

    Article  Google Scholar 

  3. Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003)

    Article  Google Scholar 

  4. Chatr-Aryamontri, A., et al.: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, 816–823 (2013)

    Article  Google Scholar 

  5. Cormen, T., Leiserson, C., Rivest, R.L., Stein, S.: Introduction to Algorithms. MIT Press, Boston (2009)

    MATH  Google Scholar 

  6. Franceschini, A., et al.: STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, 808–815 (2013)

    Article  Google Scholar 

  7. Goldstein, B., Polley, E., Briggs, F.: Random forests for genetic association studies. Stat. Appl. Genet. Mol. Biol. 10(1) (2011). https://doi.org/10.2202/1544-6115.1691

  8. Jiang, Y., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17, 184 (2016)

    Article  Google Scholar 

  9. Kohler, S., Vasilevsky, N., Engelstad, M., et al.: The human phenotype ontology in 2017. Nucleic Acids Res. 45, D865 (2017)

    Article  Google Scholar 

  10. Moreau, Y., Tranchevent, L.: Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Rev. Genet. 13, 523–536 (2012)

    Article  Google Scholar 

  11. Notaro, M., Schubach, M., Robinson, P.N., Valentini, G.: Prediction of human phenotype ontology terms by means of hierarchical ensemble methods. BMC Bioinform. 18(1), 449:1–449:18 (2017). http://dblp.uni-trier.de/db/journals/bmcbi/bmcbi18.html#NotaroSRV17

  12. Re, M., Mesiti, M., Valentini, G.: A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 9, 1812–1818 (2012)

    Article  Google Scholar 

  13. Robinson, P.N., Frasca, M., Köhler, S., Notaro, M., Re, M., Valentini, G.: A hierarchical ensemble method for DAG-structured taxonomies. In: Schwenker, F., Roli, F., Kittler, J. (eds.) MCS 2015. LNCS, vol. 9132, pp. 15–26. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20248-8_2

    Chapter  Google Scholar 

  14. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10, 1–21 (2015)

    Google Scholar 

  15. Schubach, M., Re, M., Robinson, P., Valentini, G.: Imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants. Sci. Rep. 7(2959) (2017). https://doi.org/10.1038/s41598-017-03011-5

  16. Smedley, D., et al.: A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016)

    Article  Google Scholar 

  17. Valentini, G.: True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 832–847 (2011)

    Article  Google Scholar 

  18. Valentini, G., Armano, G., Frasca, M., Lin, J., Mesiti, M., Re, M.: RANKS: a flexible tool for node label ranking and classification in biological networks. Bioinformatics 32, 2872 (2016)

    Article  Google Scholar 

  19. Valentini, G., Köhler, S., Re, M., Notaro, M., Robinson, P.N.: Prediction of human gene - phenotype associations by exploiting the hierarchical structure of the human phenotype ontology. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 66–77. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16483-0_7

    Chapter  Google Scholar 

  20. Valentini, G., Paccanaro, A., Caniza, H., Romero, A., Re, M.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif. Intell. Med. 61, 63–78 (2014)

    Article  Google Scholar 

  21. Wang, P., et al.: Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS ONE 8, 1–8 (2013)

    Article  Google Scholar 

  22. Zemojtel, T., et al.: Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci. Transl. Med. 6, 252ra123 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge partial support from the project “Discovering Patterns in Multi-Dimensional Data” (2016–2017) funded by Università degli Studi di Milano.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Notaro, M., Schubach, M., Frasca, M., Mesiti, M., Robinson, P.N., Valentini, G. (2019). Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14160-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14159-2

  • Online ISBN: 978-3-030-14160-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics