Skip to main content

Data Integration Strategy for Robust Classification of Biomedical Data

  • Conference paper
  • First Online:
Trends and Innovations in Information Systems and Technologies (WorldCIST 2020)

Abstract

This paper presents the protocol for integration of data coming from two most common types of biological data (clinical and molecular) for more effective classification patients with cancer disease. In this protocol, the identification of the most informative features is performed by using statistical and information-theory based selection methods for molecular data and the Boruta algorithm for clinical data. Predictive models are built with the help of the Random Forest classification algorithm. The process of data integration includes combining the most informative clinical features and the synthetic features obtained from genetic marker models as input variables for classifier algorithms.

We applied this classification protocol to METABRIC breast cancer samples. Clinical data, gene expression data and somatic copy number aberrations data were used for clinical endpoint prediction. We tested the various methods for combining from different dataset information. Our research shows that both types of molecular data contain features that relevant for clinical endpoint prediction. The best model was obtained by using ten clinical and two synthetic features obtained from biomarker models. In the examined cases, the type of filtration molecular markers had a small impact the predictive power of models even though the lists of top informative biomarkers are divergent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burke, H.: Biomark. Cancer 8, 89–99 (2016)

    Article  Google Scholar 

  2. Lu, R., Tang, R., Huang, J.: Clinical application of molecular features in therapeutic selection and drug development. In: Fang, L., Su, C. (eds.) Statistical Methods in Biomarker and Early Clinical Development, pp. 137–166. Springer, Cham (2019)

    Chapter  Google Scholar 

  3. Yang, Z., et al.: Sci. Rep. 9(1), 13504 (2019)

    Article  MathSciNet  Google Scholar 

  4. Xu, C., Jackson, S.: Genome Biol. 20(1), 76 (2019)

    Article  Google Scholar 

  5. de Maturana, E.L., et al.: Genes 10(3), 238 (2019)

    Article  Google Scholar 

  6. Zitnik, M., et al.: Inf. Fusion 50, 71–91 (2019)

    Article  Google Scholar 

  7. Gevaert, O., et al.: IFAC Proc. Vol. 39(1), 1174 (2006)

    Article  Google Scholar 

  8. Daemen, A., et al.: Proceedings of the 29th Annual International Conference of IEEE Engineering in Medicine and Biology Society (EMBC 2007), pp. 5411–5415 (2007)

    Google Scholar 

  9. Boulesteix, A., et al.: Bioinformatics 24, 1698–1706 (2008)

    Article  Google Scholar 

  10. van Vliet, M., et al.: PLoS ONE 7, e40385 (2012)

    Article  Google Scholar 

  11. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2017). https://www.R-project.org/

  12. Gentleman, R., et al.: Genome Biol. 5(10), R80 (2004)

    Article  Google Scholar 

  13. Pereira, B., et al.: Nat. Commun. 7, 11479 (2016)

    Article  Google Scholar 

  14. Gentleman, R., et al.: Genefilter: methods for filtering genes from high-throughput experiments. R package version 1.60.0 (2017)

    Google Scholar 

  15. BD Biosciences: Robust Statistics in BD FACSDiva Software. https://www.bdbiosciences.com/documents/Robust_Statistics_in_BDFACSDiva.pdf. Accessed 16 Jan 2019

  16. Margolin, A., et al.: Sci. Transl. Med. 5(181), 181re1 (2013)

    Article  Google Scholar 

  17. Welch, B.: Biometrika 34(1/2), 28 (1947)

    Article  MathSciNet  Google Scholar 

  18. Mnich, K., Rudnicki, W.R.: All-relevant feature selection using multidimensional filters with exhaustive search. Inf. Sci. 524, 277–297 (2020)

    Article  MathSciNet  Google Scholar 

  19. Piliszek, R., et al.: R J. 11(1), 2073 (2019)

    Article  Google Scholar 

  20. Jović, A., et al.: 2015 38th International Convention on Information and Communication Technology Electronics and Microelectronics (MIPRO), vol. 112, no. 103375, p. 1200 (2015)

    Google Scholar 

  21. Hochberg, Y.: Biometrika 75(4), 800 (1988)

    Article  MathSciNet  Google Scholar 

  22. Carvajal-Rodriguez, A., et al.: BMC Bioinform. 10, 209 (2009)

    Article  Google Scholar 

  23. Kursa, M., et al.: Fund. Inform. 101(4), 271 (2010)

    Article  MathSciNet  Google Scholar 

  24. Kursa, M., Rudnicki, W.R.: J. Stat. Softw. 36(11), 1 (2010)

    Article  Google Scholar 

  25. Breiman, L.: Mach. Learn. 45, 5 (2001)

    Article  Google Scholar 

  26. Andy, L., Wiener, M.: R News 2(3), 18 (2002)

    Google Scholar 

  27. Fernández-Delgado, M., et al.: J. Mach. Learn. Res. 15(1), 3133 (2014)

    MathSciNet  Google Scholar 

  28. Matthews, B.: Biochim. Biophys. Acta 405(2), 442 (1975)

    Article  Google Scholar 

  29. Dessi, N., et al.: BioMed Res. Int. 2013(387673), 1 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aneta Polewko-Klim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Polewko-Klim, A., Rudnicki, W.R. (2020). Data Integration Strategy for Robust Classification of Biomedical Data. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S., Orovic, I., Moreira, F. (eds) Trends and Innovations in Information Systems and Technologies. WorldCIST 2020. Advances in Intelligent Systems and Computing, vol 1160. Springer, Cham. https://doi.org/10.1007/978-3-030-45691-7_56

Download citation

Publish with us

Policies and ethics