How to Compare Various Clustering Outcomes? Metrices to Investigate Breast Cancer Patient Subpopulations Based on Proteomic Profiles

Tobiasz, Joanna; Polanska, Joanna

doi:10.1007/978-3-031-07802-6_26

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13347))

Included in the following conference series:

International Work-Conference on Bioinformatics and Biomedical Engineering

664 Accesses

Abstract

Breast cancer is a highly diverse disease. With the state-of-the-art methods of molecular studies, novel subgroups of breast cancer can be revealed. The proper identification of subtypes is crucial for treatment choice. Hence, further investigation of breast cancer subtypes is promising in terms of therapy tailoring. We applied various machine learning approaches to the set of protein level measurements to detect subpopulations of breast cancer patients. Those methods involved various dimensionality reduction techniques combined with clustering. The outcomes of those approaches depended on the algorithms involved and on their parameters. Hence, we proposed the methodology to compare the results of clustering algorithms when the proper number of groups is unknown. The used metrices based on the effect size measurements and allowed for the selection of the best machine learning approach. The values of the proposed pooled d measure varied from 1.6847 for the worst method to 2.0568 for the best one. The highest value was obtained for the custom DiviK approach. Potentially, the metrices can also serve for the proteomic characterization of differences between subtypes and the identification of novel biomarkers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sørlie, T., et al.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. 98(19), 10869–10874 (2001)
Article PubMed PubMed Central Google Scholar
Parker, J.S., et al.: Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27(8), 1160 (2009)
Article PubMed PubMed Central Google Scholar
Berger, A.C., et al.: A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33(4), 690–705 (2018)
Article CAS PubMed PubMed Central Google Scholar
Koboldt, D.C.F.R., et al.: Comprehensive molecular portraits of human breast tumours. Nature 490(7418), 61–70 (2012)
Article CAS Google Scholar
Leek, J.T., et al.: sva: Surrogate Variable Analysis. R package version 3.38.0. (2020)
Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Chapter Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)
Article Google Scholar
Mrukwa, G., Polanska, J.: DiviK: divisive intelligent K-means for hands-free unsupervised clustering in biological big data. arXiv preprint arXiv:2009.10706 (2020)
McInnes, L., Healy, J., Melville, J.: Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Marczyk, M., Jaksik, R., Polanski, A., Polanska, J.: Gamred—Adaptive filtering of high-throughput biological data. IEEE/ACM Trans. Comput. Biol. Bioinf. 17(1), 149–157 (2018)
Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Earlbaum Associates, New York (1988)
Google Scholar
Sawilowsky, S.S.: New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8(2), 26 (2009)
Article Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2017)
Article CAS PubMed Google Scholar

Download references

Acknowledgment

This study is supported by European Social Fund grant no. POWR.03.02.00-00-I029 [JT] and Silesian University of Technology grant no. 02/070/BK_22/0033 for Support and Development of Research Potential [JP]. The results published here are based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

Author information

Authors and Affiliations

Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Joanna Tobiasz & Joanna Polanska
Department of Graphics, Computer Vision and Digital Systems, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Joanna Tobiasz

Authors

Joanna Tobiasz
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Polanska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Joanna Tobiasz or Joanna Polanska .

Editor information

Editors and Affiliations

Marcelina Siebold Guest Relations Dept., University of Granada, Granada, Spain
Ignacio Rojas
Faculty of Sciences, University of Granada, Granada, Spain
Olga Valenzuela
ETSIIT. CITIC-UGR, University of Granada, Granada, Spain
Fernando Rojas
ETSIIT, University of Granada, Granada, Spain
Luis Javier Herrera
University of Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tobiasz, J., Polanska, J. (2022). How to Compare Various Clustering Outcomes? Metrices to Investigate Breast Cancer Patient Subpopulations Based on Proteomic Profiles. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-07802-6_26
Published: 08 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07801-9
Online ISBN: 978-3-031-07802-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics