Evaluating Correlation Coefficients for Clustering Gene Expression Profiles of Cancer

Jaskowiak, Pablo A.; Campello, Ricardo J. G. B.; Costa, Ivan G.

doi:10.1007/978-3-642-31927-3_11

Pablo A. Jaskowiak²¹,
Ricardo J. G. B. Campello²¹ &
Ivan G. Costa²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7409))

Included in the following conference series:

Brazilian Symposium on Bioinformatics

1253 Accesses

Abstract

Cluster analysis is usually the first step adopted to unveil information from gene expression data. One of its common applications is the clustering of cancer samples, associated with the detection of previously unknown cancer subtypes. Although guidelines have been established concerning the choice of appropriate clustering algorithms, little attention has been given to the subject of proximity measures. Whereas the Pearson correlation coefficient appears as the de facto proximity measure in this scenario, no comprehensive study analyzing other correlation coefficients as alternatives to it has been conducted. Considering such facts, we evaluated five correlation coefficients (along with Euclidean distance) regarding the clustering of cancer samples. Our evaluation was conducted on 35 publicly available datasets covering both (i) intrinsic separation ability and (ii) clustering predictive ability of the correlation coefficients. Our results support that correlation coefficients rarely considered in the gene expression literature may provide competitive results to more generally employed ones. Finally, we show that a recently introduced measure arises as a promising alternative to the commonly employed Pearson, providing competitive and even superior results to it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Biostatistics Methods in Cancer Research: Cluster Analysis of Gene Expression Data

Band-based similarity indices for gene expression classification and clustering

Article Open access 03 November 2021

SGAClust: Semi-supervised Graph Attraction Clustering of gene expression data

Article 21 June 2022

References

D’haeseleer, P.: How does gene expression clustering work? Nature Biotechnology 23, 1499–1501 (2005)
Article Google Scholar
Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)
Article Google Scholar
Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96(12), 6745–6750 (1999)
Article Google Scholar
Alizadeh, A.A., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Article Google Scholar
Ramaswamy, S., Ross, K.N., Lander, E.S., Golub, T.R.: A molecular signature of metastasis in primary solid tumors. Nature Genetics 33(1), 49–54 (2003)
Article Google Scholar
Lapointe, J., et al.: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences 101(3), 811–816 (2004)
Article Google Scholar
Pirooznia, M., Yang, J., Yang, M.Q., Deng, Y.: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 9(suppl. 1), S13 (2008)
Google Scholar
Souto, M., Costa, I., de Araujo, D., Ludermir, T., Schliep, A.: Clustering cancer gene expression data: A comparative study. BMC Bioinformatics 9(1), 497 (2008)
Article Google Scholar
Freyhult, E., Landfors, M., Onskog, J., Hvidsten, T., Ryden, P.: Challenges in microarray class discovery: A comprehensive examination of normalization, gene selection and clustering. BMC Bioinformatics 11(1), 503 (2010)
Article Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)
MATH Google Scholar
Brazma, A., Vilo, J.: Gene expression data analysis. FEBS Letters 480(1), 17–24 (2000)
Article Google Scholar
Steuer, R., Kurths, J., Daub, C.O., Weise, J., Selbig, J.: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18(suppl. 2), S231–S240 (2002)
Google Scholar
Priness, I., Maimon, O., Ben-Gal, I.: Evaluation of gene-expression clustering via mutual information distance measure. BMC Bioinformatics 8(1), 111 (2007)
Article Google Scholar
Giancarlo, R., Lo Bosco, G., Pinello, L.: Distance Functions, Clustering Algorithms and Microarray Data Analysis. In: Blum, C., Battiti, R. (eds.) LION 4. LNCS, vol. 6073, pp. 125–138. Springer, Heidelberg (2010)
Chapter Google Scholar
Souto, M.C.P., de Araujo, D.S.A., Costa, I.G., Soares, R.G.F., Ludermir, T.B., Schliep, A.: Comparative study on normalization procedures for cluster analysis of gene expression datasets. In: IJCNN, Hong Kong, China, pp. 2792–2798. IEEE (2008)
Google Scholar
Boyack, K.W., et al.: Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3), e18029 (2011)
Google Scholar
Jaskowiak, P.A., Campello, R.J.G.B., Covões, T.F., Hruschka, E.R.: A comparative study on the use of correlation coefficients for redundant feature elimination. In: 11th Brazilian Symposium on Neural Networks, São Paulo - Brazil, pp. 13–18 (2010)
Google Scholar
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: Identification and analysis of coexpressed genes. Genome Res. 9(11), 1106–1115 (1999)
Article Google Scholar
Loganantharaj, R., Cheepala, S., Clifford, J.: Metric for measuring the effectiveness of clustering of DNA microarray expression. BMC Bioinformatics 7, S5 (2006)
Google Scholar
Gentleman, R., Ding, B., Dudoit, S., Ibrahim, J.: Distance measures in DNA microarray data analysis. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 189–208. Springer, New York (2005)
Chapter Google Scholar
Giancarlo, R., Lo Bosco, G., Pinello, L., Utro, F.: The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. In: Rizzo, R., Lisboa, P.J.G. (eds.) CIBB 2010. LNCS, vol. 6685, pp. 13–30. Springer, Heidelberg (2011)
Chapter Google Scholar
Jaskowiak, P.A., Campello, R.J.G.B.: Comparing correlation coefficients as dissimilarity measures for cancer classification in gene expression data. In: 6th Brazilian Symposium on Bioinformatics, Brasília - Brazil, pp. 1–8 (2011)
Google Scholar
Pearson, K.: Contributions to the mathematical theory of evolution. iii. Regression, heredity, and panmixia. P. Roy. Soc. Lond. A Mat. 59, 69–71 (1895)
Article Google Scholar
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 100(3/4), 441–471 (1904)
Article Google Scholar
Kendall, M.G.: Rank Correlation Methods, 4th edn. Griffin, London (1970)
MATH Google Scholar
Campello, R.J.G.B., Hruschka, E.R.: On comparing two sequences of numbers and its applications to clustering analysis. Inform. Sciences 179(8), 1025–1039 (2009)
Article MathSciNet MATH Google Scholar
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
Article Google Scholar
Steinley, D.: K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology 59, 1–34 (2006)
Article MathSciNet Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)
Article MATH Google Scholar
Möller-Levet, C.S., Klawonn, F., Cho, K.H., Yin, H., Wolkenhauer, O.: Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems 152(1), 49–66 (2005)
Article MathSciNet MATH Google Scholar
Son, Y.S., Baek, J.: A modified correlation coefficient based similarity measure for clustering time-course gene expression data. Pattern Recognition Letters 29(3), 232–242 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
Pablo A. Jaskowiak & Ricardo J. G. B. Campello
Center of Informatics, Federal University of Pernambuco, Recife, Brazil
Ivan G. Costa

Authors

Pablo A. Jaskowiak
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo J. G. B. Campello
View author publications
You can also search for this author in PubMed Google Scholar
Ivan G. Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidade Federal de Pernambuco, Recife, Brazil
Marcilio C. de Souto
Department of Biological Sciences, University of Maryland/Baltimore County, Baltimore, MD, USA
Maricel G. Kann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaskowiak, P.A., Campello, R.J.G.B., Costa, I.G. (2012). Evaluating Correlation Coefficients for Clustering Gene Expression Profiles of Cancer. In: de Souto, M.C., Kann, M.G. (eds) Advances in Bioinformatics and Computational Biology. BSB 2012. Lecture Notes in Computer Science(), vol 7409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31927-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-31927-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31926-6
Online ISBN: 978-3-642-31927-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics