Skip to main content

Evaluating Correlation Coefficients for Clustering Gene Expression Profiles of Cancer

  • Conference paper
Book cover Advances in Bioinformatics and Computational Biology (BSB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7409))

Included in the following conference series:

Abstract

Cluster analysis is usually the first step adopted to unveil information from gene expression data. One of its common applications is the clustering of cancer samples, associated with the detection of previously unknown cancer subtypes. Although guidelines have been established concerning the choice of appropriate clustering algorithms, little attention has been given to the subject of proximity measures. Whereas the Pearson correlation coefficient appears as the de facto proximity measure in this scenario, no comprehensive study analyzing other correlation coefficients as alternatives to it has been conducted. Considering such facts, we evaluated five correlation coefficients (along with Euclidean distance) regarding the clustering of cancer samples. Our evaluation was conducted on 35 publicly available datasets covering both (i) intrinsic separation ability and (ii) clustering predictive ability of the correlation coefficients. Our results support that correlation coefficients rarely considered in the gene expression literature may provide competitive results to more generally employed ones. Finally, we show that a recently introduced measure arises as a promising alternative to the commonly employed Pearson, providing competitive and even superior results to it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D’haeseleer, P.: How does gene expression clustering work? Nature Biotechnology 23, 1499–1501 (2005)

    Article  Google Scholar 

  2. Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)

    Article  Google Scholar 

  3. Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  4. Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  5. Alizadeh, A.A., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)

    Article  Google Scholar 

  6. Ramaswamy, S., Ross, K.N., Lander, E.S., Golub, T.R.: A molecular signature of metastasis in primary solid tumors. Nature Genetics 33(1), 49–54 (2003)

    Article  Google Scholar 

  7. Lapointe, J., et al.: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences 101(3), 811–816 (2004)

    Article  Google Scholar 

  8. Pirooznia, M., Yang, J., Yang, M.Q., Deng, Y.: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 9(suppl. 1), S13 (2008)

    Google Scholar 

  9. Souto, M., Costa, I., de Araujo, D., Ludermir, T., Schliep, A.: Clustering cancer gene expression data: A comparative study. BMC Bioinformatics 9(1), 497 (2008)

    Article  Google Scholar 

  10. Freyhult, E., Landfors, M., Onskog, J., Hvidsten, T., Ryden, P.: Challenges in microarray class discovery: A comprehensive examination of normalization, gene selection and clustering. BMC Bioinformatics 11(1), 503 (2010)

    Article  Google Scholar 

  11. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)

    Article  Google Scholar 

  12. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)

    MATH  Google Scholar 

  13. Brazma, A., Vilo, J.: Gene expression data analysis. FEBS Letters 480(1), 17–24 (2000)

    Article  Google Scholar 

  14. Steuer, R., Kurths, J., Daub, C.O., Weise, J., Selbig, J.: The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18(suppl. 2), S231–S240 (2002)

    Google Scholar 

  15. Priness, I., Maimon, O., Ben-Gal, I.: Evaluation of gene-expression clustering via mutual information distance measure. BMC Bioinformatics 8(1), 111 (2007)

    Article  Google Scholar 

  16. Giancarlo, R., Lo Bosco, G., Pinello, L.: Distance Functions, Clustering Algorithms and Microarray Data Analysis. In: Blum, C., Battiti, R. (eds.) LION 4. LNCS, vol. 6073, pp. 125–138. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Souto, M.C.P., de Araujo, D.S.A., Costa, I.G., Soares, R.G.F., Ludermir, T.B., Schliep, A.: Comparative study on normalization procedures for cluster analysis of gene expression datasets. In: IJCNN, Hong Kong, China, pp. 2792–2798. IEEE (2008)

    Google Scholar 

  18. Boyack, K.W., et al.: Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3), e18029 (2011)

    Google Scholar 

  19. Jaskowiak, P.A., Campello, R.J.G.B., Covões, T.F., Hruschka, E.R.: A comparative study on the use of correlation coefficients for redundant feature elimination. In: 11th Brazilian Symposium on Neural Networks, São Paulo - Brazil, pp. 13–18 (2010)

    Google Scholar 

  20. Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: Identification and analysis of coexpressed genes. Genome Res. 9(11), 1106–1115 (1999)

    Article  Google Scholar 

  21. Loganantharaj, R., Cheepala, S., Clifford, J.: Metric for measuring the effectiveness of clustering of DNA microarray expression. BMC Bioinformatics 7, S5 (2006)

    Google Scholar 

  22. Gentleman, R., Ding, B., Dudoit, S., Ibrahim, J.: Distance measures in DNA microarray data analysis. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 189–208. Springer, New York (2005)

    Chapter  Google Scholar 

  23. Giancarlo, R., Lo Bosco, G., Pinello, L., Utro, F.: The Three Steps of Clustering in the Post-Genomic Era: A Synopsis. In: Rizzo, R., Lisboa, P.J.G. (eds.) CIBB 2010. LNCS, vol. 6685, pp. 13–30. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Jaskowiak, P.A., Campello, R.J.G.B.: Comparing correlation coefficients as dissimilarity measures for cancer classification in gene expression data. In: 6th Brazilian Symposium on Bioinformatics, Brasília - Brazil, pp. 1–8 (2011)

    Google Scholar 

  25. Pearson, K.: Contributions to the mathematical theory of evolution. iii. Regression, heredity, and panmixia. P. Roy. Soc. Lond. A Mat. 59, 69–71 (1895)

    Article  Google Scholar 

  26. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 100(3/4), 441–471 (1904)

    Article  Google Scholar 

  27. Kendall, M.G.: Rank Correlation Methods, 4th edn. Griffin, London (1970)

    MATH  Google Scholar 

  28. Campello, R.J.G.B., Hruschka, E.R.: On comparing two sequences of numbers and its applications to clustering analysis. Inform. Sciences 179(8), 1025–1039 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)

    Article  MATH  Google Scholar 

  30. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)

    Article  Google Scholar 

  31. Steinley, D.: K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology 59, 1–34 (2006)

    Article  MathSciNet  Google Scholar 

  32. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  33. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)

    Article  MATH  Google Scholar 

  34. Möller-Levet, C.S., Klawonn, F., Cho, K.H., Yin, H., Wolkenhauer, O.: Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems 152(1), 49–66 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  35. Son, Y.S., Baek, J.: A modified correlation coefficient based similarity measure for clustering time-course gene expression data. Pattern Recognition Letters 29(3), 232–242 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jaskowiak, P.A., Campello, R.J.G.B., Costa, I.G. (2012). Evaluating Correlation Coefficients for Clustering Gene Expression Profiles of Cancer. In: de Souto, M.C., Kann, M.G. (eds) Advances in Bioinformatics and Computational Biology. BSB 2012. Lecture Notes in Computer Science(), vol 7409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31927-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31927-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31926-6

  • Online ISBN: 978-3-642-31927-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics