Skip to main content

Hierarchical Clustering, Languages and Cancer

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Abstract

In this paper, we introduce a novel objective function for the hierarchical clustering of data from distance matrices, a very relevant task in Bioinformatics. To test the robustness of the method, we test it in two areas: (a) the problem of deriving a phylogeny of languages and (b) subtype cancer classification from microarray data. For comparison purposes, we also consider both the use of ultrametric trees (generated via a two-phase evolutionary approach that creates a large number of hypothesis trees, and then takes a consensus), and the best-known results from the literature.

We used a dataset of measured ’separation time’ among 84 Indo-European languages. The hierarchy we produce agrees very well with existing data about these languages across a wide range of levels, and it helps to clarify and raise new hypothesis about the evolution of these languages.

Our method also generated a classification tree for the different cancers in the NCI60 microarray dataset (comprising gene expression data for 60 cancer cell lines). In this case, the method seems to support the current belief about the heterogeneous nature of the ovarian, breast and non-small-lung cancer, as opposed to the relative homogeneity of other types of cancer. However, our method reveals a close relationship of the melanoma and CNS cell-lines. This is in correspondence with the fact that metastatic melanoma first appears in central nervous system (CNS).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cotta, C., Moscato, P.: A memetic-aided approach to hierarchical clustering from distance matrices: Application to phylogeny and gene expression clustering. Biosystems 71, 75–97 (2003)

    Article  Google Scholar 

  2. Merz, P., Freisleben, B.: Fitness landscapes, memetic algorithms, and greedy operators for graph bipartitioning. Evolutionary Computation 8, 61–91 (2000)

    Article  Google Scholar 

  3. Battiti, R., Bertossi, A.: Differential greedy for the 0-1 equicut problem. In: Proc. of DIMACS Workshop on Network Design: Connectivity and Facilities (1997)

    Google Scholar 

  4. Festa, P., Pardalos, P., Resende, M.G.C., Ribeiro, C.C.: Randomized heuristics for the MAX-CUT problem. Optimization Methods and Software 7, 1033–1058 (2002)

    Article  MathSciNet  Google Scholar 

  5. Wu, B., Chao, K.M., Tang, C.: Approximation and exact algorithms for constructing minimum ultrametric trees from distance matrices. Journal of Combinatorial Optimization 3, 199–211 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  6. Cotta, C.: Scatter search with path relinking for phylogenetic inference. European Journal of Operational Research 169, 520–532 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  7. Wang, J., Shan, H., Shasha, D., Piel, W.: Treerank: A similarity measure for nearest neighbor searching in phylogenetic databases. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management, Cambridge MA, pp. 171–180. IEEE Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  8. Cotta, C.: On the application of evolutionary algorithms to the consensus tree problem. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 58–67. Springer, Heidelberg (2005)

    Google Scholar 

  9. Moilanen, A.: Searching for the most parsimonious trees with simulated evolution. Cladistics 15, 39–50 (1999)

    Article  Google Scholar 

  10. Cotta, C., Moscato, P.: Inferring phylogenetic trees using evolutionary algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 720–729. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Mallory, J.P.: Search of the Indo-European languages. Archaelogy and Myth (1989)

    Google Scholar 

  12. Renfrew, C.: Time-depth in historical linguistics. The McDonald Institute for Archaeological Research, 413–439 (2000)

    Google Scholar 

  13. Richards, M.: Tracing european founder lineage in the near easter mtDNA pool. Am. K. Hum. Genet. 67, 1251–1276 (2000)

    Google Scholar 

  14. Semoni: The genetic legacy of Paleolithic Homo Sapiens in extant europeans: a Y chromosome perspective. Science 290, 1155–1159 (2000)

    Google Scholar 

  15. Chikhi, L., Nichols, R., Barbujani, G., Beaumont, M.: Y genetic data support the Neolithic demic diffusion model. Prod. Natl. Acad., Sci. 67, 11008–11013 (2002)

    Article  Google Scholar 

  16. Gray, R.D., Atkinson, Q.D.: Language-tree divergence times support the Anatolian theory of indo-european origin. Nature 426, 435–439 (2003)

    Article  Google Scholar 

  17. Bryant, D., Filimon, F., Gray, R.: Untangling our past: Languages, trees, splits and networks. In: Mace, R., Holden, C., Shennan, S. (eds.) The Evolution of Cultural Diversity: Phylogenetic Approaches, pp. 69–85. UCL Press (2005)

    Google Scholar 

  18. Dyen, I., Kruskal, J.B., Black, P.: An Indo-European classification: A lexicostatistical experiment. Transactions of the American Philosophical Society, New Ser. 82, 1–132 (1992)

    Google Scholar 

  19. Cavalli-Sforza, L.: Genes, peoples, and languages. Proceedings of the National Academy of Sciences of the United States of America 94, 7719–7724 (1997)

    Article  Google Scholar 

  20. Ross, D.T., Scherf, U., Eisen, M., Perou, C., Rees, C., Spellman, P., Iyer, V., Jeffrey, S., Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C., Lashkari, D., Shalon, D., Myers, T., Weinstein, J.N., Botstein, D., Brown, P.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24, 227–235 (2000)

    Article  Google Scholar 

  21. Cotta, C., Langston, M., Moscato, P.: Combinatorial and algorithmic issues for microarray data analysis. In: Handbook of Approximation Algorithms and Metaheuristics. Chapman and Hall, Boca Raton (2005)

    Google Scholar 

  22. Hourani, M., Mendes, A., Berretta, R., Moscato, P.: A genetic signature for parkinsons disease using rodent brain gene expression. In: Keith, J. (ed.) Bioinformatics. Humana Press (2006)

    Google Scholar 

  23. Ferraresi, V., Ciccarese, M., Zeuli, M., Cognetti, F.: Central system as exclusive site disease in patients with melanoma: treatment and clinical outcome of two cases. Melanoma Res. 15, 467–469 (2005)

    Article  Google Scholar 

  24. Marchetti, D., Denkins, Y., Reiland, J., Greiter-Wilke, A., Galjour, J., Murry, B., Blust, J., Roy, M.: Brain-metastatic melanoma: a neurotrophic perspective. Pathology Oncology Research 9, 147–158 (2003)

    Article  Google Scholar 

  25. Buell, J., Gross, T., Alloway, R., Trofe, J., Woodle, E.: Central nervous system tumors in donors: Misdiagnosis carries a high morbidity and mortality. Transplantation Proceedings 37, 583–584 (2005)

    Article  Google Scholar 

  26. Perou, C.M., Jeffrey, S.S., Rijn, M., Rees, C.A., Eisen, M.B., Ross, D.T., Pergamenschikov, A., Williams, C.F., Zhu, S.X., Lee, J.C.F., Lashkari, D., Shalon, D., Brown, P.O., Botstein, D.: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Genetics 96, 9212–9217 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahata, P., Costa, W., Cotta, C., Moscato, P. (2006). Hierarchical Clustering, Languages and Cancer. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_7

Download citation

  • DOI: https://doi.org/10.1007/11732242_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33237-4

  • Online ISBN: 978-3-540-33238-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics