Using Most Similarity Tree Based Clustering to Select the Top Most Discriminating Genes for Cancer Detection

Lu, Xinguo; Lin, Yaping; Yang, Xiaolin; Cai, Lijun; Wang, Haijun; Sanga, Gustaph

doi:10.1007/11785231_98

Using Most Similarity Tree Based Clustering to Select the Top Most Discriminating Genes for Cancer Detection

Xinguo Lu²²,
Yaping Lin^22,23,
Xiaolin Yang²²,
Lijun Cai²²,
Haijun Wang²² &
…
Gustaph Sanga²²

Conference paper

1185 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4029))

Abstract

The development of DNA array technology makes it feasible to cancer detection with DNA array expression data. However, the research is usually plagued with the problem of “curse of dimensionality”, and the capability of discrimination is weakened seriously by the noise and the redundancy that are abundant in these datasets. This paper proposes a hybrid gene selection method for cancer detection based on clustering of most similarity tree (CMST). By this method, a number of non-redundant clusters and the most discriminating gene from each cluster can be acquired. These discriminating genes are then used for training of a perceptron that produces a very efficient classification. In CMST, the Gap statistic is used to determine the optimal similarity measure λ and the number of clusters. And a gene selection method with optimal self-adaptive CMST(OS-CMST) for cancer detection is presented. The experiments show that the gene pattern pre-processing based on CMST not only reduces the dimensionality of the attributes significantly but also improves the classification rate effectively in cancer detection. And the selection scheme based on OS-CMST can acquire the top most discriminating genes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kasabov, N.: Evolving Connectionist Systems, Methods and Applications in Bioinformatics, Brain Study and Intelligent Machines. Springer, Heidelberg (2002)
MATH Google Scholar
Veer, L.J.V.T., Dai, H., Vijver, M.J.V.D., He, Y.D., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Article Google Scholar
Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8, 68–74 (2002)
Article Google Scholar
Cho, S.B., Won, H.H.: Machine Learning in DNA Microarray Analysis for Cancer Classification. In: Proc. of the First Asia-Pacific Bioinformatics Conference (APBC 2003), pp. 189–198 (2003)
Google Scholar
Goh, L., Song, Q., Kasabov, N.: A Novel Feature Selection Method to Improve Classification of Gene Expression Data. In: Proc. of Bioinformatics 2004 Second Asia-Pacific Bioinformatics Conference (APBC 2004), pp. 161–166 (2004)
Google Scholar
Hu, X., Yoo, I.: Cluster Ensemble and its application in gene expression analysis. In: The 2nd Asia-Pacific Bioinformatics Conference(APBC2004), Conferences in Reseach and Practice in Information Technology, Dunedin, New Zealand, vol. 29 (2004)
Google Scholar
Lukashin, A.V., Fuchs, R.: Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics 17, 405–414 (2001)
Article Google Scholar
Mateos, A., Herrero, J., Tamames, J., Dopazo, J.: Supervised neural networks for clustering conditions in DNA array data after reducing noise by clustering gene expression profiles. In: Microarray data analysis II, Kluwer Academic, Dordrecht (2002)
Google Scholar
Conde, L., Mateos, A., Herrero, J., Dopazo, J.: Unsupervised Reduction of the Dimensionality Followed by Supervised Learning with a Perceptron Improves the Classification of Conditions in DNA Microarray Gene Expression Data. In: Boulard, Adali, Bengio, Larsen, Douglas (eds.) Neural Networks for Signal Processing XII, pp. 77–86. IEEE Press, New York (2002)
Chapter Google Scholar
Lu, X.G., Lin, Y.P., Li, X.L., Yi, Y.Q., Cai, L.J., Wang, H.J.: Gene Cluster Algorithm Based on Most Similarity Tree. In: Proc. of the 8th International Conference on High Performance Computing in Asia Pacific Region (HPC Asia 2005), pp. 652–656 (2005)
Google Scholar
Hastie, T., Tibshirani, T., Walther, G.: Estimating the number of clusters in a dataset via the gap statistic. Tech. report. March 2000. Published in JRSSB (2000)
Google Scholar
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., et al.: Distinct types of disuse lare B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Communication, Hunan University, Changsha, 410082, China
Xinguo Lu, Yaping Lin, Xiaolin Yang, Lijun Cai, Haijun Wang & Gustaph Sanga
College of Software, Hunan University, Changsha, 410082, China
Yaping Lin

Authors

Xinguo Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yaping Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Haijun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gustaph Sanga
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Academy of Humanities and Economics, Poland
Leszek Rutkowski
Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, PL-30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, Berkeley Initiative in Soft Computing (BISC), University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Department of Electrical Engineering, University of Louisville, 40292, Louisville, KY, U.S.A
Jacek M. Żurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, X., Lin, Y., Yang, X., Cai, L., Wang, H., Sanga, G. (2006). Using Most Similarity Tree Based Clustering to Select the Top Most Discriminating Genes for Cancer Detection. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_98

Download citation

DOI: https://doi.org/10.1007/11785231_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35748-3
Online ISBN: 978-3-540-35750-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics