Entropy Based Clustering to Determine Discriminatory Genes for Microarray Dataset

Bala, Rajni; Agrawal, R. K.

doi:10.1007/978-3-642-14834-7_38

Rajni Bala⁹ &
R. K. Agrawal¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 94))

Included in the following conference series:

International Conference on Contemporary Computing

1129 Accesses

Abstract

Microarray datasets suffers from curse of dimensionality as they are represented by high dimension and only few samples are available. For efficient classification of samples there is a need of selecting a smaller set of relevant and non-redundant genes. In this paper, we propose a two stage algorithm GSUCE for finding a set of discriminatory genes responsible for classification in high dimensional microarray datasets. In the first stage the correlated genes are grouped into clusters and the best gene is selected from each cluster to create a pool of independent genes. This will reduce redundancy. We have used maximal information compression to measure similarity between genes. In second stage a wrapper based forward feature selection method is used to obtain a set of informative genes for a given classifier. The proposed algorithm is tested on five well known publicly available datasets . Comparison with other state of art methods shows that our proposed algorithm is able to achieve better classification accuracy with less number of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guyon, I., Elisseeff, A.: An Introduction to Variable and feature Selection. Journal of Machine Learning Research (3), 1157–1182 (2003)
Article MATH Google Scholar
Bellman, R.: Adaptive Control Processes. In: A Guided Tour. Princeton University Press, Princeton (1961)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Dowing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Yang, K., Cai, Z., Li, J., Lin, G.H.: A stable gene selection in microarray data analysis. BMC Bioinformatics 7, 228 (2006)
Article Google Scholar
Cho, J., Lee, D., Park, J.H., Lee, I.B.: New gene selection for classification of cancer subtype considering within-class variation. FEBS Letters 551, 3–7 (2003)
Article Google Scholar
Kohonen, T.: Self-organizing maps. Springer, Berlin (1995)
Google Scholar
Eisen, M.B., Spellman, T.P., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)
Article Google Scholar
Tavazoie, S., Huges, D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genet., 281–285 (1999)
Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for gene expression data: A survey. IEEE Trans. Knowledge and Data Eng. 16, 1370–1386 (2004)
Article Google Scholar
Yu, J., Amores, J., Sebe, N., Tian, Q.: Toward Robust Distance Metric analysis for Similarity Estimation. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2006)
Google Scholar
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring Expression Data: identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)
Article Google Scholar
Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Analysis and Machine Intelligence 24(3), 301–312 (2002)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques (2000)
Google Scholar
Kent Ridge Biomedical Data Repository, http://datam.i2r.a-star.edu.sg/datasets/krbd/
http://research.nhgri.nih.gov/Supplement/
Fu, L.M., Liu, C.S.F.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinformatics 6(67) (2005)
Google Scholar
Khan, J., Wei, S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F.: Classification and diagnosis prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7, 673–679 (2001)
Article Google Scholar
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene Selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Article Google Scholar
Ruiz, R., Riqueline, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper based gene selection from microarray data for cancer classification. Pattern Recognition 39(12), 2383–2392 (2006)
Article Google Scholar
Hong, J.H., Cho, S.B.: The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med. 36, 43–58 (2006)
Article Google Scholar
Tibsrani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centriods of gene expression. Proc. Natl. Acad. Sci. USA 99, 6567–6572 (2002)
Article Google Scholar
Yuechui, C., Yaou, Z.: A novel ensemble of classifiers for microarray data classification. Applied Soft Computing (8), 1664–1669 (2008)
Article Google Scholar
Shah, S., Kusiak, A.: Cancer gene search with Data Mining and Genetic Algorithms. Computer in Biology Medicine 37(2), 251–261 (2007)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for cancer classification using support vector machine. Machine Learning (46), 263–268 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Deen Dayal Upadhyaya College, University of Delhi, Delhi, India
Rajni Bala
School of Computer and System Science, Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal

Authors

Rajni Bala
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Sciences, University of Florida, 32611, Gainesville, FL, USA
Sanjay Ranka
University of Florida, Gainesville, Fl, USA
Arunava Banerjee
Department of Computer Science and Engineering, Indian Institute of Technology, 110016, New Delhi, INDIA
Kanad Kishore Biswas
Computer Science, College of Engineering and Science, Louisiana Tech University, LA 71272, Ruston, USA
Sumeet Dua
University of Florida, Gainesville, FL, USA
Prabhat Mishra
Department of Computer Science & Engineering, Indian Institute of Technology, 208016, Kanpur, India
Rajat Moona
National Tsing Hua University, Hsin-Chu, Taiwan, R.O.C.
Sheung-Hung Poon
Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong
Cho-Li Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bala, R., Agrawal, R.K. (2010). Entropy Based Clustering to Determine Discriminatory Genes for Microarray Dataset. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-14834-7_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14833-0
Online ISBN: 978-3-642-14834-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics