Skip to main content

Entropy Based Clustering to Determine Discriminatory Genes for Microarray Dataset

  • Conference paper
Contemporary Computing (IC3 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 94))

Included in the following conference series:

  • 1129 Accesses

Abstract

Microarray datasets suffers from curse of dimensionality as they are represented by high dimension and only few samples are available. For efficient classification of samples there is a need of selecting a smaller set of relevant and non-redundant genes. In this paper, we propose a two stage algorithm GSUCE for finding a set of discriminatory genes responsible for classification in high dimensional microarray datasets. In the first stage the correlated genes are grouped into clusters and the best gene is selected from each cluster to create a pool of independent genes. This will reduce redundancy. We have used maximal information compression to measure similarity between genes. In second stage a wrapper based forward feature selection method is used to obtain a set of informative genes for a given classifier. The proposed algorithm is tested on five well known publicly available datasets . Comparison with other state of art methods shows that our proposed algorithm is able to achieve better classification accuracy with less number of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guyon, I., Elisseeff, A.: An Introduction to Variable and feature Selection. Journal of Machine Learning Research (3), 1157–1182 (2003)

    Article  MATH  Google Scholar 

  2. Bellman, R.: Adaptive Control Processes. In: A Guided Tour. Princeton University Press, Princeton (1961)

    Google Scholar 

  3. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Dowing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  4. Yang, K., Cai, Z., Li, J., Lin, G.H.: A stable gene selection in microarray data analysis. BMC Bioinformatics 7, 228 (2006)

    Article  Google Scholar 

  5. Cho, J., Lee, D., Park, J.H., Lee, I.B.: New gene selection for classification of cancer subtype considering within-class variation. FEBS Letters 551, 3–7 (2003)

    Article  Google Scholar 

  6. Kohonen, T.: Self-organizing maps. Springer, Berlin (1995)

    Google Scholar 

  7. Eisen, M.B., Spellman, T.P., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)

    Article  Google Scholar 

  8. Tavazoie, S., Huges, D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genet., 281–285 (1999)

    Google Scholar 

  9. Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for gene expression data: A survey. IEEE Trans. Knowledge and Data Eng. 16, 1370–1386 (2004)

    Article  Google Scholar 

  10. Yu, J., Amores, J., Sebe, N., Tian, Q.: Toward Robust Distance Metric analysis for Similarity Estimation. In: Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  11. Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring Expression Data: identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)

    Article  Google Scholar 

  12. Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Analysis and Machine Intelligence 24(3), 301–312 (2002)

    Article  Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques (2000)

    Google Scholar 

  14. Kent Ridge Biomedical Data Repository, http://datam.i2r.a-star.edu.sg/datasets/krbd/

  15. http://research.nhgri.nih.gov/Supplement/

  16. Fu, L.M., Liu, C.S.F.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinformatics 6(67) (2005)

    Google Scholar 

  17. Khan, J., Wei, S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F.: Classification and diagnosis prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7, 673–679 (2001)

    Article  Google Scholar 

  18. Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene Selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)

    Article  Google Scholar 

  19. Ruiz, R., Riqueline, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper based gene selection from microarray data for cancer classification. Pattern Recognition 39(12), 2383–2392 (2006)

    Article  Google Scholar 

  20. Hong, J.H., Cho, S.B.: The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med. 36, 43–58 (2006)

    Article  Google Scholar 

  21. Tibsrani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centriods of gene expression. Proc. Natl. Acad. Sci. USA 99, 6567–6572 (2002)

    Article  Google Scholar 

  22. Yuechui, C., Yaou, Z.: A novel ensemble of classifiers for microarray data classification. Applied Soft Computing (8), 1664–1669 (2008)

    Article  Google Scholar 

  23. Shah, S., Kusiak, A.: Cancer gene search with Data Mining and Genetic Algorithms. Computer in Biology Medicine 37(2), 251–261 (2007)

    Article  Google Scholar 

  24. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for cancer classification using support vector machine. Machine Learning (46), 263–268 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bala, R., Agrawal, R.K. (2010). Entropy Based Clustering to Determine Discriminatory Genes for Microarray Dataset. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14834-7_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14833-0

  • Online ISBN: 978-3-642-14834-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics