Skip to main content

K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 616))

Abstract

In the bioinformatics and clinical research areas, microarray technology has been widely used to distinguish a cancer dataset between normal and tumour samples. However, the high dimensionality of gene expression data affects the classification accuracy of an experiment. Thus, feature selection is needed to select informative genes and remove non-informative genes. Some of the feature selection methods, yet, ignore the interaction between genes. Therefore, the similar genes are clustered together and dissimilar genes are clustered in other groups. Hence, to provide a higher classification accuracy, this research proposed k-means clustering and infinite feature selection for identifying informative genes in the selected subset. This research has been applied to colorectal cancer and small round blue cell tumors datasets. Eventually, this research successfully obtained higher classification accuracy than the previous work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  2. Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)

    Article  Google Scholar 

  3. Bajo, J., De Paz, J.F., Rodríguez, S., González, A.: A new clustering algorithm applying a hierarchical method neural network. Logic J. IGPL (2010). doi:10.1093/jigpal/jzq030

  4. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014). doi:10.1016/j.ins.2014.05.042

    Article  Google Scholar 

  5. Cebeci, Z., Yildiz, F.: Comparison of K-means and Fuzzy C-means algorithms on different cluster structures. J. Agric. Inform. 6(3), 13–23 (2015). http://doi.org/10.17700/jai.2015.6.3.196

    Google Scholar 

  6. Chan, W.H., Mohamad, M.S., Deris, S., Corchado, J.M., Omatu, S., Ibrahim, Z., Kasim, S.: An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways. Int. J. Bioinform. Res. Appl. 12(1), 72–93 (2016)

    Article  Google Scholar 

  7. Corchado, J.M., De Paz, J.F., Rodríguez, S., Bajo, J.: Model of experts for decision support in the diagnosis of leukemia patients. Artif. Intell. Med. 46(3), 179–200 (2009)

    Article  Google Scholar 

  8. De Paz, J.F., Bajo, J., Vera, V., Corchado, J.M.: MicroCBR: a case-based reasoning architecture for the classification of microarray data. Appl. Soft Comput. 11(8), 4496–4507 (2011)

    Article  Google Scholar 

  9. Garzón, J.A.C., González, J.R.: A gene selection approach based on clustering for classification tasks in colon cancer. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 4(3), 1–10 (2015)

    Article  Google Scholar 

  10. Haynes, W.A., Higdon, R., Stanberry, L., Collins, D., Kolker, E.: Differential expression analysis for pathways. PLoS Comput. Biol. 9(3), e1002967 (2013)

    Article  Google Scholar 

  11. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  12. Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)

    Article  Google Scholar 

  13. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine Learning Applications in Cancer Prognosis and Prediction. Computational and Structural Biotechnology Journal 13, 8–17 (2015). doi:10.1016/j.csbj.2014.11.005. Elsevier B.V.

    Article  Google Scholar 

  14. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 233, pp. 281–297 (1967). http://doi.org/citeulike-article-id:6083430

  15. Mohamad, M., Omatu, S., Deris, S., Misman, M., Yoshioka, M.: Selecting informative genes from microarray data by using hybrid methods for cancer classification. Artif. Life Robot. 13(2), 414–417 (2009). doi:10.1007/s10015-008-0534-4

    Article  Google Scholar 

  16. Moorthy, K., Mohamad, M.S.: Random Forest for Gene Selection and Microarray Data Classification. Bioinformation 7(3), 142–146 (2011). doi:10.6026/97320630007142

    Article  Google Scholar 

  17. Önskog, Jenny, Freyhult, Eva, Landfors, Mattias, Rydén, Patrik, Hvidsten, Torgeir R.: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning. BMC Bioinform. 12(1), 390 (2011). doi:10.1186/1471-2105-12-390

    Article  Google Scholar 

  18. Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: Proceedings of the IEEE International Conference on Computer Vision, 11–18 December, pp. 4202–4210 (2016). http://doi.org/10.1109/ICCV.2015.478

  19. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). doi:10.1016/0377-0427(87)90125-7

    Article  MATH  Google Scholar 

  20. Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)

    Article  Google Scholar 

  21. Vattani, A.: k-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011). doi:10.1007/s00454-011-9340-1

    Article  MathSciNet  MATH  Google Scholar 

  22. Zheng, B., Yoon, S.W., Lam, S.S.: Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst. Appl. 41(4), 1476–1482 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Universiti Teknologi Malaysia for funding this research through GUP Research Grants (grant numbers: Q.J130000.2528.12H12 and Q.J130000.2528.11H05). This research is also funded by Malaysian Ministry of Higher Education under a fundamental research grant (grant number: 1559).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohd Saberi Mohamad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Remli, M.A. et al. (2017). K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60816-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60815-0

  • Online ISBN: 978-3-319-60816-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics