Abstract
In the bioinformatics and clinical research areas, microarray technology has been widely used to distinguish a cancer dataset between normal and tumour samples. However, the high dimensionality of gene expression data affects the classification accuracy of an experiment. Thus, feature selection is needed to select informative genes and remove non-informative genes. Some of the feature selection methods, yet, ignore the interaction between genes. Therefore, the similar genes are clustered together and dissimilar genes are clustered in other groups. Hence, to provide a higher classification accuracy, this research proposed k-means clustering and infinite feature selection for identifying informative genes in the selected subset. This research has been applied to colorectal cancer and small round blue cell tumors datasets. Eventually, this research successfully obtained higher classification accuracy than the previous work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)
Bajo, J., De Paz, J.F., RodrÃguez, S., González, A.: A new clustering algorithm applying a hierarchical method neural network. Logic J. IGPL (2010). doi:10.1093/jigpal/jzq030
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., BenÃtez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014). doi:10.1016/j.ins.2014.05.042
Cebeci, Z., Yildiz, F.: Comparison of K-means and Fuzzy C-means algorithms on different cluster structures. J. Agric. Inform. 6(3), 13–23 (2015). http://doi.org/10.17700/jai.2015.6.3.196
Chan, W.H., Mohamad, M.S., Deris, S., Corchado, J.M., Omatu, S., Ibrahim, Z., Kasim, S.: An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways. Int. J. Bioinform. Res. Appl. 12(1), 72–93 (2016)
Corchado, J.M., De Paz, J.F., RodrÃguez, S., Bajo, J.: Model of experts for decision support in the diagnosis of leukemia patients. Artif. Intell. Med. 46(3), 179–200 (2009)
De Paz, J.F., Bajo, J., Vera, V., Corchado, J.M.: MicroCBR: a case-based reasoning architecture for the classification of microarray data. Appl. Soft Comput. 11(8), 4496–4507 (2011)
Garzón, J.A.C., González, J.R.: A gene selection approach based on clustering for classification tasks in colon cancer. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 4(3), 1–10 (2015)
Haynes, W.A., Higdon, R., Stanberry, L., Collins, D., Kolker, E.: Differential expression analysis for pathways. PLoS Comput. Biol. 9(3), e1002967 (2013)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine Learning Applications in Cancer Prognosis and Prediction. Computational and Structural Biotechnology Journal 13, 8–17 (2015). doi:10.1016/j.csbj.2014.11.005. Elsevier B.V.
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 233, pp. 281–297 (1967). http://doi.org/citeulike-article-id:6083430
Mohamad, M., Omatu, S., Deris, S., Misman, M., Yoshioka, M.: Selecting informative genes from microarray data by using hybrid methods for cancer classification. Artif. Life Robot. 13(2), 414–417 (2009). doi:10.1007/s10015-008-0534-4
Moorthy, K., Mohamad, M.S.: Random Forest for Gene Selection and Microarray Data Classification. Bioinformation 7(3), 142–146 (2011). doi:10.6026/97320630007142
Önskog, Jenny, Freyhult, Eva, Landfors, Mattias, Rydén, Patrik, Hvidsten, Torgeir R.: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning. BMC Bioinform. 12(1), 390 (2011). doi:10.1186/1471-2105-12-390
Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: Proceedings of the IEEE International Conference on Computer Vision, 11–18 December, pp. 4202–4210 (2016). http://doi.org/10.1109/ICCV.2015.478
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). doi:10.1016/0377-0427(87)90125-7
Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)
Vattani, A.: k-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011). doi:10.1007/s00454-011-9340-1
Zheng, B., Yoon, S.W., Lam, S.S.: Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst. Appl. 41(4), 1476–1482 (2014)
Acknowledgements
We would like to thank Universiti Teknologi Malaysia for funding this research through GUP Research Grants (grant numbers: Q.J130000.2528.12H12 and Q.J130000.2528.11H05). This research is also funded by Malaysian Ministry of Higher Education under a fundamental research grant (grant number: 1559).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Remli, M.A. et al. (2017). K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-60816-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60815-0
Online ISBN: 978-3-319-60816-7
eBook Packages: EngineeringEngineering (R0)