K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data

Remli, Muhammad Akmal; Mohd Daud, Kauthar; Nies, Hui Wen; Mohamad, Mohd Saberi; Deris, Safaai; Omatu, Sigeru; Kasim, Shahreen; Sulong, Ghazali

doi:10.1007/978-3-319-60816-7_7

Muhammad Akmal Remli¹⁹,
Kauthar Mohd Daud¹⁹,
Hui Wen Nies¹⁹,
Mohd Saberi Mohamad¹⁹,
Safaai Deris²⁰,
Sigeru Omatu²¹,
Shahreen Kasim²² &
…
Ghazali Sulong²³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 616))

Included in the following conference series:

International Conference on Practical Applications of Computational Biology & Bioinformatics

935 Accesses
5 Citations

Abstract

In the bioinformatics and clinical research areas, microarray technology has been widely used to distinguish a cancer dataset between normal and tumour samples. However, the high dimensionality of gene expression data affects the classification accuracy of an experiment. Thus, feature selection is needed to select informative genes and remove non-informative genes. Some of the feature selection methods, yet, ignore the interaction between genes. Therefore, the similar genes are clustered together and dissimilar genes are clustered in other groups. Hence, to provide a higher classification accuracy, this research proposed k-means clustering and infinite feature selection for identifying informative genes in the selected subset. This research has been applied to colorectal cancer and small round blue cell tumors datasets. Eventually, this research successfully obtained higher classification accuracy than the previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A proficient two stage model for identification of promising gene subset and accurate cancer classification

Article 10 March 2023

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Feature Subset Selection for Cancer Classification Using Weight Local Modularity

Article Open access 05 October 2016

References

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Article Google Scholar
Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)
Article Google Scholar
Bajo, J., De Paz, J.F., Rodríguez, S., González, A.: A new clustering algorithm applying a hierarchical method neural network. Logic J. IGPL (2010). doi:10.1093/jigpal/jzq030
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014). doi:10.1016/j.ins.2014.05.042
Article Google Scholar
Cebeci, Z., Yildiz, F.: Comparison of K-means and Fuzzy C-means algorithms on different cluster structures. J. Agric. Inform. 6(3), 13–23 (2015). http://doi.org/10.17700/jai.2015.6.3.196
Google Scholar
Chan, W.H., Mohamad, M.S., Deris, S., Corchado, J.M., Omatu, S., Ibrahim, Z., Kasim, S.: An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways. Int. J. Bioinform. Res. Appl. 12(1), 72–93 (2016)
Article Google Scholar
Corchado, J.M., De Paz, J.F., Rodríguez, S., Bajo, J.: Model of experts for decision support in the diagnosis of leukemia patients. Artif. Intell. Med. 46(3), 179–200 (2009)
Article Google Scholar
De Paz, J.F., Bajo, J., Vera, V., Corchado, J.M.: MicroCBR: a case-based reasoning architecture for the classification of microarray data. Appl. Soft Comput. 11(8), 4496–4507 (2011)
Article Google Scholar
Garzón, J.A.C., González, J.R.: A gene selection approach based on clustering for classification tasks in colon cancer. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 4(3), 1–10 (2015)
Article Google Scholar
Haynes, W.A., Higdon, R., Stanberry, L., Collins, D., Kolker, E.: Differential expression analysis for pathways. PLoS Comput. Biol. 9(3), e1002967 (2013)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Article Google Scholar
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine Learning Applications in Cancer Prognosis and Prediction. Computational and Structural Biotechnology Journal 13, 8–17 (2015). doi:10.1016/j.csbj.2014.11.005. Elsevier B.V.
Article Google Scholar
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 233, pp. 281–297 (1967). http://doi.org/citeulike-article-id:6083430
Mohamad, M., Omatu, S., Deris, S., Misman, M., Yoshioka, M.: Selecting informative genes from microarray data by using hybrid methods for cancer classification. Artif. Life Robot. 13(2), 414–417 (2009). doi:10.1007/s10015-008-0534-4
Article Google Scholar
Moorthy, K., Mohamad, M.S.: Random Forest for Gene Selection and Microarray Data Classification. Bioinformation 7(3), 142–146 (2011). doi:10.6026/97320630007142
Article Google Scholar
Önskog, Jenny, Freyhult, Eva, Landfors, Mattias, Rydén, Patrik, Hvidsten, Torgeir R.: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning. BMC Bioinform. 12(1), 390 (2011). doi:10.1186/1471-2105-12-390
Article Google Scholar
Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: Proceedings of the IEEE International Conference on Computer Vision, 11–18 December, pp. 4202–4210 (2016). http://doi.org/10.1109/ICCV.2015.478
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). doi:10.1016/0377-0427(87)90125-7
Article MATH Google Scholar
Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)
Article Google Scholar
Vattani, A.: k-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011). doi:10.1007/s00454-011-9340-1
Article MathSciNet MATH Google Scholar
Zheng, B., Yoon, S.W., Lam, S.S.: Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst. Appl. 41(4), 1476–1482 (2014)
Article Google Scholar

Download references

Acknowledgements

We would like to thank Universiti Teknologi Malaysia for funding this research through GUP Research Grants (grant numbers: Q.J130000.2528.12H12 and Q.J130000.2528.11H05). This research is also funded by Malaysian Ministry of Higher Education under a fundamental research grant (grant number: 1559).

Author information

Authors and Affiliations

Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Muhammad Akmal Remli, Kauthar Mohd Daud, Hui Wen Nies & Mohd Saberi Mohamad
Faculty of Creative Technology & Heritage, Universiti Malaysia Kelantan, Locked Bag 01, Bachok, 16300, Kota Bharu, Kelantan, Malaysia
Safaai Deris
Department of Electronics, Information and Communication Engineering, Osaka Institute of Technology, Osaka, 535-8585, Japan
Sigeru Omatu
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400, Batu Paha, Malaysia
Shahreen Kasim
School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030, Kuala Nerus, Terengganu, Malaysia
Ghazali Sulong

Authors

Muhammad Akmal Remli
View author publications
You can also search for this author in PubMed Google Scholar
Kauthar Mohd Daud
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wen Nies
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Saberi Mohamad
View author publications
You can also search for this author in PubMed Google Scholar
Safaai Deris
View author publications
You can also search for this author in PubMed Google Scholar
Sigeru Omatu
View author publications
You can also search for this author in PubMed Google Scholar
Shahreen Kasim
View author publications
You can also search for this author in PubMed Google Scholar
Ghazali Sulong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohd Saberi Mohamad .

Editor information

Editors and Affiliations

Escuela Superior de Ingeniería Informática, Universidad de Vigo , Ourense, Spain
Florentino Fdez-Riverola
Faculty of Computing, Universiti Teknologi Malaysia , Johor, Johor, Malaysia
Mohd Saberi Mohamad
Department de Informática, Universidade do Minho , Braga, Portugal
Miguel Rocha
Departamento de Informática y Automática, Universidad de Salamanca , Salamanca, Spain
Juan F. De Paz
Departamento de Informática y Automática, Universidad de Salamanca , Salamanca, Spain
Tiago Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Remli, M.A. et al. (2017). K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-60816-7_7
Published: 21 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60815-0
Online ISBN: 978-3-319-60816-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics