Abstract
Gene expression profiling uses microarray techniques to discover patterns of genes when they are expressed. This helps to draw a picture of how the cell performs its function and determines whether there are any mutations. However, microarrays generate a huge amount of data which causes a computational cost and is time-consuming in the analysis process. Feature selection is one of the solutions for reducing the dimensionality of microarray datasets by choosing important genes and eliminating redundant and irrelevant features. In this study, a fusion-based feature selection framework was proposed that aims to apply multiple feature selection methods and combine them using ensemble methods. The framework consists of three layers; in the first layer, there are three feature selection methods that worked independently for ranking genes and assigned a score for each gene. In the second layer, a threshold is used to filter each gene according to their calculated scores. In the last layer, the final decision about which genes are important is made based on one of the decision voting strategies, either majority or consensus. The proposed framework presented an improvement in terms of classification accuracy and dimensionality reduction when compared with other previous methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Miko, I., LeJeune, L.: Essentials of genetics. Cambridge NPG Education (2009)
Khurana, S.P.: Biotechnology: Principles and Process. Studium (2015)
Matilainen, M.: Identification and characterization of target genes of the nuclear receptors VDR and PPARs (2007)
Crick, F.: Central dogma of molecular biology. Nature 227, 561–563 (1970)
Alberts, B., Bray, D., Hopkin, K., Johnson, A.D., Lewis, J., Raff, M., Roberts, K., Walter, P.: Essential cell biology. Garland Science (2013)
Vlachakis, D.: Gene Expression Profiling in Cancer. Intechopen (2019). https://doi.org/10.5772/intechopen.78451
Bustin, S.A., Benes, V., Garson, J.A., Hellemans, J., Huggett, J., Kubista, M., Mueller, R., Nolan, T., Pfaffl, M.W., Shipley, G.L.: The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments (2009)
Chattopadhyay, A., Lu, T.-P.: Gene-gene interaction: the curse of dimensionality. Ann. Transl. Med. 7, 813–817 (2019)
Xue, Y., Xue, B., Zhang, M.: Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans. Knowl. Discov. from Data. 13, 1–27 (2019)
Dash, R.: A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: a case study. J. King Saud Univ. Inf. Sci. 32, 232–247 (2020)
Tsai, C.-F., Sung, Y.-T.: Ensemble feature selection in high dimension, low sample size datasets: parallel and serial combination approaches. Knowledge-Based Syst. 106097 (2020)
Jesus, J., Araújo, D., Canuto, A.: Fusion approaches of feature selection algorithms for classification problems. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 379–384. IEEE (2016)
Ke, W., Wu, C., Wu, Y., Xiong, N.N.: A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6, 61065–61076 (2018). https://doi.org/10.1109/ACCESS.2018.2873634
Momenzadeh, M., Sehhati, M., Rabbani, H.: A novel feature selection method for microarray data classification based on hidden Markov model. J. Biomed. Inform. 95, 1–8 (2019). https://doi.org/10.1016/j.jbi.2019.103213
Lin, X., Li, C., Zhang, Y., Su, B., Fan, M., Wei, H.: Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics. Molecules 23, 52 (2018)
Athilakshmi, R., Rajavel, R., Jacob, S.G.: Fusion Feature selection: new insights into feature subset detection in biological data mining. Stud. Inform. Control. 28, 327–336 (2019)
Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: Using a feature selection ensemble on DNA microarray datasets. In: ESANN (2016)
Morovvat, M., Osareh, A.: An ensemble of filters and wrappers for microarray data classification. Mach. Learn. Appl. An Int. J. 3, 1–7 (2016)
Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data: Methods, Theory and Applications. Springer Science and Business Media (2011). https://doi.org/10.1080/02664763.2012.694258
Kazemitabar, J., Amini, A., Bloniarz, A., Talwalkar, A.S.: Variable importance using decision trees. In: Advances in Neural Information Processing Systems. pp. 426–435 (2017)
Xia, F., Zhang, W., Li, F., Yang, Y.: Ranking with decision tree. Knowl. Inf. Syst. 17, 381–395 (2008)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995). https://doi.org/10.1007/bf00994018
Aydadenta, H.: Adiwijaya: a clustering approach for feature selection in microarray data classification using random forest. J. Inf. Process. Syst. 14, 1167–1175 (2018). https://doi.org/10.3745/JIPS.04.0087
Probst, P., Boulesteix, A.-L., Bischl, B.: Tunability: importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 20, 1–32 (2019)
Zhu, Z., Ong, Y.-S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 40, 3236–3248 (2007)
Sun, L., Zhang, X., Qian, Y., Xu, J., Zhang, S.: Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci. (Ny) 502, 18–41 (2019). https://doi.org/10.1016/j.ins.2019.05.072
Hameed, S.S., Muhammad, F.F., Hassan, R., Saeed, F.: Gene selection and classification in microarray datasets using a hybrid approach of PCC-BPSO/GA with multi classifiers. J. Comput. Sci. 14, 868–880 (2018)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Almutiri, T., Saeed, F., Alassaf, M., Hezzam, E.A. (2021). A Fusion-Based Feature Selection Framework for Microarray Data Classification. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-70713-2_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70712-5
Online ISBN: 978-3-030-70713-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)