Skip to main content

A Fusion-Based Feature Selection Framework for Microarray Data Classification

  • Conference paper
  • First Online:
Innovative Systems for Intelligent Health Informatics (IRICT 2020)

Abstract

Gene expression profiling uses microarray techniques to discover patterns of genes when they are expressed. This helps to draw a picture of how the cell performs its function and determines whether there are any mutations. However, microarrays generate a huge amount of data which causes a computational cost and is time-consuming in the analysis process. Feature selection is one of the solutions for reducing the dimensionality of microarray datasets by choosing important genes and eliminating redundant and irrelevant features. In this study, a fusion-based feature selection framework was proposed that aims to apply multiple feature selection methods and combine them using ensemble methods. The framework consists of three layers; in the first layer, there are three feature selection methods that worked independently for ranking genes and assigned a score for each gene. In the second layer, a threshold is used to filter each gene according to their calculated scores. In the last layer, the final decision about which genes are important is made based on one of the decision voting strategies, either majority or consensus. The proposed framework presented an improvement in terms of classification accuracy and dimensionality reduction when compared with other previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Miko, I., LeJeune, L.: Essentials of genetics. Cambridge NPG Education (2009)

    Google Scholar 

  2. Khurana, S.P.: Biotechnology: Principles and Process. Studium (2015)

    Google Scholar 

  3. Matilainen, M.: Identification and characterization of target genes of the nuclear receptors VDR and PPARs (2007)

    Google Scholar 

  4. Crick, F.: Central dogma of molecular biology. Nature 227, 561–563 (1970)

    Article  Google Scholar 

  5. Alberts, B., Bray, D., Hopkin, K., Johnson, A.D., Lewis, J., Raff, M., Roberts, K., Walter, P.: Essential cell biology. Garland Science (2013)

    Google Scholar 

  6. Vlachakis, D.: Gene Expression Profiling in Cancer. Intechopen (2019). https://doi.org/10.5772/intechopen.78451

  7. Bustin, S.A., Benes, V., Garson, J.A., Hellemans, J., Huggett, J., Kubista, M., Mueller, R., Nolan, T., Pfaffl, M.W., Shipley, G.L.: The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments (2009)

    Google Scholar 

  8. Chattopadhyay, A., Lu, T.-P.: Gene-gene interaction: the curse of dimensionality. Ann. Transl. Med. 7, 813–817 (2019)

    Google Scholar 

  9. Xue, Y., Xue, B., Zhang, M.: Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans. Knowl. Discov. from Data. 13, 1–27 (2019)

    Article  Google Scholar 

  10. Dash, R.: A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: a case study. J. King Saud Univ. Inf. Sci. 32, 232–247 (2020)

    Google Scholar 

  11. Tsai, C.-F., Sung, Y.-T.: Ensemble feature selection in high dimension, low sample size datasets: parallel and serial combination approaches. Knowledge-Based Syst. 106097 (2020)

    Google Scholar 

  12. Jesus, J., Araújo, D., Canuto, A.: Fusion approaches of feature selection algorithms for classification problems. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 379–384. IEEE (2016)

    Google Scholar 

  13. Ke, W., Wu, C., Wu, Y., Xiong, N.N.: A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6, 61065–61076 (2018). https://doi.org/10.1109/ACCESS.2018.2873634

    Article  Google Scholar 

  14. Momenzadeh, M., Sehhati, M., Rabbani, H.: A novel feature selection method for microarray data classification based on hidden Markov model. J. Biomed. Inform. 95, 1–8 (2019). https://doi.org/10.1016/j.jbi.2019.103213

    Article  Google Scholar 

  15. Lin, X., Li, C., Zhang, Y., Su, B., Fan, M., Wei, H.: Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics. Molecules 23, 52 (2018)

    Article  Google Scholar 

  16. Athilakshmi, R., Rajavel, R., Jacob, S.G.: Fusion Feature selection: new insights into feature subset detection in biological data mining. Stud. Inform. Control. 28, 327–336 (2019)

    Article  Google Scholar 

  17. Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: Using a feature selection ensemble on DNA microarray datasets. In: ESANN (2016)

    Google Scholar 

  18. Morovvat, M., Osareh, A.: An ensemble of filters and wrappers for microarray data classification. Mach. Learn. Appl. An Int. J. 3, 1–7 (2016)

    Google Scholar 

  19. Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data: Methods, Theory and Applications. Springer Science and Business Media (2011). https://doi.org/10.1080/02664763.2012.694258

  20. Kazemitabar, J., Amini, A., Bloniarz, A., Talwalkar, A.S.: Variable importance using decision trees. In: Advances in Neural Information Processing Systems. pp. 426–435 (2017)

    Google Scholar 

  21. Xia, F., Zhang, W., Li, F., Yang, Y.: Ranking with decision tree. Knowl. Inf. Syst. 17, 381–395 (2008)

    Article  Google Scholar 

  22. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995). https://doi.org/10.1007/bf00994018

    Article  MATH  Google Scholar 

  23. Aydadenta, H.: Adiwijaya: a clustering approach for feature selection in microarray data classification using random forest. J. Inf. Process. Syst. 14, 1167–1175 (2018). https://doi.org/10.3745/JIPS.04.0087

    Article  Google Scholar 

  24. Probst, P., Boulesteix, A.-L., Bischl, B.: Tunability: importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 20, 1–32 (2019)

    MathSciNet  MATH  Google Scholar 

  25. Zhu, Z., Ong, Y.-S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 40, 3236–3248 (2007)

    Article  Google Scholar 

  26. Sun, L., Zhang, X., Qian, Y., Xu, J., Zhang, S.: Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci. (Ny) 502, 18–41 (2019). https://doi.org/10.1016/j.ins.2019.05.072

    Article  MathSciNet  MATH  Google Scholar 

  27. Hameed, S.S., Muhammad, F.F., Hassan, R., Saeed, F.: Gene selection and classification in microarray datasets using a hybrid approach of PCC-BPSO/GA with multi classifiers. J. Comput. Sci. 14, 868–880 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Almutiri, T., Saeed, F., Alassaf, M., Hezzam, E.A. (2021). A Fusion-Based Feature Selection Framework for Microarray Data Classification. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_52

Download citation

Publish with us

Policies and ethics