Abstract
Disease risk prediction is an important task in biomedicine and bioinformatics. To resolve the problem of high-dimensional features space and highly feature redundancy and to improve the intelligibility of data mining results, a new wrapper method of feature selection based on random forest variables importance measures and support vector machine was proposed. The proposed method combined sequence backward searching approach and sequence forward searching approach. Feature selection starts with the entire set of features in the dataset. At every iteration, two feature subsets are gained. One feature subset removes those most unimportant features and the most important feature at the same time, which is used to train random forest and to compute feature importance for next feature selection. Another feature subset removes only those most unimportant features while remains the most important feature, which is used as the optimal feature subset to train SVM classifier. Finally, the feature subset with the highest SVM classification accuracy was regarded as optimal feature subset. The experimental results on 11 UCI datasets, a real clinical data sets and a gene expression dataset show that the proposed algorithm can generate the smaller feature subset while improve the classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Qi, Y.: Random Forest for Bioinformatics. In: Ensemble Machine Learning, pp. 307–323 (2012)
Inza, I., Larranaga, P., Blanco, R.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine 31(2), 91–103 (2008)
Tsymbal, A., Puuronen, S.: Ensemble feature selection with the simple Bayesian classification. Information Fusion 4(2), 87–100 (2010)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Bishop, C.M.: Bootstrap. Pattern Recognition and Machine Learning. Springer, Singapore (2006)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L., Friedman, J.H., Olshen, R.A., et al.: Classification and Regression Trees. Chapman&Hall (1993)
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinformatics 9, 307 (2008)
Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: A survey and results of new tests. Pattern Recognition 44, 330–349 (2011)
Liu, H., Li, J.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2012)
Wang, A., Wan, G., Cheng, Z., et al.: Incremental Learning Extremely Random Forest Classifier for Online Learning. Journal of Software 22(9), 2059–2074 (2011)
DÃaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Pang, H., George, S.L., Hui, K., Tong, T.: Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(5), 1422–1431 (2012)
Dessì, N., Milia, G., Pes, B.: Pre-filtering Features in Random Forests for Microarray Data Classification. In: New Frontiers in Mining Complex Patterns (NFMCP 2012). vol. 60 (2012)
Anaissi, A., Kennedy, P.J., Goyal, M., Catchpoole, D.R.: A balanced iterative random forest for gene selection from microarray data. BMC Bioinformatics 14, 261 (2013)
Yi, C., Li, J., Zhu, C.: A kind of feature selection based on classification accuracy of SVM. Journal of Shandong University 45(7), 119–124 (2010)
UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/
Torgo, L.: Data Mining with R: Learning with Case Studies. Luis Chapman & Hall/CRC (2010)
Jiang, S., Zheng, Q., Zhang, Q.: Clustering-Based Feature Selection. Acta Electronica Sinica 36(12), 157–160 (2008)
Liu, Y., Wang, G., Zhu, X.: Feature selection based on adaptive multi-population genetic algorithm. Journal of Jilin University 41(6), 1690–1693 (2011)
Zhang, J., He, Z., Wang, J.: Hybrid Feature Selection Algorithm Based on Adaptive Ant Colony Algorithm. Journal of System Simulation 21(6), 1605–1614 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, J., Yao, D., Zhan, X., Zhan, X. (2014). Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine. In: Basu, M., Pan, Y., Wang, J. (eds) Bioinformatics Research and Applications. ISBRA 2014. Lecture Notes in Computer Science(), vol 8492. Springer, Cham. https://doi.org/10.1007/978-3-319-08171-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-08171-7_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08170-0
Online ISBN: 978-3-319-08171-7
eBook Packages: Computer ScienceComputer Science (R0)