Abstract
Risk prediction models for colorectal cancer play an important role to identify people at higher risk of developing this disease as well as the risk factors associated with it. Feature selection techniques help to improve the prediction model performance and to gain insight in the data itself. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the most relevant features. This work assesses several feature ranking algorithms in terms of performance and robustness for a set of risk prediction models. Experimental results demonstrate that stability and model performance should be studied jointly as RF turned out to be the most stable algorithm but outperformed by others in terms of model performance while SVM-wrapper and the Pearson correlation coefficient are moderately stable while achieving good model performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ferlay, J., Soerjomataram, I., Ervik, M., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D., Forman, D., Bray, F.: Cancer incidence and mortality. International Agency for Research on Cancer (2012)
Center, M., Jemal, A., Ward, E.: International trends in colorectal cancer incidence rates. Cancer Epidemiol Biomarkers Prev. (2009)
Hu, X., Feng, F., Li, X., Yuan, P., Luan, R., Yan, J., Liu, W., Yang, Y.: Gene polymorphisms related to insulin resistance and gene-environment interaction in colorectal cancer risk. Ann. Hum. Biol. 42, 560–568 (2015)
Ouakrim, D.A., Pizot, C., Boniol, M., Malvezzi, M., Boniol, M., Negri, E., Bota, M., Jenkins, M.A., Bleiberg, H., Autier, P.: Trends in colorectal cancer mortality in Europe: retrospective analysis of the who mortality database. BMJ 351 (2015)
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Victo, G., Raj, V.C.: Review on feature selection techniques and the impact of SVM for cancer classification using gene expression profile. CoRR (2011)
Wang, H., Khoshgoftaar, T.M., Napolitano, A.: Stability of filter- and wrapper-based software metric selection techniques. In: Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, pp. 309–314 (2014)
Guzmán-Martínez, R., Alaiz-Rodríguez, R.: Feature selection stability assessment based on the Jensen-Shannon divergence. In: Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I, pp. 597–612, Springer, Heidelberg (2011)
Pes, B., Dess, N., Angioni, M.: Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf. Fusion 35, 132–147 (2017)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer, New York (2006)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)
Guyon, I., Gunn, S., Hur, A.B., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS 2004, Cambridge, MA, USA, pp. 545–552. MIT Press (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, Burlington (1999)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques, pp. 313–325. Springer, Heidelberg (2008)
Ibáñez-Sanz, G., Díez-Villanueva, A., Alonso, M.H., Rodríguez-Moranta, F., Pérez-Gómez, B., Bustamante, M., Martin, V., Llorca, J., Amiano, P., Ardanaz, E., Tardón, A., Jiménez-Moleón, J.J., Peiró, R., Alguacil, J., Navarro, C., Guinó, E., Binefa, G., Navarro, P.F., Espinosa, A., Dávila-Batista, V., Molina, A.J., Palazuelos, C., Castaño-Vinyals, G., Aragonés, N., Kogevinas, M., Pollán, M., Moreno, V.: Risk model for colorectal cancer in spanish population using environmental and genetic factors: results from the MCC-Spain study. Scientific Reports, vol. 7, p. 43263, February 2017. EP
Castano-Vinyals, G., Aragonés, N., Pérez-Gómez, B., Martín, V., Llorca, J., Moreno, V.: Population-based multicase-control study in common tumors in Spain (MCC-Spain): rationale and study design. Gac. Sanit. (2015)
Acknowledgements
This study has been possible due to data provided by MCC-Spain. The authors thank all those who took part in this study providing questionnaire data and genotyping samples.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Cueto-López, N., Alaiz-Rodríguez, R., García-Ordás, M.T., González-Donquiles, C., Martín, V. (2018). Assessing Feature Selection Techniques for a Colorectal Cancer Prediction Model. In: Pérez García, H., Alfonso-Cendón, J., Sánchez González, L., Quintián, H., Corchado, E. (eds) International Joint Conference SOCO’17-CISIS’17-ICEUTE’17 León, Spain, September 6–8, 2017, Proceeding. SOCO ICEUTE CISIS 2017 2017 2017. Advances in Intelligent Systems and Computing, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-67180-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-67180-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67179-6
Online ISBN: 978-3-319-67180-2
eBook Packages: EngineeringEngineering (R0)