Skip to main content

Assessing Feature Selection Techniques for a Colorectal Cancer Prediction Model

  • Conference paper
  • First Online:
International Joint Conference SOCO’17-CISIS’17-ICEUTE’17 León, Spain, September 6–8, 2017, Proceeding (SOCO 2017, ICEUTE 2017, CISIS 2017)

Abstract

Risk prediction models for colorectal cancer play an important role to identify people at higher risk of developing this disease as well as the risk factors associated with it. Feature selection techniques help to improve the prediction model performance and to gain insight in the data itself. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the most relevant features. This work assesses several feature ranking algorithms in terms of performance and robustness for a set of risk prediction models. Experimental results demonstrate that stability and model performance should be studied jointly as RF turned out to be the most stable algorithm but outperformed by others in terms of model performance while SVM-wrapper and the Pearson correlation coefficient are moderately stable while achieving good model performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ferlay, J., Soerjomataram, I., Ervik, M., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D., Forman, D., Bray, F.: Cancer incidence and mortality. International Agency for Research on Cancer (2012)

    Google Scholar 

  2. Center, M., Jemal, A., Ward, E.: International trends in colorectal cancer incidence rates. Cancer Epidemiol Biomarkers Prev. (2009)

    Google Scholar 

  3. Hu, X., Feng, F., Li, X., Yuan, P., Luan, R., Yan, J., Liu, W., Yang, Y.: Gene polymorphisms related to insulin resistance and gene-environment interaction in colorectal cancer risk. Ann. Hum. Biol. 42, 560–568 (2015)

    Google Scholar 

  4. Ouakrim, D.A., Pizot, C., Boniol, M., Malvezzi, M., Boniol, M., Negri, E., Bota, M., Jenkins, M.A., Bleiberg, H., Autier, P.: Trends in colorectal cancer mortality in Europe: retrospective analysis of the who mortality database. BMJ 351 (2015)

    Google Scholar 

  5. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)

    Article  Google Scholar 

  6. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  7. Victo, G., Raj, V.C.: Review on feature selection techniques and the impact of SVM for cancer classification using gene expression profile. CoRR (2011)

    Google Scholar 

  8. Wang, H., Khoshgoftaar, T.M., Napolitano, A.: Stability of filter- and wrapper-based software metric selection techniques. In: Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, pp. 309–314 (2014)

    Google Scholar 

  9. Guzmán-Martínez, R., Alaiz-Rodríguez, R.: Feature selection stability assessment based on the Jensen-Shannon divergence. In: Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I, pp. 597–612, Springer, Heidelberg (2011)

    Google Scholar 

  10. Pes, B., Dess, N., Angioni, M.: Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf. Fusion 35, 132–147 (2017)

    Article  Google Scholar 

  11. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer, New York (2006)

    Book  MATH  Google Scholar 

  12. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)

    Article  Google Scholar 

  13. Guyon, I., Gunn, S., Hur, A.B., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS 2004, Cambridge, MA, USA, pp. 545–552. MIT Press (2004)

    Google Scholar 

  14. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, Burlington (1999)

    Google Scholar 

  15. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques, pp. 313–325. Springer, Heidelberg (2008)

    Google Scholar 

  16. Ibáñez-Sanz, G., Díez-Villanueva, A., Alonso, M.H., Rodríguez-Moranta, F., Pérez-Gómez, B., Bustamante, M., Martin, V., Llorca, J., Amiano, P., Ardanaz, E., Tardón, A., Jiménez-Moleón, J.J., Peiró, R., Alguacil, J., Navarro, C., Guinó, E., Binefa, G., Navarro, P.F., Espinosa, A., Dávila-Batista, V., Molina, A.J., Palazuelos, C., Castaño-Vinyals, G., Aragonés, N., Kogevinas, M., Pollán, M., Moreno, V.: Risk model for colorectal cancer in spanish population using environmental and genetic factors: results from the MCC-Spain study. Scientific Reports, vol. 7, p. 43263, February 2017. EP

    Google Scholar 

  17. Castano-Vinyals, G., Aragonés, N., Pérez-Gómez, B., Martín, V., Llorca, J., Moreno, V.: Population-based multicase-control study in common tumors in Spain (MCC-Spain): rationale and study design. Gac. Sanit. (2015)

    Google Scholar 

Download references

Acknowledgements

This study has been possible due to data provided by MCC-Spain. The authors thank all those who took part in this study providing questionnaire data and genotyping samples.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rocío Alaiz-Rodríguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Cueto-López, N., Alaiz-Rodríguez, R., García-Ordás, M.T., González-Donquiles, C., Martín, V. (2018). Assessing Feature Selection Techniques for a Colorectal Cancer Prediction Model. In: Pérez García, H., Alfonso-Cendón, J., Sánchez González, L., Quintián, H., Corchado, E. (eds) International Joint Conference SOCO’17-CISIS’17-ICEUTE’17 León, Spain, September 6–8, 2017, Proceeding. SOCO ICEUTE CISIS 2017 2017 2017. Advances in Intelligent Systems and Computing, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-67180-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67180-2_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67179-6

  • Online ISBN: 978-3-319-67180-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics