Skip to main content

Prediction Model of Breast Cancer Based on mRMR Feature Selection

  • Conference paper
  • First Online:
Book cover Neural Information Processing (ICONIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

  • 2228 Accesses

Abstract

In real life, there are a lot of unbalanced data, and there are great differences in the data volume in category distribution, especially in the medical data where this problem is more prominent because of the prevalence rate. In this paper, the P-mRMR algorithm is proposed based on the mRMR algorithm to improve the feature selection process of unbalance data, and to process the attributes with more missing values and integrate the missing values into feature selection while selecting features specific to the characteristics of more missing values in the data set, so as to reduce the complexity of the data pre-processing. In the experiments, the AUC, confusion matrix and probability of missing value are used to compare the algorithms. The experiment shows that the features selected by the improved algorithm have better results in the classifiers.

Supported by National Key R&D Program of China (No. 2018YFC0810601), National Key R&D Program of China (No. 2016YFC0901303), National Natural Science Foundation of China (No. 61977005).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)

    Article  Google Scholar 

  2. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)

    Article  Google Scholar 

  3. Bolón-Canedo, V., Seth, S., Sánchez-Maroño, N., Alonso-Betanzos, A., Principe, J.C.: Statistical dependence measure for feature selection in microarray datasets. In: European Symposium on ESANN (2012)

    Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2011)

    MATH  Google Scholar 

  5. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. AM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  6. Chen, C., Breiman, L.: Using random forest to learn imbalanced data. University of California, Berkeley (2004)

    Google Scholar 

  7. Chen, H., Li, T., Fan, X., Luo, C.: Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 483, 1–20 (2019)

    Article  Google Scholar 

  8. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(6), 1157–1182 (2003)

    MATH  Google Scholar 

  9. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  10. Li, A., Wang, R., Xu, L.: Shrink: a breast cancer risk assessment model based on medical social network. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1189–1196. IEEE (2017)

    Google Scholar 

  11. Li, D.C., Liu, C.W., Hu, S.C.: A learning method for the class imbalance problem with medical data sets. Comput. Biol. Med. 40(5), 509–518 (2010)

    Article  Google Scholar 

  12. Li, J., et al.: Feature selection: a data perspective. AM Comput. Surv. 50(6), 1–45 (2016)

    Google Scholar 

  13. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B 39(2), 539–550 (2009)

    Article  Google Scholar 

  14. Mafarja, M.M., Mirjalili, S.: Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput. 23(15), 6249–6265 (2018). https://doi.org/10.1007/s00500-018-3282-y

    Article  Google Scholar 

  15. Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014)

    Article  Google Scholar 

  16. Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017)

    Article  Google Scholar 

  17. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. A Syst. Hum. 40(1), 185–197 (2010)

    Article  Google Scholar 

  18. Urbanowicz, R.J., Melissa, M., La, C.W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2017)

    Article  Google Scholar 

  19. Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)

    Article  Google Scholar 

  20. Yan-Xia, L.I., Yi, C., You-Qiang, H.U., Hong-Peng, Y.: Review of imbalanced data classification methods. Control Decis. 34(04), 673–688 (2019)

    Google Scholar 

  21. Yin, L., Ge, Y., Xiao, K., Wang, X., Quan, X.: Feature selection for high-dimensional imbalanced data. Neurocomput. 105, 3–11 (2013)

    Article  Google Scholar 

  22. Zhang, C., Wang, G., Zhou, Y., Yao, L., Wang, X.: Feature selection for high dimensional imbalanced class data based on f-measure optimization. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 278–283. IEEE(2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Di, J., Shi, Z. (2020). Prediction Model of Breast Cancer Based on mRMR Feature Selection. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63820-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63819-1

  • Online ISBN: 978-3-030-63820-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics