Prediction Model of Breast Cancer Based on mRMR Feature Selection

Di, Junwen; Shi, Zhiguo

doi:10.1007/978-3-030-63820-7_4

Junwen Di¹¹ &
Zhiguo Shi¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2228 Accesses

Abstract

In real life, there are a lot of unbalanced data, and there are great differences in the data volume in category distribution, especially in the medical data where this problem is more prominent because of the prevalence rate. In this paper, the P-mRMR algorithm is proposed based on the mRMR algorithm to improve the feature selection process of unbalance data, and to process the attributes with more missing values and integrate the missing values into feature selection while selecting features specific to the characteristics of more missing values in the data set, so as to reduce the complexity of the data pre-processing. In the experiments, the AUC, confusion matrix and probability of missing value are used to compare the algorithms. The experiment shows that the features selected by the improved algorithm have better results in the classifiers.

Supported by National Key R&D Program of China (No. 2018YFC0810601), National Key R&D Program of China (No. 2016YFC0901303), National Natural Science Foundation of China (No. 61977005).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Article Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Article Google Scholar
Bolón-Canedo, V., Seth, S., Sánchez-Maroño, N., Alonso-Betanzos, A., Principe, J.C.: Statistical dependence measure for feature selection in microarray datasets. In: European Symposium on ESANN (2012)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2011)
MATH Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. AM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Chen, C., Breiman, L.: Using random forest to learn imbalanced data. University of California, Berkeley (2004)
Google Scholar
Chen, H., Li, T., Fan, X., Luo, C.: Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 483, 1–20 (2019)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(6), 1157–1182 (2003)
MATH Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Li, A., Wang, R., Xu, L.: Shrink: a breast cancer risk assessment model based on medical social network. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1189–1196. IEEE (2017)
Google Scholar
Li, D.C., Liu, C.W., Hu, S.C.: A learning method for the class imbalance problem with medical data sets. Comput. Biol. Med. 40(5), 509–518 (2010)
Article Google Scholar
Li, J., et al.: Feature selection: a data perspective. AM Comput. Surv. 50(6), 1–45 (2016)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B 39(2), 539–550 (2009)
Article Google Scholar
Mafarja, M.M., Mirjalili, S.: Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput. 23(15), 6249–6265 (2018). https://doi.org/10.1007/s00500-018-3282-y
Article Google Scholar
Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014)
Article Google Scholar
Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017)
Article Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. A Syst. Hum. 40(1), 185–197 (2010)
Article Google Scholar
Urbanowicz, R.J., Melissa, M., La, C.W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2017)
Article Google Scholar
Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)
Article Google Scholar
Yan-Xia, L.I., Yi, C., You-Qiang, H.U., Hong-Peng, Y.: Review of imbalanced data classification methods. Control Decis. 34(04), 673–688 (2019)
Google Scholar
Yin, L., Ge, Y., Xiao, K., Wang, X., Quan, X.: Feature selection for high-dimensional imbalanced data. Neurocomput. 105, 3–11 (2013)
Article Google Scholar
Zhang, C., Wang, G., Zhou, Y., Yao, L., Wang, X.: Feature selection for high dimensional imbalanced class data based on f-measure optimization. In: 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 278–283. IEEE(2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Junwen Di & Zhiguo Shi

Authors

Junwen Di
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiguo Shi .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Di, J., Shi, Z. (2020). Prediction Model of Breast Cancer Based on mRMR Feature Selection. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_4
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics