Abstract
Breast cancer is the most common cancer to females worldwide. Using machine learning technology to predict breast-cancer patients’ survivability has drawn a lot of research interest. However, it still faces many issues, such as missing-value imputation. As such, the main objective of this paper is to develop a novel imputation algorithm, inspired by the recommendation system. More precisely, features with missing values are regarded as items to be evaluated for recommendation.
Consequently, a matrix factorisation algorithm (Alternating Least Square, ALS) is employed to replace missing values; accordingly, four different prediction strategies based on the ALS result are further discussed. The proposed ALS-based imputation algorithm is evaluated by using a large patient dataset from the Surveillance, Epidemiology, and End Results (SEER) program. Experimental results demonstrates a significant improvement on the survivability prediction, compared to existing methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34, 113–127 (2005)
Liu, Y.Q., Wang, C., Zhang, L.: Neural network based models for predicting breast cancer survivability. Chin. J. Biomed. Eng. 28, 221–225 (2009)
Solti, D., Zhai, H.: Predicting breast cancer patient survival using machine learning. In: ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, BCB 2013, pp. 704–705. ACM (2013)
Lang, K.M., Little, T.D.: Principled missing data treatments. Prev. Sci. 19, 284–294 (2018). https://doi.org/10.1007/s11121-016-0644-5
Surveillance, Epidemiology, and End Results. http://www.seer.cancer.gov
McGale, P., et al.: Effect of radiotherapy after mastectomy and axillary surgery on 10-year recurrence and 20-year breast cancer mortality: meta-analysis of individual patient data for 8135 women in 22 randomised trials. Lancet (London) 383, 2127–2135 (2014). https://doi.org/10.1016/S0140-6736(14)60488-8
Jia, Y., Sun, C., Liu, Z., Wang, W., Zhou, X.: Primary breast diffuse large B-cell lymphoma: a population-based study from 1975 to 2014. Oncotarget 9, 3956–3967 (2018)
Agarwal, S., Pappas, L., Agarwal, J.: Association between unilateral or bilateral mastectomy and breast cancer death in patients with unilateral ductal carcinoma. Cancer Manag. Res. 9, 649–656 (2017)
Webb-Robertson, B.J.M., et al.: Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 14, 1993–2001 (2015). https://doi.org/10.1021/pr501138h
Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016). https://doi.org/10.1016/j.ins.2015.11.005
Brock, G.N., Shaffer, J.R., Blakesley, R.E., Lotz, M.J., Tseng, G.C.: Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinf. 9, 1–12 (2008). https://doi.org/10.1186/1471-2105-9-12
Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015). https://doi.org/10.1093/molbev/msu300
Abaei, G., Selamat, A., Fujita, H.: An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl.-Based Syst. 74, 28–39 (2015). https://doi.org/10.1016/j.knosys.2014.10.017
Shukla, N., Hagenbuchner, M., Win, K.T., Yang, J.: Breast cancer data analysis for survivability studies and prediction. Comput. Methods Programs Biomed. 155, 199–208 (2018). https://doi.org/10.1016/j.cmpb.2017.12.011
Yamaguchi, Y., Misumi, T., Maruo, K.: A comparison of multiple imputation methods for incomplete longitudinal binary data. J. Biopharm. Stat. 28, 645–667 (2018). https://doi.org/10.1080/10543406.2017.1372772
Bian, Y., Li, H.: Recommendation system based on trusted relation transmission. In: 12th International Conference Intelligent Systems and Knowledge Engineering (ISKE), pp. 1–8. IEEE, November 2017. https://doi.org/10.1109/ISKE.2017.8258843
Nguyen, J., Zhu, M.: Content boosted matrix factorization techniques for recommender systems. Stat. Anal. Data Min.: ASA Data Sci. J. 6, 286–301 (2013). https://doi.org/10.1002/sam.11184
Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68880-8_32
Yang, J., Ma, J.: A structure optimization framework for feed-forward neural networks using sparse representation. Knowl.-Based Syst. 109, 61–70 (2016)
Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_15
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, Q., Yang, J., Win, K.T., Huang, X. (2019). An Alternating Least Square Based Algorithm for Predicting Patient Survivability. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_24
Download citation
DOI: https://doi.org/10.1007/978-981-13-6661-1_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6660-4
Online ISBN: 978-981-13-6661-1
eBook Packages: Computer ScienceComputer Science (R0)