Abstract
Starting from medical big data, this article uses data mining technology to analyze and study the pathogenic factors of lung cancer based on the lung cancer electronic medical record data from the oncology department of the authoritative third grade A hospital for many years. With respect to the processing of huge data from electronic medical records for lung cancer, traditional serial Apriori algorithm has the disadvantages of scanning database frequently, running slowly and consuming large amount of memory resources. Therefore, an improved Apriori algorithm based on MapReduce distributed computing model of Hadoop platform is proposed. The experimental cluster and lung cancer data mining experiments show that the improved Apriori algorithm has higher execution efficiency and good system scalability in dealing with lung cancer big data, and can well mine the relationship between lung cancer and pathogenic factors, which has important guiding significance for assisting the clinical diagnosis and risk prediction of lung cancer.
Similar content being viewed by others
References
Stewart, B. W., & Wild, C. P. (2014). World Cancer report. Geneva: WHO Press.
Chen, W., Li, H., & Sun, K. (2018). Report of cancer incidence and mortality in china, 2014. Chinese Journal of Oncology, 40(1), 5–13.
Freddie, B., Jacques, F., Isabelle, S., Rebecca, L. S., Lindsey, A. T., & Ahmedin, J. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6), 394–424.
Li, W., Zhao, S., & Liu, L. (2017). The methods and clinical significance of early diagnosis of lung cancer. Journal of Sichuan University: Medical Science Edition, 48(3), 331–335.
Sagawa, M., Kobayashi, T., Uotani, C., Kibe, Y., Tanaka, M., Machida, Y., Motono, N., Maeda, S., & Usuda, K. (2015). A survey about further work-up for cases with positive sputum cytology during lung cancer mass screening in Ishikawa Prefecture, Japan: a retrospective analysis about quality assurance of lung cancer screening. Japanese Journal of Clinical Oncology, 45(3), 297–302.
Naib, M., & Chhabra, A. (2014). Predicting primary tumors using multiclass classifier approach of data mining. International Journal of Computer Applications, 96(8), 9–13.
Qiu, M., Jia, Z., Xue, C., Shao, Z., & Sha, E. (2007). Voltage assignment with guaranteed probability satisfying timing constraint for real-time multiproceesor DSP. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 46(1), 55–73.
Shaikh, A. R., Butte, A. J., Schully, S. D., William, D. S., Muin, K. J., & Bradford, H. W. (2014). Collaborative biomedicine in the age of big data: The case of cancer. Journal of medical Internet research, 16, e101.
Qiu, M., Zhang, K., & Huang, M. (2004). An empirical study of web interface design on small display devices. Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence WI, 2004, 29–35.
Zhou, Y., Xiang, M., & Li, T. (2016). Application of big medical data in cancer diagnosis and treatment. Journal of International Oncology, 43(1), 75–78.
Lu, S., Jia, X., Zhang, N., & Lu, Y. (2020). Application of clinical database for tumor studies in the era of big data. Infectious Disease Information, 311(4), 301–306.
Liao, L., & Yu, H. (2017). Research and design of medical mega data analysis system based on Hadoop. Computer Systems & Applications, 26(4), 49–53.
Gai, K., Qiu, M., & Zhao, H. (2016). Security-aware efficient mass distributed storage approach for cloud systems in big data. 2nd IEEE International Conference on Big Data Security on Cloud, IEEE BigDataSecurity 2016, 2nd IEEE International Conference on High Performance and Smart Computing, IEEE HPSC 2016 and IEEE International Conference on Intelligent Data and Security, IEEE IDS 2016 (pp. 140–145).
Eckstein, J., & Bertsekas, D. P. (1992). On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55, 293–318.
He, B., Liao, I., Han, D., & Yang, H. (2002). A new inexact alternating directions method for monotone variational inequalities. Mathematical Programming, 92(1), 103–118.
Dai, W., Qiu, M., Qiu, L., Chen, L., & Wu, A. (2017). Who moved my data? privacy protection in smartphones. IEEE Communications Magazine, 55(1), 20–25.
Ye, C., & Yuan, X. (2007). A descent method for structured monotone variational inequalities. Optimization Methods Software, 22(2), 329–338.
He, B., Tao, M., & Yuan, X. (2012). Alternating direction method with Gaussian back substitution for Separable convex programming. SIAM J. Optimization, 22(2), 313–340.
He, B. (2009). Parallel splitting augmented Lagrangian methods for monotone structured variational inequalities computational. Optimization and Applications, 42(2), 195–212.
Cai, X., Gu, G., He, B., & Yuan, X. (2013). A proximal point algorithm revisit on the alternating direction method of multipliers. Science China Mathematics, 56(10), 2179–2186.
Tao, M., & Yuan, X. (2012). An inexact parallel splitting augmented Lagrangian method for monotone variational inequalities with separable structures. Computational Optimization and Applications, 52(2), 439–461.
Kiranmayee, B. V., Rajinikanth, T. V., & Nagini, S. (2017). A novel data mining approach for brain tumour detection. International Conference on Contemporary Computing and Informatics, IEEE, 46–50.
Hao, H., & Ma, Y. (2016). Using Aprion algorithm to implement association rule mining technology of large-scale database. Modern Electronic Technology, 39(7), 124–126.
Yue, G. (2019). Big data mining algorithm based on association rules. Microelectronics & Computer, 36(4), 105–108.
Fu, S., & Zhou, H. (2013). Research and improvement of Apriori algorithm for minim association rules. Microelectronics & Computer, 30(9), 110–114.
Yang, Q., Zhang, Y., Zhang, Q., & Yuan, P. (2019). Research and application of a multidimensional association rules mining algorithm based on Hadoop. Computer Engineering and Science, 41(12), 2127–2133.
Zhang, Y., & Wang, C. (2017). Association rule mining algorithm based on related interest measure. Journal of Nanjing University of Posts and Telecommunications(Natural Science), 37(5), 87–93.
Zhang, M., & Li, Y. (2018). Association rules analysis method of spatial data under MapReduce framework. Journal of System Simulation, 30(3), 840–845.
Li, Q., Chen, D., & Luo, X. (2019). Implementation of the association rule algorithm in medical big data. Software Engineer, 22(1), 12–15.
Agrawal, R., & Srikant, R. (1994). Fast Algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, 487–499.
Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. Proceedings of the 21th International Conference on Very Large Data Bases, 432–444.
Zhou, F., Wang, Z., Ye, F., & Deng, L. (2015). Research on association rules mining algorithm Apriori improvement. Computer Science and Exploration, 9(09), 1075–1083.
Zheng, L. (2014). A partition Apriori algorithm for generating frequent itemsets. Computer applications and software, 31(04), 297–301, 326.
Li, W., Liu, G., Meng, X., & Zhang, Z. (2016). Optimization and application of Apriori algorithm based on MapReduce in medical big data. Journal of Beijing Normal University (Natural Science), 52(4), 420–424.
Song, C. (2016). Research of association rule algorithm based on data mining. 2016 IEEE International Conference On IEEE Big Data Analysis (ICBDA) (pp. 1–4).
Liu, L. (2017). Research and application of improved Apriori algorithm. Computer Engineering and Design, 38(12), 3324–3328.
Jin, J. (2016). Research of data-aiming mining algorithm in cloud environment. International Journal of Grid and Distributed Computing, 9(4), 87–94.
Woo, J., & Xu, Y. (2013). Market basket analysis algorithm with Map/Reduce of cloud computing. Natural Sciences, 3(6), 355–466.
Yahya, O., Hegazy, O., & Ezat, E. (2012). An efficient implementation of Apriori algorithm based on Hadoop-Map Reduce model. International Journal of Reviews in Computing, 12, 59–67.
Hu, J., Zhao, W., & Fang, A. (2020). Research on clinical text processing and knowledge discovery method based on medical big data. China Digital Medicine, 15(7), 11–13, 88.
Wu, C., Zheng, H., Zhang, Y., & Zhu, J. (2016). Data cleaning of medical data and knowledge mining. Journal of Anhui University (Natural Sciences), 40(1), 23–29.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, H., Liu, H., Chen, J. et al. Data Mining and Risk Prediction Based on Apriori Improved Algorithm for Lung Cancer. J Sign Process Syst 93, 795–809 (2021). https://doi.org/10.1007/s11265-021-01663-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-021-01663-1