Skip to main content
Log in

Data Mining and Risk Prediction Based on Apriori Improved Algorithm for Lung Cancer

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Starting from medical big data, this article uses data mining technology to analyze and study the pathogenic factors of lung cancer based on the lung cancer electronic medical record data from the oncology department of the authoritative third grade A hospital for many years. With respect to the processing of huge data from electronic medical records for lung cancer, traditional serial Apriori algorithm has the disadvantages of scanning database frequently, running slowly and consuming large amount of memory resources. Therefore, an improved Apriori algorithm based on MapReduce distributed computing model of Hadoop platform is proposed. The experimental cluster and lung cancer data mining experiments show that the improved Apriori algorithm has higher execution efficiency and good system scalability in dealing with lung cancer big data, and can well mine the relationship between lung cancer and pathogenic factors, which has important guiding significance for assisting the clinical diagnosis and risk prediction of lung cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Stewart, B. W., & Wild, C. P. (2014). World Cancer report. Geneva: WHO Press.

    Google Scholar 

  2. Chen, W., Li, H., & Sun, K. (2018). Report of cancer incidence and mortality in china, 2014. Chinese Journal of Oncology, 40(1), 5–13.

    Google Scholar 

  3. Freddie, B., Jacques, F., Isabelle, S., Rebecca, L. S., Lindsey, A. T., & Ahmedin, J. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6), 394–424.

    Google Scholar 

  4. Li, W., Zhao, S., & Liu, L. (2017). The methods and clinical significance of early diagnosis of lung cancer. Journal of Sichuan University: Medical Science Edition, 48(3), 331–335.

    Google Scholar 

  5. Sagawa, M., Kobayashi, T., Uotani, C., Kibe, Y., Tanaka, M., Machida, Y., Motono, N., Maeda, S., & Usuda, K. (2015). A survey about further work-up for cases with positive sputum cytology during lung cancer mass screening in Ishikawa Prefecture, Japan: a retrospective analysis about quality assurance of lung cancer screening. Japanese Journal of Clinical Oncology, 45(3), 297–302.

    Article  Google Scholar 

  6. Naib, M., & Chhabra, A. (2014). Predicting primary tumors using multiclass classifier approach of data mining. International Journal of Computer Applications, 96(8), 9–13.

    Article  Google Scholar 

  7. Qiu, M., Jia, Z., Xue, C., Shao, Z., & Sha, E. (2007). Voltage assignment with guaranteed probability satisfying timing constraint for real-time multiproceesor DSP. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 46(1), 55–73.

    Article  Google Scholar 

  8. Shaikh, A. R., Butte, A. J., Schully, S. D., William, D. S., Muin, K. J., & Bradford, H. W. (2014). Collaborative biomedicine in the age of big data: The case of cancer. Journal of medical Internet research, 16, e101.

    Article  Google Scholar 

  9. Qiu, M., Zhang, K., & Huang, M. (2004). An empirical study of web interface design on small display devices. Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence WI, 2004, 29–35.

    Google Scholar 

  10. Zhou, Y., Xiang, M., & Li, T. (2016). Application of big medical data in cancer diagnosis and treatment. Journal of International Oncology, 43(1), 75–78.

    Google Scholar 

  11. Lu, S., Jia, X., Zhang, N., & Lu, Y. (2020). Application of clinical database for tumor studies in the era of big data. Infectious Disease Information, 311(4), 301–306.

    Google Scholar 

  12. Liao, L., & Yu, H. (2017). Research and design of medical mega data analysis system based on Hadoop. Computer Systems & Applications, 26(4), 49–53.

    Google Scholar 

  13. Gai, K., Qiu, M., & Zhao, H. (2016). Security-aware efficient mass distributed storage approach for cloud systems in big data. 2nd IEEE International Conference on Big Data Security on Cloud, IEEE BigDataSecurity 2016, 2nd IEEE International Conference on High Performance and Smart Computing, IEEE HPSC 2016 and IEEE International Conference on Intelligent Data and Security, IEEE IDS 2016 (pp. 140–145).

  14. Eckstein, J., & Bertsekas, D. P. (1992). On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55, 293–318.

    Article  MathSciNet  Google Scholar 

  15. He, B., Liao, I., Han, D., & Yang, H. (2002). A new inexact alternating directions method for monotone variational inequalities. Mathematical Programming, 92(1), 103–118.

    Article  MathSciNet  Google Scholar 

  16. Dai, W., Qiu, M., Qiu, L., Chen, L., & Wu, A. (2017). Who moved my data? privacy protection in smartphones. IEEE Communications Magazine, 55(1), 20–25.

    Article  Google Scholar 

  17. Ye, C., & Yuan, X. (2007). A descent method for structured monotone variational inequalities. Optimization Methods Software, 22(2), 329–338.

    Article  MathSciNet  Google Scholar 

  18. He, B., Tao, M., & Yuan, X. (2012). Alternating direction method with Gaussian back substitution for Separable convex programming. SIAM J. Optimization, 22(2), 313–340.

    Article  MathSciNet  Google Scholar 

  19. He, B. (2009). Parallel splitting augmented Lagrangian methods for monotone structured variational inequalities computational. Optimization and Applications, 42(2), 195–212.

    Article  MathSciNet  Google Scholar 

  20. Cai, X., Gu, G., He, B., & Yuan, X. (2013). A proximal point algorithm revisit on the alternating direction method of multipliers. Science China Mathematics, 56(10), 2179–2186.

    Article  MathSciNet  Google Scholar 

  21. Tao, M., & Yuan, X. (2012). An inexact parallel splitting augmented Lagrangian method for monotone variational inequalities with separable structures. Computational Optimization and Applications, 52(2), 439–461.

    Article  MathSciNet  Google Scholar 

  22. Kiranmayee, B. V., Rajinikanth, T. V., & Nagini, S. (2017). A novel data mining approach for brain tumour detection. International Conference on Contemporary Computing and Informatics, IEEE, 46–50.

  23. Hao, H., & Ma, Y. (2016). Using Aprion algorithm to implement association rule mining technology of large-scale database. Modern Electronic Technology, 39(7), 124–126.

    Google Scholar 

  24. Yue, G. (2019). Big data mining algorithm based on association rules. Microelectronics & Computer, 36(4), 105–108.

    Google Scholar 

  25. Fu, S., & Zhou, H. (2013). Research and improvement of Apriori algorithm for minim association rules. Microelectronics & Computer, 30(9), 110–114.

    Google Scholar 

  26. Yang, Q., Zhang, Y., Zhang, Q., & Yuan, P. (2019). Research and application of a multidimensional association rules mining algorithm based on Hadoop. Computer Engineering and Science, 41(12), 2127–2133.

    Google Scholar 

  27. Zhang, Y., & Wang, C. (2017). Association rule mining algorithm based on related interest measure. Journal of Nanjing University of Posts and Telecommunications(Natural Science), 37(5), 87–93.

    Google Scholar 

  28. Zhang, M., & Li, Y. (2018). Association rules analysis method of spatial data under MapReduce framework. Journal of System Simulation, 30(3), 840–845.

    Google Scholar 

  29. Li, Q., Chen, D., & Luo, X. (2019). Implementation of the association rule algorithm in medical big data. Software Engineer, 22(1), 12–15.

    Google Scholar 

  30. Agrawal, R., & Srikant, R. (1994). Fast Algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, 487–499.

  31. Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. Proceedings of the 21th International Conference on Very Large Data Bases, 432–444.

  32. Zhou, F., Wang, Z., Ye, F., & Deng, L. (2015). Research on association rules mining algorithm Apriori improvement. Computer Science and Exploration, 9(09), 1075–1083.

    Google Scholar 

  33. Zheng, L. (2014). A partition Apriori algorithm for generating frequent itemsets. Computer applications and software, 31(04), 297–301, 326.

    Google Scholar 

  34. Li, W., Liu, G., Meng, X., & Zhang, Z. (2016). Optimization and application of Apriori algorithm based on MapReduce in medical big data. Journal of Beijing Normal University (Natural Science), 52(4), 420–424.

    Google Scholar 

  35. Song, C. (2016). Research of association rule algorithm based on data mining. 2016 IEEE International Conference On IEEE Big Data Analysis (ICBDA) (pp. 1–4).

  36. Liu, L. (2017). Research and application of improved Apriori algorithm. Computer Engineering and Design, 38(12), 3324–3328.

    Google Scholar 

  37. Jin, J. (2016). Research of data-aiming mining algorithm in cloud environment. International Journal of Grid and Distributed Computing, 9(4), 87–94.

    Article  Google Scholar 

  38. Woo, J., & Xu, Y. (2013). Market basket analysis algorithm with Map/Reduce of cloud computing. Natural Sciences, 3(6), 355–466.

    Google Scholar 

  39. Yahya, O., Hegazy, O., & Ezat, E. (2012). An efficient implementation of Apriori algorithm based on Hadoop-Map Reduce model. International Journal of Reviews in Computing, 12, 59–67.

    Google Scholar 

  40. Hu, J., Zhao, W., & Fang, A. (2020). Research on clinical text processing and knowledge discovery method based on medical big data. China Digital Medicine, 15(7), 11–13, 88.

    Google Scholar 

  41. Wu, C., Zheng, H., Zhang, Y., & Zhu, J. (2016). Data cleaning of medical data and knowledge mining. Journal of Anhui University (Natural Sciences), 40(1), 23–29.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Guo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, H., Liu, H., Chen, J. et al. Data Mining and Risk Prediction Based on Apriori Improved Algorithm for Lung Cancer. J Sign Process Syst 93, 795–809 (2021). https://doi.org/10.1007/s11265-021-01663-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-021-01663-1

Keywords

Navigation