skip to main content
10.1145/3440943.3444727acmconferencesArticle/Chapter ViewAbstractPublication PagesiceaConference Proceedingsconference-collections
research-article

Efficient healthcare service based on Stacking Ensemble

Authors Info & Claims
Published:27 September 2021Publication History

ABSTRACT

Recently, research using medical big data to predict patients with high probability of disease are receiving a lot of attention. Due to the advancement of artificial intelligence, continuous research is essential in that diseases can be predicted only by computational numbers and can be prevented before they occur. Therefore, machine learning and deep learning research using medical big data for disease prediction are actively progressing. Due to the nature of medical data, diseases are rare, so there is a tendency to oversampling or under sampling that can lead to information distortion. Also, given that most machine learning-based research is based on certain predictive models, there is a risk that the predictions themselves will reflect the biases that exist. So, if you generalize the data your model will train on, or adjust the model's bias, you can get better predictions. In this white paper, we use diabetes, heart disease, and breast cancer data through several individual classifiers to get predicted values and use them as training data for one meta-model to get the final predictions. That is, by constructing a stacking ensemble model, the presence or absence of a disease is predicted, and its performance is analysed through experiments. This model trains multiple classifiers on the same data, so there is a possibility that the model will overfit the data. So, when training multiple classifiers, we compare the model with and without cross validation. In the experiment, the model using cross-validation for training showed an average of 1.4% higher performance than that of the individual single model. On the other hand, the meta-model without cross-validation shows lower performance than that of individual single models. In other words, when constructing a stacking ensemble model, high performance could be obtained only by essentially cross-validating individual single classifiers. Performing one final prediction on the predicted values of high-performance individual models will yield more stable and reliable predictions. The cross-learning-based cumulative ensemble model proposed in this paper predicts the presence or absence of a disease and can be used for medical service development and disease prevention.

References

  1. Lee Seunghee, Kim Jongyeop. (2020). Artificial intelligence technology trends based on medical big data. Journal of the Korean Association of Telecommunications (Information and Communications), 37 (9), 85--91.Google ScholarGoogle Scholar
  2. Ko Seungwan, Kang Hyuntae, Oh Youngtaek, Park Jae-ho and Heo Ui-nam (2018). A disease prescription prediction model using medical big data. Journal of Academic Announcement of the Korean Society of Information Sciences, 2216--2218.Google ScholarGoogle Scholar
  3. Huang Uiwon, Choi Sungwoon, Ha Heonseok and Yun Seong-ro (2017). Prediction of disease from electronic health record data using a generative antagonistic neural network. Journal of Academic Announcement of the Korean Society of Information Sciences, 808--810.Google ScholarGoogle Scholar
  4. Uhm Haneul, Kim Jaesung, Choi Sangok (2020). Machine learning-based verification and policy suggestions of corporate default risk prediction model: Focused on improvements through the Stacking Ensemble model. Intelligence Information Research, 26(2), 105--129.Google ScholarGoogle Scholar
  5. Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1--15). Springer, Berlin, Heidelberg. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lee Soo-eun, Kim Han-joon. (2020). A new ensemble learning technique with multiple stacking. Journal of the Korea Electronic Trade Association, 25(3) and 1-13.Google ScholarGoogle Scholar
  7. Dietterich, T. (1995). Overfitting and undercomputing in machine learning. ACM computing surveys (CSUR), 27(3), 326--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tang, J., S. Alelyani, and H. Liu. (2015). Data Classification: Algorithms and Applications. Data Mining and Knowledge Discovery Series, CRC Press, 498--500.Google ScholarGoogle Scholar
  9. Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association, 92(438), 548--560.Google ScholarGoogle Scholar
  10. Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning, 40(2), 139--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Syarif, I., Zaluska, E., Prugel-Bennett, A., & Wills, G. (2012, July). Application of bagging, boosting and stacking to intrusion detection. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 593--602). Springer, Berlin, Heidelberg. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. MLXTEND, http://rasbt.github.io/mlxtend/Google ScholarGoogle Scholar
  13. Wolpert, David H. (1992). Stacked generalization. Neural networks 5.2, 241--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18--22.Google ScholarGoogle Scholar
  15. Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems (pp. 3146--3154). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kaggle DataSets, https://www.kaggle.com/uciml/breast-cancer-wisconsin-dataGoogle ScholarGoogle Scholar
  18. Kaggle DataSets, https://www.kaggle.com/uciml/pima-indians-diabetes-databaseGoogle ScholarGoogle Scholar
  19. Kaggle DataSets, https://www.kaggle.com/ronitf/heart-disease-uciGoogle ScholarGoogle Scholar
  20. Kaggle DataSets, Kaggle DataSets, https://www.kaggle.com/datasetsGoogle ScholarGoogle Scholar
  21. UCI Repository, https://archive.ics.uci.edu/ml/index.phpGoogle ScholarGoogle Scholar
  22. Scikit-Learn, https://scikit-learn.org/stable/Google ScholarGoogle Scholar

Index Terms

  1. Efficient healthcare service based on Stacking Ensemble

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ACM ICEA '20: Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications
      December 2020
      219 pages
      ISBN:9781450383042
      DOI:10.1145/3440943

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 September 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader