ABSTRACT
Recently, research using medical big data to predict patients with high probability of disease are receiving a lot of attention. Due to the advancement of artificial intelligence, continuous research is essential in that diseases can be predicted only by computational numbers and can be prevented before they occur. Therefore, machine learning and deep learning research using medical big data for disease prediction are actively progressing. Due to the nature of medical data, diseases are rare, so there is a tendency to oversampling or under sampling that can lead to information distortion. Also, given that most machine learning-based research is based on certain predictive models, there is a risk that the predictions themselves will reflect the biases that exist. So, if you generalize the data your model will train on, or adjust the model's bias, you can get better predictions. In this white paper, we use diabetes, heart disease, and breast cancer data through several individual classifiers to get predicted values and use them as training data for one meta-model to get the final predictions. That is, by constructing a stacking ensemble model, the presence or absence of a disease is predicted, and its performance is analysed through experiments. This model trains multiple classifiers on the same data, so there is a possibility that the model will overfit the data. So, when training multiple classifiers, we compare the model with and without cross validation. In the experiment, the model using cross-validation for training showed an average of 1.4% higher performance than that of the individual single model. On the other hand, the meta-model without cross-validation shows lower performance than that of individual single models. In other words, when constructing a stacking ensemble model, high performance could be obtained only by essentially cross-validating individual single classifiers. Performing one final prediction on the predicted values of high-performance individual models will yield more stable and reliable predictions. The cross-learning-based cumulative ensemble model proposed in this paper predicts the presence or absence of a disease and can be used for medical service development and disease prevention.
- Lee Seunghee, Kim Jongyeop. (2020). Artificial intelligence technology trends based on medical big data. Journal of the Korean Association of Telecommunications (Information and Communications), 37 (9), 85--91.Google Scholar
- Ko Seungwan, Kang Hyuntae, Oh Youngtaek, Park Jae-ho and Heo Ui-nam (2018). A disease prescription prediction model using medical big data. Journal of Academic Announcement of the Korean Society of Information Sciences, 2216--2218.Google Scholar
- Huang Uiwon, Choi Sungwoon, Ha Heonseok and Yun Seong-ro (2017). Prediction of disease from electronic health record data using a generative antagonistic neural network. Journal of Academic Announcement of the Korean Society of Information Sciences, 808--810.Google Scholar
- Uhm Haneul, Kim Jaesung, Choi Sangok (2020). Machine learning-based verification and policy suggestions of corporate default risk prediction model: Focused on improvements through the Stacking Ensemble model. Intelligence Information Research, 26(2), 105--129.Google Scholar
- Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1--15). Springer, Berlin, Heidelberg. Google ScholarDigital Library
- Lee Soo-eun, Kim Han-joon. (2020). A new ensemble learning technique with multiple stacking. Journal of the Korea Electronic Trade Association, 25(3) and 1-13.Google Scholar
- Dietterich, T. (1995). Overfitting and undercomputing in machine learning. ACM computing surveys (CSUR), 27(3), 326--327. Google ScholarDigital Library
- Tang, J., S. Alelyani, and H. Liu. (2015). Data Classification: Algorithms and Applications. Data Mining and Knowledge Discovery Series, CRC Press, 498--500.Google Scholar
- Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association, 92(438), 548--560.Google Scholar
- Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning, 40(2), 139--157. Google ScholarDigital Library
- Syarif, I., Zaluska, E., Prugel-Bennett, A., & Wills, G. (2012, July). Application of bagging, boosting and stacking to intrusion detection. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 593--602). Springer, Berlin, Heidelberg. Google ScholarDigital Library
- MLXTEND, http://rasbt.github.io/mlxtend/Google Scholar
- Wolpert, David H. (1992). Stacked generalization. Neural networks 5.2, 241--259. Google ScholarDigital Library
- Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18--22.Google Scholar
- Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293--300. Google ScholarDigital Library
- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems (pp. 3146--3154). Google ScholarDigital Library
- Kaggle DataSets, https://www.kaggle.com/uciml/breast-cancer-wisconsin-dataGoogle Scholar
- Kaggle DataSets, https://www.kaggle.com/uciml/pima-indians-diabetes-databaseGoogle Scholar
- Kaggle DataSets, https://www.kaggle.com/ronitf/heart-disease-uciGoogle Scholar
- Kaggle DataSets, Kaggle DataSets, https://www.kaggle.com/datasetsGoogle Scholar
- UCI Repository, https://archive.ics.uci.edu/ml/index.phpGoogle Scholar
- Scikit-Learn, https://scikit-learn.org/stable/Google Scholar
Index Terms
- Efficient healthcare service based on Stacking Ensemble
Recommendations
A mixed-ensemble model for hospital readmission
A mixed-ensemble model for hospital readmission is proposed.The mixed-ensemble model enables controlling the tradeoff between reasoning transparency and predictive accuracy.The mixed-ensemble model increases the classification accuracy for positive ...
Optimized stacking ensemble models for the prediction of diabetic progression
AbstractThe influence of applied machine learning in our day-to-day life has seen significant improvement over the last few years. The use of machine learning in Artificial Intelligence to predict various aspects of human life has helped industries in ...
A stacking-based ensemble learning method for earthquake casualty prediction
AbstractThe estimation of the loss and prediction of the casualties in earthquake-stricken areas are vital for making rapid and accurate decisions during rescue efforts. The number of casualties is determined by various factors, necessitating ...
Highlights- Construct a comprehensive feature set for the earthquake casualty prediction.
- ...
Comments