Skip to main content

Advertisement

Log in

An Imbalanced Learning based MDR-TB Early Warning System

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

As a man-made disease, multidrug-resistant tuberculosis (MDR-TB) is mainly caused by improper treatment programs and poor patient supervision, most of which could be prevented. According to the daily treatment and inspection records of tuberculosis (TB) cases, this study focuses on establishing a warning system which could early evaluate the risk of TB patients converting to MDR-TB using machine learning methods. Different imbalanced sampling strategies and classification methods were compared due to the disparity between the number of TB cases and MDR-TB cases in historical data. The final results show that the relative optimal predictions results can be obtained by adopting CART-USBagg classification model in the first 90 days of half of a standardized treatment process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Balganesh, T. S., Alzari, P. M., and Cole, S. T., Rising standards for tuberculosis drug development. Trends Pharmacol. Sci. 29(11):576–581, 2008.

    Article  CAS  PubMed  Google Scholar 

  2. Brondi, L., Falzon, D., Sismanidis, C., Glaziou, P., Zignol, M., Floyd, K., Campbell, H., and Nair, H., The global risk of dying from multidrug-resistant tuberculosis (mdr-tb). Eur. Respir. J. 44:1694, 2014. 58.

    Google Scholar 

  3. Dalton, T., Cegielski, P., Akksilp, S., Asencios, L., Caoili, J. C., Cho, S.-N., Erokhin, V. V., Ershova, J., Gler, M. T., and Kazennyy, B. Y., Prevalence of and risk factors for resistance to second-line drugs in people with multidrug-resistant tuberculosis in eight countries: a prospective cohort study. Lancet 380(9851): 1406–1417, 2012.

    Article  CAS  PubMed  Google Scholar 

  4. Nelson, K. E., and Williams, C. M., Infectious disease epidemiology: theory and practice Jones & Bartlett Publishers (2014)

  5. Goker, I., Osman, O., Ozekes, S., Baslo, M. B., Ertas, M., and Ulgen, Y., Classification of juvenile myoclonic epilepsy data acquired through scanning electromyography with machine learning algorithms. J. Med. Syst. 36(5):2705–2711, 2012.

    Article  PubMed  Google Scholar 

  6. Daren, C., Chen, Y., Linchih, C., Hsu, M., and Chiang, K., A machine learning method for power prediction on the mobile devices. J. Med. Syst. 39(10):1–11, 2015.

    Google Scholar 

  7. Rodrigues, J., Reis, N., Moutinho, J., and Torre, I., Breast alert: an on-line tool for predicting the lifetime risk of women breast cancer. J. Med. Syst. 36(3):1417–1424, 2012.

    Article  PubMed  Google Scholar 

  8. Tierney, W. M., Murray, M. D., Gaskins, D. L., and Zhou, X.-H., Using computer-based medical records to predict mortality risk for inner-city patients with reactive airways disease. J. Am. Med. Inform. Assoc. 4(4):313–321, 1997.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Johnson, S. B., Generic data modeling for clinical repositories. J. Am. Med. Inform. Assoc. 3:328–339, 1996.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Fung, K. Y., Krewski, D., Chen, Y., Burnett, R., and Cakmak, S., Comparison of time series and case-crossover analyses of air pollution and hospital admission data. Int. J. Epidemiol. 32(6):1064–1070, 2003.

    Article  PubMed  Google Scholar 

  11. Fuller, J. A., Stanton, J. M., Fisher, G. G., Spitzmüller, C., Russell, S. S., and Smith, P. C., A lengthy look at the daily grind: time series analysis of events, mood, stress, and satisfaction. J. Appl. Psychol. 88(6):1019, 2003.

    Article  PubMed  Google Scholar 

  12. Chan, Y., Biostatistics 201: linear regression analysis. Age (years) 80:140, 2004.

    Google Scholar 

  13. Dinç, E., Linear regression analysis and its application to the multivariate spectral calibrations for the multiresolution of a ternary mixture of caffeine, paracetamol and metamizol in tablets. J. Pharm. Biomed. Anal. 33 (4):605–615, 2003.

    Article  PubMed  Google Scholar 

  14. Chen, J., Communicating complex information: the interpretation of statistical interaction in multiple logistic regression analysis. Am. J. Public Health 93(9):1376, 2003.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Jinzhong, C., Research on a grey model for disease surveillema and forecast. J. Xiamen University (National Science) 1(1):121–126, 1995.

    Google Scholar 

  16. Lai, Y., HONG, F., and ZENG, X., Grey forcaset to epidemic tendency of hiv/aids in shenzhen, Modern Preventive Medicine, 3 (2003)

  17. Lianxin, H., Yanyan, C., Jie, L., Jian, D., BeiLing, Z., ShuJuan, S., and ZhiTao, Y., Application of grey model to forecast incidence trend of intestinal infectious diseases. Dis. Surv. 24(2):135–136, 2009.

    Google Scholar 

  18. Er, O., Temurtas, F., and Tanrıkulu, A. Ç., Tuberculosis disease diagnosis using artificial neural networks. J. Med. Syst. 34(3):299–302, 2010.

    Article  PubMed  Google Scholar 

  19. akr, A., and Demirel, B., A software tool for determination of breast cancer treatment methods using data mining approach. J. Med. Syst. 35(6):1503–11, 2011.

    Article  Google Scholar 

  20. Koyuncugil, A. S., and Ozgulbas, N., Early warning system for financially distressed hospitals via data mining application. J. Med. Syst. 36(4):2271–87, 2011.

    Article  PubMed  Google Scholar 

  21. Tang, B., and He, H., ENN: Extended Nearest neighbor method for pattern recognition [research frontier]. IEEE Comput. Intell. Mag. 10(3):52–60, 2015.

    Article  Google Scholar 

  22. Ozcift, A., Svm feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.

    Article  PubMed  Google Scholar 

  23. Wang, Q., Zhu, W., and Wang, B., Three-dimensional svm with latent variable: application for detection of lung lesions in ct images. J. Med. Syst. 39(1):1–8, 2015.

    Google Scholar 

  24. Loon, K. V., Guiza, F., Meyfroidt, G., Aerts, J., Ramon, J., Blockeel, H., Bruynooghe, M., Berghe, G. V. D., and Berckmans, D., Prediction of clinical conditions after coronary bypass surgery using dynamic data analysis. J. Med. Syst. 34(3):229–39 , 2010.

    Article  PubMed  Google Scholar 

  25. Keltch, B., Lin, Y., and Bayrak, C., Comparison of ai techniques for prediction of liver fibrosis in hepatitis patients. J. Med. Syst. 38(8):1–8, 2014.

    Article  Google Scholar 

  26. Zhengchao, F., Xiaojun, L., Pei, Z., Evaluation of new information management model of tb prevention and control in yichang. Chin. J. Antibiot. 36(5):346–349, 2014.

    Google Scholar 

  27. Quinlan, J. R., Induction of decision trees. Mach. Learn. 1(1):81–106, 1986.

    Google Scholar 

  28. Safavian, S. R., and Landgrebe, D., A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3):660–674, 1991.

    Article  Google Scholar 

  29. Salzberg, S. L., and Segre, A., Review of c4.5: Programs for machine learning by j. ross quinlan. Mach. Learn. 16(3):235–240, 1994.

    Google Scholar 

  30. lewis, R. J., An introduction to classification and regression tree (cart) analysis. In: Annual Meeting of the Society for Academic Emergency Medicine, pp. 1–14 (2000)

  31. Loh, W.-Y., Classification and regression trees. Wiley Interdisciplinary Reviews: Data Min. Knowl. Disc. 1 (1):14–23 , 2011.

    Google Scholar 

  32. Rokach, L., and Maimon, O., Top-down induction of decision trees classifiers-a survey. IEEE Trans. Syst. Man Cybern. 35(4):476–487, 2005.

    Article  Google Scholar 

  33. Deng, H., Runger, G., and Tuv, E., Bias of importance measures for multi-valued attributes and solutions, Artificial Neural Networks and Machine Learning, 293–300 (2011)

  34. Polikar, R., Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3):21–45, 2006.

    Article  Google Scholar 

  35. Rokach, L., Ensemble-based classifiers. Artif. Intell. Rev. 33(1-2):1–39, 2010.

    Article  Google Scholar 

  36. Kuncheva, L. I., and Whitaker, C. J., Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2):181–207, 2003.

    Article  Google Scholar 

  37. Brown, G., Wyatt, J., Harris, R., and Yao, X., Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1):5–20 , 2005.

    Article  Google Scholar 

  38. Ho, T. K., Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, Vol. 1, pp. 278–282 (1995)

  39. Gashler, M., Giraud-Carrier, C., Martinez, T., Decision tree ensemble: Small heterogeneous is better than large homogeneous. In: Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 900–905 (2008)

  40. Breiman, L., Bagging predictors. Mach. Learn. 24(2):123–140, 1996.

    Google Scholar 

  41. Aslam, J. A., Popa, R. A., and Rivest, R. L., On estimating the size and confidence of a statistical audit. In: Proceedings of the Usenix/Accurate Electronic Voting Technology Workshop, pp. 1–12 (2007)

  42. Schwenker, F., Ensemble methods: Foundations and algorithms. Comput. Intell. Mag. 8(1):77–79, 2013.

    Article  Google Scholar 

  43. Kearns, M., Thoughts on hypothesis boosting. Unpublished Manuscr. 45:105, 1988.

    Google Scholar 

  44. Schapire, R. E., The strength of weak learnability. Mach. Learn. 5(2):197–227, 1990.

    Google Scholar 

  45. Mason, L., Baxter, J., Bartlett, P., and Frean, M., Boosting algorithms as gradient descent. Adv. Neural Inf. Proces. Syst. 12:512–518, 2000.

    Google Scholar 

  46. chawla, N. V., Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge Discovery Handbook, pp. 875–886 (2010)

  47. Rahman, M. M., and Davis, D., Addressing the class imbalance problem in medical datasets. Int. J. Mach. Learn. Comput. 3(2):224–228, 2013.

    Article  Google Scholar 

  48. Liu, X., Wu, J., Zhou, Z., Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. 39(2):539–550, 2009.

    Article  Google Scholar 

  49. He Haibo, G. E., Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9):1263–1284, 2009.

    Article  Google Scholar 

  50. El Saadi, H., Al Sadek, A. F., and Fakhr, M. W., Informed under-sampling for enhancing patient specific epileptic seizure detection. Int. J. Comput. Appl. 16:57, 2012.

    Google Scholar 

  51. Tang, B., and He, H., KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning. In: IEEE Congress on Evolutionary Computation (CEC), pp. 664–671 (2015)

  52. Dittman, D., Khoshgoftaar, T. M., Wald, R., and Napolitano, A.: Random forest: A reliable tool for patient response prediction (2011)

  53. Liu, T., Easyensemble and feature selection for imbalance data sets. In: Proceedings of the International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, pp. 517–520 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Li.

Additional information

This article is part of the Topical Collection on Systems-Level Quality Improvement

This study is supported by Chinese Scholarship Council, No: 201407085024.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Tang, B. & He, H. An Imbalanced Learning based MDR-TB Early Warning System. J Med Syst 40, 164 (2016). https://doi.org/10.1007/s10916-016-0517-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-016-0517-2

Keywords

Navigation