Abstract
Determining whether correct disease codes are included in discharge summaries is important for hospital management because submission of medical receipts with incorrect disease codes can result in loss of insurance reimbursement. Because medical information managers in large hospitals must evaluate more than 1000 summaries per month, an automated determination of discharge summaries will reduce their workload, allowing information managers to focus on complicated cases. This paper proposes a method of constructing classifiers of discharge summaries. In the first step, morphological analysis generated a term matrix from text data extracted from the hospital information system. Subsequently, important keywords were selected from an analysis of correspondence, training examples were generated, and machine learning methods were applied to the training examples. Several machine learning methods were compared using discharge summaries stored in the information system of Shimane University Hospital. A random forest method was found to be the best classifier when compared with deep learning, SVM and decision tree methods. Furthermore, the random forest method had a classification accuracy greater than 90%.
Similar content being viewed by others
Notes
Outpatient clinics utilize action-based payment systems, even in large hospitals.
The method can also generate \(p (p\ge 3)\)-dimensional coordinates. However, higher dimensional coordinates did not provide better performance than the experiments shown below.
Darch was removed from R package. Please check the github: https://github.com/maddin79/darch.
Two-fold cross-validation was selected because its estimator resulted in the lowest estimate of parameters, such as accuracy, as well as minimizing estimates of bias.
DPC codes are a three-level hierarchical system, with each DPC code defined as a tree. The first level denotes the type of disease, the second level denotes the primary treatment selected for that patient, and the third-level shows any additional therapy. Thus, in the tables, characteristics of codes were representative of similarities.
References
Discharge summary, http://medical-dictionary.thefreedictionary.com/discharge+summary. Accessed Feb 14, 2021
Deáth, G. (1999). Principal curves: A new technique for indirect and direct gradient analysis. Ecology, 80(7), 2237–2253.
Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502–516.
IgakuTsushinsha (ed.) (2020). Quick Reference of DPC points (in Japanese). IgakuTsushinsha, Tokyo
Ishida, M. (2016). Rmecab. http://rmecab.jp/wiki/index.php?RMeCabFunctions
JONES, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21.
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab - an S4 package for kernel methods in R. Journal of Statistical Software, 11(9), 1–20. http://www.jstatsoft.org/v11/i09/
Kim, J. H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53(11), 3735–3745. https://doi.org/10.1016/j.csda.2009.04.009.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22. http://CRAN.R-project.org/doc/Rnews/
Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4), 309–317.
Mares, M. A., Wang, S., & Guo, Y. (2016). Combining multiple feature selection methods and deep learning for high-dimensional data. Transactions on Machine Learning and Data Mining, 9, 27–45.
Nezhad, M. Z., Zhu, D., Li, X., Yang, K., & Levy, P. (2017). SAFS: A deep feature selection approach for precision medicine. arXiv:1704.05960
Podani, J., & Miklós, I. (2002). Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology, 83(12), 3331–3343.
Therneau, T. M., & Atkinson, E. J. (2015). An Introduction to Recursive Partitioning Using the RPART Routines. https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S. Springer, New York, 4th edn. http://www.stats.ox.ac.uk/pub/MASS4, iSBN 0-387-95457-0
Acknowledgements
This research was supported by a Grant-in-Aid for Scientific Research (B) 18H03289 from the Japan Society for the Promotion of Science(JSPS).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported by a Grant-in-Aid for Scientific Research (B) 18H03289 from the Japan Society for the Promotion of Science (JSPS). On behalf of all authors, the corresponding author states that there are no conflicts of interest.
Rights and permissions
About this article
Cite this article
Tsumoto, S., Kimura, T. & Hirano, S. Determination of Disease from Discharge Summaries. Rev Socionetwork Strat 15, 49–66 (2021). https://doi.org/10.1007/s12626-021-00076-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-021-00076-7