Abstract
Understanding condition severity, as extracted from Electronic Health Records (EHRs), is important for many public health purposes. Methods requiring physicians to annotate condition severity are time-consuming and costly. Previously, a passive learning algorithm called CAESAR was developed to capture severity in EHRs. This approach required physicians to label conditions manually, an exhaustive process. We developed a framework that uses two Active Learning (AL) methods (Exploitation and Combination_XA) to decrease manual labeling efforts by selecting only the most informative conditions for training. We call our approach CAESAR-Active Learning Enhancement (CAESAR-ALE). As compared to passive methods,CAESAR-ALE’s first AL method, Exploitation, reduced labeling efforts by 64% and achieved an equivalent true positive rate, while CAESAR-ALE’s second AL method, Combination_XA, reduced labeling efforts by 48% and achieved equivalent accuracy. In addition, both these AL methods outperformed the traditional AL method (SVM-Margin). These results demonstrate the potential of AL methods for decreasing the labeling efforts of medical experts, while achieving greater accuracy and lower costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Stang, P.E., Ryan, P.B., Racoosin, J.A., et al.: Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann. Intern. Med. 153(9), 600–606 (2010)
Kho, A.N., Pacheco, J.A., Peissig, P.L., et al.: Electronic medical records for genetic research: results of the eMERGE consortium. Science Translational Medicine 3(79), 79re1 (2011)
Denny, J.C., Ritchie, M.D., Basford, M.A., et al.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics 26(9), 1205–1210 (2010)
Boland, M.R., Hripcsak, G., Shen, Y., Chung, W.K., Weng, C.: Defining a comprehensive verotype using electronic health records for personalized medicine. J. Am. Med. Inform. Assoc. 20(e2), e232–e238 (2013)
Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013)
Hripcsak, G., Knirsch, C., Zhou, L., Wilcox, A., Melton, G.B.: Bias associated with mining electronic health records. Journal of Biomedical Discovery and Collaboration 6, 48 (2011)
Hripcsak, G., Albers, D.J.: Correlating electronic health record concepts with healthcare process events. J. Am. Med. Inform. Assoc. 20(e2), e311–e318 (2013)
Rich, P., Scher, R.K.: Nail psoriasis severity index: a useful tool for evaluation of nail psoriasis. Journal of the American Academy of Dermatology 49(2), 206–212 (2003)
Bastien, C.H., Vallières, A., Morin, C.M.: Validation of the Insomnia Severity Index as an outcome measure for insomnia research. Sleep Medicine 2(4), 297–307 (2001)
McLellan, A.T., Kushner, H., Metzger, D., et al.: The fifth edition of the Addiction Severity Index. Journal of Substance Abuse Treatment 9(3), 199–213 (1992)
Rockwood, T.H., Church, J.M., Fleshman, J.W., et al.: Patient and surgeon ranking of the severity of symptoms associated with fecal incontinence. Diseases of the Colon & Rectum 42(12), 1525–1531 (1999)
Horn, S.D., Horn, R.: Reliability and validity of the severity of illness index. Medical Care 24(2), 159–178 (1986)
Boland, M.R., Tatonetti, N., Hripcsak, G.: CAESAR: A classification approach for extracting severity automatically from electronic health records. In: Intelligent Systems for Molecular Biology Phenotype Day, Boston, MA, pp. 1–8 (2014) (in Press)
Elkin, P.L., Brown, S.H., Husser, C.S., et al.: Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. In: Mayo Clinic Proceedings, pp. 741–748. Elsevier (2006)
Stearns, M.Q., Price, C., Spackman, K.A., Wang, A.: SNOMED clinical terms: overview of the development process and project status. In: Proceedings of the AMIA Symposium 2001, p. 662. American Medical Informatics Association (2001)
Elhanan, G., Perl, Y., Geller, J.: A survey of SNOMED CT direct users, 2010: impressions and preferences regarding content and quality. Journal of the American Medical Informatics Association 18(suppl. 1), i36–i44 (2011)
Moskovitch, R., Shahar, Y.: Vaidurya–a concept-based, context-sensitive search engine for clinical guidelines. American Medical Informatics Association (2004)
HCUP Chronic Condition Indicator for ICD-9-CM. Healthcare Cost and Utilization Project (HCUP) (2011), http://www.hcup-us.ahrq.gov/toolssoftware/chronic/chronic.jsp (accessed on February 25, 2014)
Hwang, W., Weller, W., Ireys, H., Anderson, G.: Out-of-pocket medical spending for care of chronic conditions. Health Affairs 20(6), 267–278 (2001)
Chi, M.-J., Lee, C.-Y., Wu, S.-C.: The prevalence of chronic conditions and medical expenditures of the elderly by chronic condition indicator (CCI). Archives of Gerontology and Geriatrics 52(3) (2011)
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association 21(2), 231–237 (2014)
Perotte, A., Hripcsak, G.: Temporal properties of diagnosis code time series in aggregate. IEEE Journal of Biomedical and Health Informatics 17(2), 477–483 (2013)
Torii, M., Wagholikar, K., Liu, H.: Using machine learning for concept extraction on clinical documents from multiple data sources. Journal of the American Medical Informatics Association (June 27, 2011)
Nguyen, A.N., Lawley, M.J., Hansen, D.P., et al.: Symbolic rule-based classification of lung cancer stages from free-text pathology reports. Journal of the American Medical Informatics Association 17(4), 440–445 (2010)
Nissim, N., Moskovitch, R., Rokach, L., Elovici, Y.: Novel active learning methods for enhanced PC malware detection in windows OS. Expert Systems with Applications 41(13), 5843–5857 (2014)
Nissim, N., Moskovitch, R., Rokach, L., Elovici, Y.: Detecting unknown computer worm activity via support vector machines and active learning. Pattern Analysis and Applications 15, 459–475 (2012)
Nissim, N., Cohen, A., Glezer, C., Elovici, Y.: Detection of malicious PDF files and directions for enhancements: A state-of-the art survey. Computers & Security 48, 246–266 (2015)
Angluin, D.: Queries and concept learning. Machine Learning 2, 319–342 (1988)
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer (1994)
Liu, Y.: Active learning with support vector machine applied to gene expression data for cancer classification. Journal of Chemical Information and Computer Sciences 44(6), 1936–1941 (2004)
Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., Lemmen, C.: Active learning with support vector machines in the drug discovery process. Journal of Chemical Information and Computer Sciences 43(2), 667–673 (2003)
Figueroa, R.L., Zeng-Treitler, Q., Ngo, L.H., Goryachev, S., Wiechmann, E.P.: Active learning for clinical text classification: is it better than random sampling? Journal of the American Medical Informatics Association (2011), 2012:amiajnl-2011-000648
Nguyen, D.H., Patrick, J.D.: Supervised machine learning and active learning in classification of radiology reports. Journal of the American Medical Informatics Association (2014)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2000-2001)
Ralf, H., Graepel, T., Campbell, C.: Bayes point machines. The Journal of Machine Learning Research 1, 245–279 (2001)
Nissim, N., Moskovitch, R., Rokach, L., Elovici, Y.: Novel active learning methods for enhanced pc malware detection in Windows OS. Expert Systems With Applications 41(13) (2014)
Nissim, N., Moskovitch, R., Rokach, L., Elovici, Y.: Detecting unknown computer worm activity via support vector machines and active learning. Pattern Analysis and Applications 15(4), 459–475 (2012)
Moskovitch, R., Nissim, N., Elovici, Y.: Malicious code detection using active learning. In: ACM SIGKDD Workshop in Privacy, Security and Trust in KDD, Las Vegas (2008)
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Japkowicz, N., Elovici, Y.: Unknown malcode detection and the imbalance problem. Journal in Computer Virology 5(4) (2009)
Nissim, N., Cohen, A., Moskovitch, R., et al.: ALPD: Active learning framework for enhancing the detection of malicious PDF files aimed at organizations. In: Proceedings of JISIC (2014)
Baram, Y., El-Yaniv, R., Luz, K.: Online choice of active learning algorithms. Journal of Machine Learning Research 5, 255–291 (2004)
Herman R. 72 Statistics on Hourly Physician Compensation (2013), http://www.beckershospitalreview.com/compensation-issues/72-statistics-on-hourly-physician-compensation.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nissim, N. et al. (2015). An Active Learning Framework for Efficient Condition Severity Classification. In: Holmes, J., Bellazzi, R., Sacchi, L., Peek, N. (eds) Artificial Intelligence in Medicine. AIME 2015. Lecture Notes in Computer Science(), vol 9105. Springer, Cham. https://doi.org/10.1007/978-3-319-19551-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-19551-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19550-6
Online ISBN: 978-3-319-19551-3
eBook Packages: Computer ScienceComputer Science (R0)