Abstract
Vector space models (VSMs) are widely used as information retrieval methods and have been adapted to many applications. In this paper, we propose a novel use of VSMs for classification and retrieval of longitudinal electronic medical record data. These data contain sequences of clinical events that are based on treatment decisions, but the treatment plan is not recorded with the events. The goals of our VSM methods are (1) to identify which plan a specific patient treatment sequence best matches and (2) to find patients whose treatment histories most closely follow a specific plan. We first build a traditional VSM that uses standard terms corresponding to the events found in clinical plans and treatment histories. We also consider temporal terms that represent binary relationships of precedence between or co-occurrence of these events. We create four alternative VSMs that use different combinations of standard and temporal terms as dimensions, and we evaluate their performance using manually annotated data on chemotherapy plans and treatment histories for breast cancer patients. In classifying treatment histories, the best approach used temporal terms, which had 87 % accuracy in identifying the correct clinical plan. For information retrieval, our results showed that the traditional VSM performed best. Our results indicate that VSMs have good performance for classification and retrieval of longitudinal electronic medical records, but the results depend on how the model is constructed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Syed, H., Das, A.K.: Identifying chemotherapy regimens in electronic health record data using interval-encoded sequence alignment. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) AIME 2015. LNCS, vol. 9105, pp. 143–147. Springer, Heidelberg (2015)
Syed, H., Das, A.K.: Temporal Needleman–Wunsch. In: Proceedings of 2015 IEEE/ACM International Conference on Data Science and Advanced Analytics (DSAA 2015) (2015)
Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Inc., Upper Saddle River (1971)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). doi:10.1145/361219.361220
Suzuki, T., Yokoi, H., Fujita, S., Takabayashi, K.: Automatic DPC code selection from electronic medical records: text mining trial of discharge summary. Methods Inf. Med. 47(6), 541–548 (2008)
Prados-Suárez, B., Molina, C., Peña, Y.C., de Reyes, M.P.: Improving electronic health records retrieval using contexts. Expert Syst. Appl. 39(10), 8522–8536 (2012)
Hauskrecht, M., Valko, M., Batal, I., Clermont, G., Visweswaran, S., Cooper, G.F.: Conditional outlier detection for clinical alerting. In: AMIA Annual Symposium Proceedings/AMIA Symposium 2010, pp. 286–290 (2010)
Jain, H., Thao, C., Zhao, H.: Enhancing electronic medical record retrieval through semantic query expansion. ISeB 10(2), 165–181 (2012)
Mao, W., Chu, W.W.: The phrase-based vector space model for automatic retrieval of free-text medical documents. Data Knowl. Eng. 61(1), 76–92 (2007)
Mao, W., Chu, W.W.: Free-text medical document retrieval via phrase-based vector space model. In: Proceedings/AMIA Annual Symposium, AMIA Symposium 2002, pp. 489–493 (2002)
Hassanpour, S., O’Connor, M.J., Das, A.K.: Evaluation of semantic-based information retrieval methods in the autism phenotype domain. In: AMIA Annual Symposium Proceedings/AMIA Symposium, pp. 569–577 (2011)
Hassanpour, S., O’Connor, M.J., Das, A.K.: A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain. J. Biomed. Semant. 4(1), 14 (2013)
Mondal, D., Gangopadhyay, A., Russell, W.: Medical decision making using vector space model. In: Proceedings of the 1st ACM International Health Informatics Symposium (IHI 2010), pp. 386–390. ACM, New York (2010)
Pôssas, B., Ziviani, N., Meira, Jr. W.: Enhancing the set-based model using proximity information. In: Laender, A.H., Oliveir, A.L., (eds.) (SPIRE 2002). LNCS, vol. 2476, pp. 104–116. Springer, Heidelberg (2002)
Pôssas, B., Ziviani, N., Meira, W.J., Ribeiro-Neto, B.: Set-based model: a new approach for information retrieval. In: SIGIR 2002, pp. 230–237 (2002)
Silva, I.R., Souza, J.A.N., Santos, K.S.: Dependence among terms in vector space model. In: Proceedings of the International Database Engineering and Applications Symposium (IDEAS 2004), pp. 97–102 (2004)
Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12(2), 299–321 (1987)
Wong, S.K.M., Ziarko W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: SIGIR 85 Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25 (1985)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983)
Carlson, R.W., Allred, D.C., Anderson, B.O., et al.: Invasive breast cancer. J. Natl. Compr. Cancer Netw. 9(2), 136–222 (2011)
National Comprehensive Cancer Network. Breast cancer. NCCN Clinical Practice Guidelines in Oncology, version 1.2012 (Accessed from the web) (2012)
National Comprehensive Cancer Network. Breast cancer. NCCN Clinical Practice Guidelines in Oncology, version 1.2013 (Accessed from the web) (2013)
Gradishar, W.J., Anderson, B.O., Blair, S.L., et al.: Breast cancer version 3.2014. J. Natl. Compr. Cancer Netw. 12(4), 542–590 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Syed, H., Das, A.K. (2016). Vector Space Models for Encoding and Retrieving Longitudinal Medical Record Data. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-41576-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41575-8
Online ISBN: 978-3-319-41576-5
eBook Packages: Computer ScienceComputer Science (R0)