Abstract
The huge amount of clinical text in Electronic Medical Records (EMRs) has opened a stage for text processing and information extraction for healthcare and medical research. Extracting temporal information in clinical text is much more difficult than the newswire text due to implicit expression of temporal information, domain-specific nature and lack of writing quality, among others. Despite of these constraints, some of the existing works established rule-based, machine learning and hybrid methods to extract temporal information with the help of annotated corpora. However obtaining the annotated corpora is costly, time consuming and requires much manual effort and thus their small size inevitably affects the processing quality. Motivated by this fact, in this work we propose a novel two-stage semi-supervised framework to exploit the abundant unannotated clinical text to automatically detect temporal events and then subsequently improve the stability and the accuracy of temporal event extraction. We trained and evaluated our semi-supervised model with the selected features of testing dataset, resulting F-measure of 89.76% for event extraction.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Roden, D., Xu, H., Denny, J., Wilke, R.: Electronic medical records as a tool in clinical pharmacology: opportunities and challenges. Clin. Pharmacol. Ther. 91(6), 322–329 (2012)
Norn, G., Hopstadius, J., Bate, A., Star, K., Edwards, I.: Temporal pattern discovery in longitudinal electronic patient records. Data Min. Knowl. Disc. 20(3), 361–387 (2010)
Sun, W., Rumshisky, A., Uzuner, O.: Temporal reasoning over clinical text: the state of the art. J. Am. Med. Inf. Assoc. 20(5), 814–819 (2013)
Miller, T.A., Bethard, S., Dligach, D., Lin, C.: Savova. Extracting time expressions from clinical text, G.K. (2015)
Tang, B., Wu, Y., Jiang, M., Chen, Y., Denny, J.C., Xu, H.: A hybrid system for temporal information extraction from clinical text. J. Am. Med. Inf. Assoc. 20(5), 828–835 (2013)
Sohn, S., Wagholikar, K., Li, D., Jonnalagaddaa, S., Tao, C., Elayavilli, R.K., Liu, H.: Comprehensive temporal information detection from clinical text: medical events, time, and tlink identification. JAMIA 20(5), 836–842 (2013)
Weiyi, S., Anna, R., Ozlem, U.: Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J. Am. Med. Inf. Assoc. 20(5), 806–813 (2013)
Galescu, L., Nate, B.: A corpus of clinical narratives annotated with temporal information. In: ACM, pp. 715–720 (2012)
Styler IV, W.F., Bethard, S., Finan, S., Palmer, M., Pradhan, S., de Groen, P.C., Erickson, B., Miller, T., Lin, C., Savova, G., Pustejovsky, J.: Temporal annotation in the clinical domain. Trans. Assoc. Comput. Linguist. 2, 143–154 (2013)
Sun, W., Rumshisky, A., Uzuner, O.: Annotating temporal information in clinical narratives. J. Biomed. Inf. 46, s5–s12 (2013)
Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., et al.: The timebank corpus. In: Corpus linguistics, vol. 2003, p. 40 (2003)
Setzer, A., Gaizauskas, R.J.: Annotating events and temporal information in newswire texts. In: Proceedings of LREC-2000, vol. 2000, pp. 1287–1294 (2000)
Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Moszkowicz, J., Pustejovsky, J.: The tempeval challenge: identifying temporal relations in text. Lang. Resour. Eval. 43(2), 161–179 (2009)
Zhou, L., Friedman, C., Parsons, S., Hripcsak, G.: System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. Am. Med. Inf. Assoc. 2005, 869 (2005)
Lin, Y.-K., Chen, H., Brown, R.A.: Medtime: a temporal information extraction system for clinical narratives. J. Biomed. Inf. 46, s20–s28 (2013)
Zhu, X., Cherry, C., Kiritchenko, S., Martin, J., De Bruijn, B.: Detecting concept relations in clinical text: insights from a state-of-the-art model. J. Biomed. Inf. 46(2), 275–285 (2013)
Bethard, S., Derczynski, L., Savova, G., Pustejovsky, J., Verhagen, M.: Semeval-2015 task 6: cinical tempeval. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 806–814 (2015)
Bethard, S., Savova, G., Chen, W.T., Derczynski, L., Pustejovsky, J., Verhagen, M.: Semeval-2016 task 12: clinical tempeval. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, California, June, pp. 962–972. Association for Computational Linguistics (2016)
Higashiyama, S., Seki, K., Uehara, K.: Clinical entity recognition using cost-sensitive structured perceptron for ntcir-10 mednlp. Proc. NTCIR 10, 706–709 (2013)
Kovačević, A., Dehghan, A., Filannino, M., Keane, J.A., Nenadic, G.: Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J. Am. Med. Inf. Assoc. 20(5), 859–866 (2013)
Xu, Y., Wang, Y., Liu, T., Tsujii, J., Eric, I., Chang, C.: An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. J. Am. Med. Inf. Assoc. 20(5), 849–858 (2013)
Jindal, P., Roth, D.: Extraction of events and temporal expressions from clinical narratives: 2012 i2b2 NLP challenge on temporal relations in clinical data. J. Biomed. Inf. 46, S13–S19 (2013)
Zhu, X.: Semi-supervised learning literature survey. World 10, 10 (2005)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, vol. 1, pp. 282–289 (2001)
Jiao, F., Wang, S., Lee, C.H., Greiner, R., Schuurmans, D.: Semi-supervised conditional random fields for improved sequence segmentation and labeling. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 209–216. Association for Computational Linguistics (2006)
Moharasan, G., Ho, T.B.: A semi-supervised approach for temporal information extraction from clinical text. In: 2016 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 7–12. IEEE (2016)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Acknowledgments
This work is partially funded by Vietnam National University at Ho Chi Minh City under the grant number B2015-42-02, and Japan Advanced Institute of Science and Technology under the Data Science Project. Also we thank mayo clinic and Informatics for Integrating Biology and the Bedside (I2B2) organizers for providing access to annotated I2B2 temporal relations corpus.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Moharasan, G., Ho, TB. (2017). Extraction of Temporal Events from Clinical Text Using Semi-supervised Conditional Random Fields. In: Tan, Y., Takagi, H., Shi, Y. (eds) Data Mining and Big Data. DMBD 2017. Lecture Notes in Computer Science(), vol 10387. Springer, Cham. https://doi.org/10.1007/978-3-319-61845-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-61845-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61844-9
Online ISBN: 978-3-319-61845-6
eBook Packages: Computer ScienceComputer Science (R0)