Abstract
A patient’s occupation is an important variable used for disease surveillance and modeling, but such information is often only available in free-text clinical narratives. We have developed a large occupation dictionary that is used as part of both knowledge- (dictionary and rules) and data-driven (machine-learning) methods for the identification of occupation mentions. We have evaluated the approaches on both public and non-public clinical datasets. A machine-learning method using linear chain conditional random fields trained on minimalistic set of features achieved up to 88 % \( {\text{F}}_{1} \)-measure (token-level), with the occupation feature derived from the knowledge-driven method showing a notable positive impact across the datasets (up to additional 32 % \( {\text{F}}_{1} \)-measure).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
CRF implementation used: https://taku910.github.io/crfpp/.
References
Stubbs, A., Kotfila, C., Uzuner, O.: Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task track 1. J. Biomed. Inform. 58(Suppl.), S11–S19 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.06.007
Stubbs, A., Uzuner, O.: Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus. J. Biomed. Inform. 58(Suppl.), S20–S29 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.07.020
Yang, H., Garibaldi, J.M.: Automatic detection of protected health information from clinic narratives. J. Biomed. Inform. 58(Suppl.), S30–S38 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.06.015
Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. Comput. Biol. 9(2), e1002854 (2013). doi:10.1371/journal.pcbi.1002854
Office for National Statistics (ONS), the Standard Occupational Classification (SOC) 2010 Index. http://www.ons.gov.uk/ons/search/index.html?newquery=soc2010. Accessed 7 Dec 2015
Dehghan, A., Kovacevic, A., Karystianis, G., Keane, J.A., Nenadic, G.: Combining knowledge- and data-driven methods for de-identification of clinical narratives. J. Biomed. Inform. 58(Suppl.), S53–S59 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.06.029
Dehghan, A.: Temporal ordering of clinical events (2015). arXiv:1504.03659
Porter, F.M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Dehghan, A., Keane, J.A., Nenadic, G.: Challenges in clinical named entity recognition for decision support. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, pp. 947–951 (2013). doi:10.1109/SMC.2013.166
Acknowledgement
We would like to thank Bob Seymour from ONS for his help with finding relevant resources.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Dehghan, A., Liptrot, T., Tibble, D., Barker-Hewitt, M., Nenadic, G. (2016). Identification of Occupation Mentions in Clinical Narratives. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-41754-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41753-0
Online ISBN: 978-3-319-41754-7
eBook Packages: Computer ScienceComputer Science (R0)