Skip to main content

Identification of Occupation Mentions in Clinical Narratives

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2016)

Abstract

A patient’s occupation is an important variable used for disease surveillance and modeling, but such information is often only available in free-text clinical narratives. We have developed a large occupation dictionary that is used as part of both knowledge- (dictionary and rules) and data-driven (machine-learning) methods for the identification of occupation mentions. We have evaluated the approaches on both public and non-public clinical datasets. A machine-learning method using linear chain conditional random fields trained on minimalistic set of features achieved up to 88 % \( {\text{F}}_{1} \)-measure (token-level), with the occupation feature derived from the knowledge-driven method showing a notable positive impact across the datasets (up to additional 32 % \( {\text{F}}_{1} \)-measure).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.i2b2.org.

  2. 2.

    https://opennlp.apache.org/.

  3. 3.

    CRF implementation used: https://taku910.github.io/crfpp/.

References

  1. Stubbs, A., Kotfila, C., Uzuner, O.: Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task track 1. J. Biomed. Inform. 58(Suppl.), S11–S19 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.06.007

    Article  Google Scholar 

  2. Stubbs, A., Uzuner, O.: Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus. J. Biomed. Inform. 58(Suppl.), S20–S29 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.07.020

    Article  Google Scholar 

  3. Yang, H., Garibaldi, J.M.: Automatic detection of protected health information from clinic narratives. J. Biomed. Inform. 58(Suppl.), S30–S38 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.06.015

    Article  Google Scholar 

  4. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. Comput. Biol. 9(2), e1002854 (2013). doi:10.1371/journal.pcbi.1002854

    Google Scholar 

  5. Office for National Statistics (ONS), the Standard Occupational Classification (SOC) 2010 Index. http://www.ons.gov.uk/ons/search/index.html?newquery=soc2010. Accessed 7 Dec 2015

  6. Dehghan, A., Kovacevic, A., Karystianis, G., Keane, J.A., Nenadic, G.: Combining knowledge- and data-driven methods for de-identification of clinical narratives. J. Biomed. Inform. 58(Suppl.), S53–S59 (2015). ISSN: 1532-0464, http://dx.doi.org/10.1016/j.jbi.2015.06.029

    Article  Google Scholar 

  7. Dehghan, A.: Temporal ordering of clinical events (2015). arXiv:1504.03659

  8. Porter, F.M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  9. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  10. Dehghan, A., Keane, J.A., Nenadic, G.: Challenges in clinical named entity recognition for decision support. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, pp. 947–951 (2013). doi:10.1109/SMC.2013.166

Download references

Acknowledgement

We would like to thank Bob Seymour from ONS for his help with finding relevant resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azad Dehghan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Dehghan, A., Liptrot, T., Tibble, D., Barker-Hewitt, M., Nenadic, G. (2016). Identification of Occupation Mentions in Clinical Narratives. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41754-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41753-0

  • Online ISBN: 978-3-319-41754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics