Skip to main content

Interactive and Iterative Annotation for Biomedical Entity Recognition

  • Conference paper
  • First Online:
Brain Informatics and Health (BIH 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9250))

Included in the following conference series:

Abstract

In this paper, we demonstrate the impact of interactive machine learning for the development of a biomedical entity recognition dataset using a human-into-the-loop approach: during annotation, a machine learning model is built on previous annotations and used to propose labels for subsequent annotation. To demonstrate that such interactive and iterative annotation speeds up the development of quality dataset annotation, we conduct two experiments. In the first experiment, we carry out an iterative annotation experimental simulation and show that only a handful of medical abstracts need to be annotated to produce suggestions that increase annotation speed. In the second experiment, clinical doctors have conducted a case study in annotating medical terms documents relevant for their research. The experiments validate our method qualitatively and quantitatively, and give rise to a more personalized, responsive information extraction technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holzinger, A.: Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)

    Google Scholar 

  3. Holzinger, A., Geierhofer, R., Modritscher, F., Tatzl, R.: Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. JUCS 14, 3781–3795 (2008)

    Google Scholar 

  4. Holzinger, A., Yildirim, P., Geier, M., Simonic, K.M.: Quality-based knowledge discovery from medical text on the web. In: Pasi, G., Bordogna, G., Jain, L.C. (eds.) ISRL, vol. 50, pp. 145–158. Springer (2013)

    Google Scholar 

  5. Ghiasvand, O., Kate, R.: UWM: Disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns. In: Proc. SemEval 2014, Dublin, Ireland (2014)

    Google Scholar 

  6. Leser, U., Hakenberg, J.: What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics 6, 357–69 (2005)

    Article  Google Scholar 

  7. GuoDong, Z., Jian, S.: Exploring deep knowledge resources in biomedical name recognition. In: Proc. NLPBA/BioNLP at COLING 2004, Geneva, Switzerland, pp. 99–102 (2004)

    Google Scholar 

  8. Ludl, M.C., Lewandowski, A., Dorffner, G.: Adaptive machine learning in delayed feedback domains by selective relearning. Appl. Artif. Intell., 543–557 (2008)

    Google Scholar 

  9. Drucker, S.M., Fisher, D., Basu, S.: Helping Users Sort Faster with Adaptive Machine Learning Recommendations. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011, Part III. LNCS, vol. 6948, pp. 187–203. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Stumpf, S., Rajaram, V., Li, L., Burnett, M., Dietterich, T., Sullivan, E., Drummond, R., Herlocker, J.: Toward harnessing user feedback for machine learning. In: Proc. 12th IUI, pp. 82–91 (2007)

    Google Scholar 

  11. Das, S., Moore, T., Wong, W.K., Stumpf, S., Oberst, I., Mcintosh, K., Burnett, M.: End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression. Artif. Intell. 204, 56–74 (2013)

    Article  MathSciNet  Google Scholar 

  12. Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6, 57–71 (2005)

    Article  Google Scholar 

  13. Ohta, T., Tateisi, Y., Kim, J.D.: The GENIA corpus: An annotated research abstract corpus in molecular biology domain. In: Proc. Human Language Technology Research, HLT 2002, San Francisco, CA, USA, pp. 82–86 (2002)

    Google Scholar 

  14. Tateisi, Y., Tsujii, J.: Part-of-speech annotation of biology research abstracts. In: Proc. LREC 2004, Lisbon, Portugal, pp. 1267–1270 (2004)

    Google Scholar 

  15. Tateisi, Y., Yakushiji, A., Ohta, T., Tsujii, J.: Syntax annotation for the GENIA corpus. In: Proc. IJCNLP 2005, Lisbon, Portugal, pp. 222–227 (2005)

    Google Scholar 

  16. Lee, C., Hou, W.J., Chen, H.H.: Annotating multiple types of biomedical entities: A single word classification approach. In: Proc. Int’l Joint Workshop on NLP in Biomedicine and Its Applications, pp. 80–83 (2004)

    Google Scholar 

  17. Yetisgen-Yildiz, M., Solti, I., Xia, F., Halgrim, S.R.: Preliminary experience with amazon’s mechanical turk for annotating medical named entities. In: Proc. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 180–183 (2010)

    Google Scholar 

  18. Vidulin, V., Bohanec, M., Gams, M.: Combining human analysis and machine data mining to obtain credible data relations. Information Sciences 288, 254–278 (2014)

    Article  Google Scholar 

  19. Hoens, T.R., Chawla, N.V.: Learning in non-stationary environments with class imbalance. In: Proc. 18th ACM SIGKDD, New York, NY, USA, pp. 168–176 (2012)

    Google Scholar 

  20. Uzuner, Ö., Luo, Y., Szolovits, P.: Evaluating the state-of-the-art in automatic de-identification. J Am. Med. Inform. Ass. 14, 550–563 (2007)

    Article  Google Scholar 

  21. Uzuner, Ö., Solti, I., Xia, F., Cadag, E.: Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am. Med. Inform. Ass. 17, 561–570 (2010)

    Google Scholar 

  22. Kim, J.D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 shared task on event extraction. In: Proc. BioNLP 2009, pp. 1–9 (2009)

    Google Scholar 

  23. Kim, J.D., Pyysalo, S., Ohta, T., Bossy, R., Nguyen, N., Tsujii, J.: Overview of bionlp shared task 2011. In: Proc. BioNLP, pp. 1–6 (2011)

    Google Scholar 

  24. Benikova, D., Yimam, S.M., Santhanam, P., Biemann, C.: GermaNER: free open german named entity recognition tool. In: Proceedings of the GSCL 2015, Duisburg, Germany (2015)

    Google Scholar 

  25. Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (2007)

    Google Scholar 

  26. Biemann, C.: Unsupervised Part-of-Speech Tagging in the Large. Res. Lang. Comput., 101–135 (2009)

    Google Scholar 

  27. Biemann, C., Quasthoff, U., Heyer, G., Holz, F.: Asv toolbox - a modular collection of language exploration tools. In: Proc. LREC 2008, pp. 1760–1767 (2008)

    Google Scholar 

  28. Brown, J.R.: Inherited susceptibility to chronic lymphocytic leukemia: evidence and prospects for the future. Ther Adv Hematol 4, 298–308 (2013)

    Article  Google Scholar 

  29. Nieto, W.G., Teodosio, C., López, A, Rodríguez-Caballero, A., Romero, A., Bárcena, P., Gutierrez, M.L., Barez Hernandez, P., Carreño Luengo, M.T., Casado Romo, J.M., Cubino Luis, R., De Vega Parra, J.: Non-cll-like monoclonal b-cell lymphocytosis in the general population: Prevalence and phenotypic/genetic characteristics. Cytometry Part B 78B, 24–34 (2010)

    Google Scholar 

  30. Larsson, S.C., Wolk, A.: Obesity and risk of non-Hodgkin’s lymphoma: A meta-analysis. International Journal of Cancer 121, 1564–1570 (2007)

    Article  Google Scholar 

  31. Tsugane, S., Inoue, M.: Insulin resistance and cancer: Epidemiological evidence. Cancer Science 101, 1073–1079 (2010)

    Article  Google Scholar 

  32. Bastard, J.P., Maachi, M., Lagathu, C., Kim, M.J., Caron, M., Vidal, H., Capeau, J., Feve, B.: Recent advances in the relationship between obesity, inflammation, and insulin resistance. European Cytokine Network 17, 4–12 (2006)

    Google Scholar 

  33. Ginaldi, L., De Martinis, M., Monti, D., Franceschi, C.: The immune system in the elderly. Immunologic Research 30, 81–94 (2004)

    Article  Google Scholar 

  34. Le Marchand-Brustel, Y., Gual, P., Grémeaux, T., Gonzalez, T., Barrès, R., Tanti, J.-F.: Fatty acid-induced insulin resistance: role of insulin receptor substrate 1 serine phosphorylation in the retroregulation of insulin signalling. Biochem. Soc. Trans. 31, 1152–1156 (2003)

    Google Scholar 

  35. Yimam, S., Eckart de Castilho, R., Gurevych, I., Biemann, C.: Automatic annotation suggestions and custom annotation layers in WebAnno. In: Proc. ACL 2014 System Demonstrations, Baltimore, MD, USA, pp. 91–96 (2014)

    Google Scholar 

  36. Yimam, S., Gurevych, I., Eckart de Castilho, R., Biemann, C.: WebAnno: A flexible, web-based and visually supported system for distributed annotations. In: Proc. ACL 2013 System Demonstrations, Sofia, Bulgaria, pp. 1–6 (2013)

    Google Scholar 

  37. Yimam, S.M.: Narrowing the loop: Integration of resources and linguistic dataset development with interactive machine learning. In: Proc. HLT-NAACL: Student Research Workshop, Denver, Colorado, pp. 88–95 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Biemann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A. (2015). Interactive and Iterative Annotation for Biomedical Entity Recognition . In: Guo, Y., Friston, K., Aldo, F., Hill, S., Peng, H. (eds) Brain Informatics and Health. BIH 2015. Lecture Notes in Computer Science(), vol 9250. Springer, Cham. https://doi.org/10.1007/978-3-319-23344-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23344-4_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23343-7

  • Online ISBN: 978-3-319-23344-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics