Abstract
In this paper, we demonstrate the impact of interactive machine learning for the development of a biomedical entity recognition dataset using a human-into-the-loop approach: during annotation, a machine learning model is built on previous annotations and used to propose labels for subsequent annotation. To demonstrate that such interactive and iterative annotation speeds up the development of quality dataset annotation, we conduct two experiments. In the first experiment, we carry out an iterative annotation experimental simulation and show that only a handful of medical abstracts need to be annotated to produce suggestions that increase annotation speed. In the second experiment, clinical doctors have conducted a case study in annotating medical terms documents relevant for their research. The experiments validate our method qualitatively and quantitatively, and give rise to a more personalized, responsive information extraction technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Holzinger, A.: Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)
Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)
Holzinger, A., Geierhofer, R., Modritscher, F., Tatzl, R.: Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. JUCS 14, 3781–3795 (2008)
Holzinger, A., Yildirim, P., Geier, M., Simonic, K.M.: Quality-based knowledge discovery from medical text on the web. In: Pasi, G., Bordogna, G., Jain, L.C. (eds.) ISRL, vol. 50, pp. 145–158. Springer (2013)
Ghiasvand, O., Kate, R.: UWM: Disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns. In: Proc. SemEval 2014, Dublin, Ireland (2014)
Leser, U., Hakenberg, J.: What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics 6, 357–69 (2005)
GuoDong, Z., Jian, S.: Exploring deep knowledge resources in biomedical name recognition. In: Proc. NLPBA/BioNLP at COLING 2004, Geneva, Switzerland, pp. 99–102 (2004)
Ludl, M.C., Lewandowski, A., Dorffner, G.: Adaptive machine learning in delayed feedback domains by selective relearning. Appl. Artif. Intell., 543–557 (2008)
Drucker, S.M., Fisher, D., Basu, S.: Helping Users Sort Faster with Adaptive Machine Learning Recommendations. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011, Part III. LNCS, vol. 6948, pp. 187–203. Springer, Heidelberg (2011)
Stumpf, S., Rajaram, V., Li, L., Burnett, M., Dietterich, T., Sullivan, E., Drummond, R., Herlocker, J.: Toward harnessing user feedback for machine learning. In: Proc. 12th IUI, pp. 82–91 (2007)
Das, S., Moore, T., Wong, W.K., Stumpf, S., Oberst, I., Mcintosh, K., Burnett, M.: End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression. Artif. Intell. 204, 56–74 (2013)
Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6, 57–71 (2005)
Ohta, T., Tateisi, Y., Kim, J.D.: The GENIA corpus: An annotated research abstract corpus in molecular biology domain. In: Proc. Human Language Technology Research, HLT 2002, San Francisco, CA, USA, pp. 82–86 (2002)
Tateisi, Y., Tsujii, J.: Part-of-speech annotation of biology research abstracts. In: Proc. LREC 2004, Lisbon, Portugal, pp. 1267–1270 (2004)
Tateisi, Y., Yakushiji, A., Ohta, T., Tsujii, J.: Syntax annotation for the GENIA corpus. In: Proc. IJCNLP 2005, Lisbon, Portugal, pp. 222–227 (2005)
Lee, C., Hou, W.J., Chen, H.H.: Annotating multiple types of biomedical entities: A single word classification approach. In: Proc. Int’l Joint Workshop on NLP in Biomedicine and Its Applications, pp. 80–83 (2004)
Yetisgen-Yildiz, M., Solti, I., Xia, F., Halgrim, S.R.: Preliminary experience with amazon’s mechanical turk for annotating medical named entities. In: Proc. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 180–183 (2010)
Vidulin, V., Bohanec, M., Gams, M.: Combining human analysis and machine data mining to obtain credible data relations. Information Sciences 288, 254–278 (2014)
Hoens, T.R., Chawla, N.V.: Learning in non-stationary environments with class imbalance. In: Proc. 18th ACM SIGKDD, New York, NY, USA, pp. 168–176 (2012)
Uzuner, Ö., Luo, Y., Szolovits, P.: Evaluating the state-of-the-art in automatic de-identification. J Am. Med. Inform. Ass. 14, 550–563 (2007)
Uzuner, Ö., Solti, I., Xia, F., Cadag, E.: Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am. Med. Inform. Ass. 17, 561–570 (2010)
Kim, J.D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 shared task on event extraction. In: Proc. BioNLP 2009, pp. 1–9 (2009)
Kim, J.D., Pyysalo, S., Ohta, T., Bossy, R., Nguyen, N., Tsujii, J.: Overview of bionlp shared task 2011. In: Proc. BioNLP, pp. 1–6 (2011)
Benikova, D., Yimam, S.M., Santhanam, P., Biemann, C.: GermaNER: free open german named entity recognition tool. In: Proceedings of the GSCL 2015, Duisburg, Germany (2015)
Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (2007)
Biemann, C.: Unsupervised Part-of-Speech Tagging in the Large. Res. Lang. Comput., 101–135 (2009)
Biemann, C., Quasthoff, U., Heyer, G., Holz, F.: Asv toolbox - a modular collection of language exploration tools. In: Proc. LREC 2008, pp. 1760–1767 (2008)
Brown, J.R.: Inherited susceptibility to chronic lymphocytic leukemia: evidence and prospects for the future. Ther Adv Hematol 4, 298–308 (2013)
Nieto, W.G., Teodosio, C., López, A, Rodríguez-Caballero, A., Romero, A., Bárcena, P., Gutierrez, M.L., Barez Hernandez, P., Carreño Luengo, M.T., Casado Romo, J.M., Cubino Luis, R., De Vega Parra, J.: Non-cll-like monoclonal b-cell lymphocytosis in the general population: Prevalence and phenotypic/genetic characteristics. Cytometry Part B 78B, 24–34 (2010)
Larsson, S.C., Wolk, A.: Obesity and risk of non-Hodgkin’s lymphoma: A meta-analysis. International Journal of Cancer 121, 1564–1570 (2007)
Tsugane, S., Inoue, M.: Insulin resistance and cancer: Epidemiological evidence. Cancer Science 101, 1073–1079 (2010)
Bastard, J.P., Maachi, M., Lagathu, C., Kim, M.J., Caron, M., Vidal, H., Capeau, J., Feve, B.: Recent advances in the relationship between obesity, inflammation, and insulin resistance. European Cytokine Network 17, 4–12 (2006)
Ginaldi, L., De Martinis, M., Monti, D., Franceschi, C.: The immune system in the elderly. Immunologic Research 30, 81–94 (2004)
Le Marchand-Brustel, Y., Gual, P., Grémeaux, T., Gonzalez, T., Barrès, R., Tanti, J.-F.: Fatty acid-induced insulin resistance: role of insulin receptor substrate 1 serine phosphorylation in the retroregulation of insulin signalling. Biochem. Soc. Trans. 31, 1152–1156 (2003)
Yimam, S., Eckart de Castilho, R., Gurevych, I., Biemann, C.: Automatic annotation suggestions and custom annotation layers in WebAnno. In: Proc. ACL 2014 System Demonstrations, Baltimore, MD, USA, pp. 91–96 (2014)
Yimam, S., Gurevych, I., Eckart de Castilho, R., Biemann, C.: WebAnno: A flexible, web-based and visually supported system for distributed annotations. In: Proc. ACL 2013 System Demonstrations, Sofia, Bulgaria, pp. 1–6 (2013)
Yimam, S.M.: Narrowing the loop: Integration of resources and linguistic dataset development with interactive machine learning. In: Proc. HLT-NAACL: Student Research Workshop, Denver, Colorado, pp. 88–95 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A. (2015). Interactive and Iterative Annotation for Biomedical Entity Recognition . In: Guo, Y., Friston, K., Aldo, F., Hill, S., Peng, H. (eds) Brain Informatics and Health. BIH 2015. Lecture Notes in Computer Science(), vol 9250. Springer, Cham. https://doi.org/10.1007/978-3-319-23344-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-23344-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23343-7
Online ISBN: 978-3-319-23344-4
eBook Packages: Computer ScienceComputer Science (R0)