AHIAP: An Agile Medical Named Entity Recognition and Relation Extraction Framework Based on Active Learning

Sheng, Ming; Dong, Jing; Zhang, Yong; Bu, Yuelin; Li, Anqi; Lin, Weihang; Li, Xin; Xing, Chunxiao

doi:10.1007/978-3-030-61951-0_7

Ming Sheng¹³,
Jing Dong¹⁴,
Yong Zhang¹³,
Yuelin Bu¹⁵,
Anqi Li¹⁶,
Weihang Lin¹⁷,
Xin Li¹⁸ &
…
Chunxiao Xing¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12435))

Included in the following conference series:

International Conference on Health Information Science

714 Accesses
3 Citations

Abstract

Knowledge graph plays a significant role in many domains for providing a wide range of assistance. In the medical domain, clinical guidelines, academic papers, Electronic Medical Records (EMRs) and crawled data from the Internet contain essential information. However, those data are usually unstructured but vital to knowledge graph construction. The construction of knowledge graph using unstructured data requires a large number of medical experts to participate in annotations based on their prior experiences and knowledge. Knowledge graphs’ quality highly depends on the performances of medical named entity recognition and relation extraction that are both based on data annotation. However, faced with handling such a large amount of enormous data, manual labelling turns out to be a high labor cost task. Besides, the data is generated rapidly, requiring us to annotate and extract quickly to keep the pace with the data accumulation. Therefore, we propose a named entity recognition and relation extraction framework, AHIAP, to solve these problems mentioned above. AHIAP uses active learning method to reduce the labor cost of the annotation process while maintaining the annotation quality. There are two modules in AHIAP, an active learning module for reducing labor cost and a measurement module to control the quality. By using active learning, AHIAP only takes 200 samples to get to the accuracy of 70%, whereas the standard learning strategy takes 4000 records to get the same accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_34
Chapter Google Scholar
Verborgh, R., et al.: Triple Pattern Fragments: a low-cost knowledge graph interface for the Web. J. Web Semant. 37, 184–206 (2016)
Article Google Scholar
Donnelly, K.: SNOMED-CT: the advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. 121, 279 (2006)
Google Scholar
Agarwala, R., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 45, D12–D17 (2017)
Google Scholar
Sheng, M., et al.: DEKGB: an extensible framework for health knowledge graph. In: ICSH, pp. 27–38 (2019)
Google Scholar
Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S., Sontag, D.: Learning a health knowledge graph from electronic medical records. Sci. Rep. 7, 1–11 (2017)
Article Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Giorgi, J.M., Bader, G.D., Wren, J.: Towards reliable named entity recognition in the biomedical domain. Bioinformatics 36, 280–286 (2020)
Article Google Scholar
Sheng, M., et al.: DocKG: a knowledge graph framework for health with doctor-in-the-loop. In: Wang, H., Siuly, S., Zhou, R., Martin-Sanchez, F., Zhang, Y., Huang, Z. (eds.) HIS 2019. LNCS, vol. 11837, pp. 3–14. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32962-4_1
Chapter Google Scholar
doccano - Document Annotation Tool. https://doccano.herokuapp.com/. Accessed 11 June 2020
brat rapid annotation tool. https://brat.nlplab.org/
Prodigy · An annotation tool for AI. Machine Learning & NLP. https://prodi.gy/
Jie, Y., Yue Z., Linwei L., Xingxuan L.: YEDDA: a lightweight collaborative text span annotation tool. In: ACL 2018, pp. 31–36 (2018)
Google Scholar
Deepdive. https://github.com/HazyResearch/deepdive. Accessed 11 June 2020
Chen, W., Styler, W.: Anafora: a web-based general purpose annotation tool. In: NAACL, pp. 14–19 (2013)
Google Scholar
Eckart de Castilho, R., et al.: A web-based tool for the integrated annotation of semantic and syntactic structures. In: LT4DH Workshop, pp. 76–84 (2016)
Google Scholar
Multi-document Annotation Environment. http://keighrim.github.io/mae-annotation/
Klie, J.-C., Bugert, M., Boullosa, B., Eckart de Castilho, R., Gurevych, I.: The INCEpTION platform: machine-assisted and knowledge-oriented interactive annotation. In: ACL, pp. 5–9 (2018)
Google Scholar
Coelho da Silva, T.L., Magalhães, R.P., et al.: Improving named entity recognition using deep learning with human in the loop. In: EDBT, 594–597 (2019)
Google Scholar
Yang, Y., Kandogan, E., Li, Y., Sen, P., Lasecki, W.S.: A study on interaction in human-in-the-loop machine learning for text analytics. In: CEUR Workshop (2019)
Google Scholar
Shen, Y., Yun, H., Lipton, Z.C., Kronrod, Y., Anandkumar, A.: Deep active learning for named entity recognition. arXiv preprint arXiv:1707.05928 (2017)
Vieira, S.M., Kaymak, U., Sousa, J.M.C.: Cohen’s kappa coefficient as a performance measure for feature selection. In: WCCI 2010. pp. 1–8. IEEE (2010)
Google Scholar
Zhao, K., et al.: Modeling patient visit using electronic medical records for cost profile estimation. In: DASFAA, pp. 20–36 (2018)
Google Scholar
Tian, B., Zhang, Y., Wang, J., Xing, C.: Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI, pp. 3569–3575 (2019)
Google Scholar
Wang, J., Lin, C., Li, M., Zaniolo, C.: Boosting approximate dictionary-based entity extraction with synonyms. Inf. Sci. 530, 1–21 (2020)
Article Google Scholar
Zhao, K., et al.: Discovering subsequence patterns for next POI recommendation. In: IJCAI, pp. 3216–3222 (2020)
Google Scholar

Download references

Acknowledgement

This work was supported by NSFC (91646202), National Key R&D Program of China (2018YFB1404401, 2018YFB1402701).

Author information

Authors and Affiliations

BNRist, DCST, RIIT, Tsinghua University, Beijing, 100084, China
Ming Sheng, Yong Zhang & Chunxiao Xing
University of Queensland, Brisbane, QLD, 4072, Australia
Jing Dong
Beijing University of Posts and Telecommunications, Beijing, 100876, China
Yuelin Bu
Beihang University, Beijing, 100191, China
Anqi Li
Beijing Foreign Studies University, Beijing, 100089, China
Weihang Lin
Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, China
Xin Li

Authors

Ming Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuelin Bu
View author publications
You can also search for this author in PubMed Google Scholar
Anqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Weihang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunxiao Xing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Dong .

Editor information

Editors and Affiliations

Vrije University of Amsterdam, Amsterdam, The Netherlands
Zhisheng Huang
Victoria University, Footscray, VIC, Australia
Siuly Siuly
Victoria University, Footscray, VIC, Australia
Hua Wang
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou
Victoria University, Footscray, VIC, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sheng, M. et al. (2020). AHIAP: An Agile Medical Named Entity Recognition and Relation Extraction Framework Based on Active Learning. In: Huang, Z., Siuly, S., Wang, H., Zhou, R., Zhang, Y. (eds) Health Information Science. HIS 2020. Lecture Notes in Computer Science(), vol 12435. Springer, Cham. https://doi.org/10.1007/978-3-030-61951-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-61951-0_7
Published: 17 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61950-3
Online ISBN: 978-3-030-61951-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics