Skip to main content

Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features

  • Conference paper
  • First Online:
Towards Open and Trustworthy Digital Societies (ICADL 2021)

Abstract

Keyphrase extraction is a key natural language processing task and has widespread adoption in many information retrieval and text mining applications. In this paper, we construct nine Bert-based Chinese medical keyphrase extraction models enhanced with external features and present a thorough empirical evaluation to explore the impacts of feature types and feature fusion methods. The results show that encoding part-of-speech (POS) feature and lexicon feature generated from descriptive keyphrase metadata into the word embedding space improves the baseline Bert-SoftMax model for 4.82%, meaning that it’s beneficial to incorporate features into Chinese medical keyphrase extraction model. Furthermore, the results of the comparative evaluation experiments show that model performance is sensitive to both of feature types and feature fusion methods, so it’s advisable to consider these two factors when dealing with feature enhanced tasks. Our study also provides a feasible approach to employ metadata, aiming to help stakeholders of digital libraries to take full advantage of large quantities of metadata resources to boost the development of scholarly knowledge discovery.

The work is supported by the project “Artificial Intelligence (AI) Engine Construction Based on Scientific Literature Knowledge” (Grant No. E0290906) and the project “Key Technology Optimization Integration and System Development of Next Generation Open Knowledge Service Platform” (Grant No. 2021XM45).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/hankcs/HanLP.

  2. 2.

    “Embedded and concatenated fusion” is the combination of “embedded fusion” and “concatenated fusion”, whose model architecture is also the combination of feature related components of “feature embedded model” and “feature concatenated model”.

  3. 3.

    https://github.com/huggingface/transformers.

  4. 4.

    https://www.clips.uantwerpen.be/conll2000/chunking/conlleval.txt.

References

  1. Alzaidy, R., Caragea, C., Giles, C.L.: Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)

    Google Scholar 

  2. Berend, G.: Opinion expression mining by exploiting keyphrase extraction (2011)

    Google Scholar 

  3. Cai, X., Dong, S., Hu, J.: A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records. BMC Med. Inform. Decis. Mak. 19(2), 101–109 (2019)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Ding, L., Zhang, Z., Liu, H., Li, J., Yu, G.: Automatic keyphrase extraction from scientific Chinese medical abstracts based on character-level sequence labeling. J. Data Inf. Sci. 6(3), 35–57 (2021). https://doi.org/10.2478/jdis-2021-0013

  6. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)

    Google Scholar 

  7. Hulth, A., Megyesi, B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544 (2006)

    Google Scholar 

  8. Jie, Z., Lu, W.: Dependency-guided LSTM-CRF for named entity recognition. arXiv preprint arXiv:1909.10148 (2019)

  9. Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (1999)

    Google Scholar 

  10. Li, L., Zhao, J., Hou, L., Zhai, Y., Shi, J., Cui, F.: An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records. BMC Med. Inform. Decis. Mak. 19(5), 1–11 (2019)

    Google Scholar 

  11. Li, X., Zhang, H., Zhou, X.H.: Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inf. 107, 103422 (2020)

    Article  Google Scholar 

  12. Lin, B.Y., Xu, F.F., Luo, Z., Zhu, K.: Multi-channel BiLSTM-CRF model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text, pp. 160–165 (2017)

    Google Scholar 

  13. Liu, T., Yao, J.G., Lin, C.Y.: Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5301–5307 (2019)

    Google Scholar 

  14. Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2901–2908 (2020)

    Google Scholar 

  15. Luo, L., Li, N., Li, S., Yang, Z., Lin, H.: DUTIR at the CCKS-2018 task1: a neural network ensemble approach for Chinese clinical named entity recognition. In: CCKS Tasks, pp. 7–12 (2018)

    Google Scholar 

  16. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 157–176. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_10

    Chapter  Google Scholar 

  17. Sahrawat, D., et al.: Keyphrase extraction from scholarly articles as sequence labeling using contextualized embeddings. arXiv preprint arXiv:1910.08840 (2019)

  18. Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)

  19. Tang, M., Gandhi, P., Kabir, M.A., Zou, C., Blakey, J., Luo, X.: Progress notes classification and keyword extraction using attention-based deep learning models with BERT. arXiv preprint arXiv:1910.05786 (2019)

  20. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  21. Wu, Y.F.B., Li, Q., Bot, R.S., Chen, X.: Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 283–284 (2005)

    Google Scholar 

  22. Zhang, C.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. Syst. 4(3), 1169–1180 (2008)

    Google Scholar 

  23. Zhang, H., Long, D., Xu, G., Xie, P., Huang, F., Wang, J.: Keyphrase extraction with dynamic graph convolutional networks and diversified inference. arXiv preprint arXiv:2010.12828 (2020)

  24. Zhang, Q., Wang, Y., Gong, Y., Huang, X.J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016)

    Google Scholar 

  25. Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst.: Int. J. 2(1), 39–53 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixiong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ding, L., Zhang, Z., Zhao, Y. (2021). Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91669-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91668-8

  • Online ISBN: 978-3-030-91669-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics