Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features

Ding, Liangping; Zhang, Zhixiong; Zhao, Yang

doi:10.1007/978-3-030-91669-5_14

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

International Conference on Asian Digital Libraries

1126 Accesses
1 Citations

Abstract

Keyphrase extraction is a key natural language processing task and has widespread adoption in many information retrieval and text mining applications. In this paper, we construct nine Bert-based Chinese medical keyphrase extraction models enhanced with external features and present a thorough empirical evaluation to explore the impacts of feature types and feature fusion methods. The results show that encoding part-of-speech (POS) feature and lexicon feature generated from descriptive keyphrase metadata into the word embedding space improves the baseline Bert-SoftMax model for 4.82%, meaning that it’s beneficial to incorporate features into Chinese medical keyphrase extraction model. Furthermore, the results of the comparative evaluation experiments show that model performance is sensitive to both of feature types and feature fusion methods, so it’s advisable to consider these two factors when dealing with feature enhanced tasks. Our study also provides a feasible approach to employ metadata, aiming to help stakeholders of digital libraries to take full advantage of large quantities of metadata resources to boost the development of scholarly knowledge discovery.

The work is supported by the project “Artificial Intelligence (AI) Engine Construction Based on Scientific Literature Knowledge” (Grant No. E0290906) and the project “Key Technology Optimization Integration and System Development of Next Generation Open Knowledge Service Platform” (Grant No. 2021XM45).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/hankcs/HanLP.
2.
“Embedded and concatenated fusion” is the combination of “embedded fusion” and “concatenated fusion”, whose model architecture is also the combination of feature related components of “feature embedded model” and “feature concatenated model”.
3.
https://github.com/huggingface/transformers.
4.
https://www.clips.uantwerpen.be/conll2000/chunking/conlleval.txt.

References

Alzaidy, R., Caragea, C., Giles, C.L.: Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)
Google Scholar
Berend, G.: Opinion expression mining by exploiting keyphrase extraction (2011)
Google Scholar
Cai, X., Dong, S., Hu, J.: A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records. BMC Med. Inform. Decis. Mak. 19(2), 101–109 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ding, L., Zhang, Z., Liu, H., Li, J., Yu, G.: Automatic keyphrase extraction from scientific Chinese medical abstracts based on character-level sequence labeling. J. Data Inf. Sci. 6(3), 35–57 (2021). https://doi.org/10.2478/jdis-2021-0013
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)
Google Scholar
Hulth, A., Megyesi, B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544 (2006)
Google Scholar
Jie, Z., Lu, W.: Dependency-guided LSTM-CRF for named entity recognition. arXiv preprint arXiv:1909.10148 (2019)
Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (1999)
Google Scholar
Li, L., Zhao, J., Hou, L., Zhai, Y., Shi, J., Cui, F.: An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records. BMC Med. Inform. Decis. Mak. 19(5), 1–11 (2019)
Google Scholar
Li, X., Zhang, H., Zhou, X.H.: Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inf. 107, 103422 (2020)
Article Google Scholar
Lin, B.Y., Xu, F.F., Luo, Z., Zhu, K.: Multi-channel BiLSTM-CRF model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text, pp. 160–165 (2017)
Google Scholar
Liu, T., Yao, J.G., Lin, C.Y.: Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5301–5307 (2019)
Google Scholar
Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2901–2908 (2020)
Google Scholar
Luo, L., Li, N., Li, S., Yang, Z., Lin, H.: DUTIR at the CCKS-2018 task1: a neural network ensemble approach for Chinese clinical named entity recognition. In: CCKS Tasks, pp. 7–12 (2018)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 157–176. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_10
Chapter Google Scholar
Sahrawat, D., et al.: Keyphrase extraction from scholarly articles as sequence labeling using contextualized embeddings. arXiv preprint arXiv:1910.08840 (2019)
Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
Tang, M., Gandhi, P., Kabir, M.A., Zou, C., Blakey, J., Luo, X.: Progress notes classification and keyword extraction using attention-based deep learning models with BERT. arXiv preprint arXiv:1910.05786 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wu, Y.F.B., Li, Q., Bot, R.S., Chen, X.: Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 283–284 (2005)
Google Scholar
Zhang, C.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. Syst. 4(3), 1169–1180 (2008)
Google Scholar
Zhang, H., Long, D., Xu, G., Xie, P., Huang, F., Wang, J.: Keyphrase extraction with dynamic graph convolutional networks and diversified inference. arXiv preprint arXiv:2010.12828 (2020)
Zhang, Q., Wang, Y., Gong, Y., Huang, X.J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016)
Google Scholar
Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst.: Int. J. 2(1), 39–53 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

National Science Library, Chinese Academy of Sciences, Beijing, 100190, China
Liangping Ding, Zhixiong Zhang & Yang Zhao
Department of Library Information and Archives Management, University of Chinese Academy of Sciences, Beijing, 100049, China
Liangping Ding, Zhixiong Zhang & Yang Zhao

Authors

Liangping Ding
View author publications
You can also search for this author in PubMed Google Scholar
Zhixiong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixiong Zhang .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Hao-Ren Ke
Nanyang Technological University, Singapore, Singapore
Chei Sian Lee
Kyoto University, Kyoto, Japan
Kazunari Sugiyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, L., Zhang, Z., Zhao, Y. (2021). Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-91669-5_14
Published: 30 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics