Abstract
Keyphrase extraction is a key natural language processing task and has widespread adoption in many information retrieval and text mining applications. In this paper, we construct nine Bert-based Chinese medical keyphrase extraction models enhanced with external features and present a thorough empirical evaluation to explore the impacts of feature types and feature fusion methods. The results show that encoding part-of-speech (POS) feature and lexicon feature generated from descriptive keyphrase metadata into the word embedding space improves the baseline Bert-SoftMax model for 4.82%, meaning that it’s beneficial to incorporate features into Chinese medical keyphrase extraction model. Furthermore, the results of the comparative evaluation experiments show that model performance is sensitive to both of feature types and feature fusion methods, so it’s advisable to consider these two factors when dealing with feature enhanced tasks. Our study also provides a feasible approach to employ metadata, aiming to help stakeholders of digital libraries to take full advantage of large quantities of metadata resources to boost the development of scholarly knowledge discovery.
The work is supported by the project “Artificial Intelligence (AI) Engine Construction Based on Scientific Literature Knowledge” (Grant No. E0290906) and the project “Key Technology Optimization Integration and System Development of Next Generation Open Knowledge Service Platform” (Grant No. 2021XM45).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
“Embedded and concatenated fusion” is the combination of “embedded fusion” and “concatenated fusion”, whose model architecture is also the combination of feature related components of “feature embedded model” and “feature concatenated model”.
- 3.
- 4.
References
Alzaidy, R., Caragea, C., Giles, C.L.: Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)
Berend, G.: Opinion expression mining by exploiting keyphrase extraction (2011)
Cai, X., Dong, S., Hu, J.: A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records. BMC Med. Inform. Decis. Mak. 19(2), 101–109 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ding, L., Zhang, Z., Liu, H., Li, J., Yu, G.: Automatic keyphrase extraction from scientific Chinese medical abstracts based on character-level sequence labeling. J. Data Inf. Sci. 6(3), 35–57 (2021). https://doi.org/10.2478/jdis-2021-0013
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)
Hulth, A., Megyesi, B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544 (2006)
Jie, Z., Lu, W.: Dependency-guided LSTM-CRF for named entity recognition. arXiv preprint arXiv:1909.10148 (2019)
Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (1999)
Li, L., Zhao, J., Hou, L., Zhai, Y., Shi, J., Cui, F.: An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records. BMC Med. Inform. Decis. Mak. 19(5), 1–11 (2019)
Li, X., Zhang, H., Zhou, X.H.: Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inf. 107, 103422 (2020)
Lin, B.Y., Xu, F.F., Luo, Z., Zhu, K.: Multi-channel BiLSTM-CRF model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text, pp. 160–165 (2017)
Liu, T., Yao, J.G., Lin, C.Y.: Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5301–5307 (2019)
Liu, W., et al.: K-BERT: enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2901–2908 (2020)
Luo, L., Li, N., Li, S., Yang, Z., Lin, H.: DUTIR at the CCKS-2018 task1: a neural network ensemble approach for Chinese clinical named entity recognition. In: CCKS Tasks, pp. 7–12 (2018)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 157–176. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_10
Sahrawat, D., et al.: Keyphrase extraction from scholarly articles as sequence labeling using contextualized embeddings. arXiv preprint arXiv:1910.08840 (2019)
Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
Tang, M., Gandhi, P., Kabir, M.A., Zou, C., Blakey, J., Luo, X.: Progress notes classification and keyword extraction using attention-based deep learning models with BERT. arXiv preprint arXiv:1910.05786 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wu, Y.F.B., Li, Q., Bot, R.S., Chen, X.: Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 283–284 (2005)
Zhang, C.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. Syst. 4(3), 1169–1180 (2008)
Zhang, H., Long, D., Xu, G., Xie, P., Huang, F., Wang, J.: Keyphrase extraction with dynamic graph convolutional networks and diversified inference. arXiv preprint arXiv:2010.12828 (2020)
Zhang, Q., Wang, Y., Gong, Y., Huang, X.J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016)
Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst.: Int. J. 2(1), 39–53 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, L., Zhang, Z., Zhao, Y. (2021). Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-91669-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)