Abstract
Automatic extraction of chemical-protein interactions (CPI) included in biomedical literature plays an important role in many biomedical applications such as drug discovery, knowledge discovery, and biomedical knowledge graph construction. However, CPIs in long and complicated sentences are difficult to extract. Most of the existing methods mainly focus on the sequence information rather than syntactic information, which is also conducive to CPI extraction. In this paper, a pre-trained language model based approach with dependency syntactic information is proposed to improve the performance of CPI extraction. Firstly the approach extracts generalized dependency syntactic information based on the characteristics of CPI data. Then, BERT is adopted to generate the contextual representation of sequence information and syntactic information and the mean-pooling method is used to aggregate the context representation. Finally, the sequence information and syntactic information are fused and mapped to the category feature space through a fully-connected layer. The evaluation on the original ChemProt corpus demonstrates that in comparison to other pre-trained model-based methods, our method can achieve better performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krallinger, M., Rabal, O., Akhondi, S. A.: Overview of the BioCreative VI chemical-protein interaction Track. In: Proceedings of the sixth BioCreative challenge evaluation workshop, vol. 1, pp. 141–146 (2017)
Lung, P.Y., Zhao, T., He, Z.: Extracting chemical protein interactions from literature. In: Proceedings of 2017 BioCreative VI Workshop, Maryland, USA, pp. 160–163 (2017)
Corbett, P., Boyle, J.: Improving the learning of chemical–protein interactions from literature using transfer learning and specialized word embeddings. Database, 1–10 (2018)
Peng, Y., Rios, A., Kavuluru, R., Lu, Z.: Extracting chemical–protein relations with ensembles of SVM and deep learning models. Database, 1–9 (2018)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746 (2019)
Sun, C., et al.: Chemical-protein interaction extraction via gaussian probability distribution and external biomedical knowledge. arXiv:1911.09487 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Zhang, Y., Lin, H., Yang, Z., Wang, J., Sun, Y.: Chemical–protein interaction extraction via contextualized word representations and multihead attention. Database (2019)
Phu, M., Jason, P., Shikha, B., Samuel, R.: Do Attention Heads in BERT Track Syntactic Dependencies? arXiv:1911.12246 (2019)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019)
Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1105–1116. ACL, Berlin (2016)
Zhang, Y., Qi, P., Manning, C.D.: Graph convolution over pruned dependency trees improves relation extraction. In: Proceedings of EMNLP (2018)
Soares, L.B., Fitzgerald, N., Ling, J., Kwiatkowski, T.: Matching the blanks: distributional similarity for relation learning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2895–2905. ACL, Florence (2019)
Spacy. https://spacy.io/. Accessed 15 Jan 2020
Han, X., Gao, T., Yao, Y.: OpenNRE: an open and extensible toolkit for neural relation extraction. In: Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing: System Demonstrations (2019)
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets, In: Proceedings of the 18th BioNLP Workshop and Shared Task, vol. 1, pp. 58–65. Association for Computational Linguistics, Florence (2019)
Wu, S., He, Y.: Enriching pre-trained language model with entity information for relation classification. arxiv:1905.08284 (2019)
Liu, X., Fan, J., Dong, S.: Document-level biomedical relation extraction leveraging pre-trained self-attention structure and entity replacement. JMIR Medical Informatics (preprint) (2020)
Acknowledgement
The research of this paper was supported by National Natural Science Foundation of China (61976239), Zhongshan Innovation Foundation of High-end Scientific Research Institutions (2019AG031).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fan, J., Liu, X., Dong, S., Hu, J. (2020). Enriching Pre-trained Language Model with Dependency Syntactic Information for Chemical-Protein Interaction Extraction. In: Dou, Z., Miao, Q., Lu, W., Mao, J., Jia, G. (eds) Information Retrieval. CCIR 2020. Lecture Notes in Computer Science(), vol 12285. Springer, Cham. https://doi.org/10.1007/978-3-030-56725-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-56725-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56724-8
Online ISBN: 978-3-030-56725-5
eBook Packages: Computer ScienceComputer Science (R0)