Enriching Pre-trained Language Model with Dependency Syntactic Information for Chemical-Protein Interaction Extraction

Fan, Jianye; Liu, Xiaofeng; Dong, Shoubin; Hu, Jinlong

doi:10.1007/978-3-030-56725-5_5

Jianye Fan¹³,
Xiaofeng Liu¹³,
Shoubin Dong¹³ &
…
Jinlong Hu¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12285))

Included in the following conference series:

China Conference on Information Retrieval

669 Accesses

Abstract

Automatic extraction of chemical-protein interactions (CPI) included in biomedical literature plays an important role in many biomedical applications such as drug discovery, knowledge discovery, and biomedical knowledge graph construction. However, CPIs in long and complicated sentences are difficult to extract. Most of the existing methods mainly focus on the sequence information rather than syntactic information, which is also conducive to CPI extraction. In this paper, a pre-trained language model based approach with dependency syntactic information is proposed to improve the performance of CPI extraction. Firstly the approach extracts generalized dependency syntactic information based on the characteristics of CPI data. Then, BERT is adopted to generate the contextual representation of sequence information and syntactic information and the mean-pooling method is used to aggregate the context representation. Finally, the sequence information and syntactic information are fused and mapped to the category feature space through a fully-connected layer. The evaluation on the original ChemProt corpus demonstrates that in comparison to other pre-trained model-based methods, our method can achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krallinger, M., Rabal, O., Akhondi, S. A.: Overview of the BioCreative VI chemical-protein interaction Track. In: Proceedings of the sixth BioCreative challenge evaluation workshop, vol. 1, pp. 141–146 (2017)
Google Scholar
Lung, P.Y., Zhao, T., He, Z.: Extracting chemical protein interactions from literature. In: Proceedings of 2017 BioCreative VI Workshop, Maryland, USA, pp. 160–163 (2017)
Google Scholar
Corbett, P., Boyle, J.: Improving the learning of chemical–protein interactions from literature using transfer learning and specialized word embeddings. Database, 1–10 (2018)
Google Scholar
Peng, Y., Rios, A., Kavuluru, R., Lu, Z.: Extracting chemical–protein relations with ensembles of SVM and deep learning models. Database, 1–9 (2018)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746 (2019)
Sun, C., et al.: Chemical-protein interaction extraction via gaussian probability distribution and external biomedical knowledge. arXiv:1911.09487 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Zhang, Y., Lin, H., Yang, Z., Wang, J., Sun, Y.: Chemical–protein interaction extraction via contextualized word representations and multihead attention. Database (2019)
Google Scholar
Phu, M., Jason, P., Shikha, B., Samuel, R.: Do Attention Heads in BERT Track Syntactic Dependencies? arXiv:1911.12246 (2019)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019)
Google Scholar
Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1105–1116. ACL, Berlin (2016)
Google Scholar
Zhang, Y., Qi, P., Manning, C.D.: Graph convolution over pruned dependency trees improves relation extraction. In: Proceedings of EMNLP (2018)
Google Scholar
Soares, L.B., Fitzgerald, N., Ling, J., Kwiatkowski, T.: Matching the blanks: distributional similarity for relation learning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2895–2905. ACL, Florence (2019)
Google Scholar
Spacy. https://spacy.io/. Accessed 15 Jan 2020
Han, X., Gao, T., Yao, Y.: OpenNRE: an open and extensible toolkit for neural relation extraction. In: Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing: System Demonstrations (2019)
Google Scholar
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets, In: Proceedings of the 18th BioNLP Workshop and Shared Task, vol. 1, pp. 58–65. Association for Computational Linguistics, Florence (2019)
Google Scholar
Wu, S., He, Y.: Enriching pre-trained language model with entity information for relation classification. arxiv:1905.08284 (2019)
Google Scholar
Liu, X., Fan, J., Dong, S.: Document-level biomedical relation extraction leveraging pre-trained self-attention structure and entity replacement. JMIR Medical Informatics (preprint) (2020)
Google Scholar

Download references

Acknowledgement

The research of this paper was supported by National Natural Science Foundation of China (61976239), Zhongshan Innovation Foundation of High-end Scientific Research Institutions (2019AG031).

Author information

Authors and Affiliations

Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Jianye Fan, Xiaofeng Liu, Shoubin Dong & Jinlong Hu

Authors

Jianye Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shoubin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shoubin Dong .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Zhicheng Dou
Xidian University, Xi’an, Shaanxi, China
Qiguang Miao
Wuhan University, Wuhan, Hubei, China
Wei Lu
Tsinghua University, Beijing, China
Jiaxin Mao
Xidian University, Xi’an, Shaanxi, China
Guang Jia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, J., Liu, X., Dong, S., Hu, J. (2020). Enriching Pre-trained Language Model with Dependency Syntactic Information for Chemical-Protein Interaction Extraction. In: Dou, Z., Miao, Q., Lu, W., Mao, J., Jia, G. (eds) Information Retrieval. CCIR 2020. Lecture Notes in Computer Science(), vol 12285. Springer, Cham. https://doi.org/10.1007/978-3-030-56725-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-56725-5_5
Published: 10 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56724-8
Online ISBN: 978-3-030-56725-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics