Skip to main content

Enriching Pre-trained Language Model with Dependency Syntactic Information for Chemical-Protein Interaction Extraction

  • Conference paper
  • First Online:
Information Retrieval (CCIR 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12285))

Included in the following conference series:

  • 669 Accesses

Abstract

Automatic extraction of chemical-protein interactions (CPI) included in biomedical literature plays an important role in many biomedical applications such as drug discovery, knowledge discovery, and biomedical knowledge graph construction. However, CPIs in long and complicated sentences are difficult to extract. Most of the existing methods mainly focus on the sequence information rather than syntactic information, which is also conducive to CPI extraction. In this paper, a pre-trained language model based approach with dependency syntactic information is proposed to improve the performance of CPI extraction. Firstly the approach extracts generalized dependency syntactic information based on the characteristics of CPI data. Then, BERT is adopted to generate the contextual representation of sequence information and syntactic information and the mean-pooling method is used to aggregate the context representation. Finally, the sequence information and syntactic information are fused and mapped to the category feature space through a fully-connected layer. The evaluation on the original ChemProt corpus demonstrates that in comparison to other pre-trained model-based methods, our method can achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krallinger, M., Rabal, O., Akhondi, S. A.: Overview of the BioCreative VI chemical-protein interaction Track. In: Proceedings of the sixth BioCreative challenge evaluation workshop, vol. 1, pp. 141–146 (2017)

    Google Scholar 

  2. Lung, P.Y., Zhao, T., He, Z.: Extracting chemical protein interactions from literature. In: Proceedings of 2017 BioCreative VI Workshop, Maryland, USA, pp. 160–163 (2017)

    Google Scholar 

  3. Corbett, P., Boyle, J.: Improving the learning of chemical–protein interactions from literature using transfer learning and specialized word embeddings. Database, 1–10 (2018)

    Google Scholar 

  4. Peng, Y., Rios, A., Kavuluru, R., Lu, Z.: Extracting chemical–protein relations with ensembles of SVM and deep learning models. Database, 1–9 (2018)

    Google Scholar 

  5. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746 (2019)

  6. Sun, C., et al.: Chemical-protein interaction extraction via gaussian probability distribution and external biomedical knowledge. arXiv:1911.09487 (2019)

  7. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  8. Zhang, Y., Lin, H., Yang, Z., Wang, J., Sun, Y.: Chemical–protein interaction extraction via contextualized word representations and multihead attention. Database (2019)

    Google Scholar 

  9. Phu, M., Jason, P., Shikha, B., Samuel, R.: Do Attention Heads in BERT Track Syntactic Dependencies? arXiv:1911.12246 (2019)

  10. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019)

    Google Scholar 

  11. Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1105–1116. ACL, Berlin (2016)

    Google Scholar 

  12. Zhang, Y., Qi, P., Manning, C.D.: Graph convolution over pruned dependency trees improves relation extraction. In: Proceedings of EMNLP (2018)

    Google Scholar 

  13. Soares, L.B., Fitzgerald, N., Ling, J., Kwiatkowski, T.: Matching the blanks: distributional similarity for relation learning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2895–2905. ACL, Florence (2019)

    Google Scholar 

  14. Spacy. https://spacy.io/. Accessed 15 Jan 2020

  15. Han, X., Gao, T., Yao, Y.: OpenNRE: an open and extensible toolkit for neural relation extraction. In: Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing: System Demonstrations (2019)

    Google Scholar 

  16. Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets, In: Proceedings of the 18th BioNLP Workshop and Shared Task, vol. 1, pp. 58–65. Association for Computational Linguistics, Florence (2019)

    Google Scholar 

  17. Wu, S., He, Y.: Enriching pre-trained language model with entity information for relation classification. arxiv:1905.08284 (2019)

    Google Scholar 

  18. Liu, X., Fan, J., Dong, S.: Document-level biomedical relation extraction leveraging pre-trained self-attention structure and entity replacement. JMIR Medical Informatics (preprint) (2020)

    Google Scholar 

Download references

Acknowledgement

The research of this paper was supported by National Natural Science Foundation of China (61976239), Zhongshan Innovation Foundation of High-end Scientific Research Institutions (2019AG031).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shoubin Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, J., Liu, X., Dong, S., Hu, J. (2020). Enriching Pre-trained Language Model with Dependency Syntactic Information for Chemical-Protein Interaction Extraction. In: Dou, Z., Miao, Q., Lu, W., Mao, J., Jia, G. (eds) Information Retrieval. CCIR 2020. Lecture Notes in Computer Science(), vol 12285. Springer, Cham. https://doi.org/10.1007/978-3-030-56725-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-56725-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-56724-8

  • Online ISBN: 978-3-030-56725-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics