An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution

Hao Wang; Siyuan Du; Xiangyu Zheng; Lingyi Meng

doi:10.1017/S135132492200033X

An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution

Published online by Cambridge University Press: 01 August 2022

Hao Wang

Siyuan Du ,

Xiangyu Zheng and

Lingyi Meng

Show author details

Hao Wang: Affiliation:
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Siyuan Du: Affiliation:
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Xiangyu Zheng: Affiliation:
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Lingyi Meng*: Affiliation:
School of Foreign Languages, East China Normal University, Shanghai, China
*: *Corresponding author. E-mail: lymeng@fl.ecnu.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Metonymy resolution (MR) is a challenging task in the field of natural language processing. The task of MR aims to identify the metonymic usage of a word that employs an entity name to refer to another target entity. Recent BERT-based methods yield state-of-the-art performances. However, they neither make full use of the entity information nor explicitly consider syntactic structure. In contrast, in this paper, we argue that the metonymic process should be completed in a collaborative manner, relying on both lexical semantics and syntactic structure (syntax). This paper proposes a novel approach to enhancing BERT-based MR models with hard and soft syntactic constraints by using different types of convolutional neural networks to model dependency parse trees. Experimental results on benchmark datasets (e.g., ReLocaR, SemEval 2007 and WiMCor) confirm that leveraging syntactic information into fine pre-trained language models benefits MR tasks.

Keywords

Syntax Anaphora resolution Information retrieval

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 3 , May 2023 , pp. 669 - 692

DOI: https://doi.org/10.1017/S135132492200033X [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bahdanau, D., Cho, K. and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar

Baroni, M., Dinu, G. and Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 238–247.CrossRef Google Scholar

Brun, C., Ehrmann, M. and Jacquet, G. (2007). XRCE-M: A hybrid system for named entity metonymy resolution. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Association for Computational Linguistics, pp. 488–491.10.3115/1621474.1621583CrossRef Google Scholar

Chan, Y.S. and Roth, D. (2011). Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA. Association for Computational Linguistics, pp. 551–560.Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P.P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537.Google Scholar

Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 4171–4186.Google Scholar

Farkas, R., Simon, E., Szarvas, G. and Varga, D. (2007). Gyder: Maxent metonymy resolution. In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 161–164.10.3115/1621474.1621507CrossRef Google Scholar

Fass, D. (1988). Metonymy and metaphor: What’s the difference? In Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics. International Committee on Computational Linguistics, pp. 177–181.Google Scholar

Fu, T.-J., Li, P.-H. and Ma, W.-Y. (2019). GraphRel: Modeling text as relational graphs for joint entity and relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 1409–1418.CrossRef Google Scholar

Fundel, K., Küffner, R. and Zimmer, R. (2007). RelEx—relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371.CrossRef Google Scholar PubMed

Gao, G., Choi, E., Choi, Y. and Zettlemoyer, L. (2018). Neural metaphor detection in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 607–613.10.18653/v1/D18-1060CrossRef Google Scholar

Glavaš, G. and Vulić, I. (2021). Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online. Association for Computational Linguistics, pp. 3090–3104.CrossRef Google Scholar

Gritta, M., Pilehvar, M.T. and Collier, N. (2020). A pragmatic guide to geoparsing evaluation toponyms, named entity recognition and pragmatics. Lang Resources & Evaluation 54, 683–712.CrossRef Google Scholar PubMed

Gritta, M., Pilehvar, M.T., Limsopatham, N. and Collier, N. (2017). Vancouver welcomes you! minimalist location metonymy resolution. In Proceedings of Annual Meeting of the Association for Computational Linguistics, vol. 1. Association for Computational Linguistics, pp. 1248–1259.CrossRef Google Scholar

Guo, Z., Zhang, Y. and Lu, W. (2019). Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 241–251.10.18653/v1/P19-1024CrossRef Google Scholar

Hobbs, J.R. and Martin, P. 1987. Local pragmatics. Technical report. SRI International Menlo Park CA Artificial Intelligence Center, pp. 520–523.Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780.10.1162/neco.1997.9.8.1735CrossRef Google Scholar PubMed

Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708.10.1109/CVPR.2017.243CrossRef Google Scholar

Janda, L.A. (2011). Metonymy in word-formation. Cognitive Linguistics 22(2), 359–392.CrossRef Google Scholar

Jawahar, G., Sagot, B. and Seddah, D. (2019). What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 3651–3657.10.18653/v1/P19-1356CrossRef Google Scholar

Joshi, M. and Penstein-Rosé, C. (2009). Generalizing dependency features for opinion mining. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, pp. 313–316.CrossRef Google Scholar

Kambhatla, N. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, vol. 22. Association for Computational Linguistics.CrossRef Google Scholar

Kamei, S.-i. and Wakao, T. (1992). Metonymy: Reassessment, survey of acceptability, and its treatment in a machine translation system. In Proceedings of the 30th Annual Meeting on Association for Computational Linguistics (ACL’92). Association for Computational Linguistics, pp. 309–311.10.3115/981967.982015CrossRef Google Scholar

Kipf, T.N. and Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations.Google Scholar

Kvecses, Z. and Radden, G. (1998). Metonymy: Developing a cognitive linguistic view. Cognitive Linguistics 9(1), 37–78.10.1515/cogl.1998.9.1.37CrossRef Google Scholar

Lakoff, G. (1987). Image metaphors. Metaphor and Symbol 2(3), 219–222.10.1207/s15327868ms0203_4CrossRef Google Scholar

Lakoff, G. (1991). Metap hor and war: The metaphor system used to justify war in the gulf. Peace Research, 23(2/3), 25–32.Google Scholar

Lakoff, G. (1993). The Contemporary Theory of Metaphor. In Ortony A (ed), Metaphor and Thought. Cambridge, UK: Cambridge University Press, pp. 202–251.10.1017/CBO9781139173865.013CrossRef Google Scholar

Lakoff, G. and Johnson, M. (1980). Conceptual Metaphor in Everyday Language, vol. 77. JSTOR, pp. 453–486.Google Scholar

Li, D., Wei, F., Tan, C., Tang, D. and Ke, X. (2014). Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 49–54.Google Scholar

Li, H., Vasardani, M., Tomko, M. and Baldwin, T. (2020). Target word masking for location metonymy resolution. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, pp. 3696–3707.10.18653/v1/2020.coling-main.330CrossRef Google Scholar

Lin, B.Y., Chen, X., Chen, J. and Ren, X. (2019a). KagNet: Knowledge-aware graph networks for commonsense reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp. 2822–2832.10.18653/v1/D19-1282CrossRef Google Scholar

Lin, C., Miller, T., Dligach, D., Bethard, S. and Savova, G. (2019b). A BERT-based universal model for both within-and cross-sentence clinical temporal relation extraction. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 65–71.Google Scholar

Liu, Y., Wei, F., Li, S., Ji, H., Zhou, M. and Wang, H. (2015). A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, pp. 285–290.10.3115/v1/P15-2047CrossRef Google Scholar

Luong, T., Pham, H. and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Association for Computational Linguistics, pp. 1412–1421.CrossRef Google Scholar

Markert, K. and Nissim, M. (2002). Metonymy resolution as a classification task. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Association for Computational Linguistics, pp. 204–213.10.3115/1118693.1118720CrossRef Google Scholar

Markert, K. and Nissim, M. (2007). Semeval-2007 task 08: Metonymy resolution at semeval-2007. In Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, pp. 36–41.10.3115/1621474.1621481CrossRef Google Scholar

Markert, K. and Nissim, M. (2009). Data and models for metonymy resolution. Lang Resources & Evaluation 43, 123–138.CrossRef Google Scholar

Mathews, K.A. and Strube, M. (2020). A large harvested corpus of location metonymy. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association, pp. 5678–5687.Google Scholar

Mesnil, G., He, X., Deng, L. and Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25–29, 2013. International Speech Communication Association, pp. 3771–3775.CrossRef Google Scholar

Mihaylov, T. and Frank, A. (2018). Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 821–832.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119.Google Scholar

Miwa, M. and Bansal, M. (2016). End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 1105–1116.CrossRef Google Scholar

Monteiro, B.R., Davis, C.A. and Fonseca, F. (2016). A survey on the geographic scope of textual documents. Computers & Geosciences 96, 23–34.CrossRef Google Scholar

Nastase, V., Judea, A., Markert, K. and Strube, M. (2012). Local and global context for supervised and unsupervised metonymy resolution. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 183–193.Google Scholar

Nastase, V. and Strube, M. (2009). Combining collocations, lexical and encyclopedic knowledge for metonymy resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 910–918.CrossRef Google Scholar

Nastase, V. and Strube, M. (2013). Transforming wikipedia into a large scale multilingual concept network. Artificial Intelligence 194, 62–85.10.1016/j.artint.2012.06.008CrossRef Google Scholar

Nissim, M. and Markert, K. (2003). Syntactic features and word similarity for supervised metonymy resolution. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 56–63.CrossRef Google Scholar

Peng, N., Poon, H., Quirk, C., Toutanova, K. and Yih, W.-t. (2017). Cross-sentence n-ary relation extraction with graph LSTMs. Transactions of the Association for Computational Linguistics 5, 101–115.CrossRef Google Scholar

Pennington, J., Socher, R. and Manning, C.D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1532–1543.10.3115/v1/D14-1162CrossRef Google Scholar

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, pp. 2227–2237.10.18653/v1/N18-1202CrossRef Google Scholar

Peters, M.E., Ammar, W., Bhagavatula, C. and Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 Volume 1: Long Papers. Association for Computational Linguistics, pp. 1756–1765.CrossRef Google Scholar

Pinango, M.M., Zhang, M., Foster-Hanson, E., Negishi, M., Lacadie, C. and Constable, R.T. (2017). Metonymy as referential dependency: Psycholinguistic and neurolinguistic arguments for a unified linguistic treatment. Cognitive Science 41(2SUPPL.S2), 351–378.10.1111/cogs.12341CrossRef Google Scholar PubMed

Pustejovsky, J. (1991). The generative lexicon. Computational Linguistics 17(4), 409–441.Google Scholar

Qu, C., Yang, L., Qiu, M., Croft, W.B., Zhang, Y. and Iyyer, M. (2019). Bert with history answer embedding for conversational question answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1133–1136.CrossRef Google Scholar

Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 2383–2392.CrossRef Google Scholar

Shibata, T., Kawahara, D. and Kurohashi, S. (2016). Neural network-based model for Japanese predicate argument structure analysis. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 1235–1244.CrossRef Google Scholar

Si, C., Chen, W., Wang, W., Wang, L. and Tan, T. (2019). An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1227–1236.10.1109/CVPR.2019.00132CrossRef Google Scholar

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1631–1642.Google Scholar

Sun, C., Huang, L. and Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 380–385.Google Scholar

Sundermeyer, M., Schlüter, R. and Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association. International Speech Communication Association, pp. 194–198.CrossRef Google Scholar

Tang, G., Muller, M., Gonzales, A.R. and Sennrich, R. (2018). Why self-attention? a targeted evaluation of neural machine translation architectures. In Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 4263–4272.CrossRef Google Scholar

Tian, Y., Chen, G., Song, Y. and Wan, X. (2021). Dependency-driven relation extraction with attentive graph convolutional networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online. Association for Computational Linguistics, pp. 4458–4471.CrossRef Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008.Google Scholar

Wu, S. and He, Y. (2019). Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2361–2364.10.1145/3357384.3358119CrossRef Google Scholar

Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H. and Jin, Z. (2015). Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1785–1794.CrossRef Google Scholar

Yang, A., Wang, Q., Liu, J., Liu, K., Lyu, Y., Wu, H., She, Q. and Li, S. (2019). Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 2346–2357.CrossRef Google Scholar

Zarcone, A., Utt, J. and Padó, S. (2012). Modeling covert event retrieval in logical metonymy: Probabilistic and distributional accounts. In Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2012), Montréal, Canada. Association for Computational Linguistics, pp. 70–79.Google Scholar

Zhang, M., Zhang, J., and Su, J. (2006). Exploring syntactic features for relation extraction using a convolution tree kernel. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, pp. 288–295.CrossRef Google Scholar

Zhang, Y., Qi, P. and Manning, C.D. (2018). Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 2205–2215.CrossRef Google Scholar

Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M. and Liu, Q. (2019). ERNIE: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 1441–1451.CrossRef Google Scholar

Article contents

An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests