Hostname: page-component-76fb5796d-dfsvx Total loading time: 0 Render date: 2024-04-26T10:57:14.943Z Has data issue: false hasContentIssue false

An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution

Published online by Cambridge University Press:  01 August 2022

Hao Wang
Affiliation:
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Siyuan Du
Affiliation:
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Xiangyu Zheng
Affiliation:
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Lingyi Meng*
Affiliation:
School of Foreign Languages, East China Normal University, Shanghai, China
*
*Corresponding author. E-mail: lymeng@fl.ecnu.edu.cn

Abstract

Metonymy resolution (MR) is a challenging task in the field of natural language processing. The task of MR aims to identify the metonymic usage of a word that employs an entity name to refer to another target entity. Recent BERT-based methods yield state-of-the-art performances. However, they neither make full use of the entity information nor explicitly consider syntactic structure. In contrast, in this paper, we argue that the metonymic process should be completed in a collaborative manner, relying on both lexical semantics and syntactic structure (syntax). This paper proposes a novel approach to enhancing BERT-based MR models with hard and soft syntactic constraints by using different types of convolutional neural networks to model dependency parse trees. Experimental results on benchmark datasets (e.g., ReLocaR, SemEval 2007 and WiMCor) confirm that leveraging syntactic information into fine pre-trained language models benefits MR tasks.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bahdanau, D., Cho, K. and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
Baroni, M., Dinu, G. and Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 238247.CrossRefGoogle Scholar
Brun, C., Ehrmann, M. and Jacquet, G. (2007). XRCE-M: A hybrid system for named entity metonymy resolution. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Association for Computational Linguistics, pp. 488491.10.3115/1621474.1621583CrossRefGoogle Scholar
Chan, Y.S. and Roth, D. (2011). Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA. Association for Computational Linguistics, pp. 551560.Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P.P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 24932537.Google Scholar
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 41714186.Google Scholar
Farkas, R., Simon, E., Szarvas, G. and Varga, D. (2007). Gyder: Maxent metonymy resolution. In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 161164.10.3115/1621474.1621507CrossRefGoogle Scholar
Fass, D. (1988). Metonymy and metaphor: What’s the difference? In Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics. International Committee on Computational Linguistics, pp. 177181.Google Scholar
Fu, T.-J., Li, P.-H. and Ma, W.-Y. (2019). GraphRel: Modeling text as relational graphs for joint entity and relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 14091418.CrossRefGoogle Scholar
Fundel, K., Küffner, R. and Zimmer, R. (2007). RelEx—relation extraction using dependency parse trees. Bioinformatics 23(3), 365371.CrossRefGoogle ScholarPubMed
Gao, G., Choi, E., Choi, Y. and Zettlemoyer, L. (2018). Neural metaphor detection in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 607613.10.18653/v1/D18-1060CrossRefGoogle Scholar
Glavaš, G. and Vulić, I. (2021). Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online. Association for Computational Linguistics, pp. 30903104.CrossRefGoogle Scholar
Gritta, M., Pilehvar, M.T. and Collier, N. (2020). A pragmatic guide to geoparsing evaluation toponyms, named entity recognition and pragmatics. Lang Resources & Evaluation 54, 683712.CrossRefGoogle ScholarPubMed
Gritta, M., Pilehvar, M.T., Limsopatham, N. and Collier, N. (2017). Vancouver welcomes you! minimalist location metonymy resolution. In Proceedings of Annual Meeting of the Association for Computational Linguistics, vol. 1. Association for Computational Linguistics, pp. 12481259.CrossRefGoogle Scholar
Guo, Z., Zhang, Y. and Lu, W. (2019). Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 241251.10.18653/v1/P19-1024CrossRefGoogle Scholar
Hobbs, J.R. and Martin, P. 1987. Local pragmatics. Technical report. SRI International Menlo Park CA Artificial Intelligence Center, pp. 520523.Google Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 17351780.10.1162/neco.1997.9.8.1735CrossRefGoogle ScholarPubMed
Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 47004708.10.1109/CVPR.2017.243CrossRefGoogle Scholar
Janda, L.A. (2011). Metonymy in word-formation. Cognitive Linguistics 22(2), 359392.CrossRefGoogle Scholar
Jawahar, G., Sagot, B. and Seddah, D. (2019). What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 36513657.10.18653/v1/P19-1356CrossRefGoogle Scholar
Joshi, M. and Penstein-Rosé, C. (2009). Generalizing dependency features for opinion mining. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, pp. 313316.CrossRefGoogle Scholar
Kambhatla, N. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, vol. 22. Association for Computational Linguistics.CrossRefGoogle Scholar
Kamei, S.-i. and Wakao, T. (1992). Metonymy: Reassessment, survey of acceptability, and its treatment in a machine translation system. In Proceedings of the 30th Annual Meeting on Association for Computational Linguistics (ACL’92). Association for Computational Linguistics, pp. 309311.10.3115/981967.982015CrossRefGoogle Scholar
Kipf, T.N. and Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations.Google Scholar
Kvecses, Z. and Radden, G. (1998). Metonymy: Developing a cognitive linguistic view. Cognitive Linguistics 9(1), 3778.10.1515/cogl.1998.9.1.37CrossRefGoogle Scholar
Lakoff, G. (1987). Image metaphors. Metaphor and Symbol 2(3), 219222.10.1207/s15327868ms0203_4CrossRefGoogle Scholar
Lakoff, G. (1991). Metap hor and war: The metaphor system used to justify war in the gulf. Peace Research, 23(2/3), 2532.Google Scholar
Lakoff, G. (1993). The Contemporary Theory of Metaphor. In Ortony A (ed), Metaphor and Thought. Cambridge, UK: Cambridge University Press, pp. 202251.10.1017/CBO9781139173865.013CrossRefGoogle Scholar
Lakoff, G. and Johnson, M. (1980). Conceptual Metaphor in Everyday Language, vol. 77. JSTOR, pp. 453486.Google Scholar
Li, D., Wei, F., Tan, C., Tang, D. and Ke, X. (2014). Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 4954.Google Scholar
Li, H., Vasardani, M., Tomko, M. and Baldwin, T. (2020). Target word masking for location metonymy resolution. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, pp. 36963707.10.18653/v1/2020.coling-main.330CrossRefGoogle Scholar
Lin, B.Y., Chen, X., Chen, J. and Ren, X. (2019a). KagNet: Knowledge-aware graph networks for commonsense reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp. 28222832.10.18653/v1/D19-1282CrossRefGoogle Scholar
Lin, C., Miller, T., Dligach, D., Bethard, S. and Savova, G. (2019b). A BERT-based universal model for both within-and cross-sentence clinical temporal relation extraction. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 6571.Google Scholar
Liu, Y., Wei, F., Li, S., Ji, H., Zhou, M. and Wang, H. (2015). A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, pp. 285290.10.3115/v1/P15-2047CrossRefGoogle Scholar
Luong, T., Pham, H. and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Association for Computational Linguistics, pp. 14121421.CrossRefGoogle Scholar
Markert, K. and Nissim, M. (2002). Metonymy resolution as a classification task. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Association for Computational Linguistics, pp. 204213.10.3115/1118693.1118720CrossRefGoogle Scholar
Markert, K. and Nissim, M. (2007). Semeval-2007 task 08: Metonymy resolution at semeval-2007. In Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, pp. 3641.10.3115/1621474.1621481CrossRefGoogle Scholar
Markert, K. and Nissim, M. (2009). Data and models for metonymy resolution. Lang Resources & Evaluation 43, 123138.CrossRefGoogle Scholar
Mathews, K.A. and Strube, M. (2020). A large harvested corpus of location metonymy. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association, pp. 56785687.Google Scholar
Mesnil, G., He, X., Deng, L. and Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25–29, 2013. International Speech Communication Association, pp. 37713775.CrossRefGoogle Scholar
Mihaylov, T. and Frank, A. (2018). Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 821832.CrossRefGoogle Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 31113119.Google Scholar
Miwa, M. and Bansal, M. (2016). End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 11051116.CrossRefGoogle Scholar
Monteiro, B.R., Davis, C.A. and Fonseca, F. (2016). A survey on the geographic scope of textual documents. Computers & Geosciences 96, 2334.CrossRefGoogle Scholar
Nastase, V., Judea, A., Markert, K. and Strube, M. (2012). Local and global context for supervised and unsupervised metonymy resolution. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 183193.Google Scholar
Nastase, V. and Strube, M. (2009). Combining collocations, lexical and encyclopedic knowledge for metonymy resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 910918.CrossRefGoogle Scholar
Nastase, V. and Strube, M. (2013). Transforming wikipedia into a large scale multilingual concept network. Artificial Intelligence 194, 6285.10.1016/j.artint.2012.06.008CrossRefGoogle Scholar
Nissim, M. and Markert, K. (2003). Syntactic features and word similarity for supervised metonymy resolution. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 5663.CrossRefGoogle Scholar
Peng, N., Poon, H., Quirk, C., Toutanova, K. and Yih, W.-t. (2017). Cross-sentence n-ary relation extraction with graph LSTMs. Transactions of the Association for Computational Linguistics 5, 101115.CrossRefGoogle Scholar
Pennington, J., Socher, R. and Manning, C.D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 15321543.10.3115/v1/D14-1162CrossRefGoogle Scholar
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, pp. 22272237.10.18653/v1/N18-1202CrossRefGoogle Scholar
Peters, M.E., Ammar, W., Bhagavatula, C. and Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 Volume 1: Long Papers. Association for Computational Linguistics, pp. 17561765.CrossRefGoogle Scholar
Pinango, M.M., Zhang, M., Foster-Hanson, E., Negishi, M., Lacadie, C. and Constable, R.T. (2017). Metonymy as referential dependency: Psycholinguistic and neurolinguistic arguments for a unified linguistic treatment. Cognitive Science 41(2SUPPL.S2), 351378.10.1111/cogs.12341CrossRefGoogle ScholarPubMed
Pustejovsky, J. (1991). The generative lexicon. Computational Linguistics 17(4), 409441.Google Scholar
Qu, C., Yang, L., Qiu, M., Croft, W.B., Zhang, Y. and Iyyer, M. (2019). Bert with history answer embedding for conversational question answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11331136.CrossRefGoogle Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 23832392.CrossRefGoogle Scholar
Shibata, T., Kawahara, D. and Kurohashi, S. (2016). Neural network-based model for Japanese predicate argument structure analysis. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 12351244.CrossRefGoogle Scholar
Si, C., Chen, W., Wang, W., Wang, L. and Tan, T. (2019). An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12271236.10.1109/CVPR.2019.00132CrossRefGoogle Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 16311642.Google Scholar
Sun, C., Huang, L. and Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 380385.Google Scholar
Sundermeyer, M., Schlüter, R. and Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association. International Speech Communication Association, pp. 194198.CrossRefGoogle Scholar
Tang, G., Muller, M., Gonzales, A.R. and Sennrich, R. (2018). Why self-attention? a targeted evaluation of neural machine translation architectures. In Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 42634272.CrossRefGoogle Scholar
Tian, Y., Chen, G., Song, Y. and Wan, X. (2021). Dependency-driven relation extraction with attentive graph convolutional networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online. Association for Computational Linguistics, pp. 44584471.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 59986008.Google Scholar
Wu, S. and He, Y. (2019). Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 23612364.10.1145/3357384.3358119CrossRefGoogle Scholar
Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H. and Jin, Z. (2015). Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 17851794.CrossRefGoogle Scholar
Yang, A., Wang, Q., Liu, J., Liu, K., Lyu, Y., Wu, H., She, Q. and Li, S. (2019). Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 23462357.CrossRefGoogle Scholar
Zarcone, A., Utt, J. and Padó, S. (2012). Modeling covert event retrieval in logical metonymy: Probabilistic and distributional accounts. In Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2012), Montréal, Canada. Association for Computational Linguistics, pp. 7079.Google Scholar
Zhang, M., Zhang, J., and Su, J. (2006). Exploring syntactic features for relation extraction using a convolution tree kernel. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, pp. 288295.CrossRefGoogle Scholar
Zhang, Y., Qi, P. and Manning, C.D. (2018). Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 22052215.CrossRefGoogle Scholar
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M. and Liu, Q. (2019). ERNIE: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 14411451.CrossRefGoogle Scholar