Abstract
This paper describes a system to identify entailment and quantify semantic similarity among pairs of Portuguese sentences. The system relies on a corpus to build a supervised model, and employs the same features regardless of the task. Our experiments cover two types of features, contextualized embeddings and lexical features, which we evaluate separately and in combination. The model is derived from a voting strategy on an ensemble of distinct regressors, on similarity measurement, or calibrated classifiers, on entailment detection. Applying such system to other languages mainly depends on the availability of corpora, since all features are either multilingual or language independent. We obtain competitive results on a recent Portuguese corpus, where our best result is obtained by joining embeddings with lexical features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 385–393. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2387636.2387697
Bar-Haim, R., Dagan, I., Szpektor, I.: Benchmarking applied semantic inference: the PASCAL recognising textual entailment challenges. In: Dershowitz, N., Nissan, E. (eds.) Language, Culture, Computation. Computing - Theory and Technology. LNCS, vol. 8001, pp. 409–424. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45321-2_19
Barbosa, L., Cavalin, P., Guimarães, V., Kormaksson, M.: Blue man group no assin: usando representações distribuídas para similaridade semântica e inferência textual. Linguamática 8(2), 15–22 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-2
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval-2017, Vancouver, Canada, pp. 1–14. Association for Computational Linguistics, August 2017. https://doi.org/10.18653/v1/S17-2001. https://www.aclweb.org/anthology/S17-2001
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/N19-1423
Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: Inesc-id@assin: Medição de similaridade semântica e reconhecimento de inferência textual. Linguamática 8(2), 33–42 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-4
Fonseca, E., Borges dos Santos, L., Criscuolo, M., Aluísio, S.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-1
Freire, J., Pinheiro, V., Feitosa, D.: FlexSTS: um framework para similaridade semântica textual. Linguamática 8(2), 23–31 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-3
Hartmann, N.: Solo queue at assin: Combinando abordagens tradicionais e emergentes. Linguamática 8(2), 59–64 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-6
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: SemEval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval 2014, Dublin, Ireland, pp. 1–8. Association for Computational Linguistics, August 2014. https://doi.org/10.3115/v1/S14-2001. https://www.aclweb.org/anthology/S14-2001
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC-2014, Reykjavik, Iceland, pp. 216–223. European Languages Resources Association (ELRA), May 2014
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Oliveira Alves, A., Rodrigues, R., Gonçalo Oliveira, H.: ASAPP: Alinhamento semântico automático de palavras aplicado ao português. Linguamática 8(2), 43–58 (2016). https://linguamatica.com/index.php/linguamatica/article/view/v8n2-5
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 4996–5001. Association for Computational Linguistics, July 2019. https://doi.org/10.18653/v1/P19-1493
Real, L., et al.: SICK-BR: a Portuguese corpus for inference. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 303–312. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_31
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 353–355. Association for Computational Linguistics, November 2018. https://doi.org/10.18653/v1/W18-5446. https://www.aclweb.org/anthology/W18-5446
Wu, L., et al.: Word mover’s embedding: from Word2Vec to document embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4524–4534. Association for Computational Linguistics, October–November 2018. https://doi.org/10.18653/v1/D18-1482. https://www.aclweb.org/anthology/D18-1482
Acknowledgements
This work was supported by national funds through Fundação para a Ciência e Tecnologia (FCT) with reference UID/CEC/50021/2019 and by FCT’s INCoDe 2030 initiative, in the scope of the demonstration project AIA, “Apoio Inteligente a empreendedores (chatbots)”, which also supports the scholarship of Pedro Fialho.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fialho, P., Coheur, L., Quaresma, P. (2020). Back to the Feature, in Entailment Detection and Similarity Measurement for Portuguese. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-41505-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41504-4
Online ISBN: 978-3-030-41505-1
eBook Packages: Computer ScienceComputer Science (R0)