Back to the Feature, in Entailment Detection and Similarity Measurement for Portuguese

Fialho, Pedro; Coheur, Luísa; Quaresma, Paulo

doi:10.1007/978-3-030-41505-1_16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12037))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

581 Accesses

Abstract

This paper describes a system to identify entailment and quantify semantic similarity among pairs of Portuguese sentences. The system relies on a corpus to build a supervised model, and employs the same features regardless of the task. Our experiments cover two types of features, contextualized embeddings and lexical features, which we evaluate separately and in combination. The model is derived from a voting strategy on an ensemble of distinct regressors, on similarity measurement, or calibrated classifiers, on entailment detection. Applying such system to other languages mainly depends on the availability of corpora, since all features are either multilingual or language independent. We obtain competitive results on a recent Portuguese corpus, where our best result is obtained by joining embeddings with lexical features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gathering Information About Word Similarity from Neighbor Sentences

Combining Transformation and Classification for Recognizing Textual Entailment

Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings

Notes

1.
https://github.com/google-research/bert.

References

Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 385–393. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2387636.2387697
Bar-Haim, R., Dagan, I., Szpektor, I.: Benchmarking applied semantic inference: the PASCAL recognising textual entailment challenges. In: Dershowitz, N., Nissan, E. (eds.) Language, Culture, Computation. Computing - Theory and Technology. LNCS, vol. 8001, pp. 409–424. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45321-2_19
Chapter Google Scholar
Barbosa, L., Cavalin, P., Guimarães, V., Kormaksson, M.: Blue man group no assin: usando representações distribuídas para similaridade semântica e inferência textual. Linguamática 8(2), 15–22 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-2
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval-2017, Vancouver, Canada, pp. 1–14. Association for Computational Linguistics, August 2017. https://doi.org/10.18653/v1/S17-2001. https://www.aclweb.org/anthology/S17-2001
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/N19-1423
Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: Inesc-id@assin: Medição de similaridade semântica e reconhecimento de inferência textual. Linguamática 8(2), 33–42 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-4
Fonseca, E., Borges dos Santos, L., Criscuolo, M., Aluísio, S.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-1
Freire, J., Pinheiro, V., Feitosa, D.: FlexSTS: um framework para similaridade semântica textual. Linguamática 8(2), 23–31 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-3
Hartmann, N.: Solo queue at assin: Combinando abordagens tradicionais e emergentes. Linguamática 8(2), 59–64 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-6
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: SemEval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval 2014, Dublin, Ireland, pp. 1–8. Association for Computational Linguistics, August 2014. https://doi.org/10.3115/v1/S14-2001. https://www.aclweb.org/anthology/S14-2001
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC-2014, Reykjavik, Iceland, pp. 216–223. European Languages Resources Association (ELRA), May 2014
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Oliveira Alves, A., Rodrigues, R., Gonçalo Oliveira, H.: ASAPP: Alinhamento semântico automático de palavras aplicado ao português. Linguamática 8(2), 43–58 (2016). https://linguamatica.com/index.php/linguamatica/article/view/v8n2-5
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 4996–5001. Association for Computational Linguistics, July 2019. https://doi.org/10.18653/v1/P19-1493
Real, L., et al.: SICK-BR: a Portuguese corpus for inference. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 303–312. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_31
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 353–355. Association for Computational Linguistics, November 2018. https://doi.org/10.18653/v1/W18-5446. https://www.aclweb.org/anthology/W18-5446
Wu, L., et al.: Word mover’s embedding: from Word2Vec to document embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4524–4534. Association for Computational Linguistics, October–November 2018. https://doi.org/10.18653/v1/D18-1482. https://www.aclweb.org/anthology/D18-1482

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e Tecnologia (FCT) with reference UID/CEC/50021/2019 and by FCT’s INCoDe 2030 initiative, in the scope of the demonstration project AIA, “Apoio Inteligente a empreendedores (chatbots)”, which also supports the scholarship of Pedro Fialho.

Author information

Authors and Affiliations

INESC-ID Lisboa, Lisbon, Portugal
Pedro Fialho, Luísa Coheur & Paulo Quaresma
Instituto Superior Tecnico, Universidade de Lisboa, Lisbon, Portugal
Luísa Coheur
Universidade de Évora, Évora, Portugal
Pedro Fialho & Paulo Quaresma

Authors

Pedro Fialho
View author publications
You can also search for this author in PubMed Google Scholar
Luísa Coheur
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Quaresma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Fialho .

Editor information

Editors and Affiliations

University of Évora, Evora, Portugal
Paulo Quaresma
University of Évora, Evora, Portugal
Renata Vieira
University of São Paulo, São Carlos, Brazil
Sandra Aluísio
University of Lisbon, Lisbon, Portugal
Helena Moniz
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
University of Évora, Evora, Portugal
Teresa Gonçalves

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fialho, P., Coheur, L., Quaresma, P. (2020). Back to the Feature, in Entailment Detection and Similarity Measurement for Portuguese. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-41505-1_16
Published: 24 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41504-4
Online ISBN: 978-3-030-41505-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics