Skip to main content

Back to the Feature, in Entailment Detection and Similarity Measurement for Portuguese

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2020)

Abstract

This paper describes a system to identify entailment and quantify semantic similarity among pairs of Portuguese sentences. The system relies on a corpus to build a supervised model, and employs the same features regardless of the task. Our experiments cover two types of features, contextualized embeddings and lexical features, which we evaluate separately and in combination. The model is derived from a voting strategy on an ensemble of distinct regressors, on similarity measurement, or calibrated classifiers, on entailment detection. Applying such system to other languages mainly depends on the availability of corpora, since all features are either multilingual or language independent. We obtain competitive results on a recent Portuguese corpus, where our best result is obtained by joining embeddings with lexical features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/google-research/bert.

References

  1. Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 385–393. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2387636.2387697

  2. Bar-Haim, R., Dagan, I., Szpektor, I.: Benchmarking applied semantic inference: the PASCAL recognising textual entailment challenges. In: Dershowitz, N., Nissan, E. (eds.) Language, Culture, Computation. Computing - Theory and Technology. LNCS, vol. 8001, pp. 409–424. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45321-2_19

    Chapter  Google Scholar 

  3. Barbosa, L., Cavalin, P., Guimarães, V., Kormaksson, M.: Blue man group no assin: usando representações distribuídas para similaridade semântica e inferência textual. Linguamática 8(2), 15–22 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-2

  4. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval-2017, Vancouver, Canada, pp. 1–14. Association for Computational Linguistics, August 2017. https://doi.org/10.18653/v1/S17-2001. https://www.aclweb.org/anthology/S17-2001

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/N19-1423

  6. Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: Inesc-id@assin: Medição de similaridade semântica e reconhecimento de inferência textual. Linguamática 8(2), 33–42 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-4

  7. Fonseca, E., Borges dos Santos, L., Criscuolo, M., Aluísio, S.: Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática 8(2), 3–13 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-1

  8. Freire, J., Pinheiro, V., Feitosa, D.: FlexSTS: um framework para similaridade semântica textual. Linguamática 8(2), 23–31 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-3

  9. Hartmann, N.: Solo queue at assin: Combinando abordagens tradicionais e emergentes. Linguamática 8(2), 59–64 (2016). https://www.linguamatica.com/index.php/linguamatica/article/view/v8n2-6

  10. Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: SemEval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval 2014, Dublin, Ireland, pp. 1–8. Association for Computational Linguistics, August 2014. https://doi.org/10.3115/v1/S14-2001. https://www.aclweb.org/anthology/S14-2001

  11. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC-2014, Reykjavik, Iceland, pp. 216–223. European Languages Resources Association (ELRA), May 2014

    Google Scholar 

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  13. Oliveira Alves, A., Rodrigues, R., Gonçalo Oliveira, H.: ASAPP: Alinhamento semântico automático de palavras aplicado ao português. Linguamática 8(2), 43–58 (2016). https://linguamatica.com/index.php/linguamatica/article/view/v8n2-5

  14. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  15. Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 4996–5001. Association for Computational Linguistics, July 2019. https://doi.org/10.18653/v1/P19-1493

  16. Real, L., et al.: SICK-BR: a Portuguese corpus for inference. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 303–312. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_31

    Chapter  Google Scholar 

  17. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  18. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 353–355. Association for Computational Linguistics, November 2018. https://doi.org/10.18653/v1/W18-5446. https://www.aclweb.org/anthology/W18-5446

  19. Wu, L., et al.: Word mover’s embedding: from Word2Vec to document embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4524–4534. Association for Computational Linguistics, October–November 2018. https://doi.org/10.18653/v1/D18-1482. https://www.aclweb.org/anthology/D18-1482

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e Tecnologia (FCT) with reference UID/CEC/50021/2019 and by FCT’s INCoDe 2030 initiative, in the scope of the demonstration project AIA, “Apoio Inteligente a empreendedores (chatbots)”, which also supports the scholarship of Pedro Fialho.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Fialho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fialho, P., Coheur, L., Quaresma, P. (2020). Back to the Feature, in Entailment Detection and Similarity Measurement for Portuguese. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41505-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41504-4

  • Online ISBN: 978-3-030-41505-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics