Skip to main content

Semi-supervised Textual Entailment on Indonesian Wikipedia Data

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2018)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13396))

Abstract

Recognizing Textual Entailment (RTE) is a research in Natural Language Processing that aims to identify whether there is an entailment relation between two texts. Textual Entailment has been studied in a variety of languages, but it is rare for the Indonesian language. The purpose of the work presented in this paper is to conduct the RTE experiment on Indonesian language dataset. Since manual data creation is costly and time consuming, we choose semi-supervised machine learning approach. We apply co-training algorithm to enlarge small amounts of annotated data, called seeds. With this method, the human effort only needed to annotate the seeds. The data resource used is all from Wikipedia and the entailment pairs are extracted from its revision history. Evaluation of 1,857 sentence pairs labelled with entailment information using our approach achieve accuracy 76%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://dumps.wikimedia.org/.

  2. 2.

    https://github.com/attardi/wikiextractor.

References

  1. Alabbas, M.: A dataset for Arabic textual entailment. In: RANLP, pp. 7–13 (2013)

    Google Scholar 

  2. Androutsopoulos, I., Malakasiotis, P.: A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38(1), 135–187 (2010)

    Article  Google Scholar 

  3. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100. Madison, WI (1998)

    Google Scholar 

  4. Bos, J., Markert, K.: Recognising textual entailment with logical inference. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (2005)

    Google Scholar 

  5. Bos, J., Zanzotto, F.M., Pennacchiotti, M.: Textual entailment at evalita 2009. In: Proceedings of EVALITA 2009, pp. 1–7 (2009)

    Google Scholar 

  6. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2015)

    Google Scholar 

  7. Burger, J., Ferro, L.: Generating an entailment corpus from news headlines. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. EMSEE 2005, Stroudsburg, PA, USA, pp. 49–54. Association for Computational Linguistics (2005)

    Google Scholar 

  8. Clinchant, S., Goutte, C., Gaussier, E.: Lexical entailment for information retrieval. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 217–228. Springer, Heidelberg (2006). https://doi.org/10.1007/11735106_20

    Chapter  Google Scholar 

  9. Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9

    Chapter  Google Scholar 

  10. Ríos Gaona, M.A., Gelbukh, A., Bandyopadhyay, S.: Recognizing textual entailment using a machine learning approach. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds.) MICAI 2010. LNCS (LNAI), vol. 6438, pp. 177–185. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16773-7_15

    Chapter  Google Scholar 

  11. Giampiccolo, D., Dang, H.T., Magnini, B., Dagan, I., Cabrio, E., Dolan, B.: The fourth PASCAL recognizing textual entailment challenge. In: Proceedings of the Fourth Text Analysis Conference, TAC 2008, Gaithersburg, Maryland, USA, November 17–19, 2008 (2008)

    Google Scholar 

  12. Giampiccolo, D., Magnini, B., Dagan, I., Dolan, B.: The third pascal recognizing textual entailment challenge. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. RTE 2007, Stroudsburg, PA, USA, pp. 1–9, Association for Computational Linguistics (2007)

    Google Scholar 

  13. Hickl, A., Williams, J., Bensley, J., Roberts, K., Rink, B., Shi, Y.: Recognizing textual entailment with LCC’s groundhog system. In: Proceedings of the Second PASCAL Challenges Workshop, vol. 18 (2006)

    Google Scholar 

  14. Inkpen, D., Kipp, D., Nastase, V.: Machine learning experiments for textual entailment. In: Proceedings of the Second PASCAL Challenges Workshop on Recognizing Textual Entailment, pp. 10–15 (2006)

    Google Scholar 

  15. Kozareva, Z., Montoyo, A.: MLENT: the machine learning entailment system of the university of Alicante. In: Proceedings of the Second PASCAL Challenges Workshop on Recognizing Textual Entailment, pp. 10–15 (2006)

    Google Scholar 

  16. Landis, J., Koch, G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  Google Scholar 

  17. Lloret, E., Ferrández, Ó., Muñoz, R., Palomar, M.: A text summarization approach under the influence of textual entailment. In: NLPCS (2008)

    Google Scholar 

  18. Malakasiotis, P., Androutsopoulos, I.: Learning textual entailment using SVMs and string similarity measures. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. RTE 2007, Stroudsburg, PA, USA, pp. 42–47. Association for Computational Linguistics (2007)

    Google Scholar 

  19. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., bernardi, R., Zamparelli, R.: A sick cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). European Language Resources Association (ELRA) (2014)

    Google Scholar 

  20. Marzelou, E., Zourari, M., Giouli, V., Piperidis, S.: Building a greek corpus for textual entailment. In: LREC (2008)

    Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013) (2013). http://arxiv.org/abs/1301.3781

  22. Negri, M., Kouylekov, M., Magnini, B.: Detecting expected answer relations through textual entailment. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 532–543. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78135-6_46

    Chapter  Google Scholar 

  23. Padó, S., Galley, M., Jurafsky, D., Manning, C.: Robust machine translation evaluation with entailment features. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pp. 297–305. Association for Computational Linguistics (2009)

    Google Scholar 

  24. Peñas, A., Rodrigo, Á., Verdejo, F.: SPARTE, a test suite for recognising textual entailment in Spanish. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 275–286. Springer, Heidelberg (2006). https://doi.org/10.1007/11671299_29

    Chapter  Google Scholar 

  25. Rocktäschel, T., Grefenstette, E., Hermann, K.M., Kociský, T., Blunsom, P.: Reasoning about entailment with neural attention. In: Proceedings of the International Conference on Learning Representations (ICLR 2016) (2016). http://arxiv.org/abs/1509.06664

  26. Tatu, M., Moldovan, D.: A semantic approach to recognizing textual entailment. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT 2005, Stroudsburg, PA, USA, pp. 371–378, Association for Computational Linguistics (2005)

    Google Scholar 

  27. Wang, S., Jiang, J.: Learning natural language inference with LSTM. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 1442–1451. Association for Computational Linguistics, June 2016

    Google Scholar 

  28. Zanzotto, F.M., Pennacchiotti, M.: Expanding textual entailment corpora from wikipedia using co-training. In: Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. Coling 2010 Organizing Committee, Beijing, China, pp. 28–36, August 2010

    Google Scholar 

  29. Zanzotto, F.M., Pennacchiotti, M., Moschitti, A.: A machine learning approach to textual entailment recognition. Nat. Lang. Eng. 15(4), 551–582 (2009)

    Article  Google Scholar 

  30. Zeller, B., Padó, S.: A search task dataset for german textual entailment. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS), pp. 288–299. Potsdam (2013)

    Google Scholar 

Download references

Acknowledgement

We thank to Mirna Adriani for input and comments. This research was partially supported by PITTA UI Grant Contract No. 410/UN2.R3.1/HKP.05.00/2017. The first author was also partially funded by Bukalapak.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahmad Mahendra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Setya, K.N., Mahendra, R. (2023). Semi-supervised Textual Entailment on Indonesian Wikipedia Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23793-5_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23792-8

  • Online ISBN: 978-3-031-23793-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics