Skip to main content

Paraphrase Identification on the Basis of Supervised Machine Learning Techniques

  • Conference paper
Advances in Natural Language Processing (FinTAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

Abstract

This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexical and semantic information, as well as techniques for classifier combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barzilay, R., Lee, L.: Learning to paraphrase: An unsupervised approach using multiplesequence alignment. In: HLT-NAACL 2003: Main Proceedings, pp. 16–23 (2003)

    Google Scholar 

  2. Barzilay, R., McKeown, K.: Extracting paraphrases from a parallel corpus. In: 39th Annual Meeting of the Association for Computational Linguistics, pp. 50–57 (2001)

    Google Scholar 

  3. Brockett, C., Dolan, W.B.: Support vector machines for paraphrase identification and corpus construction. In: Second International Joint Conference on Natural Language Processing

    Google Scholar 

  4. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas (1960)

    Google Scholar 

  5. Collobert, R., Bengio, S.: Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)

    Article  MathSciNet  Google Scholar 

  6. Corley, C., Mihalcea, R.: Measures of text semantic similarity. In: Proceedings of the ACL workshop on Empirical Modeling of Semantic Equivalence

    Google Scholar 

  7. Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: Timbl: Tilburg memory-based learner. Technical Report ILK 03-10, Tilburg University (November 2003)

    Google Scholar 

  8. Dagan, I., Glickman, O.: Probabilistic textual entailment: Generic applied modeling of language variability. In: PASCAL Workshop on Learning Methods for Text Understanding and Mining

    Google Scholar 

  9. Dagan, I., Glickman, O., Magnini, B.: The pascal recognising textual entailment challenge. In: Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment

    Google Scholar 

  10. Dolan, W.B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: International Conference on Computational Linguistics, COLING

    Google Scholar 

  11. Glickman, O., Dagan, I.: Acquiring lexical paraphrases from a single corpus. In: Recent Advances in Natural Language Processing III

    Google Scholar 

  12. Kozareva, Z., Montoyo, A.: The role and resolution of textual entailment in natural language processing applications. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 186–196. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 71–78 (2003)

    Google Scholar 

  14. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  15. Paşca, M., Dienes, P.: Aligning needles in a haystack: Paraphrase acquisition across the web. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 119–130. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics

    Google Scholar 

  17. Pedersen, T.: Assessing system agreement and instance difficulty in the lexical sample tasks of senseval-2. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

    Google Scholar 

  18. Quirk, C., Brockett, C., Dolan, W.B.: Monolingual machine translation for paraphrase generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing

    Google Scholar 

  19. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)

    Google Scholar 

  20. Shinyama, Y., Sekine, S., Sudo, K., Grishman, R.: Automatic paraphrase acquisition from news articles (2002)

    Google Scholar 

  21. Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: COLING (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kozareva, Z., Montoyo, A. (2006). Paraphrase Identification on the Basis of Supervised Machine Learning Techniques. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_52

Download citation

  • DOI: https://doi.org/10.1007/11816508_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37334-6

  • Online ISBN: 978-3-540-37336-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics