Paraphrase Identification on the Basis of Supervised Machine Learning Techniques

Kozareva, Zornitsa; Montoyo, Andrés

doi:10.1007/11816508_52

Zornitsa Kozareva²¹ &
Andrés Montoyo²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

International Conference on Natural Language Processing (in Finland)

1683 Accesses
33 Citations

Abstract

This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexical and semantic information, as well as techniques for classifier combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barzilay, R., Lee, L.: Learning to paraphrase: An unsupervised approach using multiplesequence alignment. In: HLT-NAACL 2003: Main Proceedings, pp. 16–23 (2003)
Google Scholar
Barzilay, R., McKeown, K.: Extracting paraphrases from a parallel corpus. In: 39th Annual Meeting of the Association for Computational Linguistics, pp. 50–57 (2001)
Google Scholar
Brockett, C., Dolan, W.B.: Support vector machines for paraphrase identification and corpus construction. In: Second International Joint Conference on Natural Language Processing
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas (1960)
Google Scholar
Collobert, R., Bengio, S.: Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Article MathSciNet Google Scholar
Corley, C., Mihalcea, R.: Measures of text semantic similarity. In: Proceedings of the ACL workshop on Empirical Modeling of Semantic Equivalence
Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: Timbl: Tilburg memory-based learner. Technical Report ILK 03-10, Tilburg University (November 2003)
Google Scholar
Dagan, I., Glickman, O.: Probabilistic textual entailment: Generic applied modeling of language variability. In: PASCAL Workshop on Learning Methods for Text Understanding and Mining
Google Scholar
Dagan, I., Glickman, O., Magnini, B.: The pascal recognising textual entailment challenge. In: Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment
Google Scholar
Dolan, W.B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: International Conference on Computational Linguistics, COLING
Google Scholar
Glickman, O., Dagan, I.: Acquiring lexical paraphrases from a single corpus. In: Recent Advances in Natural Language Processing III
Google Scholar
Kozareva, Z., Montoyo, A.: The role and resolution of textual entailment in natural language processing applications. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 186–196. Springer, Heidelberg (2006)
Chapter Google Scholar
Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 71–78 (2003)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Paşca, M., Dienes, P.: Aligning needles in a haystack: Paraphrase acquisition across the web. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 119–130. Springer, Heidelberg (2005)
Chapter Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics
Google Scholar
Pedersen, T.: Assessing system agreement and instance difficulty in the lexical sample tasks of senseval-2. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
Google Scholar
Quirk, C., Brockett, C., Dolan, W.B.: Monolingual machine translation for paraphrase generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)
Google Scholar
Shinyama, Y., Sekine, S., Sudo, K., Grishman, R.: Automatic paraphrase acquisition from news articles (2002)
Google Scholar
Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: COLING (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,
Zornitsa Kozareva & Andrés Montoyo

Authors

Zornitsa Kozareva
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Montoyo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Turku Centre for Computer Science (TUCS), Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, FIN-20520, Turku, Finland
Tapio Salakoski
Turku Centre for Computer Science (TUCS) and Department of IT, University of Turku, Lemminkäisenkatu 14 A, 20520, Turku, Finland
Filip Ginter & Sampo Pyysalo &
Department of Information Technology, University of Turku, Lemminkäisenkatu 14–18 A, FIN-20520, Turku, Finland
Tapio Pahikkala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kozareva, Z., Montoyo, A. (2006). Paraphrase Identification on the Basis of Supervised Machine Learning Techniques. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_52

Download citation

DOI: https://doi.org/10.1007/11816508_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics