Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models

Nicula, Bogdan; Dascalu, Mihai; Newton, Natalie; Orcutt, Ellen; McNamara, Danielle S.

doi:10.1007/978-3-030-80421-3_36

Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models

Bogdan Nicula¹⁰,
Mihai Dascalu^10,11,
Natalie Newton¹²,
Ellen Orcutt¹³ &
…
Danielle S. McNamara¹²

Conference paper
First Online: 09 July 2021

1601 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12677))

Abstract

The ability to automatically assess the quality of paraphrases can be very useful for facilitating literacy skills and providing timely feedback to learners. Our aim is twofold: a) to automatically evaluate the quality of paraphrases across four dimensions: lexical similarity, syntactic similarity, semantic similarity and paraphrase quality, and b) to assess how well models trained for this task generalize. The task is modeled as a classification problem and three different methods are explored: a) manual feature extraction combined with an Extra Trees model, b) GloVe embeddings and a Siamese neural network, and c) using a pretrained BERT model fine-tuned on our task. Starting from a dataset of 1998 paraphrases from the User Language Paraphrase Corpus (ULPC), we explore how the three models trained on the ULPC dataset generalize when applied on a separate, small paraphrase corpus based on children inputs. The best out-of-the-box generalization performance is obtained by the Extra Trees model with at least 75% average F1-scores for the three similarity dimensions. We also show that the Siamese neural network and BERT models can obtain an improvement of at least 5% after fine-tuning across all dimensions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Qian, L., Qiu, L., Zhang, W., Jiang, X., Yu, Y.: Exploring diverse expressions for paraphrase generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3164–3173 (2019)
Google Scholar
Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1156–1165 (2014)
Google Scholar
McNamara, D.S.: SERT: self-explanation reading training. Discourse Process. 38, 1–30 (2004)
Article Google Scholar
McNamara, D.S., Ozuru, Y., Best, R., O’Reilly, T.: The 4-pronged comprehension strategy framework. In: Reading Comprehension Strategies: Theories, Interventions, and Technologies, pp. 465–496. Erlbaum, Mahwah (2007)
Google Scholar
Hawes, K.: Mastering Academic Writing: Write a Paraphrase Sentence. University of Memphis, Memphis, TN (2003)
Google Scholar
Jackson, G.T., McNamara, D.S.: Motivation and performance in a game-based intelligent tutoring system. J. Educ. Psychol. 105(4), 1036 (2013)
Article Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). arXiv, preprint: arXiv:1404.2188
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint: arXiv:1810.04805
Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of the Third International Workshop on Paraphrasing (IWP2005) (2005)
Google Scholar
Shen, D., et al.: Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms (2018). arXiv preprint: arXiv:1805.09843
Ji, Y., Eisenstein, J.: Discriminative improvements to distributional sentence similarity. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 891–896 (2013)
Google Scholar
Jiao, X., et al.: TinyBERT: Distilling BERT for natural language understanding (2019). arXiv preprint: arXiv:1909.10351
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2020)
Article Google Scholar
Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features (2019). arXiv preprint: arXiv:1908.00300
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: 2014 Conference on Empirical Methods on Natural Language Processing (EMNLP 2014), vol. 14. ACL, Doha (2014)
Google Scholar
McCarthy, P.M., McNamara, D.S.: The user-language paraphrase challenge (2008). Accessed 10 Jan 2008
Google Scholar
Dascalu, M., Crossley, S.A., McNamara, D.S., Dessus, P., Trausan-Matu, S.: Please ReaderBench this text: a multi-dimensional textual complexity assessment framework. In: Craig, S. (ed.) Tutoring and Intelligent Tutoring Systems, pp. 251–271. Nova Science Publishers Inc, Hauppauge (2018)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1965)
MathSciNet MATH Google Scholar
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. In: Workshop at ICLR, Scottsdale, AZ (2013)
Google Scholar
Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Google Scholar

Download references

Acknowledgments

The work was funded by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS – UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES – “Automated Text Evaluation and Simplification”. This research was also supported in part by the Institute of Education Sciences (R305A190063 and R305A190050) and the Office of Naval Research (N00014-17-1-2300 and N00014-19-1-2424). The opinions expressed are those of the authors and do not represent views of the IES or ONR.

Author information

Authors and Affiliations

University Politehnica of Bucharest, 313 Splaiul Independentei, 060042, Bucharest, Romania
Bogdan Nicula & Mihai Dascalu
Academy of Romanian Scientists, Str. Ilfov, Nr. 3, 050044, Bucharest, Romania
Mihai Dascalu
Department of Psychology, Arizona State University, P.O. Box 871104, Tempe, AZ, 85287, USA
Natalie Newton & Danielle S. McNamara
Department of Educational Psychology, University of Minnesota, 56 East River Road, Minneapolis, MN, 55455, USA
Ellen Orcutt

Authors

Bogdan Nicula
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Dascalu
View author publications
You can also search for this author in PubMed Google Scholar
Natalie Newton
View author publications
You can also search for this author in PubMed Google Scholar
Ellen Orcutt
View author publications
You can also search for this author in PubMed Google Scholar
Danielle S. McNamara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mihai Dascalu .

Editor information

Editors and Affiliations

Department of Computer Science, Durham University, Durham, UK
Alexandra I. Cristea
University of West Attica, Aigaleo, Greece
Christos Troussas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicula, B., Dascalu, M., Newton, N., Orcutt, E., McNamara, D.S. (2021). Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models. In: Cristea, A.I., Troussas, C. (eds) Intelligent Tutoring Systems. ITS 2021. Lecture Notes in Computer Science(), vol 12677. Springer, Cham. https://doi.org/10.1007/978-3-030-80421-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-80421-3_36
Published: 09 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80420-6
Online ISBN: 978-3-030-80421-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics