Skip to main content

Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12677))

Abstract

The ability to automatically assess the quality of paraphrases can be very useful for facilitating literacy skills and providing timely feedback to learners. Our aim is twofold: a) to automatically evaluate the quality of paraphrases across four dimensions: lexical similarity, syntactic similarity, semantic similarity and paraphrase quality, and b) to assess how well models trained for this task generalize. The task is modeled as a classification problem and three different methods are explored: a) manual feature extraction combined with an Extra Trees model, b) GloVe embeddings and a Siamese neural network, and c) using a pretrained BERT model fine-tuned on our task. Starting from a dataset of 1998 paraphrases from the User Language Paraphrase Corpus (ULPC), we explore how the three models trained on the ULPC dataset generalize when applied on a separate, small paraphrase corpus based on children inputs. The best out-of-the-box generalization performance is obtained by the Extra Trees model with at least 75% average F1-scores for the three similarity dimensions. We also show that the Siamese neural network and BERT models can obtain an improvement of at least 5% after fine-tuning across all dimensions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Qian, L., Qiu, L., Zhang, W., Jiang, X., Yu, Y.: Exploring diverse expressions for paraphrase generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3164–3173 (2019)

    Google Scholar 

  2. Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1156–1165 (2014)

    Google Scholar 

  3. McNamara, D.S.: SERT: self-explanation reading training. Discourse Process. 38, 1–30 (2004)

    Article  Google Scholar 

  4. McNamara, D.S., Ozuru, Y., Best, R., O’Reilly, T.: The 4-pronged comprehension strategy framework. In: Reading Comprehension Strategies: Theories, Interventions, and Technologies, pp. 465–496. Erlbaum, Mahwah (2007)

    Google Scholar 

  5. Hawes, K.: Mastering Academic Writing: Write a Paraphrase Sentence. University of Memphis, Memphis, TN (2003)

    Google Scholar 

  6. Jackson, G.T., McNamara, D.S.: Motivation and performance in a game-based intelligent tutoring system. J. Educ. Psychol. 105(4), 1036 (2013)

    Article  Google Scholar 

  7. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). arXiv, preprint: arXiv:1404.2188

  8. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint: arXiv:1810.04805

  9. Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of the Third International Workshop on Paraphrasing (IWP2005) (2005)

    Google Scholar 

  10. Shen, D., et al.: Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms (2018). arXiv preprint: arXiv:1805.09843

  11. Ji, Y., Eisenstein, J.: Discriminative improvements to distributional sentence similarity. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 891–896 (2013)

    Google Scholar 

  12. Jiao, X., et al.: TinyBERT: Distilling BERT for natural language understanding (2019). arXiv preprint: arXiv:1909.10351

  13. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2020)

    Article  Google Scholar 

  14. Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features (2019). arXiv preprint: arXiv:1908.00300

  15. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: 2014 Conference on Empirical Methods on Natural Language Processing (EMNLP 2014), vol. 14. ACL, Doha (2014)

    Google Scholar 

  16. McCarthy, P.M., McNamara, D.S.: The user-language paraphrase challenge (2008). Accessed 10 Jan 2008

    Google Scholar 

  17. Dascalu, M., Crossley, S.A., McNamara, D.S., Dessus, P., Trausan-Matu, S.: Please ReaderBench this text: a multi-dimensional textual complexity assessment framework. In: Craig, S. (ed.) Tutoring and Intelligent Tutoring Systems, pp. 251–271. Nova Science Publishers Inc, Hauppauge (2018)

    Google Scholar 

  18. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1965)

    MathSciNet  MATH  Google Scholar 

  19. Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. In: Workshop at ICLR, Scottsdale, AZ (2013)

    Google Scholar 

  22. Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)

    Google Scholar 

Download references

Acknowledgments

The work was funded by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS – UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES – “Automated Text Evaluation and Simplification”. This research was also supported in part by the Institute of Education Sciences (R305A190063 and R305A190050) and the Office of Naval Research (N00014-17-1-2300 and N00014-19-1-2424). The opinions expressed are those of the authors and do not represent views of the IES or ONR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Dascalu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nicula, B., Dascalu, M., Newton, N., Orcutt, E., McNamara, D.S. (2021). Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models. In: Cristea, A.I., Troussas, C. (eds) Intelligent Tutoring Systems. ITS 2021. Lecture Notes in Computer Science(), vol 12677. Springer, Cham. https://doi.org/10.1007/978-3-030-80421-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80421-3_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80420-6

  • Online ISBN: 978-3-030-80421-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics