Skip to main content

Matching, Re-Ranking and Scoring: Learning Textual Similarity by Incorporating Dependency Graph Alignment and Coverage Features

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Abstract

In this work, we introduce a supervised model for learning textual similarity, which can identify and score similarity between a set of candidate texts and a given query text. By combining dependency graph similarity and coverage features with lexical similarity measures using neural networks, we show that most relevant documents to a given text can be more accurately ranked and scored than if the lexical similarity measures were used in isolation. Additionally, we introduce an approximate dependency subgraph alignment approach allowing node gaps and mismatch, where a certain word in one dependency graph cannot be mapped to any word in the other graph. We apply our model to two different applications, namely re-ranking for improving document retrieval precision on a new dataset, and automatic short answer scoring on a standard dataset. Experimental results indicate that our approach is easily adaptable to different tasks and languages, and works well for long texts as well as short texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://lucene.apache.org/.

  2. 2.

    http://www.spiegel.de/.

  3. 3.

    http://duc.nist.gov/.

  4. 4.

    www.jobimtext.org/jobimviz-web-demo/api-and-demo-documentation/.

  5. 5.

    Learning rate = 0.5, momentum = 0.2.

  6. 6.

    http://web.eecs.umich.edu/~mihalcea/downloads.html#saga.

  7. 7.

    http://nlp.stanford.edu/software/lex-parser.shtml.

  8. 8.

    Learning rate = 0.3, momentum = 0.2.

References

  1. Amiri, H., Resnik, P., Boyd-Graber, J., Daumé III, H.: Learning text pair similarity with context-sensitive autoencoders. In: ACL, Berlin, Germany, pp. 1882–1892 (2016)

    Google Scholar 

  2. Chen, D., Bolton, J., Manning, C.D.: A thorough examination of the CNN/daily mail reading comprehension task. In: ACL, Berlin, Germany, pp. 2358–2367 (2016)

    Google Scholar 

  3. Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, Montreal, Canada, pp. 1693–1701 (2015)

    Google Scholar 

  4. Weston, J., Bordes, A., Chopra, S., Mikolov, T.: Towards AI-complete question answering: A set of prerequisite toy tasks. CoRR (2015)

    Google Scholar 

  5. Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: EACL 2009, Athens, Greece, pp. 567–575 (2009)

    Google Scholar 

  6. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: ACL-HLT. HLT 2011, Portland, Oregon, USA, pp. 752–762 (2011)

    Google Scholar 

  7. Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: SemEval, Montreal, Canada, pp. 385–393 (2012)

    Google Scholar 

  8. Agirre, E., et al.: Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In: SemEval, San Diego, California, pp. 497–511 (2016)

    Google Scholar 

  9. Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: Semeval-2016 task 2: Interpretable semantic textual similarity, California, San Diego, pp. 512–524 (2016)

    Google Scholar 

  10. Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung poland nlp team at semeval-2016 task 1: necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity. In: SemEval, San Diego, California, pp. 602–608 (2016)

    Google Scholar 

  11. Brychcín, T., Svoboda, L.: Uwb at semeval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval, San Diego, California, pp. 588–594 (2016)

    Google Scholar 

  12. Afzal, N., Wang, Y., Liu, H.: Mayonlp at semeval-2016 task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In: SemEval, San Diego, California, pp. 674–679 (2016)

    Google Scholar 

  13. Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA (2016)

    Google Scholar 

  14. Jansen, B.J., Spink, A.H.: Investigating customer click through behaviour with integrated sponsored and nonsponsored results. Int. J. Internet Market. Adv. 5, 74–94 (2009)

    Google Scholar 

  15. Hagen, M., Völske, M., Göring, S., Stein, B.: Axiomatic result re-ranking. In: CIKM 2016, Indianapolis, Indiana (2016)

    Google Scholar 

  16. Yang, S., Lu, W., Yang, D., Yao, L., Wei, B.: Short text understanding by leveraging knowledge into topic model. In: NAACL: HLT, pp. 1232–1237. Association for Computational Linguistics, Denver (2015)

    Google Scholar 

  17. Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, pp. 373–382 (2015)

    Google Scholar 

  18. Gu, Y., Yang, Z., Zhou, J., Qu, W., Wei, J., Shi, X.: A fast approach for semantic similar short texts retrieval. In: ACL, Berlin, Germany, pp. 89–94 (2016)

    Google Scholar 

  19. Pilehvar, M.T., Navigli, R.: From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)

    Article  MathSciNet  Google Scholar 

  20. Ramachandran, L., Cheng, J., Foltz, P.: Identifying patterns for short answer scoring using graph-based lexico-semantic text matching. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, Denver, Colorado, USA, pp. 97–106 (2015)

    Google Scholar 

  21. Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: NAACL: HLT, San Diego, California, pp. 1070–1075 (2016)

    Google Scholar 

  22. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  23. Kohail, S.: Unsupervised topic-specific domain dependency graphs for aspect identification in sentiment analysis. In: Student Research Workshop Associated with RANLP 2015, Hissar, Bulgaria, pp. 16–23 (2015)

    Google Scholar 

  24. Biemann, C., Riedl, M.: Text: Now in 2D! a framework for lexical expansion with contextual similarity. J. Lang. Model. 1, 55–95 (2013)

    Article  Google Scholar 

  25. Albalate, A., Minker, W.: Semi-Supervised and Unervised Machine Learning: Novel Strategies. Wiley (2013)

    Google Scholar 

  26. Buckley, C., Singhal, A., Mitra, M., Salton, G.: New retrieval approaches using smart: Trec 4. In: TREC, Gaithersburg, Maryland, pp. 25–48 (1995)

    Google Scholar 

  27. Singhal, A., Salton, G., Buckley, C.: Length normalization in degraded text collections. In: Proceedings of Fifth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, USA, pp. 15–17 (1996)

    Google Scholar 

  28. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)

    Article  MathSciNet  Google Scholar 

  29. Phan, X.H., Nguyen, C.T.: GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA) (2007). http://gibbslda.sourceforge.net

  30. Benikova, D., Yimam, S.M., Santhanam, P., Biemann, C.: GermaNER: Free Open German Named Entity Recognition Tool. In: GSCL, Duisburg-Essen, Germany, pp. 31–38 (2015)

    Google Scholar 

  31. Ruppert, E., Klesy, J., Riedl, M., Biemann, C.: Rule-based Dependency Parse Collapsing and Propagation for German and English. In: GSCL, Duisburg-Essen, Germany, pp. 58–66 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarah Kohail .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kohail, S., Biemann, C. (2018). Matching, Re-Ranking and Scoring: Learning Textual Similarity by Incorporating Dependency Graph Alignment and Coverage Features. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77113-7_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77112-0

  • Online ISBN: 978-3-319-77113-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics