Matching, Re-Ranking and Scoring: Learning Textual Similarity by Incorporating Dependency Graph Alignment and Coverage Features

Kohail, Sarah; Biemann, Chris

doi:10.1007/978-3-319-77113-7_30

Sarah Kohail¹⁴ &
Chris Biemann¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

938 Accesses
1 Citations

Abstract

In this work, we introduce a supervised model for learning textual similarity, which can identify and score similarity between a set of candidate texts and a given query text. By combining dependency graph similarity and coverage features with lexical similarity measures using neural networks, we show that most relevant documents to a given text can be more accurately ranked and scored than if the lexical similarity measures were used in isolation. Additionally, we introduce an approximate dependency subgraph alignment approach allowing node gaps and mismatch, where a certain word in one dependency graph cannot be mapped to any word in the other graph. We apply our model to two different applications, namely re-ranking for improving document retrieval precision on a new dataset, and automatic short answer scoring on a standard dataset. Experimental results indicate that our approach is easily adaptable to different tasks and languages, and works well for long texts as well as short texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://lucene.apache.org/.
2.
http://www.spiegel.de/.
3.
http://duc.nist.gov/.
4.
www.jobimtext.org/jobimviz-web-demo/api-and-demo-documentation/.
5.
Learning rate = 0.5, momentum = 0.2.
6.
http://web.eecs.umich.edu/~mihalcea/downloads.html#saga.
7.
http://nlp.stanford.edu/software/lex-parser.shtml.
8.
Learning rate = 0.3, momentum = 0.2.

References

Amiri, H., Resnik, P., Boyd-Graber, J., Daumé III, H.: Learning text pair similarity with context-sensitive autoencoders. In: ACL, Berlin, Germany, pp. 1882–1892 (2016)
Google Scholar
Chen, D., Bolton, J., Manning, C.D.: A thorough examination of the CNN/daily mail reading comprehension task. In: ACL, Berlin, Germany, pp. 2358–2367 (2016)
Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, Montreal, Canada, pp. 1693–1701 (2015)
Google Scholar
Weston, J., Bordes, A., Chopra, S., Mikolov, T.: Towards AI-complete question answering: A set of prerequisite toy tasks. CoRR (2015)
Google Scholar
Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: EACL 2009, Athens, Greece, pp. 567–575 (2009)
Google Scholar
Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: ACL-HLT. HLT 2011, Portland, Oregon, USA, pp. 752–762 (2011)
Google Scholar
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: A pilot on semantic textual similarity. In: SemEval, Montreal, Canada, pp. 385–393 (2012)
Google Scholar
Agirre, E., et al.: Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In: SemEval, San Diego, California, pp. 497–511 (2016)
Google Scholar
Agirre, E., Gonzalez-Agirre, A., Lopez-Gazpio, I., Maritxalar, M., Rigau, G., Uria, L.: Semeval-2016 task 2: Interpretable semantic textual similarity, California, San Diego, pp. 512–524 (2016)
Google Scholar
Rychalska, B., Pakulska, K., Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung poland nlp team at semeval-2016 task 1: necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity. In: SemEval, San Diego, California, pp. 602–608 (2016)
Google Scholar
Brychcín, T., Svoboda, L.: Uwb at semeval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval, San Diego, California, pp. 588–594 (2016)
Google Scholar
Afzal, N., Wang, Y., Liu, H.: Mayonlp at semeval-2016 task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In: SemEval, San Diego, California, pp. 674–679 (2016)
Google Scholar
Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA (2016)
Google Scholar
Jansen, B.J., Spink, A.H.: Investigating customer click through behaviour with integrated sponsored and nonsponsored results. Int. J. Internet Market. Adv. 5, 74–94 (2009)
Google Scholar
Hagen, M., Völske, M., Göring, S., Stein, B.: Axiomatic result re-ranking. In: CIKM 2016, Indianapolis, Indiana (2016)
Google Scholar
Yang, S., Lu, W., Yang, D., Yao, L., Wei, B.: Short text understanding by leveraging knowledge into topic model. In: NAACL: HLT, pp. 1232–1237. Association for Computational Linguistics, Denver (2015)
Google Scholar
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, Santiago, Chile, pp. 373–382 (2015)
Google Scholar
Gu, Y., Yang, Z., Zhou, J., Qu, W., Wei, J., Shi, X.: A fast approach for semantic similar short texts retrieval. In: ACL, Berlin, Germany, pp. 89–94 (2016)
Google Scholar
Pilehvar, M.T., Navigli, R.: From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)
Article MathSciNet Google Scholar
Ramachandran, L., Cheng, J., Foltz, P.: Identifying patterns for short answer scoring using graph-based lexico-semantic text matching. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, Denver, Colorado, USA, pp. 97–106 (2015)
Google Scholar
Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: NAACL: HLT, San Diego, California, pp. 1070–1075 (2016)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Kohail, S.: Unsupervised topic-specific domain dependency graphs for aspect identification in sentiment analysis. In: Student Research Workshop Associated with RANLP 2015, Hissar, Bulgaria, pp. 16–23 (2015)
Google Scholar
Biemann, C., Riedl, M.: Text: Now in 2D! a framework for lexical expansion with contextual similarity. J. Lang. Model. 1, 55–95 (2013)
Article Google Scholar
Albalate, A., Minker, W.: Semi-Supervised and Unervised Machine Learning: Novel Strategies. Wiley (2013)
Google Scholar
Buckley, C., Singhal, A., Mitra, M., Salton, G.: New retrieval approaches using smart: Trec 4. In: TREC, Gaithersburg, Maryland, pp. 25–48 (1995)
Google Scholar
Singhal, A., Salton, G., Buckley, C.: Length normalization in degraded text collections. In: Proceedings of Fifth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, USA, pp. 15–17 (1996)
Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)
Article MathSciNet Google Scholar
Phan, X.H., Nguyen, C.T.: GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA) (2007). http://gibbslda.sourceforge.net
Benikova, D., Yimam, S.M., Santhanam, P., Biemann, C.: GermaNER: Free Open German Named Entity Recognition Tool. In: GSCL, Duisburg-Essen, Germany, pp. 31–38 (2015)
Google Scholar
Ruppert, E., Klesy, J., Riedl, M., Biemann, C.: Rule-based Dependency Parse Collapsing and Propagation for German and English. In: GSCL, Duisburg-Essen, Germany, pp. 58–66 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Group, Computer Science Department, Universität Hamburg, Hamburg, Germany
Sarah Kohail & Chris Biemann

Authors

Sarah Kohail
View author publications
You can also search for this author in PubMed Google Scholar
Chris Biemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarah Kohail .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kohail, S., Biemann, C. (2018). Matching, Re-Ranking and Scoring: Learning Textual Similarity by Incorporating Dependency Graph Alignment and Coverage Features. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-77113-7_30
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics