Abstract
Ensembling well performing models has been proved to outperform individual models in semantic textual similarity task; however, employing existing models still remains a challenge. In this paper, we tackle this issue by providing a service oriented system to index a text similarity model using RESTful services. We also propose a baseline approach, based on an effective penalty-award weighting schema and word-level edit distance, in which pairs of sentences are divided into two main categories based on the number of substitution, insert, and delete required to convert the first sentence to the second one. It is debated that, when the word-level edit distance is very small, it is wiser to measure dissimilarity than similarity. Using knowledge bases along with common natural language processing tools, the proposed method tries to enhance the accuracy of measuring similarity between two sentences. We compared the proposed method with existing approaches, and we found that it produces promising results. Our source code is freely available onĀ GitLab.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
International Workshop on Semantic Evaluation.
- 2.
- 3.
It means only word similarity Type 1 to 5 positively contribute to the semantic similarity of two sentences.
- 4.
- 5.
References
Afzal, N., Wang, Y., Liu, H.: MayoNLP at SemEval-2016 task 1: semantic textual similarity based on lexical semantic net and deep learning semantic model. In: SemEval@ NAACL-HLT, pp. 674ā679 (2016)
Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T, Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and Pilot on interpretability. In: SemEval@ NAACL-HLT, pp. 252ā263 (2015)
Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T, Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Wiebe, J.: SemEval-2014 task 10: multilingual semantic textual similarity. In: SemEval@ COLING, pp. 81ā91 (2014)
Agirre, E., Banea, C., Cer, D.M., Diab, M.T., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., Wiebe, J.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: SemEval@ NAACL-HLT, pp. 497ā511 (2016)
Banjade, R., Niraula, N.B., Maharjan, N., Rus, V., Stefanescu, D., Lintean, M.C., Gautam, D.: NeRoSim: a system for measuring and interpreting semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 164ā171 (2015)
BƤr, D., Zesch, T., Gurevych, I.: DKPro similarity: an open source framework for text similarity. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, pp. 121ā126. Association for Computational Linguistics, August 2013
Beheshti, S.-M.-R., Benatallah, B., Venugopal, S., Ryu, S.H., Motahari-Nezhad, H.R., Wang, W.: A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing 99(4), 313ā349 (2017)
Beheshti, S.-M.-R., Nezhad, H.R.M., Benatallah, B.: Temporal provenance model (TPM): model and query language. CoRR, abs/1211.5009 (2012)
Beheshti, S.-M.-R., Venugopal, S., Ryu, S.H., Benatallah, B., Wang, W.: Big data and cross-document coreference resolution: current state and future opportunities. CoRR, abs/1311.3987 (2013)
Brychcin, T., Svoboda, L.: UWB at SemEval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval@ NAACL-HLT, pp. 588ā594 (2016)
Campagna, G., Ramesh, R., Xu, S., Fischer, M., Lam, M.S.: Almond: the architecture of an open, crowdsourced, privacy-preserving, programmable virtual assistant. In: Proceedings of the 26th International Conference on World Wide Web, pp. 341ā350. International World Wide Web Conferences Steering Committee (2017)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
Han, L., Martineau, J., Cheng, D., Thomas, C.: Samsung: align-and-differentiate approach to semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 172ā177 (2015)
HƤnig, C., Remus, R., De La Puente, X.: ExB themis: extensive feature extraction from word alignments for semantic textual similarity. In: SemEval@ NAACL-HLT, pp. 264ā268 (2015)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79ā86 (1951)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1188ā1196 (2014)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford coreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55ā60 (2014)
Maurer, H.A., Kappe, F., Zaka, B.: Plagiarism-a survey. J. UCS 12(8), 1050ā1084 (2006)
Mawson, C.O.S.: Rogetās thesaurus of english words and phrases (1976)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111ā3119 (2013)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39ā41 (1995)
Monge, A.E., Elkan, C., et al.: The field matching problem: algorithms and applications. In: KDD, pp. 267ā270 (1996)
Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 216ā225. Association for Computational Linguistics (2010)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532ā1543 (2014)
PrzybyÅa, P., Nguyen, N.T.H., Shardlow, M., Kontonatsios, G., Ananiadou, S.: NaCTeM at SemEval-2016 task 1: inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), pp. 614ā620 (2016)
Rychalska, B., Pakulska, K. Chodorowska, K., Walczak, W., Andruszkiewicz, P.: Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: SemEval@ NAACL-HLT, pp. 602ā608 (2016)
Sultan, M.A., Bethard, S., Sumner, T.: DLS@CU: sentence similarity from word alignment. In: SemEval@ COLING, pp. 241ā246 (2014)
Sultan, M.A., Bethard, S., Sumner, T.: DLS@CU: sentence similarity from word alignment and semantic vector composition. In: SemEval@ NAACL-HLT, pp. 148ā153 (2015)
Å ariÄ, F., GlavaÅ”, G., Karan, M., Å najder, J., BaÅ”iÄ, B.D.: TakeLab: systems for measuring semantic text similarity. In: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), MontrĆ©al, Canada, pp. 441ā448. Association for Computational Linguistics, 7ā8 June 2012
Yaghoub-Zadeh-Fard, M.A., Minaei-Bidgoli, B., Rahmani, S., Shahrivari, S.: PSWG: an automatic stop-word list generator for Persian information retrieval systems based on similarity function pos information. In: 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 111ā117, November 2015
Yaghoub-Zadeh-Fard, M.A., Rahmani, S., Kashefi, O., Minaei-Bidgoli, B.: An efficient set of parts of speech in Persian information retrieval systems (1394)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2018 Springer International Publishing AG
About this paper
Cite this paper
Fakouri-Kapourchali, R., Yaghoub-Zadeh-Fard, MA., Khalili, M. (2018). Semantic Textual Similarity as a Service. In: Beheshti, A., Hashmi, M., Dong, H., Zhang, W. (eds) Service Research and Innovation. ASSRI ASSRI 2015 2017. Lecture Notes in Business Information Processing, vol 234. Springer, Cham. https://doi.org/10.1007/978-3-319-76587-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-76587-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76586-0
Online ISBN: 978-3-319-76587-7
eBook Packages: Computer ScienceComputer Science (R0)