Abstract
The rapid growth of World Wide Web has extended Information Retrieval related technology such as queries for information needs become more easily accessible. One such platform is online question answering (QA). Online community can posting questions and get direct response for their special information needs using various platforms. It creates large unorganized repositories of valuable knowledge resources. Effective QA retrieval is required to make these repositories accessible to fulfill users information requests quickly. The repositories might contained similar questions and answer to users newly asked question. This paper explores the similarity-based models for the QA system to rank search result candidates. We used Damerau-Levenshtein distance and cosine similarity model to obtain ranking scores between the question posted by the registered user and a similar candidate questions in repository. Empirical experimental results indicate that our proposed ensemble models are very encouraging and give a significantly better similarity value to improve search ranking results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anson, S., Watson, H., Wadhwa, K., Metz, K.: Analysing social media data for disaster preparedness: understanding the opportunities and barriers faced by humanitarian actors. Int. J. Disaster Risk Reduction 21, 131–139 (2017)
Bard, G.V.: Spelling-error tolerant, order-independent pass-phrases via the damerau-levenshtein string-edit distance metric. In: Proceedings of the Fifth Australasian Symposium on ACSW Frontiers, ACSW 2007, vol. 68, pp. 117–124. Australian Computer Society Inc., Darlinghurst, Australia (2007)
Boom, C.D., Canneyt, S.V., Bohez, S., Demeester, T., Dhoedt, B.: Learning semantic similarity for very short texts. CoRR abs/1512.00765 (2015)
Chen, H.: String Metric and Word Similarity applied to Information Retrieval. Master’s thesis, School of Computing. University of Eastern Findland (2012)
Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 467–474, SIGIR 2008 (2008)
Cucerzan, S., Brill, E.: Spelling correction as an iterative process that exploits the collective knowledge of web users. In: Proceedings of EMNLP 4, 293–300 (2004)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Duan, H., Hsu, B.J.P.: Online spelling correction for query completion. In: Proceedings of the 20th International Conference on World Wide Web, pp. 117–126, WWW 2011, USA. ACM, New York (2011)
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323(C), 130–142 (2015)
Gomaa, W.H., Fahmy, A.A.: Article: a survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 537–546, WSDM 2013, USA. ACM, New York (2013)
Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 84–90, CIKM 2005, NY, USA. ACM, New York (2005)
Lhoussain, A.S., Hicham, G., Abdellah, Y.: Adaptating the levenshtein distance to contextual spelling correction. Int. J. Comput. Sci. Appl. 12(1), 127–133 (2015)
Li, Y., Duan, H., Zhai, C.: A generalized hidden Markov model with discriminative training for query spelling correction. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 611–620, SIGIR 2012, USA. ACM, New York (2012)
Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Lochter, J.V., Zanetti, R.F., Reller, D., Almeida, T.A.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)
Martínez-Cámara, E., Montejo-Ráez, A., Martín-Valdivia, M.T., Ureña López, L.A.: Sinai: machine learning and emotion of the crowd for sentiment analysis in microblogs. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 2, Proceedings of the Seventh International Workshop on Semantic Evaluation, pp. 402–407, SemEval 2013. Association for Computational Linguistics, Atlanta, Georgia, USA, June 2013
Martins, B., Silva, M.J.: Spelling correction for search engine queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 372–383. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30228-5_33
Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_5
Noah, S.A., Amruddin, A.Y., Omar, N.: Semantic similarity measures for malay sentences. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 117–126. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77094-7_19
Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014
Shtok, A., Dror, G., Maarek, Y., Szpektor, I.: Learning from the past: answering new questions with past answers. In: Proceedings of the 21st International Conference on World Wide Web, pp. 759–768, WWW 2012 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Che Alhadi, A., Deraman, A., Abdul Jalil, M., Wan Yussof, W.N.J., Mohamed, A.A. (2017). An Ensemble Similarity Model for Short Text Retrieval. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-62392-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)