An Ensemble Similarity Model for Short Text Retrieval

Che Alhadi, Arifah; Deraman, Aziz; Abdul Jalil, Masita@Masila; Wan Yussof, Wan Nural Jawahir; Mohamed, Akashah Amin

doi:10.1007/978-3-319-62392-4_2

Arifah Che Alhadi²³,
Aziz Deraman²³,
Masita@Masila Abdul Jalil²³,
Wan Nural Jawahir Wan Yussof²³ &
…
Akashah Amin Mohamed²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10404))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1860 Accesses
1 Citations

Abstract

The rapid growth of World Wide Web has extended Information Retrieval related technology such as queries for information needs become more easily accessible. One such platform is online question answering (QA). Online community can posting questions and get direct response for their special information needs using various platforms. It creates large unorganized repositories of valuable knowledge resources. Effective QA retrieval is required to make these repositories accessible to fulfill users information requests quickly. The repositories might contained similar questions and answer to users newly asked question. This paper explores the similarity-based models for the QA system to rank search result candidates. We used Damerau-Levenshtein distance and cosine similarity model to obtain ranking scores between the question posted by the registered user and a similar candidate questions in repository. Empirical experimental results indicate that our proposed ensemble models are very encouraging and give a significantly better similarity value to improve search ranking results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anson, S., Watson, H., Wadhwa, K., Metz, K.: Analysing social media data for disaster preparedness: understanding the opportunities and barriers faced by humanitarian actors. Int. J. Disaster Risk Reduction 21, 131–139 (2017)
Article Google Scholar
Bard, G.V.: Spelling-error tolerant, order-independent pass-phrases via the damerau-levenshtein string-edit distance metric. In: Proceedings of the Fifth Australasian Symposium on ACSW Frontiers, ACSW 2007, vol. 68, pp. 117–124. Australian Computer Society Inc., Darlinghurst, Australia (2007)
Google Scholar
Boom, C.D., Canneyt, S.V., Bohez, S., Demeester, T., Dhoedt, B.: Learning semantic similarity for very short texts. CoRR abs/1512.00765 (2015)
Google Scholar
Chen, H.: String Metric and Word Similarity applied to Information Retrieval. Master’s thesis, School of Computing. University of Eastern Findland (2012)
Google Scholar
Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 467–474, SIGIR 2008 (2008)
Google Scholar
Cucerzan, S., Brill, E.: Spelling correction as an iterative process that exploits the collective knowledge of web users. In: Proceedings of EMNLP 4, 293–300 (2004)
Google Scholar
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Article Google Scholar
Duan, H., Hsu, B.J.P.: Online spelling correction for query completion. In: Proceedings of the 20th International Conference on World Wide Web, pp. 117–126, WWW 2011, USA. ACM, New York (2011)
Google Scholar
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323(C), 130–142 (2015)
Article MathSciNet Google Scholar
Gomaa, W.H., Fahmy, A.A.: Article: a survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Google Scholar
Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 537–546, WSDM 2013, USA. ACM, New York (2013)
Google Scholar
Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 84–90, CIKM 2005, NY, USA. ACM, New York (2005)
Google Scholar
Lhoussain, A.S., Hicham, G., Abdellah, Y.: Adaptating the levenshtein distance to contextual spelling correction. Int. J. Comput. Sci. Appl. 12(1), 127–133 (2015)
Google Scholar
Li, Y., Duan, H., Zhai, C.: A generalized hidden Markov model with discriminative training for query spelling correction. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 611–620, SIGIR 2012, USA. ACM, New York (2012)
Google Scholar
Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Google Scholar
Lochter, J.V., Zanetti, R.F., Reller, D., Almeida, T.A.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)
Article Google Scholar
Martínez-Cámara, E., Montejo-Ráez, A., Martín-Valdivia, M.T., Ureña López, L.A.: Sinai: machine learning and emotion of the crowd for sentiment analysis in microblogs. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 2, Proceedings of the Seventh International Workshop on Semantic Evaluation, pp. 402–407, SemEval 2013. Association for Computational Linguistics, Atlanta, Georgia, USA, June 2013
Google Scholar
Martins, B., Silva, M.J.: Spelling correction for search engine queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 372–383. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30228-5_33
Chapter Google Scholar
Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_5
Chapter Google Scholar
Noah, S.A., Amruddin, A.Y., Omar, N.: Semantic similarity measures for malay sentences. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 117–126. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77094-7_19
Chapter Google Scholar
Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014
Google Scholar
Shtok, A., Dror, G., Maarek, Y., Szpektor, I.: Learning from the past: answering new questions with past answers. In: Proceedings of the 21st International Conference on World Wide Web, pp. 759–768, WWW 2012 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030, Kuala Nerus, Terengganu, Malaysia
Arifah Che Alhadi, Aziz Deraman, Masita@Masila Abdul Jalil, Wan Nural Jawahir Wan Yussof & Akashah Amin Mohamed

Authors

Arifah Che Alhadi
View author publications
You can also search for this author in PubMed Google Scholar
Aziz Deraman
View author publications
You can also search for this author in PubMed Google Scholar
Masita@Masila Abdul Jalil
View author publications
You can also search for this author in PubMed Google Scholar
Wan Nural Jawahir Wan Yussof
View author publications
You can also search for this author in PubMed Google Scholar
Akashah Amin Mohamed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arifah Che Alhadi .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Trieste, Trieste, Italy
Giuseppe Borruso
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
University of Trieste, Trieste, Italy
Alfredo Cuzzocrea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Che Alhadi, A., Deraman, A., Abdul Jalil, M., Wan Yussof, W.N.J., Mohamed, A.A. (2017). An Ensemble Similarity Model for Short Text Retrieval. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-62392-4_2
Published: 06 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics