Abstract
Finding similar questions in Community Question Answering (CQA) services plays more and more important role in current web and IR applications. The task aims to retrieve historical questions that are similar or relevant to new questions posed by users. However, traditional “bag-of-words” based models would fail to measure the similarity between question sentences, as they usually ignore sequential and syntactic information. In this paper, we propose a novel composite kernel to improve the accuracy in question matching. Our study illustrate that the composite kernel can efficiently capture both lexical semantics and syntactic information in a question sentence by leveraging word sequence kernel, POS tag sequence kernel and syntactic tree kernel. Experimental results on real world datasets show that our proposed method significantly outperforms the state-of-the-art models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burke, R.D., Hammond, K.J., Kulyukin, V.A., Lytinen, S.L., Tomuro, N., Schoenberg, S.: Question answering from frequently asked question files: Experiences with the faq finder system. AI Magazine 18(2), 57–66 (1997)
Cao, X., Cong, G., Cui, B., Jensen, C.S., Zhang, C.: The use of categorization information in language models for question retrieval. In: CIKM 2009, pp. 265–274. ACM, New York (2009)
Choon Hui Teo, S.: Fast and space efficient string kernels using suffix arrays. In: ICML 2006, pp. 929–936. ACM, New York (2006)
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures and the voted perceptron. In: ACL 2002, pp. 263–270. Association for Computational Linguistics (2002)
Duan, H., Cao, Y., Lin, C.-Y., Yu, Y.: Searching questions by identifying question topic and question focus. In: Proceedings of ACL 2008: HLT, pp. 156–164. Association for Computational Linguistics (June 2008)
Jijkoun, V., de Rijke, M.: Retrieving answers from frequently asked questions pages on the web. In: CIKM 2005, pp. 76–83. ACM, New York (2005)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
Moschitti, A.: Kernel methods, syntax and semantics for relational text categorization. In: CIKM 2008, pp. 253–262. ACM Press, New York (2008)
Wang, K., Ming, Z., Chua, T.-S.: A syntactic tree matching approach to finding similar questions in community-based qa services. In: SIGIR 2009, pp. 187–194. ACM Press, New York (2009)
Wang, X.-J., Tu, X., Feng, D., Zhang, L.: Ranking community answers by modeling question-answer relationships via analogical reasoning. In: SIGIR 2009, pp. 179–186. ACM, New York (2009)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR 2008, pp. 475–482. ACM, New York (2008)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR 2003, pp. 26–32. ACM, New York (2003)
Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In: ACL 2005, pp. 419–426. Association for Computational Linguistics (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Li, Z., Hu, X., Hu, B. (2010). A Novel Composite Kernel for Finding Similar Questions in CQA Services. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_59
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)