Abstract
In this paper, we address the optimization problem for huge Question-Answer (QA) pairs collection based Chinese FAQ-Finder system. Unlike most published researches which leaned to address word mismatching problem among questions, we focus on more fundamental problem: ranking function, which was always arbitrarily borrowed from traditional document retrieval directly. One unified ranking function with four embedded parameters is proposed and the characteristics of three different fields of QA pair and effects of two different Chinese word segmentation settings are investigated. Experiments on 1,000 question queries and 3.8 million QA pairs show that the unified ranking function can achieve 6.67% promotion beyond TFIDF baseline. One supervised learning approach is also proposed to optimize ranking function by employing 264 features, including part-of-speech, and bigram co-occurrence etc. Experiments show that 7.06% further improvement can be achieved.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burke, R.D., et al.: Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System. AI Magazine 18(2), 57–66 (1997)
Jijkoun, V., de Rijke, M.: Retrieving Answers from Frequently Asked Questions Pages on the Web. In: CIKM 2005, pp. 76–83 (2005)
Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proc. of CIKM 2005, pp. 84–90 (2005)
Lytinen, S., Tomuro, N.: The Use of Question Types to Match Questions in FAQFinder. In: AAAI 2002 Spring Symposium on Mining Answers From Text, AAAI Press, Menlo Park (2002)
Wanxiang, C., et al.: Chinese Sentence Similarity Computing for Bilingual Sentence Pair Retireval (in Chinese). In: JSCL-2003 (2003)
Lucene.NET, http://www.dotlucene.net/
Kishida, K.: Property of Average Precision and its Generalization: An Examination of Evaluation Indicator for Informaiton Retrieval Experiments. NII Technical Report, NII-2005-014E (Oct. 2005)
Hu, G., et al.: A Supervised Learning Approach to Entity Search. In: Ng, H.T., et al. (eds.) AIRS 2006. LNCS, vol. 4182, Springer, Heidelberg (2006)
Mei, J.-J.: TongYiCiCiLin (The Thesaurus). Shanghai Cishu Press, Shanghai (1983)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Hu, G., Liu, D., Liu, Q., Wang, Rh. (2007). Supervised Learning Approach to Optimize Ranking Function for Chinese FAQ-Finder. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)