Abstract:
Primary Question detection in online forum is a subtask of extracting question-answer pairs. In this paper, by surveying the forms of questions in Chinese online forums, ...Show MoreMetadata
Abstract:
Primary Question detection in online forum is a subtask of extracting question-answer pairs. In this paper, by surveying the forms of questions in Chinese online forums, a combination of textual and N-gram features achieved via feature selection is adopted to help detecting primary questions. By viewing primary question detection a binary classification problem, decision tree classifier C4.5 and support vector machine are introduced to distinguish questions from non-questions separately. Experimental results across multiple datasets demonstrate that the mixture of textual and N-gram features performs better than using each of them separately under both C4.5 and support vector machine. By computing the weight of each feature in the two classifiers, the top 6 features are found the very same except for a little adjustment of order, showing that the combination of textual and N-gram features is universal and effective in detecting primary questions.
Date of Conference: 10-12 August 2010
Date Added to IEEE Xplore: 09 September 2010
ISBN Information: