ABSTRACT
We present a hierarchical kernelized classification model for the automatic classification of general questions into their corresponding topic categories in community Question Answering service (cQAs). This could save many efforts of manual classification and facilitate browsing as well as better retrieving of questions from the cQA archives. To deal with the challenge of short text message of questions, we explore and optimally combine various cQA features by introducing multiple kernel learning strategy into the hierarchical classification framework. We propose a hybrid regularization approach of combining orthogonal constraint and L1 sparseness in our framework to promote the discriminative power on similar topics as well as sparsing the model parameters. The experimental results on a real world dataset from Yahoo! Answers demonstrate the effectiveness of our proposed model as compared to the state-of-the-art methods and strong baselines.
- E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of WSDM, pages 183--194, 2008. Google ScholarDigital Library
- M. J. Blooma, D. H.-L. Goh, and A. Y. K. Chua. Question classification in social media. International Journal of Information Studies, 1(2):101--109, April 2009.Google Scholar
- F. Bu, X. Zhu, Y. Hao, and X. Zhu. Function-based question classification for general qa. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1119--1128, 2010. Google ScholarDigital Library
- R. Bunescu and R. J. Mooney. Subsequence kernels for relation extraction. In Proceedings of the 19th Conference on Neural Information Processing Systems. Vancouver, British Columbia, 2005.Google Scholar
- L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings of the 13th CIKM, pages 78--87, 2004. Google ScholarDigital Library
- L. Cai, G. Zhou, K. Liu, and J. Zhao. Large-scale question classification in cqa by leveraging wikipedia semantic knowledge. In Proceedings of CIKM, pages 1321--1330, 2011. Google ScholarDigital Library
- X. Cao, G. Cong, B. Cui, and C. S. Jensen. A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of WWW, pages 201--210. Raleigh, North Carolina, USA, April 2010. Google ScholarDigital Library
- X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In Proceedings of CIKM, pages 265--274, 2009. Google ScholarDigital Library
- N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7:31--54, 2006. Google ScholarDigital Library
- W. Chan, X. Zhou, W. Wang, and T.-S. Chua Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization. In Proceedings of ACL, 2012. Google ScholarDigital Library
- M. Collins and N. Duffy. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of ACL, 2002. Google ScholarDigital Library
- J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2873--2898, 2009. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarDigital Library
- F. M. Harper, J. Weinberg, J. Logie, and J. A. Konstan. Question types in social q&a sites. First Monday, 15(7), 2010.Google Scholar
- D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proceedings of the 14th ICML, pages 170--178, 1997. Google ScholarDigital Library
- Y. LeCun, S. Chopra, R. Hadsell, R. Marc'Aurelio, and F. Huang. A tutorial on energy-based learning. Predicting Structured Data, MIT Press, 2006.Google Scholar
- Y.-J. Lee and O. L. Mangasarian. Rsvm: Reduced support vector machines. In Proceedings the First SIAM International Conference on Data Mining, 2001.Google ScholarCross Ref
- X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics, pages 556--562, 2002. Google ScholarDigital Library
- Q. Liu, E. Agichtein, G. Dror, Y. Maarek, and I. Szpektor. When web search fails, searchers become askers: Understanding the transition. In Proceedings of the 35th SIGIR, pages 801--810. Portland, Oregon, USA, August 2012. Google ScholarDigital Library
- A. Moschitti. Exploiting Syntactic and Shallow Semantic Kernels for Question/Answer Classification In Proceedings of the 45th ACL. Prague, June 2007.Google Scholar
- A. Moschitti. Syntactic and semantic kernels for short text pair categorization. In Proceedings of the 12th Conference of the European Chapter of the ACL, page 576--584. Athens, Greece, March 2009. Google ScholarDigital Library
- X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse txt & web with hidden topics from large-scale data collections. In Proceedings of WWW, 2008. Google ScholarDigital Library
- B. Qu, G. Cong, C. Li, A. Sun, and H. Chen. An evaluation of classification models for question topic categorization. JASIST, 63(5):889--903, 2012. Google ScholarDigital Library
- Chirag Shah and Jefferey Pomerantz. Evaluating and Predicting Answer Quality in Community QA. In Proceedings of the 33th ACM SIGIR Conference. 2010. Google ScholarDigital Library
- L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 10:2543--2596, 2010. Google ScholarDigital Library
- X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of ACM SIGIR Conference, pages 475--482, 2008. Google ScholarDigital Library
- Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of SIGIR, pages 42--49, 1999. Google ScholarDigital Library
- D. Zhang and W. Lee. Question classification using support vector machines. In Proceedings of the 26th Annual International ACM SIGIR conference, pages 26--32, 2002.Google Scholar
- D. Zhou, L. Xiao, and M. Wu. Hierarchical classification via orthogonal transfer. In Proceedings of the 28th ICML. Bellevue, WA, USA, 2011.Google Scholar
Index Terms
- Community question topic categorization via hierarchical kernelized classification
Recommendations
An evaluation of classification models for question topic categorization
We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 ...
Summarizing Answers in Non-Factoid Community Question-Answering
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningWe aim at summarizing answers in community question-answering (CQA). While most previous work focuses on factoid question-answering, we focus on the non-factoid question-answering. Unlike factoid CQA, non-factoid question-answering usually requires ...
A Hierarchical Classification Model for Document Categorization
ICDAR '09: Proceedings of the 2009 10th International Conference on Document Analysis and RecognitionWe propose a novel hierarchical classification method for documents categorization in this paper. The approach consists of multiple levels of classification for different hierarchies. Regularized Least Square (RLS)binary classifiers are applied in the ...
Comments