research-article

Community question topic categorization via hierarchical kernelized classification

Authors:
Wen Chan

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Weidong Yang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Jinhui Tang

Nanjing University of Science and Technology, Nanjing, China

Nanjing University of Science and Technology, Nanjing, China
View Profile

,
Jintao Du

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Xiangdong Zhou

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Wei Wang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOctober 2013Pages 959–968https://doi.org/10.1145/2505515.2505676

Published:27 October 2013Publication History

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 959–968

ABSTRACT

We present a hierarchical kernelized classification model for the automatic classification of general questions into their corresponding topic categories in community Question Answering service (cQAs). This could save many efforts of manual classification and facilitate browsing as well as better retrieving of questions from the cQA archives. To deal with the challenge of short text message of questions, we explore and optimally combine various cQA features by introducing multiple kernel learning strategy into the hierarchical classification framework. We propose a hybrid regularization approach of combining orthogonal constraint and L₁ sparseness in our framework to promote the discriminative power on similar topics as well as sparsing the model parameters. The experimental results on a real world dataset from Yahoo! Answers demonstrate the effectiveness of our proposed model as compared to the state-of-the-art methods and strong baselines.

References

E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of WSDM, pages 183--194, 2008. Google ScholarDigital Library
M. J. Blooma, D. H.-L. Goh, and A. Y. K. Chua. Question classification in social media. International Journal of Information Studies, 1(2):101--109, April 2009.Google Scholar
F. Bu, X. Zhu, Y. Hao, and X. Zhu. Function-based question classification for general qa. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1119--1128, 2010. Google ScholarDigital Library
R. Bunescu and R. J. Mooney. Subsequence kernels for relation extraction. In Proceedings of the 19th Conference on Neural Information Processing Systems. Vancouver, British Columbia, 2005.Google Scholar
L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proceedings of the 13th CIKM, pages 78--87, 2004. Google ScholarDigital Library
L. Cai, G. Zhou, K. Liu, and J. Zhao. Large-scale question classification in cqa by leveraging wikipedia semantic knowledge. In Proceedings of CIKM, pages 1321--1330, 2011. Google ScholarDigital Library
X. Cao, G. Cong, B. Cui, and C. S. Jensen. A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of WWW, pages 201--210. Raleigh, North Carolina, USA, April 2010. Google ScholarDigital Library
X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In Proceedings of CIKM, pages 265--274, 2009. Google ScholarDigital Library
N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7:31--54, 2006. Google ScholarDigital Library
W. Chan, X. Zhou, W. Wang, and T.-S. Chua Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization. In Proceedings of ACL, 2012. Google ScholarDigital Library
M. Collins and N. Duffy. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of ACL, 2002. Google ScholarDigital Library
J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2873--2898, 2009. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarDigital Library
F. M. Harper, J. Weinberg, J. Logie, and J. A. Konstan. Question types in social q&a sites. First Monday, 15(7), 2010.Google Scholar
D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proceedings of the 14th ICML, pages 170--178, 1997. Google ScholarDigital Library
Y. LeCun, S. Chopra, R. Hadsell, R. Marc'Aurelio, and F. Huang. A tutorial on energy-based learning. Predicting Structured Data, MIT Press, 2006.Google Scholar
Y.-J. Lee and O. L. Mangasarian. Rsvm: Reduced support vector machines. In Proceedings the First SIAM International Conference on Data Mining, 2001.Google ScholarCross Ref
X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics, pages 556--562, 2002. Google ScholarDigital Library
Q. Liu, E. Agichtein, G. Dror, Y. Maarek, and I. Szpektor. When web search fails, searchers become askers: Understanding the transition. In Proceedings of the 35th SIGIR, pages 801--810. Portland, Oregon, USA, August 2012. Google ScholarDigital Library
A. Moschitti. Exploiting Syntactic and Shallow Semantic Kernels for Question/Answer Classification In Proceedings of the 45th ACL. Prague, June 2007.Google Scholar
A. Moschitti. Syntactic and semantic kernels for short text pair categorization. In Proceedings of the 12th Conference of the European Chapter of the ACL, page 576--584. Athens, Greece, March 2009. Google ScholarDigital Library
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse txt & web with hidden topics from large-scale data collections. In Proceedings of WWW, 2008. Google ScholarDigital Library
B. Qu, G. Cong, C. Li, A. Sun, and H. Chen. An evaluation of classification models for question topic categorization. JASIST, 63(5):889--903, 2012. Google ScholarDigital Library
Chirag Shah and Jefferey Pomerantz. Evaluating and Predicting Answer Quality in Community QA. In Proceedings of the 33th ACM SIGIR Conference. 2010. Google ScholarDigital Library
L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 10:2543--2596, 2010. Google ScholarDigital Library
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of ACM SIGIR Conference, pages 475--482, 2008. Google ScholarDigital Library
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of SIGIR, pages 42--49, 1999. Google ScholarDigital Library
D. Zhang and W. Lee. Question classification using support vector machines. In Proceedings of the 26th Annual International ACM SIGIR conference, pages 26--32, 2002.Google Scholar
D. Zhou, L. Xiao, and M. Wu. Hierarchical classification via orthogonal transfer. In Proceedings of the 28th ICML. Bellevue, WA, USA, 2011.Google Scholar

Index Terms

Community question topic categorization via hierarchical kernelized classification
1. Information systems
  1. Information retrieval

Recommendations

An evaluation of classification models for question topic categorization

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 ...
Read More
Summarizing Answers in Non-Factoid Community Question-Answering
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

We aim at summarizing answers in community question-answering (CQA). While most previous work focuses on factoid question-answering, we focus on the non-factoid question-answering. Unlike factoid CQA, non-factoid question-answering usually requires ...
Read More
A Hierarchical Classification Model for Document Categorization
ICDAR '09: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition

We propose a novel hierarchical classification method for documents categorization in this paper. The approach consists of multiple levels of classification for different hierarchies. Regularized Least Square (RLS)binary classifiers are applied in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
kernel learning
question topic categorization
sparse orthogonal regularization
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 401
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Community question topic categorization via hierarchical kernelized classification

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

An evaluation of classification models for question topic categorization

Summarizing Answers in Non-Factoid Community Question-Answering

A Hierarchical Classification Model for Document Categorization