article

Q²C@UST: our winning solution to query classification in KDDCUP 2005

Authors:

Jeffrey Junfeng Pan,

Qiang YangAuthors Info & Claims

ACM SIGKDD Explorations Newsletter, Volume 7, Issue 2

Pages 100 - 110

https://doi.org/10.1145/1117454.1117467

Published: 01 December 2005 Publication History

Abstract

In this paper, we describe our ensemble-search based approach, Q²C@UST (http://webprojectl.cs.ust.hk/q2c/), for the query classification task for the KDDCUP 2005. There are two aspects to the key difficulties of this problem: one is that the meaning of the queries and the semantics of the predefined categories are hard to determine. The other is that there are no training data for this classification problem. We apply a two-phase framework to tackle the above difficulties. Phase I corresponds to the training phase of machine learning research and phase II corresponds to testing phase. In phase I, two kinds of classifiers are developed as the base classifiers. One is synonym-based and the other is statistics based. Phase II consists of two stages. In the first stage, the queries are enriched such that for each query, its related Web pages together with their category information are collected through the use of search engines. In the second stage, the enriched queries are classified through the base classifiers trained in phase I. Based on the classification results obtained by the base classifiers, two ensemble classifiers based on two different strategies are proposed. The experimental results on the validation dataset help confirm our conjectures on the performance of the Q2C@UST system. In addition, the evaluation results given by the KDDCUP 2005 organizer confirm the effectiveness of our proposed approaches. The best F1 value of our two solutions is 9.6% higher than the best of all other participants' solutions. The average F1 value of our two submitted solutions is 94.4% higher than the average F1 value from all other submitted solutions.

References

[1]

E. Bauer, R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:1/2, 105--142. 1999.]]

Digital Library

[2]

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 407--415, 2000.]]

Digital Library

[3]

L. Breiman. Bagging predictors. Machine Learning, 24:2, 123--140. 1996.]]

[4]

R. Caruana and A. Niculescu-Mizil. Ensemble selection from libraries of models. In Proc. 21th International Conference on Machine Learning (ICML'04), 2004.]]

Digital Library

[5]

C. Chekuri, M. Goldwasser, P. Raghavan and E. Upfal. Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6), 1997.]]

[6]

H. Chen, S. Dumais. Bringing order to the Web: Automatically categorizing search results. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 145--152, The Hague, The Netherlands, April 2000.]]

Digital Library

[7]

T. G. Dietterich. Ensemble methods in machine learning. First International Workshop on Multiple Classifier Systems, pages 1--15, 2000.]]

Digital Library

[8]

W. Fan, S. Stolfo, J. Zhang. The application of AdaBoost for distributed, scalable and on-line learning. In Proceedings of the Fifth SIGKDD International Conference on Knowledge Discovery and Data Mining, 362--366. 1999.]]

Digital Library

[9]

Y. Freund, R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, 148--156. 1996.]]

[10]

Google, http://www.google.com]]

[11]

P. G. Hoel, Elementary Statistics, Wiley, 1971.]]

[12]

T. Joachims. Transductive inference for text classification using support vector machines. In Proc. 16th International Conference on Machine Learning (ICML), Bled, Slovenia, June 1999.]]

Digital Library

[13]

T. Joachims (1998): Text Categorization with Support Vector Machines: Learning with Many Relevant Features. European Conference on Machine Learning (ECML), Claire Nédellec and Céline Rouveirol (ed.), 1998.]]

Digital Library

[14]

K. S. Jones. Automatic Keyword Classification for Information Retrieval. Butterworths, London, 1971.]]

[15]

I. H. Kang, G. Kim, Query type classification for web document retrieval. In Proceedings of the 26rd annual international ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada, 2003, 64--71.]]

Digital Library

[16]

J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On Combining Classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, 1998, pp. 226--239.]]

Digital Library

[17]

Lemur, http://www.lemurproject.org/]]

[18]

D. D. Lewis, W. A. Gale. A sequential algorithm for training text classifiers. In W. Bruce Croft and Cornelis J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 3--12, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.]]

Digital Library

[19]

Y. Li, Z. J. Zheng, K. Dai. KDD-CUP 2005. Presentation on The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA. August 21, 2005. http://kdd05.lac.uic.edu/kddcup.html.]]

[20]

Looksmart, http://www.looksmart.com.]]

[21]

ODP: Open Directory Project, http://dmoz.com]]

[22]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.]]

[23]

J. R. Quinlan. Bagging, boosting and C4.5. In proceedings of the Thirteenth National Conference on Artificial Intelligence, 725--730. 1996.]]

[24]

C. J. van Rijsbergen. Information Retrieval. Second Edition, Butterworths, London, 1979, 173--176.]]

Digital Library

[25]

Wordnet, http://wordnet.princeton.edu/]]

Cited By

Liu WZhou YZhu YDou Z(2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s10115-024-02138-y
Yu HLitchfield LHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Query Classification with Multi-objective Backoff OptimizationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401320(1925-1928)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401320
Flake PMoorby PGolson SSalz ADavidmann S(2020)Verilog HDL and its ancestors and descendantsProceedings of the ACM on Programming Languages10.1145/33863374:HOPL(1-90)Online publication date: 12-Jun-2020
https://dl.acm.org/doi/10.1145/3386337
Show More Cited By

Index Terms

Q²C@UST: our winning solution to query classification in KDDCUP 2005
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Building bridges for web query classification
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and ...
Query enrichment for web-query classification

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps ...
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916

Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter

ACM SIGKDD Explorations Newsletter Volume 7, Issue 2

December 2005

152 pages

ISSN:1931-0145

EISSN:1931-0153

DOI:10.1145/1117454

Issue’s Table of Contents

Copyright © 2005 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2005

Published in SIGKDD Volume 7, Issue 2

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

99
Total Citations
View Citations
670
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu WZhou YZhu YDou Z(2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s10115-024-02138-y
Yu HLitchfield LHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Query Classification with Multi-objective Backoff OptimizationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401320(1925-1928)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401320
Flake PMoorby PGolson SSalz ADavidmann S(2020)Verilog HDL and its ancestors and descendantsProceedings of the ACM on Programming Languages10.1145/33863374:HOPL(1-90)Online publication date: 12-Jun-2020
https://dl.acm.org/doi/10.1145/3386337
Yang QZhang YDai WPan S(2020)Transfer Learning10.1017/9781139061773Online publication date: 24-Jan-2020
https://doi.org/10.1017/9781139061773
Xu BMa YLin H(2019)A hybrid deep neural network model for query intent classificationJournal of Intelligent & Fuzzy Systems10.3233/JIFS-182682(1-11)Online publication date: 24-May-2019
https://doi.org/10.3233/JIFS-182682
Lopes CMoura D(2019)Normalized Google Distance in the Identification and Characterization of Health Queries2019 14th Iberian Conference on Information Systems and Technologies (CISTI)10.23919/CISTI.2019.8760964(1-4)Online publication date: Jun-2019
https://doi.org/10.23919/CISTI.2019.8760964
Pradhan RSharma D(2018)TempClassHandbook of Research on Contemporary Perspectives on Web-Based Systems10.4018/978-1-5225-5384-7.ch010(188-212)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5384-7.ch010
Pradhan RSharma D(2018)TemporalClassifierInformation Retrieval and Management10.4018/978-1-5225-5191-1.ch049(1143-1165)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5191-1.ch049
Lin YDatta AFabbrizio G(2018)E-commerce Product Query Classification Using Implicit User’s Feedback from Clicks2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622008(1955-1959)Online publication date: Dec-2018
https://doi.org/10.1109/BigData.2018.8622008
Zhang HSong WLiu LDu CZhao X(2017)Query Classification Using Convolutional Neural Networks2017 10th International Symposium on Computational Intelligence and Design (ISCID)10.1109/ISCID.2017.212(441-444)Online publication date: Dec-2017
https://doi.org/10.1109/ISCID.2017.212
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents