skip to main content
article

Q2C@UST: our winning solution to query classification in KDDCUP 2005

Published: 01 December 2005 Publication History

Abstract

In this paper, we describe our ensemble-search based approach, Q2C@UST (http://webprojectl.cs.ust.hk/q2c/), for the query classification task for the KDDCUP 2005. There are two aspects to the key difficulties of this problem: one is that the meaning of the queries and the semantics of the predefined categories are hard to determine. The other is that there are no training data for this classification problem. We apply a two-phase framework to tackle the above difficulties. Phase I corresponds to the training phase of machine learning research and phase II corresponds to testing phase. In phase I, two kinds of classifiers are developed as the base classifiers. One is synonym-based and the other is statistics based. Phase II consists of two stages. In the first stage, the queries are enriched such that for each query, its related Web pages together with their category information are collected through the use of search engines. In the second stage, the enriched queries are classified through the base classifiers trained in phase I. Based on the classification results obtained by the base classifiers, two ensemble classifiers based on two different strategies are proposed. The experimental results on the validation dataset help confirm our conjectures on the performance of the Q2C@UST system. In addition, the evaluation results given by the KDDCUP 2005 organizer confirm the effectiveness of our proposed approaches. The best F1 value of our two solutions is 9.6% higher than the best of all other participants' solutions. The average F1 value of our two submitted solutions is 94.4% higher than the average F1 value from all other submitted solutions.

References

[1]
E. Bauer, R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:1/2, 105--142. 1999.]]
[2]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 407--415, 2000.]]
[3]
L. Breiman. Bagging predictors. Machine Learning, 24:2, 123--140. 1996.]]
[4]
R. Caruana and A. Niculescu-Mizil. Ensemble selection from libraries of models. In Proc. 21th International Conference on Machine Learning (ICML'04), 2004.]]
[5]
C. Chekuri, M. Goldwasser, P. Raghavan and E. Upfal. Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6), 1997.]]
[6]
H. Chen, S. Dumais. Bringing order to the Web: Automatically categorizing search results. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 145--152, The Hague, The Netherlands, April 2000.]]
[7]
T. G. Dietterich. Ensemble methods in machine learning. First International Workshop on Multiple Classifier Systems, pages 1--15, 2000.]]
[8]
W. Fan, S. Stolfo, J. Zhang. The application of AdaBoost for distributed, scalable and on-line learning. In Proceedings of the Fifth SIGKDD International Conference on Knowledge Discovery and Data Mining, 362--366. 1999.]]
[9]
Y. Freund, R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, 148--156. 1996.]]
[10]
Google, http://www.google.com]]
[11]
P. G. Hoel, Elementary Statistics, Wiley, 1971.]]
[12]
T. Joachims. Transductive inference for text classification using support vector machines. In Proc. 16th International Conference on Machine Learning (ICML), Bled, Slovenia, June 1999.]]
[13]
T. Joachims (1998): Text Categorization with Support Vector Machines: Learning with Many Relevant Features. European Conference on Machine Learning (ECML), Claire Nédellec and Céline Rouveirol (ed.), 1998.]]
[14]
K. S. Jones. Automatic Keyword Classification for Information Retrieval. Butterworths, London, 1971.]]
[15]
I. H. Kang, G. Kim, Query type classification for web document retrieval. In Proceedings of the 26rd annual international ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada, 2003, 64--71.]]
[16]
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On Combining Classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, 1998, pp. 226--239.]]
[17]
Lemur, http://www.lemurproject.org/]]
[18]
D. D. Lewis, W. A. Gale. A sequential algorithm for training text classifiers. In W. Bruce Croft and Cornelis J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 3--12, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.]]
[19]
Y. Li, Z. J. Zheng, K. Dai. KDD-CUP 2005. Presentation on The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA. August 21, 2005. http://kdd05.lac.uic.edu/kddcup.html.]]
[20]
Looksmart, http://www.looksmart.com.]]
[21]
ODP: Open Directory Project, http://dmoz.com]]
[22]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.]]
[23]
J. R. Quinlan. Bagging, boosting and C4.5. In proceedings of the Thirteenth National Conference on Artificial Intelligence, 725--730. 1996.]]
[24]
C. J. van Rijsbergen. Information Retrieval. Second Edition, Butterworths, London, 1979, 173--176.]]
[25]
Wordnet, http://wordnet.princeton.edu/]]

Cited By

View all
  • (2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 1-Sep-2024
  • (2020)Query Classification with Multi-objective Backoff OptimizationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401320(1925-1928)Online publication date: 25-Jul-2020
  • (2020)Verilog HDL and its ancestors and descendantsProceedings of the ACM on Programming Languages10.1145/33863374:HOPL(1-90)Online publication date: 12-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 7, Issue 2
December 2005
152 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1117454
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2005
Published in SIGKDD Volume 7, Issue 2

Check for updates

Author Tags

  1. KDDCUP 2005
  2. ensemble learning
  3. query classification
  4. synonym-based classifier

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 1-Sep-2024
  • (2020)Query Classification with Multi-objective Backoff OptimizationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401320(1925-1928)Online publication date: 25-Jul-2020
  • (2020)Verilog HDL and its ancestors and descendantsProceedings of the ACM on Programming Languages10.1145/33863374:HOPL(1-90)Online publication date: 12-Jun-2020
  • (2020)Transfer Learning10.1017/9781139061773Online publication date: 24-Jan-2020
  • (2019)A hybrid deep neural network model for query intent classificationJournal of Intelligent & Fuzzy Systems10.3233/JIFS-182682(1-11)Online publication date: 24-May-2019
  • (2019)Normalized Google Distance in the Identification and Characterization of Health Queries2019 14th Iberian Conference on Information Systems and Technologies (CISTI)10.23919/CISTI.2019.8760964(1-4)Online publication date: Jun-2019
  • (2018)TempClassHandbook of Research on Contemporary Perspectives on Web-Based Systems10.4018/978-1-5225-5384-7.ch010(188-212)Online publication date: 2018
  • (2018)TemporalClassifierInformation Retrieval and Management10.4018/978-1-5225-5191-1.ch049(1143-1165)Online publication date: 2018
  • (2018)E-commerce Product Query Classification Using Implicit User’s Feedback from Clicks2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622008(1955-1959)Online publication date: Dec-2018
  • (2017)Query Classification Using Convolutional Neural Networks2017 10th International Symposium on Computational Intelligence and Design (ISCID)10.1109/ISCID.2017.212(441-444)Online publication date: Dec-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media