research-article

Improving context-aware query classification via adaptive self-training

Authors:
Minmin Chen

Washington University in Saint Louis, Saint Louis, MO, USA

Washington University in Saint Louis, Saint Louis, MO, USA
View Profile

,
Jian-Tao Sun

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Xiaochuan Ni

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Yixin Chen

Washington University in Saint Louis, Saint Louis, MO, USA

Washington University in Saint Louis, Saint Louis, MO, USA
View Profile

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementOctober 2011Pages 115–124https://doi.org/10.1145/2063576.2063598

Published:24 October 2011Publication History

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 115–124

ABSTRACT

Topical classification of user queries is critical for general-purpose web search systems. It is also a challenging task, due to the sparsity of query terms and the lack of labeled queries. On the other hand, search contexts embedded in query sessions and unlabeled queries free on the web have not been fully utilized in most query classification systems. In this work, we leverage these information to improve query classification accuracy.

We first incorporate search contexts into our framework using a Conditional Random Field (CRF) model. Discriminative training of CRFs is favored over the traditional maximum likelihood training because of its robustness to noise. We then adapt self-training with our model to exploit the information in unlabeled queries. By investigating different confidence measurements and model selection strategies, we effectively avoid the error-reinforcing nature of self-training. In extensive experiments on real search logs, we have averaged around 20% improvement in classification accuracy over other state-of-the-art baselines.

References

S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kołcz. Improving automatic query classification via semi-supervised learning. In Proc. ICDM, pages 42--49, 2005. Google ScholarDigital Library
M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervised learning on large graphs. Learning theory, pages 624--638, 2004.Google Scholar
S. Benson, L. McInnes, J. Moré, and J. Sarich. TAO user manual (revision 1.9). Mathematics and Computer Science Division, Argonne National Laboratory, Tech. Rep. ANL/MCS-TM-242, 2005.Google Scholar
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. COLT, pages 92--100, 1998. Google ScholarDigital Library
C. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2):121--167, 1998. Google ScholarDigital Library
H. Cao, D. Hu, D. Shen, D. Jiang, J. Sun, E. Chen, and Q. Yang. Context-aware query classification. In Proc. SIGIR, pages 3--10, 2009. Google ScholarDigital Library
L. Catledge and J. Pitkow. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN systems, 27(6):1065--1073, 1995. Google ScholarDigital Library
O. Chapelle, B. Schölkopf, A. Zien, et al. Semi-supervised learning. MIT press Cambridge, MA, 2006. Google ScholarDigital Library
M. Chen, C. Y., M. Brent, and A. Tenney. Gradient-Based Feature Selection for Conditional Random Fields and Its Applications in Computational Genetics. In Proc. ICTAI, pages 750--757, 2009. Google ScholarDigital Library
B. Croft et al. The role of context and adaptation in user interfaces. Journal of Man-Machine Studies, 21(4):283--292, 1984. Google ScholarDigital Library
H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proc. WWW, pages 325--332, 2002. Google ScholarDigital Library
K. Gimpel and N. Smith. Softmax-margin crfs: Training log-linear models with cost functions. In Proc. ACL, pages 733--736, 2010. Google ScholarDigital Library
A. Goker. Context learning in Okapi. Journal of Documentation, 53(1):80--83, 1997.Google ScholarCross Ref
B. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines. Journal of the American Society for Information Science and Technology, 58(6):862--871, 2007. Google ScholarDigital Library
F. Jiao, S. Wang, C. Lee, R. Greiner, and D. Schuurmans. Semi-supervised conditional random fields for improved sequence segmentation and labeling. In Proc. ACL, pages 209--216, 2006. Google ScholarDigital Library
T. Joachims. Learning to classify text using support vector machines: Methods, theory, and algorithms. Computational Linguistics, 29(4):656--664, 2002.Google Scholar
R. Jones and K. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proc. CIKM, pages 699--708, 2008. Google ScholarDigital Library
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML, pages 282--289, 2001. Google ScholarDigital Library
X. Li, Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In Proc. SIGIR, pages 339--346, 2008. Google ScholarDigital Library
G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proc. ICML, pages 593--600. ACM, 2007. Google ScholarDigital Library
N. Seshadri and C. Sundberg. List Viterbi decoding algorithms with applications. Communications, IEEE Transactions on, 42(234):313--323, 2002.Google Scholar
F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proc. Human Language Technology - NAACL, pages 134--141, 2003. Google ScholarDigital Library
F. Sha and L. Saul. Large margin hidden Markov models for automatic speech recognition. In Proc. NIPS, pages 1249--1256, 2007.Google Scholar
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. In ACM SIGIR Forum, volume 33, pages 6--12, 1999. Google ScholarDigital Library
C. Sutton and A. McCallum. An Introduction to Conditional Random Fields for Relational Learning. Introduction to statistical relational learning, page 93, 2007.Google Scholar
S. Talja, H. Keso, and T. Pietil\"ainen. The production of 'context' in information seeking research: a metatheoretical view. Information Processing and Management, 35(6):751--763, 1999. Google ScholarDigital Library
B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In Proc. NIPS, 2003.Google ScholarDigital Library
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proc. ICML, page 104, 2004. Google ScholarDigital Library
V. Vapnik and V. Vapnik. Statistical learning theory. Wiley New York, 1998.Google ScholarDigital Library
D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proc. ACL, pages 189--196, 1995. Google ScholarDigital Library
T. Zhang and F. Oles. A probability analysis on the value of unlabeled data for classification problems. In Proc. ICML, pages 1191--1198, 2000. Google ScholarDigital Library
X. Zhu. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2006.Google Scholar

Index Terms

Improving context-aware query classification via adaptive self-training
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Context-aware query classification
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Understanding users'search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries ...
Read More
Automatic web query classification using labeled and unlabeled training data
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Accurate topical categorization of user queries allows for increased effectiveness, efficiency, and revenue potential in general-purpose web search systems. Such categorization becomes critical if the system is to return results not just from a general ...
Read More
Learning with click graph for query intent classification

Topical query classification, as one step toward understanding users' search intent, is gaining increasing attention in information retrieval. Previous works on this subject primarily focused on enrichment of query features, for example, by augmenting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
October 2011
2712 pages
ISBN:9781450307178
DOI:10.1145/2063576
Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query classification
unlabeled queries
user search context
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 249
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving context-aware query classification via adaptive self-training

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Context-aware query classification

Automatic web query classification using labeled and unlabeled training data

Learning with click graph for query intent classification