skip to main content
10.1145/2740908.2742850acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

queryCategorizr: A Large-Scale Semi-Supervised System for Categorization of Web Search Queries

Published: 18 May 2015 Publication History

Abstract

Understanding interests expressed through user's search query is a task of critical importance for many internet applications. To help identify user interests, web engines commonly utilize classification of queries into one or more pre-defined interest categories. However, majority of the queries are noisy short texts, making accurate classification a challenging task. In this demonstration, we present queryCategorizr, a novel semi-supervised learning system that embeds queries into low-dimensional vector space using a neural language model applied on search log sessions, and classifies them into general interest categories while relying on a small set of labeled queries. Empirical results on large-scale data show that queryCategorizr outperforms the current state-of-the-art approaches. In addition, we describe a Graphical User Interface (GUI) that allows users to query the system and explore classification results in an interactive manner.

References

[1]
E. Gabrilovich, A. Broder, M. Fontoura, A. Joshi, V. Josifovski, L. Riedel, and T. Zhang. Classifying search queries using the web as a source of knowledge. ACM Transactions on the Web, 3(2):1--28, April 2009.
[2]
D. Gayo-Avello. A survey on session detection methods in query logs and a proposal for future evaluation. Inf. Sci., 179(12):1822--1843, May 2009.
[3]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.
[4]
J. Turian, L. Ratinov, and Y. Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384--394. Association for Computational Linguistics, 2010.
[5]
X. Yu and H. Shi. Query segmentation using conditional random fields. In Proceedings of the First International Workshop on Keyword Search on Structured Data, KEYS '09, pages 21--26, 2009.
[6]
K. Zhang, J. T. Kwok, and B. Parvin. Prototype vector machine for large scale semi-supervised learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1233--1240. ACM, 2009.

Cited By

View all
  • (2017)Classification of Student Web Queries2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC)10.1109/CCWC.2017.7868375(1-7)Online publication date: Jan-2017

Index Terms

  1. queryCategorizr: A Large-Scale Semi-Supervised System for Categorization of Web Search Queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1602 pages
    ISBN:9781450334730
    DOI:10.1145/2740908

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query categorization
    2. query embeddings
    3. word2vec

    Qualifiers

    • Research-article

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Classification of Student Web Queries2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC)10.1109/CCWC.2017.7868375(1-7)Online publication date: Jan-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media