skip to main content
10.1145/2872518.2890588acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
abstract

StarrySky: A Practical System to Track Millions of High-Precision Query Intents

Published: 11 April 2016 Publication History

Abstract

Query intent mining is a critical problem in various real-world search applications. In the past few years we have witnessed dramatic advances in the field of query intent mining area. In this paper, we present a practical system---StarrySky for identifying and inferring millions of query intents in daily sponsored search with high precision and acceptable coverage. We have already achieved great advantages by deploying this system in Sogou sponsored search engine\footnote {http://www.sogou.com}. The general architecture of StarrySky consists of three stages. First, we detect millions of fine-grained query clusters from two years of click logs which can represent different query intents. Second, we refine the qualities of query clusters with a series of well-designed operations, and call the final refined clusters as concepts. Third and foremost, we build a flexible real-time inference algorithm for assigning query intents to the detected concepts with high precision. Beyond the description of the system, we employ several experiments to evaluate its performance and flexibility. Our inference algorithm achieves up to 96% precision and 68% coverage on daily search requests. We believe StarrySky is a practical and valuable system for tracking query intents.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003.
[2]
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. J. Stat. Mech., page 10008, 9 October 2008.
[3]
A. Z. Broder, M. Fontoura, and et al. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR, pages 231--238, 2007.
[4]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008.
[5]
S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5):75 -- 174, 2010.
[6]
S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proc. Natl. Acad. Sci., 104:36--41, 2007.
[7]
J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query similarity. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, pages 259--268, New York, NY, USA, 2011. ACM.
[8]
J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09, pages 267--274, New York, NY, USA, 2009. ACM.
[9]
Y. Hu, Y. Qian, H. Li, D. Jiang, J. Pei, and Q. Zheng. Mining query subtopics from search log data. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 305--314, New York, NY, USA, 2012. ACM.
[10]
D. Jiang, K. W.-T. Leung, and W. Ng. Query intent mining with multiple dimensions of web search data. World Wide Web, pages 1--23, 19 March 2015.
[11]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
[12]
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th International Conference on World Wide Web, WWW '08, pages 91--100, New York, NY, USA, 2008. ACM.
[13]
F. Radlinski, M. Szummer, and N. Craswell. Inferring query intent from reformulations and clicks. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 1171--1172, New York, NY, USA, 2010. ACM.
[14]
E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query refinements by user intent. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 841--850, New York, NY, USA, 2010. ACM.
[15]
M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web, WWW '06, pages 377--386, New York, NY, USA, 2006. ACM.
[16]
V. Simonet. Classifying youtube channels: A practical system. In Proceedings of the 22Nd International Conference on World Wide Web Companion, WWW '13 Companion, pages 1295--1304, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee.
[17]
J. Tang, Z. Meng, X. Nguyen, Q. Mei, and M. Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In Proceedings of The 31st International Conference on Machine Learning, pages 190--198, Beijing, 2014.
[18]
S. I. Wang and C. D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the ACL, pages 90--94, 2012.
[19]
S. Yang, A. Kolcz, A. Schlaikjer, and P. Gupta. Large-scale high-precision topic modeling on twitter. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1907--1916, New York, NY, USA, 2014. ACM.
[20]
Q. Ye, W. Bin, and W. Bai. The Influence of Technology on Social Network Analysis and Mining, volume 6, chapter 16 Detecting Communities in Massive Networks Efficiently with Flexible Resolution, pages 373--392. Springer, 2013.

Cited By

View all
  • (2024)Understanding user intent modeling for conversational recommender systems: a systematic literature reviewUser Modeling and User-Adapted Interaction10.1007/s11257-024-09398-x34:5(1643-1706)Online publication date: 1-Nov-2024
  • (2018)Using Node Identifiers and Community Prior for Graph-Based ClassificationData Science and Engineering10.1007/s41019-018-0062-83:1(68-83)Online publication date: 16-Mar-2018
  • (2017)Tracking millions of query intents with StarrySky and its applicationsWeb Intelligence10.3233/WEB-17036215:3(233-250)Online publication date: 11-Aug-2017
  • Show More Cited By

Index Terms

  1. StarrySky: A Practical System to Track Millions of High-Precision Query Intents

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
    April 2016
    1094 pages
    ISBN:9781450341448

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 11 April 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. community detection
    2. large-scale multi-class query classification
    3. query intent
    4. search log mining

    Qualifiers

    • Abstract

    Conference

    WWW '16
    Sponsor:
    • IW3C2
    WWW '16: 25th International World Wide Web Conference
    April 11 - 15, 2016
    Québec, Montréal, Canada

    Acceptance Rates

    WWW '16 Companion Paper Acceptance Rate 115 of 727 submissions, 16%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Understanding user intent modeling for conversational recommender systems: a systematic literature reviewUser Modeling and User-Adapted Interaction10.1007/s11257-024-09398-x34:5(1643-1706)Online publication date: 1-Nov-2024
    • (2018)Using Node Identifiers and Community Prior for Graph-Based ClassificationData Science and Engineering10.1007/s41019-018-0062-83:1(68-83)Online publication date: 16-Mar-2018
    • (2017)Tracking millions of query intents with StarrySky and its applicationsWeb Intelligence10.3233/WEB-17036215:3(233-250)Online publication date: 11-Aug-2017
    • (2016)Query Splitting for Context-Driven Federated Recommendations2016 27th International Workshop on Database and Expert Systems Applications (DEXA)10.1109/DEXA.2016.049(193-197)Online publication date: Sep-2016
    • (2016)Enhanced Query Classification with Millions of Fine-Grained TopicsWeb-Age Information Management10.1007/978-3-319-39958-4_10(120-131)Online publication date: 2-Jun-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media