Skip to main content
Log in

Learning Query Ambiguity Models by Using Search Logs

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Song R, Luo Z, Nie J Y, Yu Y, Hon HW. Identification of ambiguous queries in Web search. Information Processing and Management, 2008, 45(2): 216-229.

    Article  Google Scholar 

  2. Dou Z, Song R,Wen J R. A large-scale evaluation and analysis of personalized search strategies. In Proc. the 16th International Conference on World Wide Web (WWW2007), Banff, Canada, May 8-12, 2007, pp.581-590.

  3. Sanderson M. Ambiguous queries: Test collections need more sense. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, July 20-24, 2008, pp.499-506.

  4. Radlinski F, Dumais S. Improving personalized Web search using result diversification. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), Seattle, USA, Aug. 6-11, 2006, pp.691-692.

  5. Li Y, Zheng Z, Dai K. KDD CUP-2005 report: Facing a great challenge. SIGKDD Explor. Newsl., 2005, 7(2): pp.91-99.

    Article  Google Scholar 

  6. Vapnik V N. Principles of Risk Minimization for Learning Theory. Advances in Neural Information Processing Systems 4, Morgan Kaufmann, 1992, pp.831-838.

  7. Mihalcea R, Pedersen T. Advances in word sense disambiguation. In Tutorials at the 20th National Conference on Artificial Intelligence, Pittsburgh, USA, July 9-13, 2005.

  8. Krovetz R, Croft B W. Lexical ambiguity and information retrieval. ACM Trans. Inf. Syst., 1992, 10(2): 115-141.

    Article  Google Scholar 

  9. Voorhees E M. Using WordNet to disambiguate word senses for text retrieval. In Proc. the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), Pittsburgh, USA, June 27-July 1, 1993, pp.171-180.

  10. Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.

  11. Zhai C X, Cohen W W, Lafferty J. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proc. the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, Canada, Jul. 28-Aug. 1, 2003, pp.10-17.

  12. Zhai C X, Lafferty J. A risk minimization framework for information retrieval. Information Processing and Management, 2006, 42(1): 31-55.

    Article  MATH  Google Scholar 

  13. Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), Seattle, USA, Aug. 6-11, 2006, pp.429-436.

  14. Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In Proc. the Second ACM International Conference on Web Search and Data Mining (WSDM2009), Barcelona, Spain, Feb. 9-12, 2009, pp.5-14.

  15. Lee U, Liu Z, Cho J. Automatic identification of user goals in Web search. In Proc. the 14th International Conference on World Wide Web (WWW2005), Chiba, Japan, May 10-14, 2005, pp.391-400.

  16. Dai H (Kathy), Zhao L, Nie Z, Wen J R, Wang L, Li Y. Detecting online commercial intention (OCI). In Proc. the 15th International Conference on World Wide Web (WWW2006), Edinburgh, UK, May 23-26, 2006, pp.829-837.

  17. Gravano L, Hatzivassiloglou V, Lichtenstein R. Categorizing web queries according to geographical locality. In Proc. the Twelfth International Conference on Information and Knowledge Management (CIKM2003), New Orleans, USA, Nov. 2-8, 2003, pp.325-333.

  18. Platt J C. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advanced in Kernel Methods: Support Vector Learning, MIT Press, 1998.

  19. Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H. Contextaware query suggestion by mining click-through and session data. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2008), Las Vegas, USA, Aug. 24-27, 2008, pp.875-883.

  20. Shen D, Pan R, Sun J T, Pan J J, Wu K, Yin J, Yang Q. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. SIGKDD Explor. Newsl., 2005, 7(2): 100-110.

    Article  Google Scholar 

  21. Lin J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 1991, 37(1): 145-151.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruihua Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, R., Dou, Z., Hon, HW. et al. Learning Query Ambiguity Models by Using Search Logs. J. Comput. Sci. Technol. 25, 728–738 (2010). https://doi.org/10.1007/s11390-010-9360-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9360-y

Keywords

Navigation