Skip to main content
Log in

Intent mining in search query logs for automatic search script generation

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Capturing users’ information needs is essential in decreasing the barriers in information access. This paper mines sequences of actions called search scripts from search query logs which keep large-scale users’ search experiences. Search scripts can be applied to guide users to satisfy their information needs, improve the search effectiveness of retrieval systems, recommend advertisements at suitable places, and so on. Information quality, query ambiguity, topic diversity, and document relevancy are four major challenging issues in search script mining. In this paper, we determine the relevance of URLs for a query, adopt the Open Directory Project (ODP) categories to disambiguate queries and URLs, explore various features and clustering algorithms for intent clustering, identify critical actions from each intent cluster to form a search script, generate a nature language description for each action, and summarize a topic for each search script. Experiments show that the complete link hierarchical clustering algorithm with the features of query terms, relevant URLs, and disambiguated ODP categories performs the best. Applying the intent clusters created by the best model to intent boundary identification achieves an \(F\) score of  0.6666. The intent clusters then are applied to generate search scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.dmoz.org/about.html.

References

  1. Altman A, Tennenholtz M (2005) Ranking systems: the PageRank axioms. In: Proceedings of the 6th ACM conference on electronic commerce, pp 1–8

  2. Ashkan A, Clarke C (2012) Impact of query intent and search context on clickthrough behavior in sponsored search. Knowl Inf Syst 34(2):425–452

    Google Scholar 

  3. Baeza-Yates R, Hurtado C, Mendoza M (2005) Query recommendation using query logs in search engines. In: Current trends in database technology—EDBT 2004 workshops, pp 588–596

  4. Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 407–416

  5. Beitzel SM, Jensen EC, Chowdhury A, Frieder O, Grossman D (2007) Temporal analysis of a very large topically categorized Web query log. J Am Soc Inf Sci Technol 58(2):166–178

    Google Scholar 

  6. Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1–3):217–239

    Article  MATH  MathSciNet  Google Scholar 

  7. Broder AZ (2002) A taxonomy of web search. SIGIR Forum 36(2):3–10

    Article  Google Scholar 

  8. Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H (2008) Context-aware query suggestion by mining click-through and session data. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 875–883

  9. Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceeding of the 18th ACM conference on information and knowledge management, pp 621–630

  10. Craswell N, Jones R, Dupret G, Viegas E (2009) Proceedings of the 2009 workshop on Web search click data, p 95

  11. Craswell N, Zoeter O, Taylor M, Ramsey B (2008) An experimental comparison of click position-bias models. In: Proceedings of the international conference on Web search and Web data mining, pp 87–94

  12. El-Arini K, Guestrin C (2011) Beyond keyword search: discovering relevant scientific literature. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 439–447

  13. Gu S, Yan J, Ji L, Yan S, Huang J, Liu N, Chen Y, Chen Z (2011) Cross domain random walk for query intent pattern mining from search engine log. In: Proceedings of the 2011 IEEE 11th international conference on data mining, pp 221–230

  14. Guo F, Liu C, Kannan A, Minka T, Taylor M, Wang YM, Faloutsos C (2009) Click chain model in web search. In: Proceedings of the 18th international conference on World Wide Web, pp 11–20

  15. Guo F, Liu C, Wang YM (2009) Efficient multiple-click models in web search. In: Proceedings of the 2nd ACM international conference on Web search and data mining, pp 124–131

  16. Jansen BJ, Spink A, Blakely C, Koshman S (2007) Defining a session on Web search engines: research articles. J Am Soc Inf Sci Technol 58(6):862–871

    Article  Google Scholar 

  17. Joachims T, Granka L, Pan B, Hembrooke H, Radlinski F, Gay G (2007) Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans Inf Syst 25(2):1–27

    Article  Google Scholar 

  18. Landis R, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    MATH  MathSciNet  Google Scholar 

  19. Li X, Wang Y-Y, Acero A (2008) Learning query intent from regularized click graphs. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 339–346

  20. Manshadi M, Li X (2009) Semantic tagging of web search queries. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, vol 2, pp 861–869

  21. Montgomery AL, Faloutsos C (2001) Identifying Web browsing trends and patterns. Computer 34(7): 94–95

    Google Scholar 

  22. Muhlestein D, Lim S (2011) Online learning with social computing based interest sharing. Knowl Inf Syst 26(1):31–58

    Article  Google Scholar 

  23. Nguyen V, Kan M-Y (2007) Functional faceted Web query analysis. In: Query log analysis: social and technological challenges. A workshop at the 16th international World Wide Web conference

  24. Perugini S (2008) Symbolic links in the open directory project. Int J Inf Process Manag 44(2):910–930

    Article  Google Scholar 

  25. Saleh B, Masseglia F (2011) Discovering frequent behaviors: time is an essential element of the context. Knowl Inf Syst 28(2):311–331

    Article  Google Scholar 

  26. Senkul P, Salin S (2012) Improving pattern quality in web usage mining by using semantic information. Knowl Inf Syst 30(3):527–541

    Article  Google Scholar 

  27. Shen X, Dumais S, Horvitz E (2005) Analysis of topic dynamics in web search. In: Special interest tracks and posters of the 14th international conference on World Wide Web, pp 1102–1103

  28. Shie BE, Hsiao HF, Tseng V (2012) Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowl Inf Syst. doi:10.1007/s10115-012-0483-z

  29. Silverstein C, Henzinger M, Marais H, Moricz M (1998) Analysis of a very large AltaVista query log. Digital Equipment Corporation, Technical Note

  30. Spink A, Jansen BJ, Wolfram D, Saracevic T (2002) From E-sex to E-commerce: Web search changes. Computer 35(3):107–109

    Article  Google Scholar 

  31. Wan M, Jönsson A, Wang C, Li L, Yang Y (2011) Web user clustering and Web prefetching using random indexing with weight functions. Knowl Inf Syst 33(1):89–115

    Google Scholar 

  32. Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 448–456

  33. Wang CJ, Lin KHY, Chen HH (2010) Intent boundary detection in search query logs. In: Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 749–750

  34. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  Google Scholar 

  35. Wen JR, Nie J-Y, Zhang H-J (2001) Clustering user queries of a search engine. In: Proceedings of the 10th international conference on World Wide Web, pp 162–168

  36. Zhang W, Jones R (2007) Comparing click logs and editorial labels for training query rewriting. In: Query log analysis: social and technological challenges. A workshop at the 16th international World Wide Web conference

  37. Zhang Z, Nasraoui O (2006) Mining search engine query logs for query recommendation. In: Proceedings of the 15th international conference on World Wide Web, pp 1039–1040

Download references

Acknowledgments

Research of this paper was partially supported by National Science Council, under the contract 99-2221-E-002-167-MY3. We are also grateful to Microsoft Research Asia for the support of MSN Search Query Log excerpt.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hsin-Hsi Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, CJ., Chen, HH. Intent mining in search query logs for automatic search script generation. Knowl Inf Syst 39, 513–542 (2014). https://doi.org/10.1007/s10115-013-0620-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0620-3

Keywords

Navigation