Abstract
Capturing users’ information needs is essential in decreasing the barriers in information access. This paper mines sequences of actions called search scripts from search query logs which keep large-scale users’ search experiences. Search scripts can be applied to guide users to satisfy their information needs, improve the search effectiveness of retrieval systems, recommend advertisements at suitable places, and so on. Information quality, query ambiguity, topic diversity, and document relevancy are four major challenging issues in search script mining. In this paper, we determine the relevance of URLs for a query, adopt the Open Directory Project (ODP) categories to disambiguate queries and URLs, explore various features and clustering algorithms for intent clustering, identify critical actions from each intent cluster to form a search script, generate a nature language description for each action, and summarize a topic for each search script. Experiments show that the complete link hierarchical clustering algorithm with the features of query terms, relevant URLs, and disambiguated ODP categories performs the best. Applying the intent clusters created by the best model to intent boundary identification achieves an \(F\) score of 0.6666. The intent clusters then are applied to generate search scripts.






Similar content being viewed by others
References
Altman A, Tennenholtz M (2005) Ranking systems: the PageRank axioms. In: Proceedings of the 6th ACM conference on electronic commerce, pp 1–8
Ashkan A, Clarke C (2012) Impact of query intent and search context on clickthrough behavior in sponsored search. Knowl Inf Syst 34(2):425–452
Baeza-Yates R, Hurtado C, Mendoza M (2005) Query recommendation using query logs in search engines. In: Current trends in database technology—EDBT 2004 workshops, pp 588–596
Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 407–416
Beitzel SM, Jensen EC, Chowdhury A, Frieder O, Grossman D (2007) Temporal analysis of a very large topically categorized Web query log. J Am Soc Inf Sci Technol 58(2):166–178
Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1–3):217–239
Broder AZ (2002) A taxonomy of web search. SIGIR Forum 36(2):3–10
Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H (2008) Context-aware query suggestion by mining click-through and session data. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 875–883
Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceeding of the 18th ACM conference on information and knowledge management, pp 621–630
Craswell N, Jones R, Dupret G, Viegas E (2009) Proceedings of the 2009 workshop on Web search click data, p 95
Craswell N, Zoeter O, Taylor M, Ramsey B (2008) An experimental comparison of click position-bias models. In: Proceedings of the international conference on Web search and Web data mining, pp 87–94
El-Arini K, Guestrin C (2011) Beyond keyword search: discovering relevant scientific literature. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 439–447
Gu S, Yan J, Ji L, Yan S, Huang J, Liu N, Chen Y, Chen Z (2011) Cross domain random walk for query intent pattern mining from search engine log. In: Proceedings of the 2011 IEEE 11th international conference on data mining, pp 221–230
Guo F, Liu C, Kannan A, Minka T, Taylor M, Wang YM, Faloutsos C (2009) Click chain model in web search. In: Proceedings of the 18th international conference on World Wide Web, pp 11–20
Guo F, Liu C, Wang YM (2009) Efficient multiple-click models in web search. In: Proceedings of the 2nd ACM international conference on Web search and data mining, pp 124–131
Jansen BJ, Spink A, Blakely C, Koshman S (2007) Defining a session on Web search engines: research articles. J Am Soc Inf Sci Technol 58(6):862–871
Joachims T, Granka L, Pan B, Hembrooke H, Radlinski F, Gay G (2007) Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans Inf Syst 25(2):1–27
Landis R, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Li X, Wang Y-Y, Acero A (2008) Learning query intent from regularized click graphs. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 339–346
Manshadi M, Li X (2009) Semantic tagging of web search queries. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, vol 2, pp 861–869
Montgomery AL, Faloutsos C (2001) Identifying Web browsing trends and patterns. Computer 34(7): 94–95
Muhlestein D, Lim S (2011) Online learning with social computing based interest sharing. Knowl Inf Syst 26(1):31–58
Nguyen V, Kan M-Y (2007) Functional faceted Web query analysis. In: Query log analysis: social and technological challenges. A workshop at the 16th international World Wide Web conference
Perugini S (2008) Symbolic links in the open directory project. Int J Inf Process Manag 44(2):910–930
Saleh B, Masseglia F (2011) Discovering frequent behaviors: time is an essential element of the context. Knowl Inf Syst 28(2):311–331
Senkul P, Salin S (2012) Improving pattern quality in web usage mining by using semantic information. Knowl Inf Syst 30(3):527–541
Shen X, Dumais S, Horvitz E (2005) Analysis of topic dynamics in web search. In: Special interest tracks and posters of the 14th international conference on World Wide Web, pp 1102–1103
Shie BE, Hsiao HF, Tseng V (2012) Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowl Inf Syst. doi:10.1007/s10115-012-0483-z
Silverstein C, Henzinger M, Marais H, Moricz M (1998) Analysis of a very large AltaVista query log. Digital Equipment Corporation, Technical Note
Spink A, Jansen BJ, Wolfram D, Saracevic T (2002) From E-sex to E-commerce: Web search changes. Computer 35(3):107–109
Wan M, Jönsson A, Wang C, Li L, Yang Y (2011) Web user clustering and Web prefetching using random indexing with weight functions. Knowl Inf Syst 33(1):89–115
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 448–456
Wang CJ, Lin KHY, Chen HH (2010) Intent boundary detection in search query logs. In: Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 749–750
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Wen JR, Nie J-Y, Zhang H-J (2001) Clustering user queries of a search engine. In: Proceedings of the 10th international conference on World Wide Web, pp 162–168
Zhang W, Jones R (2007) Comparing click logs and editorial labels for training query rewriting. In: Query log analysis: social and technological challenges. A workshop at the 16th international World Wide Web conference
Zhang Z, Nasraoui O (2006) Mining search engine query logs for query recommendation. In: Proceedings of the 15th international conference on World Wide Web, pp 1039–1040
Acknowledgments
Research of this paper was partially supported by National Science Council, under the contract 99-2221-E-002-167-MY3. We are also grateful to Microsoft Research Asia for the support of MSN Search Query Log excerpt.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, CJ., Chen, HH. Intent mining in search query logs for automatic search script generation. Knowl Inf Syst 39, 513–542 (2014). https://doi.org/10.1007/s10115-013-0620-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0620-3