Skip to main content

Continuous Topically Related Queries Grouping and Its Application on Interest Identification

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7825))

Included in the following conference series:

Abstract

When a user performs a search on a search engine, the query reflects a particular interest of the user. The interest may either span a short session of a few minutes, or a long period of time like months or years. In the latter, the user may perform searching related to a particular interest from time to time, making the queries related to that interest sporadically distributed in the search log. Identification of these topically related queries is very meaningful, since it can help the search engine better understand the user’s interest and hence deliver better results to the user. In this paper, we propose a method to aggregate topically related queries into interests regardless of where the queries appear in the search log. It first identifies sets of continuous topically-related queries called CTQs and then clusters similar CTQs together to form interests. In order to identify the CTQs accurately, we propose the Pattern-Concept-Time-Based (PCTB) method that utilizes query reformulation patterns, concepts behind the queries and the user’s continuous searching behavior to compute the similarity between two queries. To evaluate the effectiveness of our approach, we employ the AOL search log as our test dataset and develop a search middleware on top of Google for extracting concepts related to the queries. Experimental results show that our method can obtain a high precision and recall on identifying CTQs, which in turn improves the performance of interest identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Hurtado, C.A., Mendoza, M.: Query recommendation using query logs in search engines. In: Lindner, W., Fischer, F., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proc. of the SIGKDD Conference (2000)

    Google Scholar 

  3. Buzikashvili, N., Jansen, B.J.: Limits of the web log analysis artifacts. In: Workshop on Logging Traces of Web Activity: The Mechanics of Data Collection. WWW Conference (2006)

    Google Scholar 

  4. Church, K.W., Gale, W., Hanks, P., Hindle, D.: Using statistics in lexical analysis. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon (1991)

    Google Scholar 

  5. Downey, D.: Models of searching and browsing: languages, studies and applications. In: Proc. of the IJCAI Conference (2007)

    Google Scholar 

  6. Gayo-Avello, D.: A survey on session detection methods in query logs and a proposal for future evaluation. Information Sciences, 1822–1843 (2009)

    Google Scholar 

  7. He, D., Göker, A., Harper, D.J.: Combining evidence for automatic web session identification. Information Processing Management 38 (2002)

    Google Scholar 

  8. Huang, J., Efthimiadis, E.N.: Analyzing and evaluating query reformulation strategies in web search logs. In: CIKM (2009)

    Google Scholar 

  9. Lau, T., Horvitz, E.: Patterns of search: analyzing and modeling web query refinement. In: Proc. of the UM Conference (1999)

    Google Scholar 

  10. Leung, K., Lee, D.: Deriving concept-based user profiles from search engine logs. IEEE Transactions on Knowledge and Data Engineering 99(1) (2007)

    Google Scholar 

  11. Leung, K.W.-T., Fung, H.Y., Lee, D.L.: Constructing concept relation network and its application to personalized web search. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT 2011, pp. 413–424 (2011)

    Google Scholar 

  12. Ozmutlu, H.C., Ozmutlu, F.S.: Automatic new topic identification in search engine transaction logs (2006)

    Google Scholar 

  13. Ozmutlu, H.C., Ozmutlu, S.: Cross-validation of neural network applications for automatic new topic identification. American Society for Information Science and Technology (2008)

    Google Scholar 

  14. Ozmutlu, S., Ozmutlu, H.C., Buyuk, B.: Using conditional probabilities for automatic new topic identification. Online Information Review (2007)

    Google Scholar 

  15. Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: Proc. of the INFOSCALE Conference (2006)

    Google Scholar 

  16. Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum, 6–12 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, P., Leung, K.WT., Lee, D.L. (2013). Continuous Topically Related Queries Grouping and Its Application on Interest Identification. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37487-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37487-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37486-9

  • Online ISBN: 978-3-642-37487-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics