Skip to main content
Log in

An efficient approach to suggesting topically related web queries using hidden topic model

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. J. Am. Soc. Inf. Sci. Technol. 58(12), 1793–1804 (2007)

    Article  Google Scholar 

  2. Balfe, E., Smyth, B.: An analysis of query similarity in collaborative Web search. In: Advances in Information Retrieval, 27th European Conference on IR Research, (ECIR’05), pp. 330–344. Santiago de Compostela, Spain (2005)

  3. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 131–140. Banff, Alberta, Canada (2007)

  4. Beeferman, D., Berger, A.L.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 407–416. Boston, MA (2000)

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In: Proceedings of Text REtrieval Conference (TREC’03), pp. 69–080. Gaithersburg, Maryland (2003)

    Google Scholar 

  7. Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD’08), pp. 875–883. Las Vegas, Nevada (2008)

  8. Carman, M.J., Crestani, F., Harvey, M., Baillie, M.: Towards query log based personalization using topic models. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management(CIKM’10), pp. 1849–1852. Toronto, Ontario (2010)

  9. Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th international conference on World Wide Web, (WWW’05), pp. 2–11. Chiba, Japan (2005)

  10. Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the Web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07), pp. 7–14. Amsterdam, The Netherlands (2007)

  11. Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (CIKM’05), pp. 704–711. Bremen, Germany (2005)

  12. Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)

    Article  Google Scholar 

  13. Dolog, P., Stuckenschmidt, H., Wache, H., Diederich, J.: Relaxing rdf queries based on user and domain preferences. J. Intell. Inf. Syst. 33(3), 239–260 (2009)

    Article  Google Scholar 

  14. Eda, T., Yoshikawa, M., Uchiyama, T., Uchiyama, T.: The effectiveness of latent semantic analysis for building up a bottom-up taxonomy from folksonomy tags. World Wide Web 12(4), 421–440 (2009)

    Article  Google Scholar 

  15. Fan, J., Wu, H., Li, G., Zhou, L.: Suggesting topic-based query terms as you type. In: Advances in Web Technologies and Applications, Proceedings of the 12th Asia-Pacific Web Conference(APWeb’10), pp. 61–67. Buscan, Korea (2010)

  16. Fitzpatrick, L., Dent, M.: Automatic feedback using past queries: social searching? In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313. Philadelphia, PA (1997)

  17. Fonseca, B.M., Golgher, P.B., Pôssas, B., Ribeiro-Neto, B.A., Ziviani, N.: Concept-based interactive query expansion. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, (CIKM’05), pp. 696–703 (2005)

  18. Fu, L., lian Goh, D.H., boon Foo, S.S.: The effect of similarity measures on the quality of query clusters. J. Inf. Sci. 30(5), 396–407 (2004)

    Article  Google Scholar 

  19. Glance, N.S.: Community search assistant. In: Proceedings of the 2001 International Conference on Intelligent User Interfaces (IUI’01), pp. 91–96. Santa Fe, NM (2001)

  20. He, X., Yan, J., Ma, J., Liu, N., Chen, Z.: Query topic detection for reformulation. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 1187–1188. Banff, Alberta (2007)

  21. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99), pp. 289–296. Stockholm, Sweden (1999)

  22. Huang, S., Zhao, Q., Mitra, P., Giles, C.L.: Hierarchical location and topic based query expansion. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI’08), pp. 1150–1155. Chicago, Illinois (2008)

  23. Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32(1), 5–17 (1998)

    Article  Google Scholar 

  24. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), pp. 538–543. Edmonton, Alberta (2002)

  25. Kelly, D., Cushing, A., Dostert, M., Niu, X., Gyllstrom, K.: Effects of popularity and quality on the usage of query suggestions during information search. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems(CHI’10), pp. 45–54. Atlanta, Georgia (2010)

  26. Li, L., Otsuka, S., Kitsuregawa, M.: Query recommendation using large-scale web access logs and Web page archive. In: Proceedings of 19th International Conference on Database and Expert Systems Applications (DEXA’08), pp. 134–141. Turin, Italy (2008)

  27. Li, L., Otsuka, S., Kitsuregawa, M.: Finding related search engine queries by Web community based query enrichment. World Wide Web 13(1–2), 121–142 (2010)

    Article  Google Scholar 

  28. Li, L., Yang, Z., Liu, L., Kitsuregawa, M.: Query-url bipartite based approach to personalized query recommendation. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,(AAAI’08), pp. 1189–1194. Chicago, Illinois (2008)

  29. Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)

    Article  MATH  Google Scholar 

  30. Ma, H., Lyu, M.R., King, I.: Diversifying query suggestion results. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10). Atlanta, Georgia (2010)

  31. Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 709–718. Napa Valley, California (2008)

  32. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

  33. Mei, Q., Zhou, D., Church, K.W.: Query suggestion using hitting time. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 469–478. Napa Valley, California (2008)

  34. Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93), pp. 183–190 (1993)

  35. Ravid, G., Rafaeli, S.: Popularity and findability through log analysis of search terms and queries: the case of a multilingual public service web site. IEEE Trans. Inf. Theory 33(5), 567–583 (2007)

    Google Scholar 

  36. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41(4), 288–297 (1990)

    Article  Google Scholar 

  37. Shi, X., Yang, C.C.: Mining related queries from web search engine query logs using an improved association rule mining model. J. Am. Soc. Inf. Sci. Technol. 58(12), 1871–1883 (2007)

    Article  MathSciNet  Google Scholar 

  38. Song, Y., wei He, L.: Optimal rare query suggestion with implicit user feedback. In: Proceedings of the 19th International Conference on World Wide Web (WWW’10), pp. 901–910. Raleigh, North Carolina (2010)

  39. Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05), pp. 418–425. Houston, Texas (2005)

  40. Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), pp. 382–389. Seattle, Washington (2006)

  41. Vechtomova, O., Wang, Y.: A study of the effect of term proximity on query expansion. J. Inf. Sci. 32(4), 324–333 (2006)

    Article  Google Scholar 

  42. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 61–69. Dublin, Ireland (1994)

  43. Wen, J.R., Nie, J.Y., Zhang, H.: Query clustering using user logs. ACM Trans. Inf. Sys. 20(1), 59–81 (2002)

    Article  Google Scholar 

  44. Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Sys. 18(1), 79–112 (2000)

    Article  Google Scholar 

  45. Yang, J.M., Cai, R., Jing, F., Wang, S., Zhang, L., Ma, W.Y.: Search-based query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 1439–1440. Napa Valley, California (2008)

  46. Zhu, Y., Gruenwald, L.: Query expansion using Web access log files. In: Proceedings of the 16th International Conference on Database and Expert Systems Applications (DEXA’05), pp. 686–695. Copenhagen, Denmark (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Xu, G., Yang, Z. et al. An efficient approach to suggesting topically related web queries using hidden topic model. World Wide Web 16, 273–297 (2013). https://doi.org/10.1007/s11280-011-0151-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0151-3

Keywords

Navigation