skip to main content
10.1145/1367497.1367504acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Query-sets: using implicit feedback and query patterns to organize web documents

Published: 21 April 2008 Publication History

Abstract

In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve better results in non-supervised tasks, such as clustering and labeling, through the incorporation of usage data obtained from search engine queries. This type of model allows us to discover the motivations of users when visiting a certain document. The terms used in queries can provide a better choice of features, from the user's point of view, for summarizing the Web pages that were clicked from these queries. In this work we extend and formalize as "query model" an existing but not very well known idea of "query view" for document representation. Furthermore, we create a novel model based on "frequent query patterns" called the "query-set model". Our evaluation shows that both "query-based" models outperform the vector-space model when used for clustering and labeling documents in a website. In our experiments, the query-set model reduces by more than 90% the number of features needed to represent a set of documents and improves by over 90% the quality of the results. We believe that this can be explained because our model chooses better features and provides more accurate labels according to the user's expectations.

References

[1]
R. Baeza-Yates. Web usage mining in search engines. In Web Mining: Applications and Techniques, Anthony Scime, editor., pages 307--321. Idea Group, 2004.
[2]
R. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press Addison-Wesley, 1999.
[3]
R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query clustering for boosting web page ranking. In J. Favela, E. M. Ruiz, and E. Chávez, editors, AWIC, volume 3034 of Lecture Notes in Computer Science, pages 164--175. Springer, 2004.
[4]
R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Improving search engines by query clustering. JASIST, 58(12):1793--1804, October 2007.
[5]
R. A. Baeza-Yates and B. Poblete. A website mining model centered on user queries. In M. Ackermann, B. Berendt, M. Grobelnik, A. Hotho, D. Mladenic, G. Semeraro, M. Spiliopoulou, G. Stumme, V. Svátek, and M. van Someren, editors, EWMF/KDO, volume 4289 of Lecture Notes in Computer Science, pages 1--17. Springer, 2005.
[6]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In KDD, 1999. Boston, MA USA.
[7]
F. Beil, M. Ester, and X. Xu. Frequent term-based text clustering. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 436--442, 2002.
[8]
B. Berendt and M. Spiliopoulou. Analysis of navigation behaviour in web sites integrating multiple information systems. In VLDB Journal, Vol. 9, No. 1, pages 56--75, 2000.
[9]
M. Castellanos. Hotminer: Discovering hot topics from dirty text. In M. W. Berry, editor, Survey of Text Mining. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2003.
[10]
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1):5--32, 1999.
[11]
R. Cooley, P. Tan, and J. Srivastava. Websift: the web site information filter system. In KDD Workshop on Web Mining, San Diego, CA. Springer-Verlag, in press, 1999.
[12]
R. Cooley, P.-N. Tan, and J. Srivastava. Discovery of interesting usage patterns from web data. In WEBKDD, pages 163--182, 1999.
[13]
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.
[14]
M. Eirinaki, C. Lampos, S. Paulakis, and M. Vazirgiannis. Web personalization integrating content semantics and navigational patterns. Proceedings of the 6th annual ACM international workshop on Web information and data management, pages 72--79, 2004.
[15]
J. Fürnkranz. Exploiting structural information for text classification on the www. Intelligent Data Analysis, pages 487--498, 1999.
[16]
K. Hammouda and M. Kamel. Phrase-based document similarity based on an index graph model. Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 203, 2002.
[17]
I.-H. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 64--71, New York, NY, USA, 2003. ACM Press.
[18]
G. Karypis. CLUTO a clustering toolkit. Technical Report 02-017, Dept. of Computer Science, University of Minnesota, 2002. Available at http://www.cs.umn.edu/~cluto.
[19]
F. Masseglia, P. Poncelet, and M. Teisseire. Using data mining techniques on web access logs to dynamically improve hypertext structure. ACM SigWeb Letters vol. 8, num. 3, pages 1--19, 1999.
[20]
B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Commun. ACM, 43(8):142--151, 2000.
[21]
M. Perkowitz and O. Etzioni. Adaptive web sites: an AI challenge. In IJCAI (1), pages 16--23, 1997.
[22]
B. Poblete and R. Baeza-Yates. A content and structure website mining model. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 957--958, New York, NY, USA, 2006. ACM Press.
[23]
B. Póssas, N. Ziviani, J. Wagner Meira, and B. Ribeiro-Neto. Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst., 23(4):397--429, 2005.
[24]
D. Puppin, F. Silvestri, and D. Laforenza. Query-driven document partitioning and collection selection. In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, page 34, New York, NY, USA, 2006. ACM Press.
[25]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM Press.
[26]
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975.
[27]
M. Seno and G. Karypis. Lpminer: An algorithm for finding frequent itemsets using length-decreasing support constraint. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 505--512. IEEE Computer Society, 2001.
[28]
D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. A comparison of implicit and explicit links for web page classification. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 643--650, New York, NY, USA, 2006. ACM Press.
[29]
A. Sieg, B. Mobasher, S. Lytinen, and R. Burke. Using concept hierarchies to enhance user queries in web-based information retrieval. In IASTED International Conference on Artificial Intelligence and Applications, 2004.
[30]
M. Spiliopoulou. Web usage mining for web site evaluation. Commun. ACM, 43(8):127--134, 2000.
[31]
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12--23, 2000.
[32]
P. Tonella, F. Ricca, E. Pianta, and C. Girardi. Using keyword extraction for Web site clustering. Web Site Evolution, 2003. Theme: Architecture. Proceedings. Fifth IEEE International Workshop on, pages 41--48, 2003.
[33]
X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 87--94, New York, NY, USA, 2007. ACM.
[34]
Y. Wang and J. E. Hodges. Document clustering using compound words. In IC-AI, pages 307--313, 2005.
[35]
G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, and C.-J. Lu. Log mining to improve the performance of site search. In WISEW '02: Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02), page 238, Washington, DC, USA, 2002. IEEE Computer Society.
[36]
J. Zhu, J. Hong, and J. G. Hughes. Pagecluster: Mining conceptual link hierarchies from web log files for adaptive web site navigation. ACM Trans. Inter. Tech., 4(2):185--208, 2004.

Cited By

View all
  • (2022)“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational AgentsExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519870(1-7)Online publication date: 27-Apr-2022
  • (2020)Learning to Cluster Documents into Workspaces Using Large Scale Activity LogsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403291(2416-2424)Online publication date: 23-Aug-2020
  • (2019)Using Collection Shards to Study Retrieval Performance Effect SizesACM Transactions on Information Systems10.1145/331036437:3(1-40)Online publication date: 19-Mar-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature selection
  2. labeling
  3. search engine queries
  4. usage mining
  5. web page organization

Qualifiers

  • Research-article

Conference

WWW '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational AgentsExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519870(1-7)Online publication date: 27-Apr-2022
  • (2020)Learning to Cluster Documents into Workspaces Using Large Scale Activity LogsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403291(2416-2424)Online publication date: 23-Aug-2020
  • (2019)Using Collection Shards to Study Retrieval Performance Effect SizesACM Transactions on Information Systems10.1145/331036437:3(1-40)Online publication date: 19-Mar-2019
  • (2018)Developing a deeper understanding of positive customer feedbackJournal of Services Marketing10.1108/JSM-07-2016-026332:2(142-160)Online publication date: 9-Apr-2018
  • (2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
  • (2016)Query-Biased Partitioning for Selective SearchProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983706(1119-1128)Online publication date: 24-Oct-2016
  • (2016)Learning Query and Document Relevance from a Web-scale Click GraphProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911531(185-194)Online publication date: 7-Jul-2016
  • (2016)Recommendation engine feedback session strategy for mapping user search goals (FFS: Recommendation system)2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)10.1109/ICEEOT.2016.7755581(4572-4580)Online publication date: Mar-2016
  • (2015)Prediction of User Interests for Providing Relevant Information Using Relevance Feedback and Re-rankingInternational Journal of Intelligent Information Technologies10.4018/IJIIT.201510010411:4(55-71)Online publication date: 1-Oct-2015
  • (2015)Adaptive information retrieval system based on fuzzy profiling2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2015.7338012(1-8)Online publication date: Aug-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media