research-article

Query-sets: using implicit feedback and query patterns to organize web documents

Authors:

Barbara Poblete,

Ricardo Baeza-YatesAuthors Info & Claims

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 41 - 50

https://doi.org/10.1145/1367497.1367504

Published: 21 April 2008 Publication History

Abstract

In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve better results in non-supervised tasks, such as clustering and labeling, through the incorporation of usage data obtained from search engine queries. This type of model allows us to discover the motivations of users when visiting a certain document. The terms used in queries can provide a better choice of features, from the user's point of view, for summarizing the Web pages that were clicked from these queries. In this work we extend and formalize as "query model" an existing but not very well known idea of "query view" for document representation. Furthermore, we create a novel model based on "frequent query patterns" called the "query-set model". Our evaluation shows that both "query-based" models outperform the vector-space model when used for clustering and labeling documents in a website. In our experiments, the query-set model reduces by more than 90% the number of features needed to represent a set of documents and improves by over 90% the quality of the results. We believe that this can be explained because our model chooses better features and provides more accurate labels according to the user's expectations.

References

[1]

R. Baeza-Yates. Web usage mining in search engines. In Web Mining: Applications and Techniques, Anthony Scime, editor., pages 307--321. Idea Group, 2004.

[2]

R. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press Addison-Wesley, 1999.

Digital Library

[3]

R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query clustering for boosting web page ranking. In J. Favela, E. M. Ruiz, and E. Chávez, editors, AWIC, volume 3034 of Lecture Notes in Computer Science, pages 164--175. Springer, 2004.

[4]

R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Improving search engines by query clustering. JASIST, 58(12):1793--1804, October 2007.

[5]

R. A. Baeza-Yates and B. Poblete. A website mining model centered on user queries. In M. Ackermann, B. Berendt, M. Grobelnik, A. Hotho, D. Mladenic, G. Semeraro, M. Spiliopoulou, G. Stumme, V. Svátek, and M. van Someren, editors, EWMF/KDO, volume 4289 of Lecture Notes in Computer Science, pages 1--17. Springer, 2005.

Digital Library

[6]

D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In KDD, 1999. Boston, MA USA.

Digital Library

[7]

F. Beil, M. Ester, and X. Xu. Frequent term-based text clustering. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 436--442, 2002.

Digital Library

[8]

B. Berendt and M. Spiliopoulou. Analysis of navigation behaviour in web sites integrating multiple information systems. In VLDB Journal, Vol. 9, No. 1, pages 56--75, 2000.

Digital Library

[9]

M. Castellanos. Hotminer: Discovering hot topics from dirty text. In M. W. Berry, editor, Survey of Text Mining. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2003.

[10]

R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1):5--32, 1999.

Digital Library

[11]

R. Cooley, P. Tan, and J. Srivastava. Websift: the web site information filter system. In KDD Workshop on Web Mining, San Diego, CA. Springer-Verlag, in press, 1999.

[12]

R. Cooley, P.-N. Tan, and J. Srivastava. Discovery of interesting usage patterns from web data. In WEBKDD, pages 163--182, 1999.

Digital Library

[13]

G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.

[14]

M. Eirinaki, C. Lampos, S. Paulakis, and M. Vazirgiannis. Web personalization integrating content semantics and navigational patterns. Proceedings of the 6th annual ACM international workshop on Web information and data management, pages 72--79, 2004.

Digital Library

[15]

J. Fürnkranz. Exploiting structural information for text classification on the www. Intelligent Data Analysis, pages 487--498, 1999.

Digital Library

[16]

K. Hammouda and M. Kamel. Phrase-based document similarity based on an index graph model. Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 203, 2002.

Digital Library

[17]

I.-H. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 64--71, New York, NY, USA, 2003. ACM Press.

Digital Library

[18]

G. Karypis. CLUTO a clustering toolkit. Technical Report 02-017, Dept. of Computer Science, University of Minnesota, 2002. Available at http://www.cs.umn.edu/~cluto.

[19]

F. Masseglia, P. Poncelet, and M. Teisseire. Using data mining techniques on web access logs to dynamically improve hypertext structure. ACM SigWeb Letters vol. 8, num. 3, pages 1--19, 1999.

Digital Library

[20]

B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Commun. ACM, 43(8):142--151, 2000.

Digital Library

[21]

M. Perkowitz and O. Etzioni. Adaptive web sites: an AI challenge. In IJCAI (1), pages 16--23, 1997.

Digital Library

[22]

B. Poblete and R. Baeza-Yates. A content and structure website mining model. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 957--958, New York, NY, USA, 2006. ACM Press.

Digital Library

[23]

B. Póssas, N. Ziviani, J. Wagner Meira, and B. Ribeiro-Neto. Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst., 23(4):397--429, 2005.

Digital Library

[24]

D. Puppin, F. Silvestri, and D. Laforenza. Query-driven document partitioning and collection selection. In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, page 34, New York, NY, USA, 2006. ACM Press.

Digital Library

[25]

F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, New York, NY, USA, 2005. ACM Press.

Digital Library

[26]

G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975.

Digital Library

[27]

M. Seno and G. Karypis. Lpminer: An algorithm for finding frequent itemsets using length-decreasing support constraint. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 505--512. IEEE Computer Society, 2001.

Digital Library

[28]

D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. A comparison of implicit and explicit links for web page classification. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 643--650, New York, NY, USA, 2006. ACM Press.

Digital Library

[29]

A. Sieg, B. Mobasher, S. Lytinen, and R. Burke. Using concept hierarchies to enhance user queries in web-based information retrieval. In IASTED International Conference on Artificial Intelligence and Applications, 2004.

[30]

M. Spiliopoulou. Web usage mining for web site evaluation. Commun. ACM, 43(8):127--134, 2000.

Digital Library

[31]

J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12--23, 2000.

Digital Library

[32]

P. Tonella, F. Ricca, E. Pianta, and C. Girardi. Using keyword extraction for Web site clustering. Web Site Evolution, 2003. Theme: Architecture. Proceedings. Fifth IEEE International Workshop on, pages 41--48, 2003.

[33]

X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 87--94, New York, NY, USA, 2007. ACM.

Digital Library

[34]

Y. Wang and J. E. Hodges. Document clustering using compound words. In IC-AI, pages 307--313, 2005.

[35]

G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, and C.-J. Lu. Log mining to improve the performance of site search. In WISEW '02: Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02), page 238, Washington, DC, USA, 2002. IEEE Computer Society.

Digital Library

[36]

J. Zhu, J. Hong, and J. G. Hughes. Pagecluster: Mining conceptual link hierarchies from web log files for adaptive web site navigation. ACM Trans. Inter. Tech., 4(2):185--208, 2004.

Digital Library

Cited By

Shani CLibov ATolmach SLewin-Eytan LMaarek YShahaf D(2022)“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational AgentsExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519870(1-7)Online publication date: 27-Apr-2022
https://dl.acm.org/doi/10.1145/3491101.3519870
Kong WBendersky MNajork MVargo BColagrosso MGupta RLiu YShah MRajan STang JPrakash B(2020)Learning to Cluster Documents into Workspaces Using Large Scale Activity LogsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403291(2416-2424)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403291
Ferro NKim YSanderson M(2019)Using Collection Shards to Study Retrieval Performance Effect SizesACM Transactions on Information Systems10.1145/331036437:3(1-40)Online publication date: 19-Mar-2019
https://dl.acm.org/doi/10.1145/3310364
Show More Cited By

Index Terms

Query-sets: using implicit feedback and query patterns to organize web documents
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Overviewing the Knowledge of a Query Keyword by Clustering Viewpoints of Web Search Information Needs
WAINA '15: Proceedings of the 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops

In this paper, we address the issue of how to overview the knowledge of a given query keyword. We especially focus on concerns of those who search for Web pages with a given query keyword, and study how to efficiently overview the whole list of Web ...
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink and click graphs. The hyperlink graph expresses link structure among Web pages, ...
Disjunctive Sets of Phrase Queries for Diverse Query Suggestion
WI '19: IEEE/WIC/ACM International Conference on Web Intelligence

This paper proposes a method of suggesting expanded queries that disambiguate the original Web query which has multiple interpretations. In order to produce a diverse set of queries including those corresponding to infrequent query intents, our method ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '08: Proceedings of the 17th international conference on World Wide Web

April 2008

1326 pages

ISBN:9781605580852

DOI:10.1145/1367497

General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '08

Sponsor:

ACM

WWW '08: The 17th International World Wide Web Conference

April 21 - 25, 2008

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
796
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shani CLibov ATolmach SLewin-Eytan LMaarek YShahaf D(2022)“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational AgentsExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519870(1-7)Online publication date: 27-Apr-2022
https://dl.acm.org/doi/10.1145/3491101.3519870
Kong WBendersky MNajork MVargo BColagrosso MGupta RLiu YShah MRajan STang JPrakash B(2020)Learning to Cluster Documents into Workspaces Using Large Scale Activity LogsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403291(2416-2424)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403291
Ferro NKim YSanderson M(2019)Using Collection Shards to Study Retrieval Performance Effect SizesACM Transactions on Information Systems10.1145/331036437:3(1-40)Online publication date: 19-Mar-2019
https://dl.acm.org/doi/10.1145/3310364
Nasr LBurton JGruber T(2018)Developing a deeper understanding of positive customer feedbackJournal of Services Marketing10.1108/JSM-07-2016-026332:2(142-160)Online publication date: 9-Apr-2018
https://doi.org/10.1108/JSM-07-2016-0263
Brusilovsky PSmyth BShapira B(2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
https://doi.org/10.1007/978-3-319-90092-6_7
Dai ZXiong CCallan JMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Query-Biased Partitioning for Selective SearchProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983706(1119-1128)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983706
Jiang SHu YKang CDaly TYin DChang YZhai CPerego RSebastiani FAslam JRuthven IZobel J(2016)Learning Query and Document Relevance from a Web-scale Click GraphProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911531(185-194)Online publication date: 7-Jul-2016
https://dl.acm.org/doi/10.1145/2911451.2911531
Todkar OGawali SKadam A(2016)Recommendation engine feedback session strategy for mapping user search goals (FFS: Recommendation system)2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)10.1109/ICEEOT.2016.7755581(4572-4580)Online publication date: Mar-2016
https://doi.org/10.1109/ICEEOT.2016.7755581
Kannan ARamesh LGanapathy SBhuvaneshwari RKulothungan KPandiyaraju V(2015)Prediction of User Interests for Providing Relevant Information Using Relevance Feedback and Re-rankingInternational Journal of Intelligent Information Technologies10.4018/IJIIT.201510010411:4(55-71)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.4018/IJIIT.2015100104
Alhabashneh OIqbal RDoctor FAmin S(2015)Adaptive information retrieval system based on fuzzy profiling2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2015.7338012(1-8)Online publication date: Aug-2015
https://doi.org/10.1109/FUZZ-IEEE.2015.7338012
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten