ABSTRACT
Web usage mining is the application of data mining techniques to the data generated by the interactions of users with web servers. This kind of data, stored in server logs, represents a valuable source of information, which can be exploited to optimize the document-retrieval task, or to better understand, and thus, satisfy user needs.
Our research focuses on two important issues: improving search-engine performance through static caching of search results, and helping users to find interesting web pages by recommending news articles and blog posts.
Concerning the static caching of search results, we present the query covering approach. The general idea is to populate the cache with those documents that contribute to the result pages of a large number of queries, as opposed to caching the top documents of most frequent queries.
For the recommendation of web pages, we present a graph-based approach, which leverages the user-browsing logs to identify early adopters. These users discover interesting content before others, and monitoring their activity we can find web pages to recommend.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, 2009. Google ScholarDigital Library
- A. Anagnostopoulos, L. Becchetti, S. Leonardi, I. Mele, and P. Sankowski. Stochastic query covering. In WSDM, 2011. Google ScholarDigital Library
- R. Baeza-Yates, C. Hurtado, and M. Mendoza. Improving search engines by query clustering. J. Am. Soc. Inf. Sci. Technol., 58(12):1793--1804, 2007. Google ScholarDigital Library
- R. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel. Admission policies for caches of search engine results. In SPIRE, 2007. Google ScholarDigital Library
- P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In CIKM, 2008. Google ScholarDigital Library
- A. Bookstein. Information retrieval: A sequential learning process. Journal of the American Society for Information Science, 34(5):331--342, 1983.Google ScholarCross Ref
- G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Efficient diversification of web search results. Proc. VLDB Endow., 4:451--459, 2011. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google ScholarDigital Library
- C. L. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. In TREC, 2009.Google Scholar
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008. Google ScholarDigital Library
- A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, 2007. Google ScholarDigital Library
- T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarDigital Library
- A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. Learning influence probabilities in social networks. In WSDM, 2010. Google ScholarDigital Library
- F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh. Set covering with our eyes closed. In FOCS '08, pages 347--356. IEEE Computer Society, 2008. Google ScholarDigital Library
- E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarDigital Library
- I. Mele, F. Bonchi, and A. Gionis. The early-adopter graph and its application to web-page recommendation. In CIKM, 2012. Google ScholarDigital Library
- V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In SIGIR, 1995. Google ScholarDigital Library
- P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In CSCW, 1994. Google ScholarDigital Library
- C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. In ACM SIGIR Forum, pages 6--12, 1999. Google ScholarDigital Library
- A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: the public and their queries. J. Amer. Soc. Inform. Sci. Tech., 52(3):226--234, 2001. Google ScholarDigital Library
- J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl., 1(2):12--23, 2000. Google ScholarDigital Library
- R. W. White, M. Bilenko, and S. Cucerzan. Studying the use of popular destinations to enhance web search interaction. In SIGIR, 2007. Google ScholarDigital Library
- Y. Xie and D. O'Hallaron. Locality in search engine queries and its implications for caching. In IEEE Infocom 2002, pages 1238--1247, 2002.Google Scholar
- J. Zhu, J. Hong, and J. G. Hughes. Pagecluster: Mining conceptual link hierarchies from web log files for adaptive web site navigation. ACM Trans. Internet Technol., 4(2), 2004. Google ScholarDigital Library
Index Terms
- Web usage mining for enhancing search-result delivery and helping users to find interesting web content
Recommendations
Web usage mining: discovery and applications of usage patterns from Web data
Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern ...
Discovery of Interesting Association Rules Based on Web Usage Mining
MEDIACOM '10: Proceedings of the 2010 International Conference on Multimedia CommunicationsMining of association rules is an important research topic in web usage mining. The purpose of this paper is to research how to dig interesting association rules effectively from the Web logs after been preprocessed. Firstly, using the FP-growth ...
Web personalization based on usage mining
FDIA'09: Proceedings of the Third BCS-IRSG conference on Future Directions in Information AccessPersonalized or recommender systems are a particular type of information filtering applications. User profiles, representing the information needs and preferences of users, can be inferred from log or clickthrough data, or the ratings that users provide ...
Comments