Skip to main content
Log in

Methods for web revisitation prediction: survey and experimentation

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

More than 45 % of the pages that we visit on the Web are pages that we have visited before. Browsers support revisits with various tools, including bookmarks, history views and URL auto-completion. However, these tools only support revisits to a small number of frequently and recently visited pages. Several browser plugins and extensions have been proposed to better support the long tail of less frequently visited pages, using recommendation and prediction techniques. In this article, we present a systematic overview of revisitation prediction techniques, distinguishing them into two main types and several subtypes. We also explain how the individual prediction techniques can be combined into comprehensive revisitation workflows that achieve higher accuracy. We investigate the performance of the most important workflows and provide a statistical analysis of the factors that affect their predictive accuracy. Further, we provide an upper bound for the accuracy of revisitation prediction using an ‘oracle’ that discards non-revisited pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. See http://webhistoryproject.blogspot.com.

  2. http://www.mozilla.com/en-US/firefox.

  3. https://www.google.com/intl/en/chrome/browser.

  4. http://www.delicious.com.

  5. http://infoaxe.com.

  6. http://www.webmynd.com.

  7. As explained in Sect. 4.1.4, this is ensured through normalization.

  8. As explained in Sect. 4.1.4, this is ensured through normalization.

  9. As explained in Sect. 4.1.4, this is ensured through normalization.

  10. For more details, see https://developer.mozilla.org/en/The_Places_frecency_algorithm.

  11. A temporal unit is measured in milliseconds and expresses any time interval, ranging from seconds, minutes and hours to days, weeks and months.

  12. SUPRA stands for “SUrfing PRediction FrAmework”. The code is publicly available at http://sourceforge.net/projects/supraproject.

  13. http://sourceforge.net/projects/supraproject.

  14. http://jung.sourceforge.net.

References

  • Adar, E., Teevan, J., Dumais, S.T.: Large scale analysis of web revisitation patterns. In: Proceedings of the 26th Conference on Human Factors in Computing Systems, CHI 2008, 2008, Florence, Italy, April 5–10 2008, pp. 1197–1206 (2008)

  • Adar, E., Teevan, J., Dumais, S.T.: Resonance on the web: web dynamics and revisitation patterns. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Boston, MA, USA, 4–9 April 2009, pp. 1381–1390 (2009)

  • Adomavicius, G., Tuzhilin, A.: Using data mining methods to build customer profiles. IEEE Comput. 34(2), 74–82 (2001)

    Article  Google Scholar 

  • Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., 26–28 May 1993, pp. 207–216 (1993)

  • Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, 12–15 Sept 1994, Santiago de Chile, Chile, pp. 487–499 (1994)

  • Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, 6–10 March 1995, Taipei, Taiwan, pp. 3–14 (1995)

  • Albrecht, D.W., Zukerman, I., Nicholson, A.E.: Pre-sending documents on the WWW: A comparative study. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, IJCAI 99, Stockholm, Sweden, 31 July– 6 Aug 1999, pp. 1274–1279 (1999)

  • Arcuri, M., Coon, T., Johnson, J., Manning, A., Van Tilburg, M.: Adaptive menus. US Patent 6,121,968 (2000)

  • Awad, M., Khan, L., Thuraisingham, B.M.: Predicting WWW surfing using multiple evidence combination. VLDB J. Int. J. Very Large Data Bases 17(3), 401–417 (2008)

    Article  Google Scholar 

  • Billsus, D., Pazzani, M.J.: A hybrid user model for news story classification. In: Proceedings of the 7th International Conference on User Modeling, UM 99, Banff, Canada, pp. 99–108 (1999)

  • Brank, J., Milic-Frayling, N., Frayling, A., Smyth, G.: Predictive algorithms for browser support of habitual user activities on the web. In: 2005 IEEE / WIC/ACM International Conference on Web Intelligence (WI 2005), 19–22 Sept 2005, Compiegne, France, pp. 629–635 (2005)

  • Brusilovsky, P.: Adaptive hypermedia. User Model. User-Adap. Interact. 11(1–2), 87–110 (2001)

    Article  MATH  Google Scholar 

  • Catledge, L.D., Pitkow, J.E.: Characterizing browsing strategies in the world-wide web. Comput. Netw. ISDN Syst. 27(6), 1065–1073 (1995)

    Article  Google Scholar 

  • Chierichetti, F., Kumar, R., Tomkins, A.: Stochastic models for tabbed browsing. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 241–250 (2010)

  • Cockburn, A., McKenzie, B.: What do web users do? An empirical analysis of web use. Int. J. Hum Comput Stud. 54(6), 903–922 (2001)

    Article  MATH  Google Scholar 

  • Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, 29 March–2 April 2009, Shanghai, China, pp. 138–149 (2009)

  • Crabtree, I.B., Soltysiak, S.J.: Identifying and tracking changing interests. Int. J. Digit. Libr 2(1), 38–53 (1998)

    Article  Google Scholar 

  • Deshpande, M., Karypis, G.: Selective markov models for predicting web page accesses. ACM Trans. Internet Technol. 4(2), 163–184 (2004)

    Article  Google Scholar 

  • Ding, Y., Li, X.: Time weight collaborative filtering. In: Proceedings of the 14th ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, 31 Oct–5 Nov 2005, pp. 485–492 (2005)

  • El-Sayed, M., Ruiz, C., Rundensteiner, E.A.: Fs-miner: efficient and incremental mining of frequent sequence patterns in web logs. In: 6th ACM CIKM International Workshop on Web Information and Data Management (WIDM 2004), Washington, DC, USA, 12–13 Nov 2004, pp. 128–135 (2004)

  • Findlater, L., McGrenere, J.: A comparison of static, adaptive, and adaptable menus. In: Proceedings of the 2004 Conference on Human Factors in Computing Systems, CHI 2004, Vienna, Austria, 24–29 April 2004, pp. 89–96 (2004)

  • Fitchett, S., Cockburn, A.: Accessrank: predicting what users will do next. In: Proceedings of the 2012 CHI Conference on Human Factors in Computing Systems, Austin, TX, USA, 05–10 May 2012, pp. 2239–2242 (2012)

  • Fox, S., Karnawat, K., Mydland, M., Dumais, S.T., White, T.: Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst 23(2), 147–168 (2005)

    Article  Google Scholar 

  • Fu, X., Budzik, J., Hammond, K.J.: Mining navigation history for recommendation. In: Proceedings of the 5th International Conference on Intelligent User Interfaces, IUI 00, New Orleans, Louisiana, USA, pp. 106–112 (2000)

  • Gaul, W., Schmidt-Thieme, L.: Mining generalized association rules for sequential and path data. In: Proceedings of the 2001 IEEE International Conference on Data Mining, 29 Nov–2 Dec 2001, San Jose, California, USA, pp. 593–596 (2001)

  • Géry, M., Haddad, M.H.: Evaluation of web usage mining approaches for user’s next request prediction. In: 5th ACM CIKM International Workshop on Web Information and Data Management (WIDM 2003), New Orleans, Louisiana, USA, 7–8 Nov 2003, pp. 74–81 (2003)

  • Hawking, D., Craswell, N., Bailey, P., Griffiths, K.: Measuring search engine quality. Inf. Retr 4(1), 33–59 (2001)

    Article  MATH  Google Scholar 

  • Kawase, R., Papadakis, G., Herder, E., Nejdl, W.: The impact of bookmarks and annotations on refinding information. In: HT’10, Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, Toronto, Ontario, Canada, 13–16 June 2010, pp. 29–34 (2010)

  • Kawase, R., Papadakis, G., Herder, E., Nejdl, W.: Beyond the usual suspects: context-aware revisitation support. In: HT’11, Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, Eindhoven, The Netherlands, 6–9 June 2011, pp. 27–36 (2011)

  • Koren, Y.: Collaborative filtering with temporal dynamics. Commun. ACM 53(4), 89–97 (2010)

    Article  Google Scholar 

  • Koychev, I., Schwab, I.: Adaptation to drifting user’s interests. In: Proceedings of ECML Workshop: Machine Learning in New Information Age, Barcelona, Spain, pp. 39–46 (2000)

  • Kumar, R., Tomkins, A.: A characterization of online browsing behavior. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 561–570 (2010)

  • Lee, D., Choi, J., Kim, J.H., Noh, S.H., Min, S.L., Cho, Y., Kim, C.S.: On the existence of a spectrum of policies that subsumes the least recently used (lru) and least frequently used (lfu) policies. In: Proceedings of the 1999 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Atlanta, Georgia, USA, pp. 134–143 (1999)

  • Lymberopoulos, D., Riva, O., Strauss, K., Mittal, A., Ntoulas, A.: Pocketweb: instant web browsing for mobile devices. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, 3–7 March 2012, pp. 1–12 (2012)

  • Mayer, M.: Web history tools and revisitation support: a survey of existing approaches and directions. Found. Trends Hum.-Comput. Interact. 2(3), 173–278 (2009)

    Article  Google Scholar 

  • Milic-Frayling, N., Jones, R., Rodden, K., Smyth, G., Blackwell, A.F., Sommerer, R.: Smartback: supporting users in back navigation. In: Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, 17–20 May 2004, pp. 63–71 (2004)

  • Mitchell, T.M., Caruana, R., Freitag, D., McDermott, J.P., Zabowski, D.: Experience with a learning personal assistant. Commun. ACM 37(7), 80–91 (1994)

    Article  Google Scholar 

  • Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000)

    Article  Google Scholar 

  • Morris, D., Morris, M.R., Venolia, G.: Searchbar: a search-centric web history for task resumption and information re-finding. In: Proceedings of the 2008 ACM CHI Conference on Human Factors in Computing Systems, Florence, Italy, 5–10 April 2008, pp. 1207–1216 (2008)

  • Obendorf, H., Weinreich, H., Herder, E., Mayer, M.: Web page revisitation revisited: implications of a long-term click-stream study of browser usage. In: Proceedings of the 2007 ACM CHI Conference on Human Factors in Computing Systems, San Jose, California, USA, 28 April–3 May 2007, pp. 597–606 (2007)

  • Papadakis, G., Kawase, R., Herder, E.: Client- and server-side revisitation prediction with SUPRA. In: 2nd International Conference on Web Intelligence, Mining and Semantics, WIMS ’12, Craiova, Romania, 6–8 June 2012, p. 14 (2012)

  • Papadakis, G., Niederée, C., Nejdl, W.: Decay-based ranking for social application content. In: WEBIST 2010, Proceedings of the 6th International Conference on Web Information Systems and Technologies, Volume 1, Valencia, Spain, 7–10 April 2010, pp. 276–281 (2010)

  • Parameswaran, A.G., Koutrika, G., Bercovitz, B., Garcia-Molina, H.: Recsplorer: recommendation algorithms based on precedence mining. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, 6–10 June 2010, pp. 87–98 (2010)

  • Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: P. Brusilovsky, A. Kobsa, W. Nejdl (eds.) The Adaptive Web, Methods and Strategies of Web Personalization, Lecture Notes in Computer Science, pp. 325–341. Springer (2007)

  • Sandvig, J.J., Mobasher, B., Burke, R.D.: Robustness of collaborative recommendation based on association rule mining. In: Proceedings of the 2007 ACM Conference on Recommender Systems, RecSys 2007, Minneapolis, MN, USA, 19–20 Oct 2007, pp. 105–112 (2007)

  • Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. J. Mach. Learn. Res. 6, 1265–1295 (2005)

    MATH  MathSciNet  Google Scholar 

  • Sugiyama, K., Hatano, K., Yoshikawa, M.: Adaptive web search based on user profile constructed without any effort from users. In: Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, 17–20 May 2004, pp. 675–684 (2004)

  • Takano, H., Winograd, T.: Dynamic bookmarks for the WWW. In: HYPERTEXT ’98. Proceedings of the 9th ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space—Structure in Hypermedia Systems, 20–24 June 1998, Pittsburgh, PA, USA, pp. 297–298 (1998)

  • Tauscher, L., Greenberg, S.: How people revisit web pages: empirical findings and implications for the design of history systems. Int. J. Hum. Comput. Stud. 47(1), 97–137 (1997)

    Article  Google Scholar 

  • Teevan, J., Dumais, S.T., Liebling, D.J.: A longitudinal study of how highlighting web content change affects people’s web interactions. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Atlanta, Georgia, USA, 10–15 April 2010, pp. 1353–1356 (2010)

  • Tyler, S.K., Teevan, J.: Large scale query log analysis of re-finding. In: Proceedings of the 3rd International Conference on Web Search and Web Data Mining, WSDM 2010, New York, NY, USA, 4–6 Feb 2010, pp. 191–200 (2010)

  • Weinreich, H., Obendorf, H., Herder, E., Mayer, M.: Off the beaten tracks: exploring three aspects of web navigation. In: Proceedings of the 15th international conference on World Wide Web, WWW 2006, Edinburgh, Scotland, UK, 23–26 May 2006, pp. 133–142 (2006)

  • Yang, H., Parthasarathy, S.: On the use of constrained associations for web log mining. In: O. Zaiane, J. Srivastava, M. Spiliopoulou, B. Masand (eds.) WEBKDD 2002—Mining Web Data for Discovering Usage Patterns and Profiles, Lecture Notes in Computer Science, pp. 100–118. Springer (2003)

  • Yao, Y., Shi, L., Wang, Z.: A markov prediction model based on page hierarchical clustering. Int. J. Distrib. Sens. Netw. 5(1), 89–89 (2009)

    Article  Google Scholar 

  • Zukerman, I., Albrecht, D.W., Nicholson, A.E.: Predicting users’ requests on the www. In: Proceedings of the 7th International Conference on User Modeling, UM 99, Banff, Canada, pp. 275–284 (1999)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eelco Herder.

Appendix: Notations and Acronyms

Appendix: Notations and Acronyms

In the following, we summarize the symbols used in this work:

  • 5HR \(\rightarrow \) the 500-requests drift method

  • AM \(\rightarrow \) an association matrix (order-neutral propagation method)

  • AR \(\rightarrow \) association rules

  • CTM \(\rightarrow \) the continuous connectivity transition matrix (propagation method)

  • DM \(\rightarrow \) the day-model drift method

  • DTM \(\rightarrow \) the decreasing continuous connectivity transition matrix (propagation method)

  • ED \(\rightarrow \) the exponential decay ranking method

  • FR \(\rightarrow \) the Frecency ranking method

  • HDM \(\rightarrow \) the hybrid day model (ranking method)

  • HHM \(\rightarrow \) the hybrid hour model (ranking method)

  • HQM \(\rightarrow \) the hybrid day quarter model (ranking method)

  • \({\mathbf {I}}_{\mathbf{p}_{\mathbf{i}}}\) \(\rightarrow \) the request indices of a page \(p_i\)

  • ITM \(\rightarrow \) the increasing continuous connectivity transition matrix (propagation method)

  • LD \(\rightarrow \) the logarithmic decay ranking method

  • LRU \(\rightarrow \) the last recently used ranking method

  • M \(\rightarrow \) the propagation matrix

  • MFU \(\rightarrow \) the most frequently used ranking method

  • MM \(\rightarrow \) the month-model drift method

  • \({\mathbf {P}}\) \(\rightarrow \) a set of Web pages

  • P \(\rightarrow \) the category of one-step workflows that consist solely of a propagation method

  • P+D \(\rightarrow \) the category of two-step workflows that combine a propagation method with a drift method

  • PD \(\rightarrow \) the polynomial decay ranking method

  • \({\mathbf {R}}\) \(\rightarrow \) a set of page requests corresponding to the navigational activity of a user

  • R \(\rightarrow \) the category of one-step workflows that consist solely of a ranking method

  • R+P \(\rightarrow \) the category of two-step workflows that combine a ranking method with a propagation method

  • R+P+D \(\rightarrow \) the category of three-step workflows that combine a ranking method with a propagation and a drift method

  • \({\mathbf {S}}\) \(\rightarrow \) a session

  • STM \(\rightarrow \) the simple connectivity transition matrix (propagation method)

  • \({\mathbf {T}}_{\mathbf{p}_\mathbf{i}}\) \(\rightarrow \) the request timestamps of a page \(p_i\)

  • TM \(\rightarrow \) a transition matrix (order-preserving propagation method)

  • TR \(\rightarrow \) the 1000-requests drift method

  • WM \(\rightarrow \) the week-model drift method

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Papadakis, G., Kawase, R., Herder, E. et al. Methods for web revisitation prediction: survey and experimentation. User Model User-Adap Inter 25, 331–369 (2015). https://doi.org/10.1007/s11257-015-9161-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-015-9161-7

Keywords

Navigation