Skip to main content
Log in

A resource allocation policy for delay minimization in fetching capacitated feeds

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As social media services such as Twitter and Facebook are gaining popularity, the amount of information published from those services is explosively growing. Most of them use feeds to facilitate distribution of a huge volume of content they publish. In this context, many users subscribe to feeds to acquire up-to-date information through feed aggregation services, and recent real-time search engines also increasingly utilize feeds to promptly find recent web content when it is produced. Accordingly, it is necessary for such services to effectively fetch feeds for minimizing fetching delay, while at the same time maximizing the number of fetched entries. Fetching delay is a time lag between entry publication and retrieval, which is primarily incurred by finiteness of fetching resources. In this paper, we consider a polling-based approach among the methods applicable to fetching feeds, which bases on a specific schedule for visiting feeds. While the existing polling-based approaches have focused on the allocation of fetching resources to feeds in order to either reduce the fetching delay or increase the number of fetched entries, we propose a resource allocation policy that can optimize both objectives. Extensive experiments have been carried out to evaluate the proposed model, in comparison with the existing alternative methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C., et al.: A framework for projected clustering of high dimensional data streams. In: Proc. 30th Int’l Conf. Very Large Data Bases (VLDB) (2004)

  2. Arasu, A., et al.: Searching the web. ACM Trans. Internet Technology 1(1), 2–43 (2001)

    Article  Google Scholar 

  3. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley (1993)

  4. Brewington, B.E., Cybenko, G.: How dynamic is the web. In: Proc. 9th Intl’ World Wide Web Conf. (WWW) (2000)

  5. Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: Proc. 19th ACM SIGMOD Conf. (2000)

  6. Cho, J., Garcia-Molina, H.: Effective page refresh policies for web crawlers. ACM Trans. Database Syst. 28(4), 390–426 (2003)

    Article  Google Scholar 

  7. Cho, J., Garcia-Molina, H.: Estimating frequency of change. ACM Trans. Internet Technology 3(3), 256–290 (2003)

    Article  Google Scholar 

  8. Cho, J., Ntoulas, A.: Effective change detection using sampling. In: Proc. 28th Int’l Conf. Very Large Data Bases (VLDB) (2002)

  9. Coffman, E.G., Jr., Liu, Z., Webber, R.R.: Optimal robot scheduling for web search engines. J. Sched. 1(1), 15–29 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Del Corso, G.M., Gulli, A., Romani, F.: Ranking a stream of news. In: Proc. 14th Int’l World Wide Web Conf. (WWW) (2005)

  11. Edwards, J., McCurley, K., Tomlin, J.: An adaptive model of optimizing performance of an incremental web crawler. In: Proc. 9th Int’l World Wide Web Conf. (WWW) (2000)

  12. Fitzpatrick, B., et al.: PubSubHubbub core 0.3 (Online). Available: http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.3.html. Accessed 1 March 2011

  13. Geer, D.: Is it really time for real-time search. IEEE Computer, pp. 16–19 (2010)

  14. Google Reader http://www.google.com/reader. Accessed 1 Mar 2011

  15. Gurumurthy, S., et al.: Improving web search relevance and freshness with content previews. In: Proc. 19th ACM Int’l Conf. Information and Knowledge Management (CIKM) (2010)

  16. Hama, H., Zin, T.T., Tin, P.: Optimal crawling strategies for multimedia search engines. In: Proc. 5th Int’l Conf. Intelligent Information Hiding and Multimedia Signal Processing (2009)

  17. Han, S-K., et al.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web J. 12(4), 381–398 (2009)

    Article  Google Scholar 

  18. Han, Y., et al.: A new aggregation policy for RSS services. In: Proc. 17th Int’l World Wide Web Conf. (WWW) (2008)

  19. Jansen, B.J., Campbell, G., Gregg, M.: Real time search user behavior. In: Proc. 28th ACM Conf. Human Factors in Computing Systems (CHI) (2010)

  20. Koren, Y.: Collaborative filtering with temporal dynamics. In: Proc. 15th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD) (2009)

  21. Kumar, R., et al.: On the bursty evolution of blogspace. World Wide Web J. 8(2), 159–178 (2005)

    Article  Google Scholar 

  22. Lee, R., Wakamiya, S., Sumiya, K.: Discovery of unusual regional social activities using geo-tagged microblogs. World Wide Web J. 14(4), 321–349 (2011)

    Article  Google Scholar 

  23. Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: Proc. 17th Int’l World Wide Web Conf. (WWW) (2008)

  24. Pandey, S., Olston, C.: User-centric web crawling. In: Proc. 14th Int’l World Wide Web Conf. (WWW) (2005)

  25. Pandey, S., Ramamritham, K., Chakrabarti, S.: Monitoring the dynamic web to respond to continuous queries. In: Proc. 12th Int’l World Wide Web Conf. (WWW) (2003)

  26. Park, J., et al.: Searching social media streams on the web. IEEE Intell. Syst. 25(6), 24–31 (2010)

    Article  Google Scholar 

  27. Rao, X., Chen, L.: A distributed full-text top-k document dissemination system in distributed hash tables. World Wide Web J. doi:10.1007/s11280-010-0106-0

  28. Saint-Andre, P.: Extensible messaging and presence protocol (XMPP): core (Online). Available: http://tools.ietf.org/html/draft-ietf-xmpp-3920bis-05. Accessed 1 March 2011

  29. Shin, Y., Lim, J., Park, J.: Joint optimization of index freshness and coverage in real-time search engines. IEEE Trans. Knowl. Data Eng. (online publication, 2011). http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.144

  30. Sia, K.C., Cho, J., Cho, H.K.: Efficient monitoring algorithm for fast news alerts. IEEE Trans. Knowl. Data Eng. 19(7), 950–961 (2007)

    Article  Google Scholar 

  31. Sia, K.C., et al.: Monitoring RSS feeds based on user browsing pattern. In: Proc. the International Conference on Weblogs and Social Media (2007)

  32. Squicciarini, A.C., Sundareswaran, S.: Web-traveler policies for images on social networks. World Wide Web J. 12(4), 461–484 (2011)

    Article  Google Scholar 

  33. Sun, J., Gao, H., Yang, X.: Towards a quality-oriented real-time web crawler. Web Information Systems and Mining, LNCS, vol. 6318, p. 67 (2010)

  34. Taddesse, F.G., et al.: Semantic-based merging of RSS items. World Wide Web J. 13(1), 169–207 (2010)

    Article  MathSciNet  Google Scholar 

  35. Wolf, J.L., et al.: Optimal crawling strategies for web search engines. In: Proc. 11th Int’l World Wide Web Conf. (WWW) (2002)

  36. Yahoo Pipes http://pipes.yahoo.com. Accessed 1 Mar 2011

  37. Yao, J., et al.: Bursty event detection from collaborative tags. World Wide Web J. 15(2), 171–195 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonghun Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jee, C., Lim, J., Shin, Y. et al. A resource allocation policy for delay minimization in fetching capacitated feeds. World Wide Web 16, 91–109 (2013). https://doi.org/10.1007/s11280-012-0158-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-012-0158-4

Keywords

Navigation