Skip to main content
Log in

Prefetching in Content Distribution Networks via Web Communities Identification and Outsourcing

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Content distribution networks (CDNs) improve scalability and reliability, by replicating content to the “edge” of the Internet. Apart from the pure networking issues of the CDNs relevant to the establishment of the infrastructure, some very crucial data management issues must be resolved to exploit the full potential of CDNs to reduce the “last mile” latencies. A very important issue is the selection of the content to be prefetched to the CDN servers. All the approaches developed so far, assume the existence of adequate content popularity statistics to drive the prefetch decisions. Such information though, is not always available, or it is extremely volatile, turning such methods problematic. To address this issue, we develop self-adaptive techniques to select the outsourced content in a CDN infrastructure, which requires no apriori knowledge of request statistics. We identify clusters of “correlated” Web pages in a site, called Web site communities, and make these communities the basic outsourcing unit. Through a detailed simulation environment, using both real and synthetic data, we show that the proposed techniques are very robust and effective in reducing the user-perceived latency, performing very close to an unfeasible, off-line policy, which has full knowledge of the content popularity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Annapureddy, S., Freedman, M.J., Mazieres D.: Shark: scaling file servers via cooperative caching. In: Proceedings of the USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA, 2–4 May 2005

  2. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)

    Article  MATH  Google Scholar 

  3. Bent, L., Rabinovich, M., Voelker, G.M., Xiao, Z.: Characterization of a large Web site population with implications for content delivery. World Wide Web J. 9(4), 505–536 (2006)

    Article  Google Scholar 

  4. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: Proceedings of the SIAM Data Mining Conference (SDM) (2004)

  5. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)

    MATH  MathSciNet  Google Scholar 

  6. Chen, Y., Qiu, L., Chen, W., Nguyen, L., Katz, R.H.: Efficient and adaptive Web replication using content clustering. IEEE J. Sel. Areas Commun. 21(6), 979–994 (2003)

    Article  Google Scholar 

  7. Eiron, N., McCurley, K.S.: Untangling compound documents on the Web. In: Proceedings of the ACM Conference on Hypertext and hypermedia (HT), Nottingham, 26–30 August 2003

  8. Fan, L., Cao, P., Almeida, J.M., Broder, A.Z.: Summary cache: a scalable wide-area Web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)

    Article  Google Scholar 

  9. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Comput. 35(3), 66–71 (2002)

    Google Scholar 

  10. Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum cut trees. Internet Math. 1(4), 385–408 (2003/2004)

    MathSciNet  Google Scholar 

  11. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)

    Article  Google Scholar 

  12. Fujita, N., Ishikawa, Y., Iwata, A., Izmailov, R.: Coarse-grain replica management strategies for dynamic replication of Web contents. Comput. Networks 45(1), 19–34 (2004)

    Article  MATH  Google Scholar 

  13. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U. S. A. 99(12), 7821–7826 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  14. Greco, G., Greco, S., Zumpano, E.: Web communities: models and algorithms. World Wide Web J. 7(1), 58–82 (2004)

    Article  Google Scholar 

  15. Guimera, R., Sales-Pardo, M., Nunes Amaral, L.A.: Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 70, 025101 (2004)

    Article  Google Scholar 

  16. Haveliwala, T.H.: Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)

    Article  Google Scholar 

  17. He, Y., Siganos, G., Faloutsos, M., Krishnamurthy, S.: Asystematic framework for unearthing the missing links: measurements and impact. In: Proceedings of the USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI), pp. 187–200. Cambridge, USA, 11–13 April 2007

  18. Ino, H., Kudo, M., Nakamura, A.: Partitioning of Web graphs by community topology. In: Proceedings of the ACM World Wide Web Conference (WWW), Chiba, 10–14 May 2005

  19. Jung, Y., Krishnamurthy, B., Rabinovich, M.: Flash crowds and denial of service attacks: characterization and implications for CDNs and Web sites. In: Proceedings of the ACM World Wide Web Conference (WWW), Honolulu, 7–11 May 2002

  20. Kangasharju, J., Roberts, J.W., Ross, K.W.: Object replication strategies in content distribution networks. Comput. Commun. 25(4), 376–383 (2002)

    Article  Google Scholar 

  21. Katsaros, D., Manolopoulos, Y.: Caching in Web memory hierarchies. In: Proceedings of the ACM Symposium on Applied Computing (SAC), Nicosia, 14–17 March 2004

  22. Katsaros, D., Manolopoulos, Y.: Prediction in wireless networks by Markov chains. IEEE Wireless Communications magazine, 2007 (in press)

  23. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  24. Krishnamurthy, B., Wills, C., Zhang, Y.: On the use and performance of content distribution networks. In: Proceedings of the ACM SIGCOMM Internet Measurement Workshop (IMW), San Francisco, 1–2 November 2001

  25. Kroeger, T., Long, D.E., Mogul, J.: Exploring the bounds of Web latency reduction from caching and prefetching. In: Proceedings of the USENIX Symposium on Internet Technologies and Services (USITS), Monterey, 8–11 December 1997

  26. Kulkarni, P., Shenoy, P.J., Gong, W.: Scalable techniques for memory-efficient CDN simulations. In: Proceedings of the ACM World Wide Web Conference (WWW), Budapest, 20–24 May 2003

  27. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for emerging cyber-communities. Comput. Networks 31(11–16), 1481–1493 (1999)

    Article  Google Scholar 

  28. Laoutaris, N., Zissimopoulos V., Stavrakakis, I.: On the optimization of storage capacity allocation for content distribution. Comput. Networks 47(3), 409–428 (2005)

    Article  Google Scholar 

  29. Leung, K.Y., Eric Wong, W.M., Yeung, K.H.: Designing efficient and robust caching algorithms for streaming-on-demand services on the Internet. World Wide Web J. 7(3), 297–314 (2004)

    Article  Google Scholar 

  30. Li, W.-S., Candan, K.S., Vu, Q., Agrawal, D.: Query relaxation by structure and semantics for retrieval of logical Web documents. IEEE Trans. Knowl. Data Eng. 14(4), 768–791 (2002)

    Article  Google Scholar 

  31. Masada, T., Takasu, A., Adachi, J.: Web page grouping based on parameterized connectivity. In: Lee, Y.-J., Li, J., Whang, K.-Y., Lee, D. (eds.) Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), pp. 374–380. Springer, Heidelberg (2004)

    Google Scholar 

  32. Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized Web prefetching. IEEE Trans. Knowl. Data Eng. 15(5), 1155–1169 (2003)

    Article  Google Scholar 

  33. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  34. Ninan, A.G., Kulkarni, P., Shenoy, P.J., Ramamritham, K., Tewari, R.: Scalable consistency maintenance in content distribution networks using cooperative leases. IEEE Trans. Knowl. Data Eng. 15(4), 813–828 (2003)

    Article  Google Scholar 

  35. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University (1999)

  36. Pallis, G., Stamos, K., Vakali, A., Katsaros, D., Sidiropoulos, A., Manolopoulos, Y.: Replication based on objects load under a content distribution network. In: Proceedings of the IEEE International Workshop on Challenges in Web Information Retrieval and Integration (WIRI), Atlanta, 3–7 April 2006

  37. Pallis, G., Vakali, A.: Insight and perspectives for content delivery networks. Commun. ACM 49(1), 101–106 (2006)

    Article  Google Scholar 

  38. Pallis, G., Vakali, A., Stamos, K., Sidiropoulos, A., Katsaros, D., Manolopoulos, Y. A latency-based object placement approach in content distribution networks. In: Proceedings of the IEEE Latin American Web Congress (LA-Web), Buenos Aires, 31 October–2 November, 2005

  39. Pan, J., Hou, Y.T., Li, B.: An overview of DNS-based server selections in content distribution networks. Comput. Networks 43(6), 695–711 (2003)

    Article  MATH  Google Scholar 

  40. Qiu, L., Padmnanabhan, V.N., Voelker, G.M.: On the placement of Web server replicas. In: Proceedings of the IEEE Conference on Computer Communications (INFOCOM), vol. 3, pp. 1587–1596. IEEE, Piscataway (2001)

    Google Scholar 

  41. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. U. S. A. 101(9), 2658–2663 (2004)

    Article  Google Scholar 

  42. Sivasubramanian, S., Pierre, G., van Steen, M., Alonso, G.: Analysis of caching and replication strategies for web applications. IEEE Internet Computing 11(1), 60–66 (2007)

    Article  Google Scholar 

  43. Spertus, E., Sahami, M., Buyukkokten, O.: Evaluating similarity measures: a large-scale study in the Orkut social network. In: Proceeding of the ACM International Conference on Knowledge Discovery in Data Mining (SIGKDD), pp. 678–684 (2005)

  44. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), San Diego, 27–31 August 2001

  45. Tajima, K., Hatano, K., Matsukura, T., Sano, R., Tanaka, K.: Discovery and retrieval of logical information units in Web. In: Proceedings of the ACM Workshop on Organizing Web Space, in conjunction with ACM Digital Libraries (1999)

  46. Vakali, A.: Proxy cache replacement algorithms: a history-based approach. World Wide Web J. 4(4), 277–298 (2001)

    Article  MATH  Google Scholar 

  47. Vakali, A., Pallis, G.: Content delivery networks: status and trends. IEEE Internet Computing 7(6), 68–74 (2003)

    Article  Google Scholar 

  48. Venkataramani, A., Yalagandula, P., Kokku, R., Sharif, S., Dahlin, M.: The potential costs and benefits of long-term prefetching for content distribution. Comput. Commun. 25(4), 367–375 (2002)

    Article  Google Scholar 

  49. Wu, B., Kshemkalyani, A.D.: Objective-greedy algorithms for long-term Web prefetching. In: Proceedings of the IEEE Conference on Network Computing and Applications (NCA), pp. 61–68. IEEE, Piscataway (2004)

    Google Scholar 

  50. Wu, B., Kshemkalyani, A.D.: Objective-optimal algorithms for long-term Web prefetching. IEEE Trans. Comput. 55(1), 2–17 (2006)

    Article  Google Scholar 

  51. Zegura, E., Calvert, K., Bhattacharjee, S.: How to model an internetwork. In: Proceedings of the IEEE Conference on Computer Communications (INFOCOM), pp. 594–602. IEEE, Piscataway (1996)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Pallis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sidiropoulos, A., Pallis, G., Katsaros, D. et al. Prefetching in Content Distribution Networks via Web Communities Identification and Outsourcing. World Wide Web 11, 39–70 (2008). https://doi.org/10.1007/s11280-007-0027-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-007-0027-8

Keywords

Navigation