Skip to main content
Log in

A distributed full-text top-k document dissemination system in distributed hash tables

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Recent years witnessed the explosive growth of ‘live’ web content in the World Wide Web like Weblogs, RSS feeds, and real-time news, etc. The popular usage of RSS feeds/readers enables end users to subscribe for favorite contents via input RSS URLs. However, the RSS feeds/readers architecture suffers from (i) the high bandwidth consumption issue, and (ii) limited filtering semantics. In this paper, we proposed a stateful full text dissemination scheme over structured P2Ps to address both issues. Specifically, for the semantic side, end users are allowed to subscribe for favorite contents via input keywords; for the network bandwidth side, the cooperative content polling, filtering and disseminating via DHT-based P2P overlay networks save the network bandwidth consumption. Our contributions include the novel techniques to (i) reduce the unit-publishing cost by pruning irreverent documents during the forwarding path towards destinations, and (ii) reduce the publication amount by selecting a very small number of meaningful terms. Based on real data sets, our experimental results show that the proposed scheme can significantly reduce the publishing cost with low maintenance overhead and a high document quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Manolescu, I., Polyzotis, N., Preda, N., Sun, C.: Xml processing in dht networks. In: ICDE, pp. 606–615 (2008)

  2. Baldoni, R., Marchetti, C., Virgillito, A., Vitenberg, R.: Content-based publish-subscribe over structured overlay networks. In: icdcs (2005)

  3. Banavar, G., Chandra, T.D., Mukherjee, B., Nagarajarao, J., Strom, R.E., Sturman, D.C.: An efficient multicast protocol for content-based publish-subscribe systems. In: ICDCS, pp. 262–272 (1999)

  4. Callan, J.P.: Document filtering with inference networks. In: SIGIR, pp. 262–269 (1996)

  5. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC, pp. 206–215 (2004)

  6. Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19(3), 332–383 (2001)

    Article  Google Scholar 

  7. Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD Conference, pp. 115–126 (2001)

  8. Fagin, R. Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Gupta, A., Sahin, O.D., Agrawal, D., Abbadi, A.E.: Meghdoot: content-based publish/subscribe over p2p networks. In: Middleware, pp. 254–273 (2004)

  10. Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: WWW, pp. 551–560 (2007)

  11. Lillis, K., Pitoura, E.: Cooperative xpath caching. In: SIGMOD Conference, pp. 327–338 (2008)

  12. Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)

  13. Ramasubramanian, V., Peterson, R., Sirer, E. G.: Corona: a high performance publish-subscribe system for the world wide web. In: NSDI (2006)

  14. Rao, W., Chen, L., Fu, A.W.-C., Bu, Y.: Optimal proactive caching in peer-to-peer network: analysis and application. In: CIKM, pp. 663–672 (2007)

  15. Rao, W., Chen, L., Fu, A.W.: On efficient content matching in distributed pub/sub systems. In: INFOCOM (2009)

  16. Rao, W., Fu, A.W.-C., Chen, L., Chen, H.: Stairs: towards efficient full-text filtering and dissemination in a dht environment. In: ICDE (2009)

  17. Rao, W., Chen, L., Fu, A.W.-C., Wang, G.: Optimal resource placement in structured peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 21(7), 1011–1026 (2010)

    Article  Google Scholar 

  18. Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM (2001)

  19. Rose, I., Murty, R., Pietzuch, P.R., Ledlie, J., Roussopoulos, M., Welsh, M.: Cobra: content-based filtering and aggregation of blogs and rss feeds. In: NSDI (2007)

  20. Rowstron, A.I.T., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Middleware (2001)

  21. Sandler, D., Mislove, A., Post, A., Druschel, P.: Feedtree: Sharing web micronews with peer-to-peer event notification. In: IPTPS, pp. 141–151 (2005)

  22. Stoica, I., Morris, R., Karger, D.R., M. Frans Kaashoek, and Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. In: SIGCOMM (2001)

  23. Tang, C., Xu, Z.: pFilter: global information filtering and dissemination using structured overlay networks. In: FTDCS, pp. 24–30 (2003)

  24. Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: NSDI, pp. 211–224 (2004)

  25. Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM (2003)

  26. Tryfonopoulos, C., Koubarakis, M., Drougas, Y.: Information filtering and query indexing for an information retrieval model. ACM Trans. Inf. Syst. 27(2), 1–47 (2009)

    Article  Google Scholar 

  27. Yang, Y., Dunlap, R., Rexroad, M., Cooper, B.F.: Performance of full text search in structured and unstructured peer-to-peer systems. In: INFOCOM (2006)

  28. Xu, Q., Shen, H.T., Cui, B., Hou, X., Dai, Y.: A novel content distribution mechanism in dht networks. In: Proceedings of the 8th International IFIP-TC 6 Networking Conference, pp. 742–755 (2009)

  29. Yalagandula, P., Dahlin, M.: A scalable distributed information management system. In: SIGCOMM, pp. 379–390 (2004)

  30. Yan, T.W., Garcia-Molina, H.: The sift information dissemination system. ACM Trans. Database Syst. 24(4), 529–565 (1999)

    Article  Google Scholar 

  31. Zhao, B.Y., Kubiatowicz, J., Joseph, A.D.: Tapestry: a fault-tolerant wide-area application infrastructure. Comput. Commun. Rev. 32(1), 81 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weixiong Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, W., Chen, L. A distributed full-text top-k document dissemination system in distributed hash tables. World Wide Web 14, 545–572 (2011). https://doi.org/10.1007/s11280-010-0106-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0106-0

Keywords

Navigation