Abstract
Recent years witnessed the explosive growth of ‘live’ web content in the World Wide Web like Weblogs, RSS feeds, and real-time news, etc. The popular usage of RSS feeds/readers enables end users to subscribe for favorite contents via input RSS URLs. However, the RSS feeds/readers architecture suffers from (i) the high bandwidth consumption issue, and (ii) limited filtering semantics. In this paper, we proposed a stateful full text dissemination scheme over structured P2Ps to address both issues. Specifically, for the semantic side, end users are allowed to subscribe for favorite contents via input keywords; for the network bandwidth side, the cooperative content polling, filtering and disseminating via DHT-based P2P overlay networks save the network bandwidth consumption. Our contributions include the novel techniques to (i) reduce the unit-publishing cost by pruning irreverent documents during the forwarding path towards destinations, and (ii) reduce the publication amount by selecting a very small number of meaningful terms. Based on real data sets, our experimental results show that the proposed scheme can significantly reduce the publishing cost with low maintenance overhead and a high document quality.
Similar content being viewed by others
References
Abiteboul, S., Manolescu, I., Polyzotis, N., Preda, N., Sun, C.: Xml processing in dht networks. In: ICDE, pp. 606–615 (2008)
Baldoni, R., Marchetti, C., Virgillito, A., Vitenberg, R.: Content-based publish-subscribe over structured overlay networks. In: icdcs (2005)
Banavar, G., Chandra, T.D., Mukherjee, B., Nagarajarao, J., Strom, R.E., Sturman, D.C.: An efficient multicast protocol for content-based publish-subscribe systems. In: ICDCS, pp. 262–272 (1999)
Callan, J.P.: Document filtering with inference networks. In: SIGIR, pp. 262–269 (1996)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC, pp. 206–215 (2004)
Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19(3), 332–383 (2001)
Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD Conference, pp. 115–126 (2001)
Fagin, R. Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Gupta, A., Sahin, O.D., Agrawal, D., Abbadi, A.E.: Meghdoot: content-based publish/subscribe over p2p networks. In: Middleware, pp. 254–273 (2004)
Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: WWW, pp. 551–560 (2007)
Lillis, K., Pitoura, E.: Cooperative xpath caching. In: SIGMOD Conference, pp. 327–338 (2008)
Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)
Ramasubramanian, V., Peterson, R., Sirer, E. G.: Corona: a high performance publish-subscribe system for the world wide web. In: NSDI (2006)
Rao, W., Chen, L., Fu, A.W.-C., Bu, Y.: Optimal proactive caching in peer-to-peer network: analysis and application. In: CIKM, pp. 663–672 (2007)
Rao, W., Chen, L., Fu, A.W.: On efficient content matching in distributed pub/sub systems. In: INFOCOM (2009)
Rao, W., Fu, A.W.-C., Chen, L., Chen, H.: Stairs: towards efficient full-text filtering and dissemination in a dht environment. In: ICDE (2009)
Rao, W., Chen, L., Fu, A.W.-C., Wang, G.: Optimal resource placement in structured peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 21(7), 1011–1026 (2010)
Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM (2001)
Rose, I., Murty, R., Pietzuch, P.R., Ledlie, J., Roussopoulos, M., Welsh, M.: Cobra: content-based filtering and aggregation of blogs and rss feeds. In: NSDI (2007)
Rowstron, A.I.T., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Middleware (2001)
Sandler, D., Mislove, A., Post, A., Druschel, P.: Feedtree: Sharing web micronews with peer-to-peer event notification. In: IPTPS, pp. 141–151 (2005)
Stoica, I., Morris, R., Karger, D.R., M. Frans Kaashoek, and Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. In: SIGCOMM (2001)
Tang, C., Xu, Z.: pFilter: global information filtering and dissemination using structured overlay networks. In: FTDCS, pp. 24–30 (2003)
Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: NSDI, pp. 211–224 (2004)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM (2003)
Tryfonopoulos, C., Koubarakis, M., Drougas, Y.: Information filtering and query indexing for an information retrieval model. ACM Trans. Inf. Syst. 27(2), 1–47 (2009)
Yang, Y., Dunlap, R., Rexroad, M., Cooper, B.F.: Performance of full text search in structured and unstructured peer-to-peer systems. In: INFOCOM (2006)
Xu, Q., Shen, H.T., Cui, B., Hou, X., Dai, Y.: A novel content distribution mechanism in dht networks. In: Proceedings of the 8th International IFIP-TC 6 Networking Conference, pp. 742–755 (2009)
Yalagandula, P., Dahlin, M.: A scalable distributed information management system. In: SIGCOMM, pp. 379–390 (2004)
Yan, T.W., Garcia-Molina, H.: The sift information dissemination system. ACM Trans. Database Syst. 24(4), 529–565 (1999)
Zhao, B.Y., Kubiatowicz, J., Joseph, A.D.: Tapestry: a fault-tolerant wide-area application infrastructure. Comput. Commun. Rev. 32(1), 81 (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rao, W., Chen, L. A distributed full-text top-k document dissemination system in distributed hash tables. World Wide Web 14, 545–572 (2011). https://doi.org/10.1007/s11280-010-0106-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-010-0106-0