Abstract
The World Wide Web (WWW) is experiencing an explosive growth of live content. To timely disseminate end users with fresh content that is relevant to users’ interests is a very useful but challenging task. In this paper we design a distributed top-k document dissemination scheme: by inputting query terms and a number k as subscription conditions, end users subscribe to the top-k most relevant documents. When the documents and subscription conditions are distributed, the difficulties are how to forward the personalized top-k documents to the needed users with low forwarding cost. To this end, we propose document forwarding and filtering algorithms on a set of dedicate servers. Experiments based on real query logs and document datasets show the efficiency of our proposed algorithms.
Similar content being viewed by others
References
Babcock, B., Olston, C.: Distributed top-k monitoring. In: SIGMOD Conference, pp. 28–39 (2003)
Beaver, J., Pruhs, K., Chrysanthis, P., Liberatore, V.: Improving the hybrid data dissemination model of web documents. World Wide Web J. 11, 313–337 (2008). doi:10.1007/s11280-007-0039-4
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)
Callan, J.P.: Document filtering with inference networks. In: SIGIR, pp. 262–269 (1996)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC, pp. 206–215 (2004)
Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19(3), 332–383 (2001)
Das, G., Gunopulos, D., Koudas, N., Sarkas, N.: Ad-hoc top-k query answering for data streams. In: VLDB, pp. 183–194 (2007)
Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD Conference, pp. 115–126 (2001)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Fan, L., Cao, P., Almeida, J.M., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)
Garey, M.R., Johnson, D.S.: Computers and Intractability: a Guide to the Theory of NP-Completeness. Freeman, New York (1979)
Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)
Gupta, A., Sahin, O.D., Agrawal, D., Abbadi, A.E.: Meghdoot: content-based publish/subscribe over p2p networks. In: Middleware, pp. 254–273 (2004)
Haghani, P., Michel, S., Aberer, K.: The gist of everything new: personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498 (2010)
Kostic, D., Rodriguez, A., Albrecht, J.R., Vahdat, A.: Bullet: high bandwidth data dissemination using an overlay mesh. In: SOSP, pp. 282–297 (2003)
Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: WWW, pp. 551–560 (2007)
Liu, H., Ramasubramanian, V., Sirer, E.G.: Client behavior and feed characteristics of rss, a publish-subscribe system for web micronews. In: Internet Measurement Conference, pp. 29–34 (2005)
Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)
Milo, T., Zur, T., Verbin, E.: Boosting topic-based publish-subscribe systems with dynamic clustering. In: SIGMOD Conference, pp. 749–760 (2007)
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD Conference, pp. 635–646 (2006)
Mouratidis, K., Pang, H.: An incremental threshold method for continuous text search queries. In: ICDE, pp. 1187–1190 (2009)
Mouratidis, K., Pang, H.: Efficient evaluation of continuous text search queries. IEEE Trans. Knowl. Data Eng. 23(10), 1469–1482 (2011)
Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Ramasubramanian, V., Peterson, R., Sirer, E.G.: Corona: a high performance publish-subscribe system for the world wide web. In: NSDI (2006)
Rao, W., Chen, L.: A distributed full-text top-k document dissemination system in distributed hash tables. World Wide Web J. 14(5–6), 545–572 (2011)
Rao, W., Chen, L., Fu, A.W.-C.: STAIRS: towards efficient full-text filtering and dissemination in DHT environments. VLDB J. (2011). doi:10.1007/s00778-011-0224-z
Rao, W., Chen, L., Fu, A.W.-C., Bu, Y.: Optimal proactive caching in peer-to-peer network: analysis and application. In: CIKM, pp. 663–672 (2007)
Rao, W., Chen, L., Fu, A.W.-C., Chen, H., Zou, F.: On efficient content matching in distributed pub/sub systems. In: INFOCOM, pp. 756–764 (2009)
Rao, W., Chen, L., Fu, A.W.-C., Wang, G.: Optimal resource placement in structured peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 21(7), 1011–1026 (2010)
Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM (2001)
Roitman, H., Carmel, D., Yom-Tov, E.: Maintaining dynamic channel profiles on the web. In: PVLDB, pp. 151–162 (2008)
Rose, I., Murty, R., Pietzuch, P.R., Ledlie, J., Roussopoulos, M., Welsh, M.: Cobra: content-based filtering and aggregation of blogs and rss feeds. In: NSDI (2007)
Sandler, D., Mislove, A., Post, A., Druschel, P.: Feedtree: sharing web micronews with peer-to-peer event notification. In: IPTPS, pp. 141–151 (2005)
Sidiropoulos, A., Pallis, G., Katsaros, D., Stamos, K., Vakali, A., Manolopoulos, Y.: Prefetching in content distribution networks via web communities identification and outsourcing. World Wide Web J. 11(1), 39–70 (2008)
Snoeren, A.C., Conley, K., Gifford, D.K.: Mesh based content routing using xml. In: SOSP, pp. 160–173 (2001)
Yan, T.W., Garcia-Molina, H.: The sift information dissemination system. ACM Trans. Database Syst. 24(4), 529–565 (1999)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Kaushik Chakrabarti.
Rights and permissions
About this article
Cite this article
Rao, W., Chen, L. Distributed top-k full-text content dissemination. Distrib Parallel Databases 30, 273–301 (2012). https://doi.org/10.1007/s10619-012-7096-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7096-0