Skip to main content
Log in

Distributed top-k full-text content dissemination

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The World Wide Web (WWW) is experiencing an explosive growth of live content. To timely disseminate end users with fresh content that is relevant to users’ interests is a very useful but challenging task. In this paper we design a distributed top-k document dissemination scheme: by inputting query terms and a number k as subscription conditions, end users subscribe to the top-k most relevant documents. When the documents and subscription conditions are distributed, the difficulties are how to forward the personalized top-k documents to the needed users with low forwarding cost. To this end, we propose document forwarding and filtering algorithms on a set of dedicate servers. Experiments based on real query logs and document datasets show the efficiency of our proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Fig. 4
Algorithm 3
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. http://alert.live.com

  2. http://www.google.com/alerts

  3. Babcock, B., Olston, C.: Distributed top-k monitoring. In: SIGMOD Conference, pp. 28–39 (2003)

    Google Scholar 

  4. Beaver, J., Pruhs, K., Chrysanthis, P., Liberatore, V.: Improving the hybrid data dissemination model of web documents. World Wide Web J. 11, 313–337 (2008). doi:10.1007/s11280-007-0039-4

    Article  Google Scholar 

  5. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  6. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  7. Callan, J.P.: Document filtering with inference networks. In: SIGIR, pp. 262–269 (1996)

    Google Scholar 

  8. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC, pp. 206–215 (2004)

    Chapter  Google Scholar 

  9. Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19(3), 332–383 (2001)

    Article  Google Scholar 

  10. Das, G., Gunopulos, D., Koudas, N., Sarkas, N.: Ad-hoc top-k query answering for data streams. In: VLDB, pp. 183–194 (2007)

    Google Scholar 

  11. Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: SIGMOD Conference, pp. 115–126 (2001)

    Google Scholar 

  12. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fan, L., Cao, P., Almeida, J.M., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)

    Article  Google Scholar 

  14. Garey, M.R., Johnson, D.S.: Computers and Intractability: a Guide to the Theory of NP-Completeness. Freeman, New York (1979)

    MATH  Google Scholar 

  15. Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)

    Google Scholar 

  16. Gupta, A., Sahin, O.D., Agrawal, D., Abbadi, A.E.: Meghdoot: content-based publish/subscribe over p2p networks. In: Middleware, pp. 254–273 (2004)

    Google Scholar 

  17. Haghani, P., Michel, S., Aberer, K.: The gist of everything new: personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498 (2010)

    Google Scholar 

  18. Kostic, D., Rodriguez, A., Albrecht, J.R., Vahdat, A.: Bullet: high bandwidth data dissemination using an overlay mesh. In: SOSP, pp. 282–297 (2003)

    Google Scholar 

  19. Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: WWW, pp. 551–560 (2007)

    Chapter  Google Scholar 

  20. Liu, H., Ramasubramanian, V., Sirer, E.G.: Client behavior and feed characteristics of rss, a publish-subscribe system for web micronews. In: Internet Measurement Conference, pp. 29–34 (2005)

    Google Scholar 

  21. Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)

    Google Scholar 

  22. Milo, T., Zur, T., Verbin, E.: Boosting topic-based publish-subscribe systems with dynamic clustering. In: SIGMOD Conference, pp. 749–760 (2007)

    Google Scholar 

  23. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD Conference, pp. 635–646 (2006)

    Google Scholar 

  24. Mouratidis, K., Pang, H.: An incremental threshold method for continuous text search queries. In: ICDE, pp. 1187–1190 (2009)

    Google Scholar 

  25. Mouratidis, K., Pang, H.: Efficient evaluation of continuous text search queries. IEEE Trans. Knowl. Data Eng. 23(10), 1469–1482 (2011)

    Article  Google Scholar 

  26. Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)

    Google Scholar 

  27. Ramasubramanian, V., Peterson, R., Sirer, E.G.: Corona: a high performance publish-subscribe system for the world wide web. In: NSDI (2006)

    Google Scholar 

  28. Rao, W., Chen, L.: A distributed full-text top-k document dissemination system in distributed hash tables. World Wide Web J. 14(5–6), 545–572 (2011)

    Article  Google Scholar 

  29. Rao, W., Chen, L., Fu, A.W.-C.: STAIRS: towards efficient full-text filtering and dissemination in DHT environments. VLDB J. (2011). doi:10.1007/s00778-011-0224-z

    Google Scholar 

  30. Rao, W., Chen, L., Fu, A.W.-C., Bu, Y.: Optimal proactive caching in peer-to-peer network: analysis and application. In: CIKM, pp. 663–672 (2007)

    Google Scholar 

  31. Rao, W., Chen, L., Fu, A.W.-C., Chen, H., Zou, F.: On efficient content matching in distributed pub/sub systems. In: INFOCOM, pp. 756–764 (2009)

    Google Scholar 

  32. Rao, W., Chen, L., Fu, A.W.-C., Wang, G.: Optimal resource placement in structured peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 21(7), 1011–1026 (2010)

    Article  Google Scholar 

  33. Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM (2001)

    Google Scholar 

  34. Roitman, H., Carmel, D., Yom-Tov, E.: Maintaining dynamic channel profiles on the web. In: PVLDB, pp. 151–162 (2008)

    Google Scholar 

  35. Rose, I., Murty, R., Pietzuch, P.R., Ledlie, J., Roussopoulos, M., Welsh, M.: Cobra: content-based filtering and aggregation of blogs and rss feeds. In: NSDI (2007)

    Google Scholar 

  36. Sandler, D., Mislove, A., Post, A., Druschel, P.: Feedtree: sharing web micronews with peer-to-peer event notification. In: IPTPS, pp. 141–151 (2005)

    Google Scholar 

  37. Sidiropoulos, A., Pallis, G., Katsaros, D., Stamos, K., Vakali, A., Manolopoulos, Y.: Prefetching in content distribution networks via web communities identification and outsourcing. World Wide Web J. 11(1), 39–70 (2008)

    Article  Google Scholar 

  38. Snoeren, A.C., Conley, K., Gifford, D.K.: Mesh based content routing using xml. In: SOSP, pp. 160–173 (2001)

    Chapter  Google Scholar 

  39. Yan, T.W., Garcia-Molina, H.: The sift information dissemination system. ACM Trans. Database Syst. 24(4), 529–565 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weixiong Rao.

Additional information

Communicated by Kaushik Chakrabarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, W., Chen, L. Distributed top-k full-text content dissemination. Distrib Parallel Databases 30, 273–301 (2012). https://doi.org/10.1007/s10619-012-7096-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-012-7096-0

Keywords

Navigation