Skip to main content
Log in

Content-Based Publish/Subscribe System for Web Syndication

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Content syndication has become a popular way for timely delivery of frequently updated information on the Web. Today, web syndication technologies such as RSS or Atom are used in a wide variety of applications spreading from large-scale news broadcasting to medium-scale information sharing in scientific and professional communities. However, they exhibit serious limitations for dealing with information overload in Web 2.0. There is a vital need for efficient real-time filtering methods across feeds, to allow users to effectively follow personally interesting information. We investigate in this paper three indexing techniques for users’ subscriptions based on inverted lists or on an ordered trie for exact and partial matching. We present analytical models for memory requirements and matching time and we conduct a thorough experimental evaluation to exhibit the impact of critical parameters of realistic web syndication workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Hmedeh Z, Vouzoukidou N, Travers N, Christophides V, du Mouza C, Scholl M. Characterizing web syndication behavior and content. In Proc. the 12th WISE, Nov. 2011, pp.29-42.

  2. Pereira J, Fabret F, Llirbat F, Preotiuc-Pietro R, Ross K A, Shasha D. Publish/subscribe on the web at extreme speed. In Proc. the 26th VLDB, Sept. 2000, pp.627-630.

  3. Fabret F, Jacobsen H A, Llirbat F, Pereira J, Ross K A, Shasha D. Filtering algorithms and implementation for very fast publish/subscribe. In Proc. SIGMOD, May 2001, pp.115-126.

  4. Aguilera M K, Strom R E, Sturman D C, Astley M, Chandra T D. Matching events in a content-based subscription system. In Proc. the 8th PODC, Apr. 29-May 6, 1999, pp.53-61.

  5. Zobel J, Moffat A. Inverted files for text search engines. ACM Computing Survey, 2006, 38(2): Article No. 6.

  6. Knuth D E. The Art of Computer Programming, Volume III: Sorting and Searching (2nd edition). Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1998.

  7. Yan T W, Garcia-Molina H. Index structures for selective dissemination of information under the Boolean model. ACM Transactions on Database Systems, 1994, 19(2): 332–364.

    Article  Google Scholar 

  8. König A C, Church K W, Markov M. A data structure for sponsored search. In Proc. the 25th ICDE, Mar. 29-April 2, 2009, pp.90-101.

  9. Bodon F. Surprising results of trie-based FIM algorithms. In Proc. IEEE CIDM Workshop on FIMI, Nov. 2004.

  10. Malik H H, Kender J R. Optimizing frequency queries for data mining applications. In Proc. the 7th ICDM, Oct. 2007, pp.595-600.

  11. Travers N, Hmedeh Z, Vouzoukidou N, du Mouza C, Christophides V, Scholl M. RSS feeds behavior analysis, structure and vocabulary. International Journal of Web Information Systems, 2014, 10(3): 291–320.

    Article  Google Scholar 

  12. Yan T W, Garcia-Molina H. The SIFT information dissemination system. ACM Transactions on Database Systems, 1999, 24(4): 529–565.

    Article  Google Scholar 

  13. Bodon F. A trie-based APRIORI implementation for mining frequent item sequences. In Proc. the 1st Int. Work. Open Source Data Mining (OSDM), Aug. 2005, pp.56-65.

  14. Clément J, Flajolet P, Vallée B. Dynamical sources in information theory: A general analysis of trie structures. Algorithmica, 2001, 29(1): 307–369.

    Article  MathSciNet  MATH  Google Scholar 

  15. Baeza-Yates R A, Ribeiro-Neto B. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

  16. Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613–620.

    Article  MATH  Google Scholar 

  17. Bookstein A, Swanson D. Probabilistic models for automatic indexing. J. Am. Soc. Inf. Sci., 1974, 25(5): 312–316.

    Article  Google Scholar 

  18. Bagwell P. Ideal hash trees. Technical Report LAMPREPORT-2001-001, Ecole Polytechnique Federal de Lausanne, Switzerland, 2001.

  19. Walker A J. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software, 1977, 3(3): 253–256.

    Article  MATH  Google Scholar 

  20. Beitzel S M, Jensen E C, Chowdhury A, Grossman D, Frieder O. Hourly analysis of a very large topically categorized web query log. In Proc. the 27th SIGIR, Jul. 2004, pp.321-328.

  21. Carzaniga A, Wolf A. Forwarding in a content-based network. In Proc. the 17th SIGCOMM, Aug. 2003, pp.163-174.

  22. Kale S, Hazan E, Cao F, Singh J P. Analysis and algorithms for content-based event matching. In Proc. the 25th Int. Conf. Distributed Computing Systems (ICDCS) Workshops, Jun. 2005, pp.363-369.

  23. Wang B, Zhang W, Kitsuregawa M. UB-tree based efficient predicate index with dimension transform for pub/sub system. In Proc. the 9th DASFAA, Mar. 2004, pp.63-74.

  24. Machanavajjhala A, Vee E, Garofalakis M N, Shanmugasundaram J. Scalable ranked publish/subscribe. PVLDB, 2008, 1(1): 451–462.

    Google Scholar 

  25. Sadoghi M, Jacobsen H A. BE-tree: An index structure to efficiently match Boolean expressions over high-dimensional discrete space. In Proc. the 30th SIGMOD, Jun. 2011, pp.637-648.

  26. Whang S, Garcia-Molina H, Brower C, Shanmugasundaram J, Vassilvitskii S, Vee E, Yerneni R. Indexing Boolean expressions. PVLDB, 2009, 2(1): 37–48.

    Google Scholar 

  27. Sadoghi M, Jacobsen H A. Analysis and optimization for Boolean expression indexing. ACM Transactions on Database Systems, 2013, 38(2): Article No. 8.

  28. Sadoghi M, Jacobsen H A. Relevance matters: Capitalizing on less (top-k matching in publish/subscribe). In Proc. the 28th ICDE, Apr. 2012, pp.786-797.

  29. Petrovic M, Liu H, Jacobsen H A. G-ToPSS: Fast filtering of graph-based metadata. In Proc. the 14th WWW, May 2005, pp.539-547.

  30. Liu H, Petrovic M, Jacobsen H. Efficient filtering of RSS documents on computer cluster. Technical Report, MSRG, University of Toronto, Nov. 2007.

  31. Demers A J, Gehrke J, Hong M, Riedewald M, White W M. Towards expressive publish/subscribe systems. In Proc. the 10th EDBT, Mar. 2006, pp.627-644.

  32. Irmak U, Mihaylov S, Suel T, Ganguly S, Izmailov R. Efficient query subscription processing for prospective search engines. In Proc. USENIX, Jun. 2006, pp.375-380.

  33. Shraer A, Gurevich M, Fontoura M, Josifovski V. Top-k publish-subscribe for social annotation of news. PVLDB, 2013, 6(6): 385–396.

    Google Scholar 

  34. Hmedeh Z, du Mouza C, Travers N. TDV-based filter for novelty and diversity in a real-time pub/sub system. In Proc. the 19th IDEAS, Jul. 2015, pp.136-145.

  35. Hmedeh Z, du Mouza C, Travers N. FiND: A real-time filtering by novelty and diversity for publish/subscribe systems. In Proc. the 27th SSDBM, June 29-July 1, 2015.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeinab Hmedeh.

Additional information

A preliminary version of the paper was published in the Proceedings of EDBT 2012.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hmedeh, Z., Kourdounakis, H., Christophides, V. et al. Content-Based Publish/Subscribe System for Web Syndication. J. Comput. Sci. Technol. 31, 359–380 (2016). https://doi.org/10.1007/s11390-016-1632-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1632-8

Keywords

Navigation