Skip to main content
Log in

Multi-Query Optimization on RSS Feeds

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

RSS feeds are text-content rich, semantically heterogeneous, and contain dynamic XML elements streamed in asynchronous and pull strategies. Hence, for efficient retrieval of RSS feeds, semantic-aware querying operators have been proposed in the literature (Getahun and Chbeir in Inf Sci 237(237):313–342, 2013). However, it is commonly admitted that the use of semantic information would improve, on one hand, the relevance of query result but, on the other hand, at the cost of degrading the efficiency and the performance of the system. To benefit from query execution on semantic information while keeping the efficiency of the system, we propose here a multi-query optimization approach for semantic RSS feed queries. Our approach processes queries by examining the semantic relationship between them and their corresponding windows. It generates a multi-query chain for queries using their window relations for faster execution at runtime. In addition, we propose an operator called quickDrop for semantic load shedding to gracefully decrease irrelevant data load. To validate the proposed approach, we developed a prototype and conducted a set of experiments. The obtained results show that the use of our approach significantly improves the performance of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. A window boundary is defined in this example using time in GMT to represent the user need for a specific time of a day.

  2. A semantic query [34] is inherently fuzzy, and the user typically expects only a subset of the full results.

  3. Atomic query is a simple query having a source, a predicate with attribute, operator and value.

  4. However, the building process of the value knowledge base is out of the scope of this paper.

  5. WordNet is an online lexical reference system (taxonomy), where nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing a lexical concept [Miller 1990], [WordNet 2005].

  6. \(\ge \) and \(\le \) are considered as single operators put together using the Boolean operator OR.

  7. Relations will be detailed in the next subsection.

  8. RSS 0.92 is upward compatible to RSS 0.91, User land specification http://backend.userland.com/rss09x.

  9. RSS 1.0 is also called RDF Site Summary, and it is a lightweight multipurpose extensible metadata description and syndication format conforms to the W3C’s RDF specification and is extensible via XML namespace and/or RDF-based modularization. http://web.resource.org/rss/1.0/spec.

  10. Precision level, e, is 1-confidence interval, with 95% confidence level and 50% degree of variability [30].

  11. http://www.politico.com/news/cnn.

  12. http://www.bbc.co.uk/news.

  13. http://www.newsweek.com/.

  14. http://www.politico.com/news/the-washington-post.

  15. https://ca.reuters.com/news/topNews.

  16. https://www.theguardian.com/theguardian/mainsection/topstories.

  17. http://time.com/

References

  1. Getahun F, Chbeir R (2013) RSS query algebra: towards a better news management. Inf Sci 237(237):313–342

    Article  MATH  Google Scholar 

  2. RSS ADVISORY BOARD. RSS 2.0 specification. http://www.rssboard.org/

  3. Fabret F, Jacobsen HA, Llirbat F, Pereira J, Ross KA, Shasha D (2001) Filtering algorithms and implementation for very fast publish/subcribe. In: SIGMOD, pp 115–126

  4. Hammad MA, Franklin MJ, Aref WG, Elmagarmid AK (2003) Scheduling for shared window joins over data streams. In: VLDB, pp 297–308

  5. Madden SR, Shah MA, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: SIGMOD, pp 49–60

  6. Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD, pp 299–310

  7. Chi Y, Wang H, Yu PS, Muntz RR (2005) Loadstar: a load shedding scheme for classifying data streams. In: SIAM conference on data mining, pp 1302–1305

  8. Garofalakis M, Gibbons P (2001) Approximate query processing: taming the megabytes. In: VLDB, Rome

  9. Hellerstein J, Haas P, Wang H (1997) Online aggregation. In: SIGMOD, Tucson, pp 171–182

  10. SELLIS TK (1988) Multiple-query optimization. ACM Trans Database Syst 13(1):23–52

    Article  Google Scholar 

  11. Jarke M (1985) Common subexpression isolation in multiple query optimization. Springer, Berlin, pp 191–205

    Google Scholar 

  12. Chakravarthy, US, Minker J (1986) Multiple query processing in deductive databases using query graphs. In: Proceedings of the 12th international conference on very large data bases, San Francisco, CA, pp 384–391

  13. Munagala K, Srivastava U, Widom J (2007) Optimization of continuous queries with shared expensive filters. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 215–224. https://doi.org/10.1145/1265530.1265561

  14. Arvind A, Jennifer W (2004) Resource sharing in continuous sliding-window aggregates. Technical Report

  15. Song W, Elke R, Samrat G, Sudeept B (2006) StateSlice: new paradigm of multi-query optimization of window based stream queries. In: VLDB, pp 619–630

  16. Mingsheng H, Alan D, Johannes G (2007) Massively multi-query join processing in pub-lish/subscribe systems. In: SIGMOD, pp 761–772

  17. Krishnamurthy S, Wu C, Franklin M (2006) On-the-fly sharing for streamed aggregation. In: SIGMOD, pp 623–634

  18. Li J, David M, Kristin T, Vassilis P, Peter A (2005) No pane, no gain: efficient evaluation of sliding window aggregates over data streams. In: SIGMOD, pp 39–44

  19. Shenoda G, Mohamed A, Panos K, Alexandros L (2011) Optimized processing of multiple aggregate continuous queries. In: CIKM, pp 1515–1524

  20. Moustafa A, Michael J, Walid G, Ahmed K (2003) Scheduling for shared window joins over data streams. In: VLDB, pp 297–308

  21. Nesime T, Uger C, Stan Z (2003) Load shedding on data streams. In: VLDB, pp 674–683

  22. Reiss F, Hellerstein J (2005) Data triage: an adaptive architecture for load shedding in telegraphcq. In: IEEE ICDE, Tokyo, pp 155–156

  23. Brian B, Mayur D, Rajeev M (2004) Load shedding for aggregation queries over data streams. In: ICDE, pp 155–156

  24. Robie J, Chamberlin D, Dyck M, Snelson J (2009) World wide web consortium (W3C). http://www.w3.org/TR/xquery-11/

  25. Getahun F, Tekli J, Atnafu S, Chbeir R (2007) Towards efficient horizontal multimedia database fragmentation using semantic-based predicates implication. In: SBBD 2007, pp 68–82

  26. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, arXiv: 1301.3781

  27. Brill E (1992) A simple rule based part of speech tagger. In: Applied natural language processing (ACL), pp 152–155

  28. Getahun F, Tekli J, Viviani M, Chbeir R, Yetongnon K (2009) Towards semantic-based RSS merging. In: International symposium on intelligent interactive multimedia systems and services, pp 53–64

  29. Getahun F, Tekli J, Chbeir R, Viviani M, Yétongnon K (2009) Relating RSS news/items. In: 9th international conference on web engineering ICWE 2009, San Sebastian, Spain, pp 442–45

  30. Yamane T (1967) Statistics an introductory analysis, 2nd edn. Harper and Row, New York

    MATH  Google Scholar 

  31. WordNet 2.1. (2005) A lexical database of the english language. http://wordnet.princeton.edu/online/

  32. 1 Billion Word Language Model Benchmark (2017) statmt. http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz

  33. Gulli A (2004) AG’s corpus of news articles. http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html

  34. Lim L, Wang H, Wang M (2013) Semantic queries by example. In: Proceedings of the 16th international conference on extending database technology, no. 978-1-4503-1597-5, pp 347–358. https://doi.org/10.1145/2452376.2452417

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fekade Getahun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Getahun, F., Chbeir, R. Multi-Query Optimization on RSS Feeds. J Data Semant 7, 47–64 (2018). https://doi.org/10.1007/s13740-018-0085-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-018-0085-3

Keywords

Navigation