Abstract
RSS feeds are text-content rich, semantically heterogeneous, and contain dynamic XML elements streamed in asynchronous and pull strategies. Hence, for efficient retrieval of RSS feeds, semantic-aware querying operators have been proposed in the literature (Getahun and Chbeir in Inf Sci 237(237):313–342, 2013). However, it is commonly admitted that the use of semantic information would improve, on one hand, the relevance of query result but, on the other hand, at the cost of degrading the efficiency and the performance of the system. To benefit from query execution on semantic information while keeping the efficiency of the system, we propose here a multi-query optimization approach for semantic RSS feed queries. Our approach processes queries by examining the semantic relationship between them and their corresponding windows. It generates a multi-query chain for queries using their window relations for faster execution at runtime. In addition, we propose an operator called quickDrop for semantic load shedding to gracefully decrease irrelevant data load. To validate the proposed approach, we developed a prototype and conducted a set of experiments. The obtained results show that the use of our approach significantly improves the performance of the system.
Similar content being viewed by others
Notes
A window boundary is defined in this example using time in GMT to represent the user need for a specific time of a day.
A semantic query [34] is inherently fuzzy, and the user typically expects only a subset of the full results.
Atomic query is a simple query having a source, a predicate with attribute, operator and value.
However, the building process of the value knowledge base is out of the scope of this paper.
WordNet is an online lexical reference system (taxonomy), where nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing a lexical concept [Miller 1990], [WordNet 2005].
\(\ge \) and \(\le \) are considered as single operators put together using the Boolean operator OR.
Relations will be detailed in the next subsection.
RSS 0.92 is upward compatible to RSS 0.91, User land specification http://backend.userland.com/rss09x.
RSS 1.0 is also called RDF Site Summary, and it is a lightweight multipurpose extensible metadata description and syndication format conforms to the W3C’s RDF specification and is extensible via XML namespace and/or RDF-based modularization. http://web.resource.org/rss/1.0/spec.
Precision level, e, is 1-confidence interval, with 95% confidence level and 50% degree of variability [30].
References
Getahun F, Chbeir R (2013) RSS query algebra: towards a better news management. Inf Sci 237(237):313–342
RSS ADVISORY BOARD. RSS 2.0 specification. http://www.rssboard.org/
Fabret F, Jacobsen HA, Llirbat F, Pereira J, Ross KA, Shasha D (2001) Filtering algorithms and implementation for very fast publish/subcribe. In: SIGMOD, pp 115–126
Hammad MA, Franklin MJ, Aref WG, Elmagarmid AK (2003) Scheduling for shared window joins over data streams. In: VLDB, pp 297–308
Madden SR, Shah MA, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: SIGMOD, pp 49–60
Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD, pp 299–310
Chi Y, Wang H, Yu PS, Muntz RR (2005) Loadstar: a load shedding scheme for classifying data streams. In: SIAM conference on data mining, pp 1302–1305
Garofalakis M, Gibbons P (2001) Approximate query processing: taming the megabytes. In: VLDB, Rome
Hellerstein J, Haas P, Wang H (1997) Online aggregation. In: SIGMOD, Tucson, pp 171–182
SELLIS TK (1988) Multiple-query optimization. ACM Trans Database Syst 13(1):23–52
Jarke M (1985) Common subexpression isolation in multiple query optimization. Springer, Berlin, pp 191–205
Chakravarthy, US, Minker J (1986) Multiple query processing in deductive databases using query graphs. In: Proceedings of the 12th international conference on very large data bases, San Francisco, CA, pp 384–391
Munagala K, Srivastava U, Widom J (2007) Optimization of continuous queries with shared expensive filters. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 215–224. https://doi.org/10.1145/1265530.1265561
Arvind A, Jennifer W (2004) Resource sharing in continuous sliding-window aggregates. Technical Report
Song W, Elke R, Samrat G, Sudeept B (2006) StateSlice: new paradigm of multi-query optimization of window based stream queries. In: VLDB, pp 619–630
Mingsheng H, Alan D, Johannes G (2007) Massively multi-query join processing in pub-lish/subscribe systems. In: SIGMOD, pp 761–772
Krishnamurthy S, Wu C, Franklin M (2006) On-the-fly sharing for streamed aggregation. In: SIGMOD, pp 623–634
Li J, David M, Kristin T, Vassilis P, Peter A (2005) No pane, no gain: efficient evaluation of sliding window aggregates over data streams. In: SIGMOD, pp 39–44
Shenoda G, Mohamed A, Panos K, Alexandros L (2011) Optimized processing of multiple aggregate continuous queries. In: CIKM, pp 1515–1524
Moustafa A, Michael J, Walid G, Ahmed K (2003) Scheduling for shared window joins over data streams. In: VLDB, pp 297–308
Nesime T, Uger C, Stan Z (2003) Load shedding on data streams. In: VLDB, pp 674–683
Reiss F, Hellerstein J (2005) Data triage: an adaptive architecture for load shedding in telegraphcq. In: IEEE ICDE, Tokyo, pp 155–156
Brian B, Mayur D, Rajeev M (2004) Load shedding for aggregation queries over data streams. In: ICDE, pp 155–156
Robie J, Chamberlin D, Dyck M, Snelson J (2009) World wide web consortium (W3C). http://www.w3.org/TR/xquery-11/
Getahun F, Tekli J, Atnafu S, Chbeir R (2007) Towards efficient horizontal multimedia database fragmentation using semantic-based predicates implication. In: SBBD 2007, pp 68–82
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, arXiv: 1301.3781
Brill E (1992) A simple rule based part of speech tagger. In: Applied natural language processing (ACL), pp 152–155
Getahun F, Tekli J, Viviani M, Chbeir R, Yetongnon K (2009) Towards semantic-based RSS merging. In: International symposium on intelligent interactive multimedia systems and services, pp 53–64
Getahun F, Tekli J, Chbeir R, Viviani M, Yétongnon K (2009) Relating RSS news/items. In: 9th international conference on web engineering ICWE 2009, San Sebastian, Spain, pp 442–45
Yamane T (1967) Statistics an introductory analysis, 2nd edn. Harper and Row, New York
WordNet 2.1. (2005) A lexical database of the english language. http://wordnet.princeton.edu/online/
1 Billion Word Language Model Benchmark (2017) statmt. http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz
Gulli A (2004) AG’s corpus of news articles. http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html
Lim L, Wang H, Wang M (2013) Semantic queries by example. In: Proceedings of the 16th international conference on extending database technology, no. 978-1-4503-1597-5, pp 347–358. https://doi.org/10.1145/2452376.2452417
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Getahun, F., Chbeir, R. Multi-Query Optimization on RSS Feeds. J Data Semant 7, 47–64 (2018). https://doi.org/10.1007/s13740-018-0085-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-018-0085-3