ABSTRACT
This paper proposes a new approach for the the incremental evaluation of RDF graph streams over sliding windows. Our system, called "SPECTRA", combines a novel formof RDF graph summarisation, a new incremental evaluation method and adaptive indexing techniques. We materialise the summarised graph from each event using vertically partitioned views to facilitate the fast hash-joins for all types of queries. Our incremental and adaptive indexing is a byproduct of query processing, and thus provides considerable advantages over offline and online indexing. Furthermore, contrary to the existing approaches, we employ incremental evaluation of triples within a window. This results in considerable reduction in response time, while cutting the unnecessary cost imposed by recomputation models for each triple insertion and eviction within a defined window. We show that our resulting system is able to cope with complex queries and datasets with clear benefits. Our experimental results on both synthetic and real-world datasets show up to an order of magnitude of performance improvements as compared to state-of-the-art systems.
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Sw-store: A vertically partitioned DBMS for semantic web data management. The VLDB Journal, 18(2):385--406, Apr. 2009. Google ScholarDigital Library
- A. Arasu, S. Babu, and J. Widom. The cql continuous query language: Semantic foundations and query execution. The VLDB Journal, 15:121--142, 2006. Google ScholarDigital Library
- M. Arias and J. D. Fernández. An empirical study of real-world SPARQL queries. CoRR, abs/1103.5043, 2011.Google Scholar
- M. Atre and Chaoji. Matrix "bit" loaded: A scalable lightweight join query processor for RDF data. In WWW, pages 41--50, 2010. Google ScholarDigital Library
- R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pages 261--272, 2000. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In SIGMOD-SIGACT-SIGART, pages 1--16, 2002. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In SIGMOD-PODS, pages 1--16, 2002. Google ScholarDigital Library
- D. F. Barbieri and Braga. C-SPARQL: Sparql for continuous querying. In WWW, pages 1061--1062, 2009. Google ScholarDigital Library
- H. R. Bazoobandi, S. Rooij, F. Harmelen, and H. Bal. A compact in-memory dictionary for RDF data. In ESWC, pages 205--220, 2015. Google ScholarDigital Library
- J. Broekstra and Kampman. Sesame: A generic architecture for storing and querying RDF and RDF schema. In ISWC, pages 54--68, 2002. Google ScholarDigital Library
- J.-P. Calbimonte, O. Corcho, and A. J. G. Gray. Enabling ontology-based access to streaming data sources. In ISWC, pages 96--111, 2010. Google ScholarDigital Library
- S. Chaudhuri and V. Narasayya. Self-tuning database systems: A decade of progress. In VLDB, pages 3--14, 2007. Google ScholarDigital Library
- L. Chen and C. Wang. Continuous subgraph pattern search over certain and uncertain graph streams. In IEEE Trans on Know. and Data Eng., pages 1093--1109, 2010. Google ScholarDigital Library
- S. Choudhury, L. B. Holder, G. C. Jr., K. Agarwal, and J. Feo. A selectivity based approach to continuous pattern detection in streaming graphs. pages 157--168, 2015.Google Scholar
- W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y. Wu. Incremental graph pattern matching. In SIGMOD, pages 925--936, 2011. Google ScholarDigital Library
- A. Gubichev and M. Then. Graph pattern matching: Do we have to reinvent the wheel? In GRADES, pages 8:1--8:7, 2014. Google ScholarDigital Library
- S. Gurajada, S. Seufert, I. Miliaraki, and M. Theobald. Triad: A distributed shared-nothing rdf engine based on asynchronous message passing. In SIGMOD, pages 289--300, 2014. Google ScholarDigital Library
- A. Hogan, M. Arenas, A. Mallea, and A. Polleres. Everything you always wanted to know about blank nodes. Web Semantics: Science, Services and Agents on the World Wide Web, 27--28:42--69, 2014. Google ScholarDigital Library
- S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, pages 68--78, 2007.Google Scholar
- S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In SIGMOD, pages 413--424, 2007. Google ScholarDigital Library
- S. Komazec, D. Cerri, and D. Fensel. Sparkwave: Continuous schema-enhanced pattern matching over RDF data streams. In DEBS, pages 58--68, 2012. Google ScholarDigital Library
- J. Krämer and B. Seeger. Semantics and implementation of continuous sliding window queries over data streams. In ACM Trans. Database Syst., volume 34, pages 4:1--4:49, 2009. Google ScholarDigital Library
- D. Le-Phuoc, M. Dao-Tran, J. X. Parreira, and M. Hauswirth. A native and adaptive approach for unified processing of linked streams and linked data. In ISWC, pages 370--388. 2011. Google ScholarDigital Library
- F. Liu and S. Blanas. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In soCC, pages 153--166, 2015. Google ScholarDigital Library
- B. McBride. Jena: Implementing the RDF model and syntax specification. In SemWeb, pages 23--28, 2001. Google ScholarDigital Library
- Y. Nenov, R. Piro, B. Motik, I. Horrocks, Z. Wu, and J. Banerjee. RDFox: A highly-scalable RDF store. In ISWC, 2015.Google ScholarCross Ref
- T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. In VLDB, pages 91--113, 2010. Google ScholarDigital Library
- J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. In ACM Transactions on Database Systems, volume 34, pages 1--45, 2009. Google ScholarDigital Library
- F. Picalausa, Y. Luo, G. H. L. Fletcher, J. Hidders, and S. Vansummeren. A structural approach to indexing triples. In ESWC, pages 406--421, 2012. Google ScholarDigital Library
- K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. Colt: Continuous on-line tuning. In SIGMOD, pages 793--795, 2006. Google ScholarDigital Library
- U. Srivastava and J. Widom. Flexible time management in data stream systems. In PODs, pages 263--274, 2004. Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple indexing for semantic web data management. In VLDB Endow., volume 1, pages 1008--1019, 2008. Google ScholarDigital Library
- K. Wilkinson. Jena Property Table Implementation. In SSWS, 2006.Google Scholar
- D. Wood, M. Lanthaler, and R. Cyganiak. RDF 1.1 concepts and abstract syntax. In W3C Recommendation, Technical Report, 2014.Google Scholar
- L. Zou, M. T. Ozsu, L. Chen, X. Shen, R. Huang, and D. Zhao. gStore: a graph-based SPARQL query engine. In VLDB, pages 565--590, 2014. Google ScholarDigital Library
- SPECTRA: Continuous Query Processing for RDF Graph Streams Over Sliding Windows
Recommendations
Time- and Space-Efficient Sliding Window Top-k Query Processing
A sliding window top-k (top-k/w) query monitors incoming data stream objects within a sliding window of size w to identify the k highest-ranked objects with respect to a given scoring function over time. Processing of such queries is challenging because,...
A Structure for Sliding Window Equijoins in Data Stream Processing
CSE '13: Proceedings of the 2013 IEEE 16th International Conference on Computational Science and EngineeringSliding window equijoins are commonly used in data stream applications. In their implementation, a hash table is generally allocated for each stream source. However, this structure may degrade join performance because all tuples in the hash tables need ...
Continuous monitoring of top-k queries over sliding windows
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataGiven a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in conventional databases, the existing methods are inapplicable to highly dynamic ...
Comments