ABSTRACT
We propose a logical framework, based on Datalog, to study the foundations of querying JSON data. The main feature of our approach, which we call J-Logic, is the emphasis on paths. Paths are sequences of keys and are used to access the tree structure of nested JSON objects. J-Logic also features "packing" as a means to generate a new key from a path or subpath. J-Logic with recursion is computationally complete, but many queries can be expressed without recursion, such as deep equality. We give a necessary condition for queries to be expressible without recursion. Most of our results focus on the deterministic nature of JSON objects as partial functions from keys to values. Predicates defined by J-Logic programs may not properly describe objects, however. Nevertheless we show that every object-to-object transformation in J-Logic can be defined using only objects in intermediate results. Moreover we show that it is decidable whether a positive, nonrecursive J-Logic program always returns an object when given objects as inputs. Regarding packing, we show that packing is unnecessary if the output does not require new keys. Finally, we show the decidability of query containment for positive, nonrecursive J-Logic programs.
- S. Abiteboul and R. Hull. Data functions, datalog and negation. In H. Boral and P.A. Larson, editors, 1988 Proceedings SIGMOD International Conference on Management of Data, pages 143--153. ACM Press, 1988. Google ScholarDigital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- S. Abiteboul and P.C. Kanellakis. Object identity as a query language primitive. Journal of the ACM, 45(5):798--842, 1998. Google ScholarDigital Library
- S. Abiteboul and V. Vianu. Procedural languages for database queries and updates. Journal of Computer and System Sciences, 41(2):181--229, 1990. Google ScholarDigital Library
- S. Abiteboul and V. Vianu. Datalog extensions for database queries and updates. Journal of Computer and System Sciences, 43(1):62--124, 1991. Google ScholarDigital Library
- P.C. Arocena, B. Glavic, and R.J. Miller. Value invention in data exchange. In Proceedings 2013 SIGMOD Conference, pages 157--168. ACM, 2013. Google ScholarDigital Library
- P. Barceló and R. Pichler, editors. Datalog in Academia and Industry: Second International Workshop, Datalog 2.0, volume 7494 of Lecture Notes in Computer Science. Springer, 2012. Google ScholarDigital Library
- C. Beeri and M.Y. Vardi. A proof procedure for data dependencies. Journal of the ACM, 31(4):718--741, 1984. Google ScholarDigital Library
- A. Bonner and G. Mecca. Sequences, Datalog, and transducers. Journal of Computer and System Sciences, 57:234--259, 1998. Google ScholarDigital Library
- A.J. Bonner and G. Mecca. Querying sequence databases with transducers. Acta Informatica, 36:511--544, 2000. Google ScholarDigital Library
- P. Buneman, A. Deutsch, and W.-C. Tan. A deterministic model for semi-structured data. http://users.soe.ucsc.edu/ tan/papers/1998/icdt.pdf. Presented at the Workshop on Query Processing for Semistructured Data and Non-standard Data Formats, Jerusalem, Israel, January 13, 1999.Google Scholar
- P. Buneman, S.A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theoretical Computer Science, 149(1):3--48, 1995. Google ScholarDigital Library
- L. Cabibbo. The expressive power of stratified logic programs with value invention. Information and Computation, 147(1):22--56, 1998. Google ScholarDigital Library
- A.K. Chandra and D. Harel. Computable queries for relational data bases. Journal of Computer and System Sciences, 21(2):156--178, 1980.Google ScholarCross Ref
- O. de Moor, G. Gottlob, T. Furche, and A. Sellers, editors. Datalog Reloaded: First International Workshop, Datalog 2010, volume 6702 of Lecture Notes in Computer Science. Springer, 2011. Google ScholarDigital Library
- The JSON data interchange format. Standard ECMA-404, October 2013.Google Scholar
- D. Florescu and G. Fourny. JSONiq: The history of a query language. IEEE Internet Computing, 17(5):86--90, 2013. Google ScholarDigital Library
- G. Fourny. JSONiq, the SQL of NoSQL. http://www.28.io/jsoniq-the-sql-of-nosql. Retrieved 25 November 2016.Google Scholar
- D.D. Freydenberger and D. Reidenbach. Bad news on decision problems for patterns. Information and Computation, 208(1):83--96, 2010. Google ScholarDigital Library
- T. Furche, G. Gottlob, B. Neumayr, and E. Sallinger. Data wrangling for big data: Towards a lingua franca for data wrangling. In R. Pichler and A. Soares da Silva, editors, Proceedings 10th Alberto Mendelzon International Workshop on Foundations of Data Management, volume 1644 of CEUR Workshop Proceedings, 2016.Google Scholar
- H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, and J. Widom. The TSIMMIS approach to mediation: data models and languages. Journal of Intelligent Information Systems, 8(2):117--132, 1997. Google ScholarDigital Library
- M. Gyssens, J. Paredaens, J. Van den Bussche, and D. Van Gucht. A graph-oriented object database model. IEEE Transactions on Knowledge and Data Engineering, 6(4):572--586, 1994. Google ScholarDigital Library
- S.S. Huang, T.J. Green, and B.T. Loo. Datalog and emerging applications: an interactive tutorial. In Proceedings 2011 ACM SIGMOD International Conference on Management of Data, pages 1213--1216. ACM Press, 2011. Google ScholarDigital Library
- R. Hull and M. Yoshikawa. ILOG: Declarative creation and manipulation of object identifiers. In D. McLeod, R. Sacks-Davis, and H. Schek, editors, Proceedings of the 16th International Conference on Very Large Data Bases, pages 455--468. Morgan Kaufmann, 1990. Google ScholarDigital Library
- M. Kifer and J. Wu. A logic for programming with complex objects. Journal of Computer and System Sciences, 47(1):77--120, 1993. Google ScholarDigital Library
- A. Klug and R. Price. Determining view dependencies using tableaux. ACM Transactions on Database Systems, 7:361--380, 1982. Google ScholarDigital Library
- G. Kuper and M. Vardi. The logical data model. ACM Transactions on Database Systems, 18(3):379--413, 1993. Google ScholarDigital Library
- G. Mecca and A.J. Bonner. Query languages for sequence databases: Termination and complexity. IEEE Transactions on Knowledge and Data Engineering, 13(3):519--525, 2001. Google ScholarDigital Library
- K.W. Ong, Y. Papakonstantinou, and R. Vernoux. The SQLGoogle Scholar
- query language: Configurable, unifying and semi-structured. arXiv:1405.3631, 2015.Google Scholar
- J. Paredaens and D. Van Gucht. Converting nested algebra expressions into flat algebra expressions. ACM Transactions on Database Systems, 17(1):65--93, 1992. Google ScholarDigital Library
- F. Pezoa, J.L. Reutter, F. Suarez, M. Ugarte, and D. Vrgo\vc. Foundations of JSON Schema. In Proceedings 25th International Conference on World Wide Web, pages 263--273, 2016. Google ScholarDigital Library
- A. Poggi et al. Linking data to ontologies. Journal on Data Semantics, 10:133--173, 2008. Google ScholarDigital Library
- K. Tajima. Schemaless semistructured data revisited: Reinventing Peter Buneman's deterministic semistructured data model. In V. Tannen, L. Wong, et al., editors, In Search of Elegance in the Theory and Practice of Computation, volume 8000 of Lecture Notes in Computer Science, pages 466--482. Springer, 2013.Google Scholar
- J. Van den Bussche and J. Paredaens. The expressive power of complex values in object-based data models. Information and Computation, 120:220--236, 1995. Google ScholarDigital Library
- J. Van den Bussche, D. Van Gucht, M. Andries, and M. Gyssens. On the completeness of object-creating database transformation languages. Journal of the ACM, 44(2):272--319, 1997. Google ScholarDigital Library
- T.L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings 17th International Conference on Database Theory, pages 96--106, 2014.Google Scholar
Index Terms
- J-Logic: Logical Foundations for JSON Querying
Recommendations
Does query evaluation tractability help query containment?
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWhile checking containment of Datalog programs is undecidable, checking whether a Datalog program is contained in a union of conjunctive queries (UCQ), in the context of relational databases, or a union of conjunctive 2-way regular path queries (UC2RPQ),...
Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing ...
Logic-Based Query Optimization for Object Databases
We present a technique for transferring query optimization techniques, developed for relational databases, into object databases. We demonstrate this technique for ODMG database schemas defined in ODL and object queries expressed in OQL. The object ...
Comments