Abstract
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program.
- [1] . 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), and (Eds.). USENIX Association, 265–283. Retrieved from: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadiGoogle Scholar
- [2] . 1995. Foundations of Databases. Addison-Wesley. Retrieved from: http://webdam.inria.fr/Alice/Google ScholarDigital Library
- [3] . 2022. Convergence of Datalog over (pre-) semirings. In International Conference on Management of Data (PODS’22), and (Eds.). ACM, 105–117.
DOI: Google ScholarDigital Library - [4] . 2016. FAQ: Questions asked frequently. In 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, and (Eds.). ACM, 13–28.
DOI: Google ScholarDigital Library - [5] . 2000. The generalized distributive law. IEEE Trans. Inf. Theor 46, 2 (2000), 325–343.
DOI: Google ScholarDigital Library - [6] . 2020. Seminaïve evaluation for a higher-order functional language. Proc. ACM Program. Lang. 4, POPL (2020), 22:1–22:28.
DOI: Google ScholarDigital Library - [7] . 1975. Regular algebra applied to path-finding problems. J. Inst. Math. Appl. 15 (1975), 161–186.Google ScholarCross Ref
- [8] . 2023. Ehrenfeucht-Fraïssé games in semiring semantics. CoRR abs/2308.04910 (2023).Google Scholar
- [9] . 1979. Graphs and Networks. The Clarendon Press, Oxford University Press, New York. xvi+277 pages.Google Scholar
- [10] . 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (1970), 377–387.
DOI: Google ScholarDigital Library - [11] . 2018. Scaling-up reasoning and advanced analytics on BigData. Theor. Pract. Log. Program. 18, 5–6 (2018), 806–845.
DOI: Google ScholarCross Ref - [12] . 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM Symposium on Principles of Programming Languages, , , and (Eds.). ACM, 238–252.
DOI: Google ScholarDigital Library - [13] . 1992. Comparing the Galois connection and widening/narrowing approaches to abstract interpretation. In Programming Language Implementation and Logic Programming (Leuven, 1992) (
Lecture Notes in Computer Science , Vol. 631). Springer, Berlin, 269–295.DOI: Google ScholarCross Ref - [14] , , and (Eds.). 2008. Complexity of Constraints—An Overview of Current Research Themes.
(Lecture Notes in Computer Science , Vol. 5250). Springer.DOI: Google ScholarCross Ref - [15] . 2021. Semiring provenance for fixed-point logic. In 29th EACSL Annual Conference on Computer Science Logic (CSL’21) (LIPIcs, Vol. 183), and (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 17:1–17:22.
DOI: Google ScholarCross Ref - [16] . 2019. BigData applications from graph analytics to machine learning by aggregates in recursion. In 35th International Conference on Logic Programming (Technical Communications) (ICLP’19)(
EPTCS , Vol. 306), , , , , , , , , , and (Eds.). 273–279.DOI: Google ScholarCross Ref - [17] . 1990. Introduction to Lattices and Order. Cambridge University Press, Cambridge. Retrieved from: http://www.worldcat.org/search?qt=worldcat_org_all&q=0521367662Google Scholar
- [18] . 1997. Bucket elimination: A unifying framework for processing hard and soft constraints. Constraints Int. J. 2, 1 (1997), 51–55.
DOI: Google ScholarDigital Library - [19] . 2010. Newtonian program analysis. J. ACM 57, 6 (2010), 33:1–33:47.
DOI: Google ScholarDigital Library - [20] . 1985. A Kripke-Kleene semantics for logic programs. J. Log. Program. 2, 4 (1985), 295–312.
DOI: Google ScholarCross Ref - [21] . 1991. Bilattices and the semantics of logic programming. J. Log. Program. 11, 1&2 (1991), 91–116.
DOI: Google ScholarDigital Library - [22] . 1991. Kleene’s logic, generalized. J. Log. Comput. 1, 6 (1991), 797–810.
DOI: Google ScholarCross Ref - [23] . 1993. The family of stable models. J. Log. Program. 17, 2/3&4 (1993), 197–225.
DOI: Google ScholarCross Ref - [24] . 2002. Fixpoint semantics for logic programming a survey. Theor. Comput. Sci. 278, 1–2 (2002), 25–51.
DOI: Google ScholarDigital Library - [25] . 1962. Algorithm 97: Shortest path. Commun. ACM 5, 6 (1962), 345.
DOI: Google ScholarDigital Library - [26] . 1991. Minimum and maximum predicates in logic programming. In 10th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (Ed.). ACM Press, 154–163.
DOI: Google ScholarDigital Library - [27] . 1995. Extrema predicates in deductive databases. J. Comput. Syst. Sci. 51, 2 (1995), 244–259.
DOI: Google ScholarDigital Library - [28] . 1989. The alternating fixpoint of logic programs with negation. In 8th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (Ed.). ACM Press, 1–10.
DOI: Google ScholarDigital Library - [29] . 1991. The well-founded semantics for general logic programs. J. ACM 38, 3 (1991), 620–650.
DOI: Google ScholarDigital Library - [30] . 1988. The stable model semantics for logic programming. In 5th International Conference and Symposium on Logic Programming, and (Eds.). MIT Press, 1070–1080.Google Scholar
- [31] . 1975. Algèbre linéaire et cheminement dans un graphe. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Verte 9, V-1 (1975), 77–99.Google Scholar
- [32] . 1979. Les elements p-reguliers dans les dioïdes. Discret. Math. 25, 1 (1979), 33–39.
DOI: Google ScholarCross Ref - [33] . 2008. Graphs, Dioids and Semirings
(Operations Research/Computer Science Interfaces Series , Vol. 41). Springer, New York, xx+383 pages.Google ScholarDigital Library - [34] . 1995. DATALOG queries with stratified negation and choice: From P to D\({}^{\mbox{P}}\). In 5th International Conference on Database Theory (ICDT’95) (Lecture Notes in Computer Science, Vol. 893), and (Eds.). Springer, 82–96.
DOI: Google ScholarCross Ref - [35] . 2001. Greedy algorithms in Datalog. Theory Pract. Log. Program. 1, 4 (2001), 381–407.
DOI: Google ScholarDigital Library - [36] . 1992. Greedy by choice. In 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, and (Eds.). ACM Press, 105–113.
DOI: Google ScholarDigital Library - [37] . 2013. Datalog and recursive query processing. Found. Trends Datab. 5, 2 (2013), 105–195.
DOI: Google ScholarDigital Library - [38] . 2007. Provenance semirings. In 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (Ed.). ACM, 31–40.
DOI: Google ScholarDigital Library - [39] . 2019. RaSQL: Greater power and performance for big data analytics with recursive-aggregate-SQL on Spark. In International Conference on Management of Data, , , , , and (Eds.). ACM, 467–484.
DOI: Google ScholarDigital Library - [40] . 1998. An introduction to idempotency. In Idempotency.
Publications of the Newton Institute , Vol. 11. Cambridge University Press, Cambridge, 1–49.DOI: Google ScholarCross Ref - [41] . 1999. Parikh’s theorem in commutative Kleene algebra. In 14th Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society, 394–401.
DOI: Google ScholarCross Ref - [42] . 2016. Soufflé: On synthesis of program analyzers. In 28th International Conference on Computer Aided Verification (CAV’16) (Lecture Notes in Computer Science, Vol. 9780), and (Eds.). Springer, 422–430.
DOI: Google ScholarCross Ref - [43] . 1976. Global data flow analysis and iterative algorithms. J. ACM 23, 1 (1976), 158–171.
DOI: Google ScholarDigital Library - [44] . 1956. Representation of events in nerve nets and finite automata. In Automata Studies. Princeton University Press, Princeton, NJ, 3–41.Google Scholar
- [45] . 2014. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23, 2 (2014), 253–278.
DOI: Google ScholarCross Ref - [46] . 2003. Information Algebras—Generic Structures for Inference. Springer.Google Scholar
- [47] . 2008. Semiring induced valuation algebras: Exact and approximate local computation algorithms. Artif. Intell. 172, 11 (2008), 1360–1399.
DOI: Google ScholarDigital Library - [48] . 1991. The expressive power of stratified programs. Inf. Comput. 90, 1 (1991), 50–66.
DOI: Google ScholarDigital Library - [49] . 1991. Why not negation by fixpoint? J. Comput. Syst. Sci. 43, 1 (1991), 125–144.
DOI: Google ScholarDigital Library - [50] . 1987. The Kleene and the Parikh theorem in complete semirings. In Automata, Languages and Programming.
(Lecture Notes in Computer Science , Vol. 267). Springer, Berlin, 212–225.DOI: Google ScholarCross Ref - [51] . 1997. Semirings and formal power series: their relevance to formal languages and automata. In Handbook of Formal Languages, Vol. 1. Springer, Berlin, 609–677.Google Scholar
- [52] . 1977. Algebraic structures for transitive closure. Theor. Comput. Sci. 4, 1 (1977), 59–76.
DOI: Google ScholarCross Ref - [53] . 2006. The DLV system for knowledge representation and reasoning. ACM Trans. Comput. Log. 7, 3 (2006), 499–562.
DOI: Google ScholarDigital Library - [54] . 2004. Elements of Finite Model Theory. Springer.
DOI: Google ScholarCross Ref - [55] . 1979. Generalized nested dissection. SIAM J. Numer. Anal. 16, 2 (1979), 346–358.
DOI: Google ScholarDigital Library - [56] . 1980. Applications of a planar separator theorem. SIAM J. Comput. 9, 3 (1980), 615–627.
DOI: Google ScholarDigital Library - [57] . 2020. Founded semantics and constraint semantics of logic rules. J. Log. Comput. 30, 8 (2020), 1609–1668.
DOI: Google ScholarCross Ref - [58] . 2022. Recursive rules with aggregation: A simple unified semantics. J. Log. Comput. 32, 8 (2022), 1659–1693.
DOI: Google ScholarCross Ref - [59] . 2016. Convergence of Newton’s method over commutative semirings. Inf. Comput. 246 (2016), 43–61.
DOI: Google ScholarDigital Library - [60] . 2018. Datalog: Concepts, History, and Outlook. In Declarative Logic Programming: Theory, Systems, and Applications, and (Eds.). ACM / Morgan & Claypool, 3–100.
DOI: Google ScholarDigital Library - [61] . 2013. Extending the power of datalog recursion. VLDB J. 22, 4 (2013), 471–493.
DOI: Google ScholarDigital Library - [62] . 2022. Recursion in Materialize. Retrieved from DOI: https://github.com/frankmcsherry/blog/blob/master/posts/2022-12-25.mdGoogle Scholar
- [63] . 1999. Principles of Program Analysis. Springer-Verlag, Berlin. xxii+450 pages.
DOI: Google ScholarCross Ref - [64] . 1999. Numerical Optimization. Springer.
DOI: Google ScholarCross Ref - [65] . 1966. On context-free languages. J. Assoc. Comput. Mach. 13 (1966), 570–581.
DOI: Google ScholarDigital Library - [66] . 1973. Commutative regular equations and Parikh’s theorem. J. London Math. Soc. (2) 6 (1973), 663–666.
DOI: Google ScholarCross Ref - [67] . 1989. Every logic program has a natural stratification and an iterated least fixed point model. In 8th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, (Ed.). ACM Press, 11–21.
DOI: Google ScholarDigital Library - [68] . 1990. The well-founded semantics coincides with the three-valued stable semantics. Fundam. Inform. 13, 4 (1990), 445–463.Google ScholarDigital Library
- [69] . 2016. Newtonian program analysis via tensor product. In 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16), and (Eds.). ACM, 663–677.
DOI: Google ScholarDigital Library - [70] . 2018. Einsum is all you need—Einstein summation in deep learning. Retrieved from DOI: https://rockt.github.io/2018/04/30/einsumGoogle Scholar
- [71] . 1992. Monotonic aggregation in deductive databases. In 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, and (Eds.). ACM Press, 114–126.
DOI: Google ScholarDigital Library - [72] . 1990. Path problems in graphs. In Computational Graph Theory
(Comput. Suppl. , Vol. 7). Springer, Vienna, 155–189.DOI: Google ScholarCross Ref - [73] . 1988. Axioms for probability and belief-function proagation. In 4th Annual Conference on Uncertainty in Artificial Intelligence (UAI’88), , , , and (Eds.). North-Holland, 169–198.Google Scholar
- [74] . 2016. Big data analytics with Datalog queries on Spark. In International SIGMOD Conference on Management of Data. 1135–1149.
DOI: Google ScholarDigital Library - [75] . 2015. Optimizing recursive queries with monotonic aggregates in DeALS. In 31st IEEE International Conference on Data Engineering (ICDE’15), , , , , and (Eds.). IEEE Computer Society, 867–878.
DOI: Google ScholarCross Ref - [76] . 1999. Enumerative Combinatorics. Vol. 2 (
Cambridge Studies in Advanced Mathematics , Vol. 62). Cambridge University Press, Cambridge. xii+581 pages.DOI: Google ScholarCross Ref - [77] . 1976. Graph theory and Gaussian elimination. J. R. Bunch and D. J. Rose (Eds.). 3–22.Google Scholar
- [78] . 1981. A unified approach to path problems. J. ACM 28, 3 (1981), 577–593.
DOI: Google ScholarDigital Library - [79] . 1982. The complexity of relational query languages (extended abstract). In 14th Annual ACM Symposium on Theory of Computing, , , , and (Eds.). ACM, 137–146.
DOI: Google ScholarDigital Library - [80] . 2021. Datalog unchained. In 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’21), , , and (Eds.). ACM, 57–69.
DOI: Google ScholarDigital Library - [81] . 2022. Optimizing recursive queries with progam synthesis. In International SIGMOD Conference on Management of Data, , , and (Eds.). ACM, 79–93.
DOI: Google ScholarDigital Library - [82] . 1962. A theorem on Boolean matrices. J. ACM 9, 1 (1962), 11–12.
DOI: Google ScholarDigital Library - [83] . 2019. Monotonic properties of completed aggregates in recursive queries. CoRR abs/1910.08888 (2019).Google Scholar
- [84] . 2021. Developing big-data Application as queries: An aggregate-based approach. IEEE Data Eng. Bull. 44, 2 (2021), 3–13. Retrieved from: http://sites.computer.org/debull/A21june/p3.pdfGoogle Scholar
- [85] . 2016. The magic of pushing extrema into recursion: Simple, powerful Datalog programs. In 10th Alberto Mendelzon International Workshop on Foundations of Data Management (CEUR Workshop Proceedings, Vol. 1644), and (Eds.). CEUR-WS.org. Retrieved from: http://ceur-ws.org/Vol-1644/paper16.pdfGoogle Scholar
- [86] . 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. Theory Pract. Log. Program. 17, 5–6 (2017), 1048–1065.
DOI: Google ScholarCross Ref - [87] . 2018. Declarative BigData algorithms via aggregates and relational database dependencies. In 12th Alberto Mendelzon International Workshop on Foundations of Data Management(
CEUR Workshop Proceedings , Vol. 2100), and (Eds.). CEUR-WS.org. Retrieved from: http://ceur-ws.org/Vol-2100/paper2.pdfGoogle Scholar - [88] . 1981. Linear and combinatorial optimization in ordered algebraic structures. Ann. Discrete Math. 10 (1981), viii+380.Google Scholar
Index Terms
- Convergence of datalog over (Pre-) Semirings
Recommendations
Convergence of Datalog over (Pre-) Semirings
PODS '22: Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsRecursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems ...
Convergence of Datalog over (Pre-) Semirings
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems ...
Convergence of Newton's Method over Commutative Semirings
We give a lower bound on the speed at which Newton's method (as defined in 11) converges over arbitrary ω-continuous commutative semirings. From this result, we deduce that Newton's method converges within a finite number of iterations over any semiring ...
Comments