Skip to main content
Log in

Extending the power of datalog recursion

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Supporting aggregates in recursive logic rules represents a very important problem for Datalog. To solve this problem, we propose a simple extension, called Datalog\(^{FS}\,\)(Datalog extended with frequency support goals), that supports queries and reasoning about the number of distinct variable assignments satisfying given goals, or conjunctions of goals, in rules. This monotonic extension greatly enhances the power of Datalog, while preserving (i) its declarative semantics and  (ii) its amenability to efficient implementation via differential fixpoint and other optimization techniques presented in the paper. Thus, Datalog\(^{FS}\,\)enables the efficient formulation of queries that could not be expressed efficiently or could not be expressed at all in Datalog with stratified negation and aggregates. In fact, using a generalized notion of multiplicity called frequency, we show that diffusion models and page rank computations can be easily expressed and efficiently implemented using Datalog\(^{FS}\,\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A good example of this problem due to Ross and Sagiv [8] is given at the end of Sect. 2—see Example 3.

  2. Naturally, by “least fixpoint” of a program, we mean “least fixpoint of its immediate consequence operator” [9].

  3. This example illustrates that these problems are caused by the non-monontonic nature of count. In fact, the following program where count is replaced by our running-FS operator has as unique minimal model \(\{ \mathtt p(a),q(a),p(b),q(b) \}\):

    $$\begin{aligned}&\begin{array}{ll} \mathtt{\quad p(b). }&\mathtt{q(b). } \\ \mathtt{\quad p(a) } \leftarrow&\mathtt{1\!:\![q(X)]. } \\ \mathtt{\quad q(a) } \leftarrow&\mathtt{1\!:\![p(X)]. } \\ \end{array} \end{aligned}$$
  4. Observe that while N+1 can be viewed as a call to an arithmetic functions, we can stay in the framework of pure logic programs and view it as the postfix functor +1 applied to N, as in Datalog\(_{1S}\) [16]; this also supports comparison between positive integers without assuming a totally ordered universe.

  5. This discussion suggests that the formal semantics of Datalog\(^{FS}\,\)can also be defined without using lists–that is, in terms of Herbrand bases and interpretations that only use the constants and predicates in the programs and no function symbols. This interesting topic is left for future research.

  6. If this information was stored in SQL tables, then the head of the rule would be updated using the following information:

    $$\begin{aligned} \mathtt{select~cassb.Part,~sum(cassb.Qty*cbasic.K) }\\ \mathtt{\quad \quad from\,cassb,\,cbasic\,where\,cassb.Sub=cbasic.Sub }\\ \mathtt{\quad \quad group\,by\,cbasic.Sub } \end{aligned}$$

    .

  7. Determining local and global variables is already part of the binding passing analysis performed by current Datalog compilers [24].

  8. A boolean function B(T) is monotonic w.r.t. the integer values T whenever \(B(T)\) evaluating to true implies that \(B(T^{\prime })\) is true for every \(T^{\prime }> T\).

  9. For instance, this function can be used to express the fact that an agent who sees, say, all his 89 partners switching to B experiences a much stronger push than another twitter user who sees his only partner moving to B (although percentage-wise the two situations are the same).

  10. A somewhat more efficient computation could be achieved via ordered lists. This approach however is undesirable inasmuch as it requires a totally ordered universe and compromises genericity [18].

References

  1. Hellerstein, J.M.: Datalog redux: experience and conjecture. In: PODS, pp. 1–2 (2010)

  2. de Moor, O., Gottlob, G., Furche, T., Sellers, A.J.: Datalog Reloaded-First International Workshop, Datalog 2010, Oxford, UK, 16–19 March 2010, Springer 2011

  3. Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD Conference, pp. 1213–1216 (2011)

  4. Loo, B.T., Condie, T., Garofalakis, M.N., Gay, D.E., Hellerstein, J.M., Maniatis, P., Ramakrishnan, R., Roscoe, T., Stoica, I.: Declarative networking. Commun. ACM 52(11), 87–95 (2009)

    Article  Google Scholar 

  5. Gottlob, G., Orsi, G., Pieris, A.: Ontological queries: rewriting and optimization. In: ICDE, pp. 2–13 (2011)

  6. Afrati, F.N., Borkar, V.R., Carey, M.J., Polyzotis, N., Ullman, J.D.: Map-reduce extensions and recursive queries. In EDBT, pp. 1–8 (2011)

  7. Zaniolo, C.: Logical foundations of continuous query languages for data streams. Datalog 2012. pp. 177–189 (2012)

  8. Ross, K.A., Sagiv, Y.: Monotonic aggregation in deductive database. J. Comput. Syst. Sci. 54(1), 79–97 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  9. Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer, Berlin (1987)

    Book  MATH  Google Scholar 

  10. Van Gelder, A.: Foundations of aggregation in deductive databases. In: DOOD, pp. 13–34 (1993)

  11. Kanellakis, P.C.: Elements of Relational Database Theory. Technical report, Providence, RI (1989)

  12. Ullman, J.D.: Principles of Database and Knowledge-Base Systems. Computer Science Press, Inc., New York (1988)

    Google Scholar 

  13. Gelfond, M., Lifschitz, V.: The Stable Model Semantics for Logic Programming. MIT Press, London, pp. 1070–1080 (1988)

  14. Van Gelder, Allen, Ross, K.A., Schlipf, J.S.: The well-founded semantics for general logic programs. J. ACM 38(3), 619–649 (1991)

    Article  MathSciNet  Google Scholar 

  15. Zaniolo, C., Faloutsos, S.C.S.C., Snodgrass, R.T., Subrahmanian, V.S., Zicari, R.: Advanced Database Systems. Morgan Kaufmann (1997)

  16. Chomicki, J., Imielinski, T.: Temporal deductive databases and infinite objects. In: PODS, pp. 61–73 (1988)

  17. Hirsch, J.E.: An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics 85(3), 741–754 (2010)

    Article  Google Scholar 

  18. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  19. Mumick, I.S., Hamid, Pirahesh, H., Ramakrishnan, R.: The magic of duplicates and aggregates, In: VLDB, pp. 264–277 (1990)

  20. Kolaitis, P.G.: The expressive power of stratified logic programs. Inf. Comput. 90, 50–66 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  21. Dahlhaus, E.: Skolem Normal Forms Concerning the Least Fixpoint. Springer, London (1987)

    Google Scholar 

  22. Mumick, I.S., Shmueli, O.: How expressive is stratified aggregation? Ann. Math. Artif. Intell. 15, 407–435 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  23. Mazuran, M., Serra, E., Zaniolo, C.: Graph Languages in \(\text{ Datalog}^{FS}\): From Abstract Semantics to Efficient Implementation. Technical report, UCLA (2011)

  24. Arni, F., Ong, K.L., Tsur, S., Wang, H., Zaniolo, C.: The deductive database system ldl++. TPLP 3(1), 61–94 (2003)

    MathSciNet  MATH  Google Scholar 

  25. Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R.T., Subrahmanian, V.S., Zicari, R.: Advanced Database Systems. Morgan Kaufmann, Los Altos (1997)

  26. Angles, R., Gutiérrez, C.: Survey of graph database models. ACM Comput. Surv. 40(1) (2008)

  27. Cruz, I.F., Mendelzon, A.O., Wood P.T. A graphical query language supporting recursion. In: SIGMOD Conference, pp. 323–330 (1987)

  28. Consens, M.P., Mendelzon, A.O. : Graphlog: a visual formalism for real life recursion. In: PODS, pp. 404–416 (1990)

  29. Paredaens, J., Peelman, P., Tanca, L.: G-log: a graph-based query language. IEEE Trans. Knowl. Data Eng. 7(3), 436–453 (1995)

    Article  Google Scholar 

  30. Jackson, M.O., Yariv, L.: Diffusion on Social Networks. Economie Publique (2005)

  31. Shakarian, P., Subrahmanian, V.S., Sapino, M.L.: Using generalized annotated programs to solve social network optimization problems. In: ICLP (Technical Communications), pp. 182–191 (2010)

  32. Ramakrishnan, R., Gehrke, J.: Database Management Systems. WCB/McGraw-Hill, New York (1998)

    Google Scholar 

  33. Ullman, J.D., Widom, J.: A First Course in Database Systems. Prentice-Hall, Prentice (1997)

    Google Scholar 

  34. Zaniolo, C., Arni, N., Ong, K.: Negation and aggregates in recursive rules: the ldl++ approach. In: DOOD, pp. 204–221 (1993)

  35. Lausen, G., Ludäscher, B., May, W.: On active deductive databases: the statelog approach. In: Transactions and Change in Logic Databases, pp. 69–106 (1998)

  36. Ganguly, S., Greco, S., Zaniolo, C.: Minimum and maximum predicates in logic programming. In: PODS, pp. 154–163 (1991)

  37. Giannotti, F., Pedreschi, D., Saccà, D., Zaniolo, C.: Non-determinism in deductive databases. In: DOOD, pp. 129–146 (1991)

  38. Greco, S., Zaniolo, C.: Greedy algorithms in datalog. TPLP 1(4), 381–407 (2001)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank Sergio Greco, Oded Shmueli and the reviewers for their comments and suggested improvements. Carlo Zaniolo’s work was supported in part by NSF Grant No. IIS 1118107. Edoardo Serra’s work was supported by the following projects: FRAME Grant No. PON01-02477, OpenKnowTeck Grant No. DM2130 and TETRIS Grant No. PON01-00451. Mirjana Mazuran’s work was partially funded by the Italian project Sensori (Industria 2015—Bando Nuove Tecnologie per il Made in Italy)—Grant n. 00029MI01/2011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlo Zaniolo.

Appendices

Appendix A: Abstract semantics by example

The abstract semantics of a Datalog\(^{FS}\,\)program P is defined by its translation into a set of Horn clauses called the expansion of P. For example, let us consider the query in Example 11 that counts how many copies of each basic part are contained in assemblies:

$$\begin{aligned} \begin{array}{ll} \mathtt{cassb(Part, Sub)\!:\!\,\, Qty \, } \leftarrow&\mathtt{ assbl(Part, Sub, Qty).~~~~~~~~ } \\ \mathtt{cbasic(Pno)\!:\!\,\, 1 } \leftarrow&\mathtt{\quad basic(Pno,\_). } \\ \mathtt{cbasic(Part)\!:\!\,\, K } \leftarrow&\mathtt{\quad K\!:\![cassb(Part, Sub),cbasic(Sub)]. } \\ \end{array} \end{aligned}$$

We rewrite all multi-occuring predicates using lessthan, obtaining

$$\begin{aligned}\begin{array}{ll} \mathtt{\overline{cassb}(Part, Sub,J) } \leftarrow&\mathtt{assbl(Part, Sub, Qty), } \\ \mathtt{ }&\mathtt{lessthan(J,Qty). } \\ \mathtt{\overline{cbasic}(Pno,J) } \leftarrow&\mathtt{lessthan(J,1), basic(Pno, \_). } \\ \mathtt{\overline{cbasic}(Part,J) } \leftarrow&\mathtt{lessthan(J,K), } \\ \mathtt{ }&\mathtt{\quad K\!:\![\overline{cassb}(Part, Sub, J1), \overline{cbasic}(Sub, J2)]. } \\ \end{array} \end{aligned}$$

The third rule evaluates the join of \(\mathtt{cassb(Part, Sub, \_) }\) and \(\mathtt{cbasic(Sub, \_) }\) on \(\mathtt{Sub }\), and from their multiplicity of the two operands derives the counts associated with each Part. But in terms of formal abstract semantics no count aggregate is used: we instead use the \(\mathtt{conj }\) predicate specific to each b-expression. For the case at hand we obtain (using lists to assemble the values of the three local variables into an object):

$$\begin{aligned} \begin{array}{ll} \mathtt{conj(1, Part, [[Sub, J1, J2]]) } \leftarrow&\mathtt{\overline{cassb}(Part, Sub, J1), } \\ \mathtt{ }&\mathtt{ \overline{cbasic}(Sub, J2). } \\ \mathtt{conj(N1, Part, [[Sub, J1, J2] | T]) } \leftarrow&\mathtt{\overline{cassb}(Part, Sub, J1), } \\ \mathtt{ }&\mathtt{ \overline{cbasic}(Sub, J2), } \\ \mathtt{ }&\mathtt{conj(N, Part, T), } \\ \mathtt{ }&\mathtt{ notin( [Sub, J1, J2], T), } \\ \mathtt{ }&\mathtt{ N1=N+1. } \\ \mathtt{notin(Z, [~]). }&\mathtt{ } \\ \mathtt{notin(Z, [V|T]) } \leftarrow&\mathtt{ \quad Z \ne V , notin(Z, T). } \\ \end{array} \end{aligned}$$

Therefore, the semantic-defining rewriting of our FS-literals expand our Datalog\(^{FS}\,\)program into standard Horn Clauses, for which model-theoretic, proof-theoretic and fixpoint semantics exist and are equivalent. Thus the FS terms can be used freely in recursive rules. However this rewriting does not provide a good starting point for implementation since it is extremely inefficient. Thus, in the example above, we have triples [Sub, J1, J2], and we build all lists of such triplets without repetitions; as we do that, we also count the N elements in each of the N! lists.Footnote 10

Efficient implementations that are fully consistent with the formal semantics, are however possible starting directly from the FS constructs in the rules and we suggest that this is the approach that should be taken in actual implementations of Datalog\(^{FS}\,\). In fact, working directly with the FS constructs in the rules is much preferable in terms of performance and it also more attractive in terms of usability. Indeed, concepts such as multiplicity are simple and enough that users would rather program with running-FS and FS-assert terms, rather than their abstract definitions based on lists and recursive predicates. Although Prolog-like implementation approaches are also possible, in this paper we focused on Datalog implementation technology.

Appendix B: The expressive power of running-FS

In Sect. 5, we claimed that the increased expressive power of stratified Datalog\(^{FS}\,\)is due only to the running-FS construct. In this section we prove such assertion by showing that final-FS goals and FS-assert constructs can be rewritten using only running-FS goals and stratified negation.

In Sects. 3.3 and 3.4 we have shown the rewriting in logic programming without lists of final-FS goals and FS-assert, respectively. However, the rewritings that we have shown use integer comparisons (operator \(>\) for final-FS goals) and the sum operator (for FS-assert constructs). Now we will show how it is possible to rewrite final-FS goals and FS-assert constructs using only running-FS goals without integer comparison and built-in integer operation as the sum.

As shown in Sect. 3.4, a final-FS goal \(\mathtt{Kj=![exprj(Xj, Yj)] }\) is rewritten as the conjunction:

$$\begin{aligned} \mathtt{Kj:[exprj(Xj, Yj)] },\lnot \mathtt{morethan(Y_j,K) } \end{aligned}$$

where the predicate morethan is defined as follows:

$$\begin{aligned} \begin{array}{ll} \mathtt{morethan(Y_j,K) } \leftarrow&\mathtt{K1:[exprj(\_,Y_j)],K1>K. } \\ \end{array} \end{aligned}$$

this rule uses the integer comparison \(>\) and this could lead to think that its use adds expressive power to our language. However, this is not the case and in the following we provide a further rewriting that only uses the inequality operator \(\ne \):

$$\begin{aligned} \begin{array}{ll} \mathtt{morethan(Y_j,K) } \leftarrow&\mathtt{exprj(Z_j,Y_j), } \\ \mathtt{ }&\mathtt{K:[exprj(X_j,Y_j),X_j\ne Z_j]. } \\ \end{array} \end{aligned}$$

Similarly, in the case of FS-assert constructs we have shown, in Sect. 3.3, their rewriting which uses integer comparison and the sum operator as follows:

$$\begin{aligned} \begin{array}{ll} \mathtt{ lessthan(1, K) } \leftarrow&\mathtt{\! K \ge 1. } \\ \mathtt{ lessthan(J1, K) } \leftarrow&\mathtt{\! lessthan(J, K), K\! > \! J, J1=J+1. } \\ \end{array} \end{aligned}$$

these rules can be rewritten as follows:

$$\begin{aligned} \begin{array}{ll} \mathtt{ a(1,0). }&\mathtt{ } \\ \mathtt{ a(1,1). }&\mathtt{ } \\ \mathtt{ a(K,1) } \leftarrow&\mathtt{\!K:[a(K1,X)] . } \\ \mathtt{ b(1,K,0) } \leftarrow&\mathtt{a(K,1). } \\ \mathtt{ b(1,K,1) } \leftarrow&\mathtt{a(K,1). } \\ \mathtt{ b(K,K1,1) } \leftarrow&\mathtt{a(K1,\_),\!K:[b(K2,K1,X),K1\ne K2]. } \\ \mathtt{ lessthan(K,K1) } \leftarrow&\mathtt{b(K,K1,1). } \\ \end{array} \end{aligned}$$

where predicate a generates all integer numbers from 1 until the maximum precision (as the first rule of the lessthan predicate in the previous two rules). Then, the 4th and 5th rule, generate, for each integer, two b predicates and the 6th rule, given a number K1, generates as many occurrences of predicate b as K1. Finally, the last rule will generate as many predicates lessthan as the required K.

Appendix C: Stratified aggregates

The expressive power of \(D^a\) on unordered domains (genericity), is also of significant theoretical interest. This issue has been studied by Mumick and Shmueli in [22], where the basic arithmetic functions \((+,-,*, \ldots )\) were also allowed as built-in. Even with these extensions, \(D^a\) cannot express the generalized part explosion query of Example 15. In the rest of this section, we will express Datalog aggregates in Datalog\(^{FS}\,\). Then, from Example 15 we can conclude that, for unordered domains:

Theorem 7

\(D^a \subsetneqq \) Datalog\(^{FS}\,\)

Let us now discuss how to express \(D^a\) in Datalog\(^{FS}\,\). Mumick and Shmueli describe Datalog with stratified aggregates as an extension of Datalog that permits a “Groupby” predicates of the form

$$\begin{aligned} \mathtt{GROUPBY }(\mathtt{r }(\overline{\mathtt{t }}),[\mathtt{Y }_\mathtt{1 }, \mathtt{Y }_\mathtt{2,\ldots , }\mathtt{Y }_\mathtt{m }],\mathtt{Z }_\mathtt{1 }= \mathtt{A }_\mathtt{1 }(\mathtt{E }_\mathtt{1 }(\overline{\mathtt{t }})), \mathtt{\ldots ,Z }_\mathtt{n }=\mathtt{A }_\mathtt{n }(\mathtt{E }_\mathtt{n } (\overline{\mathtt{t }}))) \end{aligned}$$

to appear as subgoals in Datalog body rules. The “Groupby” predicate takes as arguments: a predicate r with its attribute list \(\overline{\mathtt{t }}\), a grouping list \([\mathtt{Y }_\mathtt{1 },\mathtt{Y }_\mathtt{2,\ldots },\mathtt{Y }_\mathtt{m }]\) contained in \(\overline{\mathtt{t }}\) and a set of aggregation terms \(\mathtt{Z }_\mathtt{1 }=\mathtt{A }_\mathtt{1 }(\mathtt{E }_\mathtt{1 }),\mathtt{\ldots ,Z } _\mathtt{n }=\mathtt{A }_\mathtt{n }(\mathtt{E }_\mathtt{n })\). For each aggregation term \(\mathtt{Z }_\mathtt{i }=\mathtt{A }_\mathtt{i } (\mathtt{E }_\mathtt{i }(\overline{\mathtt{t }})), \mathtt{Z }_\mathtt{i }\) is a new variable, \(\mathtt{E }_\mathtt{i }(\overline{\mathtt{t }})\) is an arithmetic expression that uses the variables \(\overline{\mathtt{t }}\) and \(\mathtt{A }_\mathtt{i }\) is an aggregate operator, for example sum, count, max, min and avg. Stratification means that if a derived relation \(r1\) is defined by applying aggregation on a derived relation \(r2\), then \(r2\)’s definition does not depend, syntactically, on relation \(r1\).

Example 23

Stratified aggregate.

$$\begin{aligned} \begin{array}{ll} \mathtt{p(Y,Z_1,Z_2) } \leftarrow&\mathtt{GROUPBY(r(\_,Y,F),[Y],Z_1=avg(F),Z_2=sum(F)). } \\ \end{array} \end{aligned}$$

This program returns for each Y value in r the average and sum of value assumed by F in r. \(\square \)

We can express the predicate “Groupby” by using Datalog\(^{FS}\,\). Without loss of generality we can consider only one aggregation term. In fact for each rule of the form

$$\begin{aligned} \begin{array}{ll} \mathtt{p(Y_1,\ldots ,Y_m,Z_1,\ldots ,Z_n) } \leftarrow&\mathtt{GROUPBY(r(\overline{t}), [Y_1,\ldots ,Y_m], } \\ \mathtt{ }&\mathtt{Z_1=A_1(E_1(\overline{t})), \ldots ,Z_n=A_n(E_n(\overline{t}))). } \\ \end{array} \end{aligned}$$

we can rewrite the program in the following way:

$$\begin{aligned} \begin{array}{ll} \mathtt{r_0:p(Y_1,\ldots ,Y_m,Z_1,\ldots ,Z_n) } \leftarrow&\mathtt{p_1(Y_1,\ldots ,Y_m,Z_1),\ldots } \\ \mathtt{ }&\mathtt{\ldots ,p_n(Y_1,\ldots ,Y_m,Z_n). } \\ \mathtt{r_1:p_1(Y_1,\ldots ,Y_m,Z_1) } \leftarrow&\mathtt{\!\!\! GROUPBY(r(\overline{t}),[Y_1,\ldots ,Y_m],Z_1=A_1(E_1(\overline{t}))). } \\ \mathtt{\vdots }&\mathtt{ } \\ \mathtt{r_n:p_n(Y_1,\ldots ,Y_m,Z_n) } \leftarrow&\mathtt{\!\!\! GROUPBY(r(\overline{t}), [Y_1,\ldots ,Y_m],Z_n=A_n(E_n(\overline{t}))). } \\ \end{array} \end{aligned}$$

where \(r_1,\ldots ,r_n\) are rules with “GROUPBY” predicate with only one aggregation term.

Count aggregate We start by showing how the “GROUPBY” predicate with count aggregate can be express in Datalog\(^{FS}\,\). Consider the following rule

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{GROUPBY(r(\_,\overline{y}, \overline{x}),[\overline{y}],Z=count(\overline{x})). } \\ \end{array} \end{aligned}$$

where \(\overline{\mathtt{y }}\) is the list of grouping variables, \(\overline{\mathtt{x }}\) is the list of count variables. The following program can be rewrite in this way:

$$\begin{aligned} \begin{array}{ll} \mathtt{p( \overline{y},Z) } \leftarrow&\mathtt{r(\_, \overline{y},\_ ), Z=![r^{\prime }( \overline{y},\overline{x})]. } \\ \mathtt{r^{\prime }( \overline{y},\overline{x}) } \leftarrow&\mathtt{r(\_ , \overline{y},\overline{x} ). } \\ \end{array} \end{aligned}$$

where the rule with predicate \(r^{\prime }\) represents a projection over relation r.

Sum aggregate The next step consists to show how to use the aggregate sum on decimal number. Suppose that we have the scale-up factor \(\mathtt{d=10 }^\mathtt{n }\) that represent the decimal precision number (see Sect. 7) where n is large enough to represent the decimal numbers used. The rule with sum aggregate has the following form:

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{GROUPBY(r(\_,\overline{y}, \overline{x}),[\overline{y}],Z=sum(E(\overline{y},\overline{x})). } \\ \end{array} \end{aligned}$$

where E is a expression composed by built-in functions \((+,-,*, \ldots )\) that use some variables in \(\overline{\mathtt{y }}\) and \(\overline{\mathtt{x }}\). Initially, we compute for each fact of relation r the expression \(\mathtt{E }(\overline{\mathtt{y }},\overline{\mathtt{x }})\) and we convert such value in a multiplicity by distinguishing its sign.

$$\begin{aligned} \begin{array}{ll} \mathtt{r^{\prime \prime }(\overline{y},\overline{x},+):K } \leftarrow&\mathtt{r(\_,\overline{y},\overline{x}),Z=E(\overline{y},\overline{x}), Z>0,K=Z*d. } \\ \mathtt{r^{\prime \prime }(\overline{y},\overline{x},-):K } \leftarrow&\mathtt{r(\_,\overline{y},\overline{x}),Z=E(\overline{y}, \overline{x}),Z<0,K=-Z*d. } \\ \end{array} \end{aligned}$$

As last we obtain the sum aggregate value by subtract the sum \(\mathtt{Z }^\mathtt{- }\) of negative numbers from the sum of positive numbers \(\mathtt{Z }^\mathtt{+ }\) in the following way:

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{r(\_, \overline{y},\_ ),Z^+=![r^{\prime \prime }(\overline{y},\overline{x},+)], } \\ \mathtt{ }&\mathtt{Z^-=![r^{\prime \prime }(\overline{y},\overline{x},-)],Z=(Z^+-Z^-)/D. } \\ \end{array} \end{aligned}$$

Note that the result value is scaled up by using a factor D.

Average aggregate The average aggregate can be obtained by combining the sum and count aggregate. Thus given the average aggregate rule:

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{GROUPBY(r(\_,\overline{y}, \overline{x}),[\overline{y}],Z=avg(E(\overline{y},\overline{x})). } \\ \end{array} \end{aligned}$$

it is possible to rewrite it in Datalog\(^{FS}\,\)by using the following rule:

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{r(\_, \overline{y},\_ ),Z^+=![r^{\prime \prime }(\overline{y},\overline{x},+)],Z^-=![r^{\prime \prime }(\overline{y},\overline{x},-)], } \\ \mathtt{ }&\mathtt{K=![r(\_, \overline{y},\overline{x})],Z=(Z^+-Z^-)/(K*D). } \\ \end{array} \end{aligned}$$

where the rule that defines the predicates \(r^{\prime \prime }\) is the rule used in sum aggregate.

Min and max aggregates For min and max aggregate it is sufficient to use the Negation-stratified Datalog. It follows that for the min aggregate rule:

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{GROUPBY(r(\_,\overline{y}, \overline{x}),[\overline{y}],Z=min(E(\overline{y},\overline{x})). } \\ \end{array} \end{aligned}$$

we have the following Negation-stratified Datalog:

$$\begin{aligned} \begin{array}{ll} \mathtt{p(\overline{y},Z) } \leftarrow&\mathtt{r(\_, \overline{y}, \overline{x}), Z=E(\overline{y},\overline{x}), \lnot p^{\prime }(\overline{y},\overline{x}). } \\ \mathtt{p^{\prime }(\overline{y},\overline{x}) } \leftarrow&\mathtt{r(\_, \overline{y}, \overline{x}),r(\_, \overline{y}, \overline{x}^{\prime }),E(\overline{y},\overline{x}^{\prime }) <E(\overline{y},\overline{x}). } \\ \end{array} \end{aligned}$$

We can obtain the max aggregate by change the \(\mathtt{E }(\overline{\mathtt{y }},\overline{\mathtt{x }}^{\prime } )<\mathtt{E }(\overline{\mathtt{y,x }})\) atom in the second rule with \(\mathtt{E }(\overline{\mathtt{y,x }}^{\prime })>E(\overline{y},\overline{x})\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mazuran, M., Serra, E. & Zaniolo, C. Extending the power of datalog recursion. The VLDB Journal 22, 471–493 (2013). https://doi.org/10.1007/s00778-012-0299-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0299-1

Keywords

Navigation