Skip to main content
Log in

On the equivalence and rewriting of aggregate queries

  • Published:
Acta Informatica Aims and scope Submit manuscript

Abstract.

We introduce a first-order language with real polynomial arithmetic and aggregation operators (count, iterated sum and multiply), which is well suited for the definition of aggregate queries involving complex statistical functions. It offers a good trade-off between expressive power and complexity, with a tractable data complexity. Interestingly, some fundamental properties of first-order with real arithmetic are preserved in the presence of aggregates. In particular, there is an effective quantifier elimination for formulae with aggregation. We then consider the problem of querying data that has already been aggregated in aggregate views, and focus on queries with an aggregation over a conjunctive query (namely single-block aggregate group-by queries without having clause). Our main conceptual contribution is the introduction of a new equivalence relation among conjunctive queries, the isomorphism modulo a product. We prove that the equivalence of aggregate queries such as for instance averages reduces to it. Deciding if two queries are isomorphic modulo a product is shown to be NP-complete. We then analyze the equivalence problem in the case of aggregate conjunctive queries with comparisons. We introduce the concept of cross isomorphic linear expansions, which generalizes isomorphim modulo a product, and we show that equivalence reduces to it and that it can be decided in PSPACE. Finally, we show that the problem of complete rewriting of count queries using count views is NP-complete, and we introduce new rewriting techniques based on the isomorphism modulo a product. to recover the values of counts by complex arithmetical computation from the views.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arnon, D., Collins, G., McCallum, S. (1988) Cylindrical algebraic decomposition. SIAM J. computing 13(4): 865-889

    MATH  Google Scholar 

  2. Abiteboul, S., Duschka, O.M. (1998) Complexity of answering queries using materialized views. In: Proc. ACM PODS’98, June 1-3, 1998. ACM Press, Seattle, Washington, pp. 254-263

  3. Agrawal, R., Gupta, A., Sarawagi, S. (1997) Modeling multidimensional databases. In: Proceedings of ICDE’97. IEEE Computer Society, pp. 232-243

  4. Abiteboul, S., Hull, R., Vianu, V. (1995) Foundations of Databases. Addison-Wesley

  5. Afrati, F.N., Li, C., Mitra, P. (2002) Answering queries using views with arithmetic comparisons. In: Proc. PODS 2002, pp. 209-220

  6. Benedikt, M., Dong, G., Libkin, L., Wong, L. (1996) Relational expressive power of constraint query languages. In: Proc. PODS’96. Journal of the ACM (to appear)

  7. Barbará, D., Imielinski, T. (1995) Sleepers and workaholics: Caching strategies in mobile environments. VLDB Journal 4(4): 567-602

    Google Scholar 

  8. Benedikt, M., Libkin, L. (1996) On the structure of queries in constraint query languages. In: Proc. LICS’96. IEEE Computer Society Press, pp. 25-34

  9. Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y. (2000) Query processing using views for regular path queries with inverse. In: Proc. PODS 2000, pp. 58-66

  10. Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y. (2002) Lossless regular views. In: Proc. PODS 2002, pp. 247-258

  11. Chaudhuri, S., Krishnamurthy, R., Potamianos, Shim, S.K. (1995) Optimizing queries with materialized views. In: Proc. ICDE’95. IEEE Computer Society, pp. 190-200

  12. Chandra, A.K., Merlin, P.M. (1977) Optimal implementation of conjunctive queries in relational data bases. In: Proc. ACM SIGACT Symp. on the Theory of Computing, pp. 77-90

  13. Cohen, S., Nutt, W., Serebrenik, A., (1999) Rewriting aggregate queries using views. In: Proc. PODS’99. ACM Press, pp. 155-166

  14. Cohen, S., Nutt, W., Serebrenik, A., (2000) Algorithms for rewriting aggregate queries using views. In: Proc. ADBIS-DASFAA 2000. Springer, Berlin Heidelberg New York, pp. 65-78

  15. Cohen, S., Nutt, W., Sagiv, Y. (2001) Equivalences among aggregate queries with negation. In: Proc. PODS 2001, ACM

  16. Cabibbo, L., Torlone, R. (1999) A framework for the investigation of aggregate functions in database queries. In: Proc. ICDT’99. Springer, Berlin Heidelberg New York, pp. 383-397

  17. Chaudhuri, S., Vardi, M. (1993) Optimization of real conjunctive queries. In: Proc. 12th ACM PODS. Washington, pp. 59-70

  18. Duschka, O.M. Genesereth, M.R., Levy, A.Y. (2000) Recursive query plans for data integration. Journal of Logic Programming 43(1): 49-73

    Article  MathSciNet  MATH  Google Scholar 

  19. Van den Dries, L., Macintyre, A., Marker, D. (1994) The elementary theory of restricted analytic fields with exponentiation. Annals of Mathematics 85

  20. Gray, J., Bosworth, A., Layman, A., Pirahesh, H. (1996) Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proc. ICDE’96, New Orleans, Louisiana USA, pp. 152-159

  21. Ghosh, S.P. (1986) Statistical relational tables for statistical database management. IEEE Transactions on Software Engineering SE-12(12): 1106-1116

    Google Scholar 

  22. Gupta, A., Harinarayan, V., Quass, D. (1995) Aggregate-query processing in data warehousing environments. In: Proc. VLDB’95. Morgan Kaufmann, pp. 358-369

  23. Gyssens, M., Lakshmanan, L.V.S. (1997) A foundation for multi-dimensional databases. In: Proc. VLDB’97. Morgan Kaufmann, pp. 106-115

  24. Goldstein, J., Larson, P. (2001) Optimizing queries using materialized views: A practical, scalable solution. In: Proc. SIGMOD 2001. ACM, pp. 331-342

  25. Grumbach, S., Libkin, L., Milo, T., Wong, L. (1996) Query languages for bags: Expressive power and complexity. Sigact News 27(2): 30-37

    Google Scholar 

  26. Gupta, H., Mumick, I.S. (1999) Selection of views to materialize under a maintenance cost constraint. In: Proc. ICDT’99. Springer, Berlin Heidelberg New York, pp. 453-470

  27. Grumbach, S., Rafanelli, M., Tininini, L. (1999) Querying aggregate data. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31-June 2, 1999. Philadelphia, Pennsylvania, ACM Press, pp. 174-184

  28. Grumbach, S., Tininini, L. (2000) Automatic aggregation using explicit metadata. In: Proc. SSDBM 2000. IEEE, pp. 85-94

  29. Grumbach, S., Tininini, L. (2000) On the content of materialized aggregate views. In: Proc. ACM-PODS 2000. ACM, pp. 47-57

  30. Gusfield, D. (1988) A graph theoretic approach to statistical data security. SIAM Journal on Computing 17(3): 552-571

    MathSciNet  MATH  Google Scholar 

  31. Halevy, A.Y. (2001) Answering queries using views: a survey. VLDB Journal 10(4): 270-294

    Article  MATH  Google Scholar 

  32. Hella, L., Libkin, L., Nurmonen, J., Wong, L. (1999) Logics with aggregate operators. In: Proc. LICS’99. IEEE Computer Society, pp. 35-44

  33. Harinarayan, V., Rajaraman, A., Ullman, J.D. (1996) Implementing data cube efficiently. In: Proc. SIGMOD’96. Montreal, Canada, pp. 205-216

  34. Hogg, R.V., Tanis, E.A. (1977) Probability and Statistical Inference. MacMillan

  35. Ibarra O., Su, J. (1997) On the containment and equivalence of database queries with linear constraints. In: Proc. PODS’97, pp. 32-43

  36. Kanellakis, P.C., Kuper, G.M., Revesz, P.Z. (1995) Constraint query languages. Journal of Computer and System Sciences 51: 26-52

    Article  MathSciNet  Google Scholar 

  37. Kuper, G.M., Libkin, L., Paredaens, J. (2000) Constraint Databases. LNCS, Springer, Berlin Heidelberg New York

  38. Klug, A. (1988) On conjunctive queries containing inequalities. Journal of the ACM 35(1): 146-160

    Article  MATH  Google Scholar 

  39. Kotidis, Y., Roussopoulos, N. (1999) Dynamat: A dynamic view management system for data warehouses. In: Proc. SIGMOD’99. ACM Press, pp. 371-382

  40. Kozen, D., Yap, C. (1985) Algebraic cell decomposition in nc. In: Proc IEEE Foundations of Computer Science, pp. 515-521

  41. Li, C., Bawa, M., Ullman, J.D. (2001) Minimizing view sets without losing query-answering power. In: Proc. ICDT 2001. Springer, Berlin Heidelberg New York, pp. 99-113

  42. Levy, A.Y., Mumick, I.S. (1996) Reasoning with aggregation constraints. In: Proc EDBT’96, pp. 514-534

  43. Levy, A.Y. Mendelzon, A.O. Sagiv, Y., Srivastava, D. (1995) Answering queries using views. In: Proc. PODS’95, pp. 95-104

  44. Lenz, H.-J., Shoshani, A. (1997) Summarizability in olap and statistical data bases. In: Proc. SSDBM’97. Olympia, Washington, USA, pp. 132-143

  45. Levy, A.Y., Srivastava, D., Kirk, T. (1995) Data model and query evaluation in global information systems. Journal of Intelligent Information Systems 5(2): 121-143

    Google Scholar 

  46. Libkin, L., Wong, L. (1997) On the power of aggregation in relational query languages. In: Proc. DBPL’97. Springer, Berlin Heidelberg New York, pp. 260-280

  47. Malvestuto, F.M., Moscarini, M. (1998) Computational issues connected with the protection of sensitive statistics by auditing sum queries. In: Proc. SSDBM’98. IEEE Computer Society, pp. 134-144

  48. Malvestuto, F.M., Moscarini, M., Rafanelli, M. (1991) Suppressing marginal cells to protect sensitive information in a two-dimensional statistical table. In: Proc. PODS’91. ACM Press, pp. 252-258

  49. Nutt, W., Sagiv, Y., Shurin, S. (1998) Deciding equivalence among aggregate queries. In: Proc. PODS’98, pp. 214-223

  50. Ozsoyoglu, G., Ozsoyoglu, Z.M. Matos, V. (1987) Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. ACM Transactions on Database Systems 12(4): 566-592

    Article  Google Scholar 

  51. Pottinger, R., Levy, A.Y. (2000) A scalable algorithm for answering queries using views. In: Proc. VLDB 2000. Morgan Kaufmann, pp. 484-495

  52. Qian, X. (1996) Query folding. In: Proc. ICDE’96. IEEE Computer Society, pp. 48-55

  53. Rafanelli, M., Bezenchek, A., Tininini, L. (1996) The aggregate data problem: a system for their definition and management. ACM Sigmod Record 25(4): 8-13

    Google Scholar 

  54. Renegar, J. (1992) On the computational complexity and geometry of the first-order theory of the reals. Journal of Symbolic Computation 13: 255-352

    MathSciNet  MATH  Google Scholar 

  55. Rafanelli, M., Ricci, F.L. (1993) Mefisto: a functional model for statistical entities. IEEE Transactions on Knowledge and Data Engineering 5(4): 670-681

    Article  Google Scholar 

  56. Ross, K.A., Srivastava, D., Stuckey, P.J., Sudarshan, S. (1998) Foundations of aggregation constraints. Theoretical Computer Science B 193(1-2): 149-179

    Google Scholar 

  57. Rajaraman, A., Sagiv, Y., Ullman, J.D. (1995) Answering queries using templates with binding patterns. In: Proc. PODS’95. ACM Press, pp. 105-112

  58. Sristava, D., Dar, S., Jagadish, H.V. Levy, A.Y. (1996) Answering queries with aggregation using views. In: Proc. VLDB’96, pp. 318-329

  59. Shoshani, A. (1997) Olap and statistical databases: Similarities and differences. In: Proc. PODS’97, pp. 183-196

  60. Shoshani, A., Wong, H.K.T. (1985) Statistical and scientific database issues. IEEE Transactions on Software Engineering SD-11(10): 1040-1047

    Google Scholar 

  61. van der Meyden, R. (1992) The complexity of querying indefinite data about linearly ordered domains. In: Proc. PODS’92. ACM Press, pp. 331-345

  62. Wolfson, O., Sistla, A.P., Dao, S., Narayanan, K., Raj, R. (1995) View maintenance in mobile computing. ACM Sigmod Record 24(4): 22-27

    Google Scholar 

  63. Yang, H.Z., Larson, P. (1987) Query transformation for psj-queries. In: Proc. VLDB’87. Morgan Kaufmann, pp. 245-254

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéphane Grumbach.

Additional information

Received: 10 July 2003, Revised: 25 April 2004, Published online: 8 June 2004

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grumbach, S., Rafanelli, M. & Tininini, L. On the equivalence and rewriting of aggregate queries. Acta Informatica 40, 529–584 (2004). https://doi.org/10.1007/s00236-004-0101-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00236-004-0101-y

Keywords

Navigation