Abstract.
We introduce a first-order language with real polynomial arithmetic and aggregation operators (count, iterated sum and multiply), which is well suited for the definition of aggregate queries involving complex statistical functions. It offers a good trade-off between expressive power and complexity, with a tractable data complexity. Interestingly, some fundamental properties of first-order with real arithmetic are preserved in the presence of aggregates. In particular, there is an effective quantifier elimination for formulae with aggregation. We then consider the problem of querying data that has already been aggregated in aggregate views, and focus on queries with an aggregation over a conjunctive query (namely single-block aggregate group-by queries without having clause). Our main conceptual contribution is the introduction of a new equivalence relation among conjunctive queries, the isomorphism modulo a product. We prove that the equivalence of aggregate queries such as for instance averages reduces to it. Deciding if two queries are isomorphic modulo a product is shown to be NP-complete. We then analyze the equivalence problem in the case of aggregate conjunctive queries with comparisons. We introduce the concept of cross isomorphic linear expansions, which generalizes isomorphim modulo a product, and we show that equivalence reduces to it and that it can be decided in PSPACE. Finally, we show that the problem of complete rewriting of count queries using count views is NP-complete, and we introduce new rewriting techniques based on the isomorphism modulo a product. to recover the values of counts by complex arithmetical computation from the views.
Similar content being viewed by others
References
Arnon, D., Collins, G., McCallum, S. (1988) Cylindrical algebraic decomposition. SIAM J. computing 13(4): 865-889
Abiteboul, S., Duschka, O.M. (1998) Complexity of answering queries using materialized views. In: Proc. ACM PODS’98, June 1-3, 1998. ACM Press, Seattle, Washington, pp. 254-263
Agrawal, R., Gupta, A., Sarawagi, S. (1997) Modeling multidimensional databases. In: Proceedings of ICDE’97. IEEE Computer Society, pp. 232-243
Abiteboul, S., Hull, R., Vianu, V. (1995) Foundations of Databases. Addison-Wesley
Afrati, F.N., Li, C., Mitra, P. (2002) Answering queries using views with arithmetic comparisons. In: Proc. PODS 2002, pp. 209-220
Benedikt, M., Dong, G., Libkin, L., Wong, L. (1996) Relational expressive power of constraint query languages. In: Proc. PODS’96. Journal of the ACM (to appear)
Barbará, D., Imielinski, T. (1995) Sleepers and workaholics: Caching strategies in mobile environments. VLDB Journal 4(4): 567-602
Benedikt, M., Libkin, L. (1996) On the structure of queries in constraint query languages. In: Proc. LICS’96. IEEE Computer Society Press, pp. 25-34
Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y. (2000) Query processing using views for regular path queries with inverse. In: Proc. PODS 2000, pp. 58-66
Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y. (2002) Lossless regular views. In: Proc. PODS 2002, pp. 247-258
Chaudhuri, S., Krishnamurthy, R., Potamianos, Shim, S.K. (1995) Optimizing queries with materialized views. In: Proc. ICDE’95. IEEE Computer Society, pp. 190-200
Chandra, A.K., Merlin, P.M. (1977) Optimal implementation of conjunctive queries in relational data bases. In: Proc. ACM SIGACT Symp. on the Theory of Computing, pp. 77-90
Cohen, S., Nutt, W., Serebrenik, A., (1999) Rewriting aggregate queries using views. In: Proc. PODS’99. ACM Press, pp. 155-166
Cohen, S., Nutt, W., Serebrenik, A., (2000) Algorithms for rewriting aggregate queries using views. In: Proc. ADBIS-DASFAA 2000. Springer, Berlin Heidelberg New York, pp. 65-78
Cohen, S., Nutt, W., Sagiv, Y. (2001) Equivalences among aggregate queries with negation. In: Proc. PODS 2001, ACM
Cabibbo, L., Torlone, R. (1999) A framework for the investigation of aggregate functions in database queries. In: Proc. ICDT’99. Springer, Berlin Heidelberg New York, pp. 383-397
Chaudhuri, S., Vardi, M. (1993) Optimization of real conjunctive queries. In: Proc. 12th ACM PODS. Washington, pp. 59-70
Duschka, O.M. Genesereth, M.R., Levy, A.Y. (2000) Recursive query plans for data integration. Journal of Logic Programming 43(1): 49-73
Van den Dries, L., Macintyre, A., Marker, D. (1994) The elementary theory of restricted analytic fields with exponentiation. Annals of Mathematics 85
Gray, J., Bosworth, A., Layman, A., Pirahesh, H. (1996) Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proc. ICDE’96, New Orleans, Louisiana USA, pp. 152-159
Ghosh, S.P. (1986) Statistical relational tables for statistical database management. IEEE Transactions on Software Engineering SE-12(12): 1106-1116
Gupta, A., Harinarayan, V., Quass, D. (1995) Aggregate-query processing in data warehousing environments. In: Proc. VLDB’95. Morgan Kaufmann, pp. 358-369
Gyssens, M., Lakshmanan, L.V.S. (1997) A foundation for multi-dimensional databases. In: Proc. VLDB’97. Morgan Kaufmann, pp. 106-115
Goldstein, J., Larson, P. (2001) Optimizing queries using materialized views: A practical, scalable solution. In: Proc. SIGMOD 2001. ACM, pp. 331-342
Grumbach, S., Libkin, L., Milo, T., Wong, L. (1996) Query languages for bags: Expressive power and complexity. Sigact News 27(2): 30-37
Gupta, H., Mumick, I.S. (1999) Selection of views to materialize under a maintenance cost constraint. In: Proc. ICDT’99. Springer, Berlin Heidelberg New York, pp. 453-470
Grumbach, S., Rafanelli, M., Tininini, L. (1999) Querying aggregate data. In: Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31-June 2, 1999. Philadelphia, Pennsylvania, ACM Press, pp. 174-184
Grumbach, S., Tininini, L. (2000) Automatic aggregation using explicit metadata. In: Proc. SSDBM 2000. IEEE, pp. 85-94
Grumbach, S., Tininini, L. (2000) On the content of materialized aggregate views. In: Proc. ACM-PODS 2000. ACM, pp. 47-57
Gusfield, D. (1988) A graph theoretic approach to statistical data security. SIAM Journal on Computing 17(3): 552-571
Halevy, A.Y. (2001) Answering queries using views: a survey. VLDB Journal 10(4): 270-294
Hella, L., Libkin, L., Nurmonen, J., Wong, L. (1999) Logics with aggregate operators. In: Proc. LICS’99. IEEE Computer Society, pp. 35-44
Harinarayan, V., Rajaraman, A., Ullman, J.D. (1996) Implementing data cube efficiently. In: Proc. SIGMOD’96. Montreal, Canada, pp. 205-216
Hogg, R.V., Tanis, E.A. (1977) Probability and Statistical Inference. MacMillan
Ibarra O., Su, J. (1997) On the containment and equivalence of database queries with linear constraints. In: Proc. PODS’97, pp. 32-43
Kanellakis, P.C., Kuper, G.M., Revesz, P.Z. (1995) Constraint query languages. Journal of Computer and System Sciences 51: 26-52
Kuper, G.M., Libkin, L., Paredaens, J. (2000) Constraint Databases. LNCS, Springer, Berlin Heidelberg New York
Klug, A. (1988) On conjunctive queries containing inequalities. Journal of the ACM 35(1): 146-160
Kotidis, Y., Roussopoulos, N. (1999) Dynamat: A dynamic view management system for data warehouses. In: Proc. SIGMOD’99. ACM Press, pp. 371-382
Kozen, D., Yap, C. (1985) Algebraic cell decomposition in nc. In: Proc IEEE Foundations of Computer Science, pp. 515-521
Li, C., Bawa, M., Ullman, J.D. (2001) Minimizing view sets without losing query-answering power. In: Proc. ICDT 2001. Springer, Berlin Heidelberg New York, pp. 99-113
Levy, A.Y., Mumick, I.S. (1996) Reasoning with aggregation constraints. In: Proc EDBT’96, pp. 514-534
Levy, A.Y. Mendelzon, A.O. Sagiv, Y., Srivastava, D. (1995) Answering queries using views. In: Proc. PODS’95, pp. 95-104
Lenz, H.-J., Shoshani, A. (1997) Summarizability in olap and statistical data bases. In: Proc. SSDBM’97. Olympia, Washington, USA, pp. 132-143
Levy, A.Y., Srivastava, D., Kirk, T. (1995) Data model and query evaluation in global information systems. Journal of Intelligent Information Systems 5(2): 121-143
Libkin, L., Wong, L. (1997) On the power of aggregation in relational query languages. In: Proc. DBPL’97. Springer, Berlin Heidelberg New York, pp. 260-280
Malvestuto, F.M., Moscarini, M. (1998) Computational issues connected with the protection of sensitive statistics by auditing sum queries. In: Proc. SSDBM’98. IEEE Computer Society, pp. 134-144
Malvestuto, F.M., Moscarini, M., Rafanelli, M. (1991) Suppressing marginal cells to protect sensitive information in a two-dimensional statistical table. In: Proc. PODS’91. ACM Press, pp. 252-258
Nutt, W., Sagiv, Y., Shurin, S. (1998) Deciding equivalence among aggregate queries. In: Proc. PODS’98, pp. 214-223
Ozsoyoglu, G., Ozsoyoglu, Z.M. Matos, V. (1987) Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. ACM Transactions on Database Systems 12(4): 566-592
Pottinger, R., Levy, A.Y. (2000) A scalable algorithm for answering queries using views. In: Proc. VLDB 2000. Morgan Kaufmann, pp. 484-495
Qian, X. (1996) Query folding. In: Proc. ICDE’96. IEEE Computer Society, pp. 48-55
Rafanelli, M., Bezenchek, A., Tininini, L. (1996) The aggregate data problem: a system for their definition and management. ACM Sigmod Record 25(4): 8-13
Renegar, J. (1992) On the computational complexity and geometry of the first-order theory of the reals. Journal of Symbolic Computation 13: 255-352
Rafanelli, M., Ricci, F.L. (1993) Mefisto: a functional model for statistical entities. IEEE Transactions on Knowledge and Data Engineering 5(4): 670-681
Ross, K.A., Srivastava, D., Stuckey, P.J., Sudarshan, S. (1998) Foundations of aggregation constraints. Theoretical Computer Science B 193(1-2): 149-179
Rajaraman, A., Sagiv, Y., Ullman, J.D. (1995) Answering queries using templates with binding patterns. In: Proc. PODS’95. ACM Press, pp. 105-112
Sristava, D., Dar, S., Jagadish, H.V. Levy, A.Y. (1996) Answering queries with aggregation using views. In: Proc. VLDB’96, pp. 318-329
Shoshani, A. (1997) Olap and statistical databases: Similarities and differences. In: Proc. PODS’97, pp. 183-196
Shoshani, A., Wong, H.K.T. (1985) Statistical and scientific database issues. IEEE Transactions on Software Engineering SD-11(10): 1040-1047
van der Meyden, R. (1992) The complexity of querying indefinite data about linearly ordered domains. In: Proc. PODS’92. ACM Press, pp. 331-345
Wolfson, O., Sistla, A.P., Dao, S., Narayanan, K., Raj, R. (1995) View maintenance in mobile computing. ACM Sigmod Record 24(4): 22-27
Yang, H.Z., Larson, P. (1987) Query transformation for psj-queries. In: Proc. VLDB’87. Morgan Kaufmann, pp. 245-254
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 10 July 2003, Revised: 25 April 2004, Published online: 8 June 2004
Rights and permissions
About this article
Cite this article
Grumbach, S., Rafanelli, M. & Tininini, L. On the equivalence and rewriting of aggregate queries. Acta Informatica 40, 529–584 (2004). https://doi.org/10.1007/s00236-004-0101-y
Issue Date:
DOI: https://doi.org/10.1007/s00236-004-0101-y