Abstract
Modern data analysis has given birth to numerous grouping constructs and programming paradigms, way beyond the traditional group by. Applications such as data warehousing, web log analysis, streams monitoring and social networks understanding necessitated the use of data cubes, grouping variables, windows and MapReduce. In this paper we review the associated set (ASSET) concept and discuss its applicability in both continuous and traditional data settings. Given a set of values B, an associated set over B is just a collection of annotated data multisets, one for each b(B. The goal is to efficiently compute aggregates over these data sets. An ASSET query consists of repeated definitions of associated sets and aggregates of these, possibly correlated, resembling a spreadsheet document. We review systems implementing ASSET queries both in continuous and persistent contexts and argue for associated sets’ analytical abilities and optimization opportunities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems. Addison-Wesley, Reading (1994)
Graefe, G.: Query Evaluation Techniques for Large Databases. ACM Computing Surveys 25, 73–170 (1993)
Chaudhuri, S., Shim, K.: Including Group-By in Query Optimization. In: 20th International Conference on Very Large Data Bases, pp. 354–366. Morgan Kaufmann, San Francisco (1994)
Yan, W.P., Larson, P.: Eager Aggregation and Lazy Aggregation. In: 21st International Conference on Very Large Data Bases, pp. 345–357. Morgan Kaufmann, San Francisco (1995)
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26, 65–74 (1997)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In: 12th International Conference on Data Engineering, pp. 152–159. IEEE Computer Society, Los Alamitos (1996)
Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimensional Aggregates. In: 22nd International Conference on Very Large Data Bases, pp. 506–521. Morgan Kaufmann, San Francisco (1996)
Ross, K.A., Srivastava, D.: Fast Computation of Sparse Datacubes. In: International Conference on Very Large Data Bases (VLDB), pp. 116–125. Morgan Kaufmann, San Francisco (1997)
Ross, K.A., Srivastava, D., Chatziantoniou, D.: Complex Aggregation at Multiple Granularities. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 263–277. Springer, Heidelberg (1998)
Chatziantoniou, D., Ross, K.A.: Querying Multiple Features of Groups in Relational Databases. In: 22nd International Conference on Very Large Data Bases, pp. 295–306. Morgan Kaufmann, San Francisco (1996)
Chatziantoniou, D.: Using grouping variables to express complex decision support queries. Data & Knowledge Engineering 61, 114–136 (2007)
Chatziantoniou, D.: Evaluation of Ad Hoc OLAP: In-Place Computation. In: 11th International Conference on Scientific and Statistical Database Management, pp. 34–43. IEEE Computer Society, Los Alamitos (1999)
Chatziantoniou, D.: The PanQ Tool and EMF SQL for Complex Data Management. In: 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–424. ACM, New York (1999)
Chatziantoniou, D., Akinde, M.O., Johnson, T., Kim, S.: The MD-join: An Operator for Complex OLAP. In: 17th International Conference on Data Engineering, pp. 524–533. IEEE Computer Society, Los Alamitos (2001)
Akinde, M.O., Böhlen, M.H., Johnson, T., Lakshmanan, L.V.S., Srivastava, D.: Efficient OLAP Query Processing in Distributed Data Warehouses. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 336–353. Springer, Heidelberg (2002)
Steenhagen, H.J., Apers, P.M.G., Blanken, H.M.: Optimization of Nested Queries in a Complex Object Model. In: Jarke, M., Bubenko, J.A., Jeffery, K.G. (eds.) EDBT 1994. LNCS, vol. 779, pp. 337–350. Springer, Heidelberg (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 1–16. ACM, New York (2002)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: 6th Symposium on Operating System Design and Implementation, pp. 137–150. USENIX Association (2004)
DeWitt, D.J., Stonebraker, M.: MapReduce: A major step backwards. The Database Column, http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
Pavlo, A., et al.: A Comparison of Approaches to Large-Scale Data Analysis. In: SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign Language for Data Processing. In: SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, R., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: Proceedings of the VLDB Conference, vol. 2(1), pp. 922–933. VLDB Endowment (2009)
Roth, M.A., Korth, H.F., Silberschatz, A.: Extended Algebra and Calculus for Nested Relational Databases. Transactions on Database Systems 13, 389–417 (1988)
Mamoulis, N.: Efficient Processing of Joins on Set-valued Attributes. In: SIGMOD International Conference on Management of Data, pp. 157–168. ACM, New York (2003)
Winslett, M.: Interview with Jim Gray. SIGMOD Record 32, 53–61 (2003)
Witkowski, A., Bellamkonda, S., Bozkaya, T., Dorman, G., Folkert, N., Gupta, A., Sheng, L., Subramanian, S.: Spreadsheets in RDBMS for OLAP. In: SIGMOD International Conference on Management of Data, pp. 52–63. ACM, New York (2003)
Liu, B., Jagadish, H.V.: A Spreadsheet Algebra for a Direct Data Manipulation Query Interface. In: 25th International Conference on Data Engineering (ICDE), pp. 417–428. IEEE, Los Alamitos (2009)
Chatziantoniou, D., Sotiropoulos, Y.: Stream Variables: A Quick but not Dirty SQL Extension for Continuous Queries. In: 23rd International Conference on Data Engineering Workshops, pp. 19–28. IEEE Computer Society, Los Alamitos (2007)
Chatziantoniou, D., Sotiropoulos, Y.: COSTES: Continuous spreadsheet-like computations. In: 24th International Conference on Data Engineering Workshops, pp. 82–87. IEEE Computer Society, Los Alamitos (2008)
Gehrke, J., Korn, F., Srivastava, D.: On Computing Correlated Aggregates Over Continual Data Streams. In: SIGMOD International Conference on Management of Data, pp. 13–24. ACM, New York (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chatziantoniou, D., Sotiropoulos, Y. (2010). ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP. In: Castellanos, M., Dayal, U., Miller, R.J. (eds) Enabling Real-Time Business Intelligence. BIRTE 2009. Lecture Notes in Business Information Processing, vol 41. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14559-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-14559-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14558-2
Online ISBN: 978-3-642-14559-9
eBook Packages: Computer ScienceComputer Science (R0)