research-article

ARCube: supporting ranking aggregate queries in partially materialized data cubes

Authors:

Jiawei HanAuthors Info & Claims

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 79 - 92

https://doi.org/10.1145/1376616.1376627

Published: 09 June 2008 Publication History

Abstract

Supporting ranking queries in database systems has been a popular research topic recently. However, there is a lack of study on supporting ranking queries in data warehouses where ranking is on multidimensional aggregates instead of on measures of base facts. To address this problem, we propose a query execution model to answer different types of ranking aggregate queries based on a unified, partial cube structure, ARCube. The query execution model follows a candidate generation and verification framework, where the most promising candidate cells are generated using a set of high-level guiding cells. We also identify a bounding principle for effective pruning: once a guiding cell is pruned, all of its children candidate cells can be pruned. We further address the problem of efficient online candidate aggregation and verification by developing a chunk-based execution model to verify a bulk of candidates within a bounded memory buffer. Our extensive performance study shows that the new framework not only leads to an order of magnitude performance improvements over the state-of-the-art method, but also is much more flexible in terms of the types of ranking aggregate queries supported.

References

[1]

DBLP. http://www.informatik.uni-trier.de/~ley/db/.

[2]

TPC-H. http://www.tpc.org/tpch/.

[3]

R. Agrawal, R. Rantzau, and E. Terzi. Context-sensitive ranking. In SIGMOD Conference, pages 383--394, 2006.

Digital Library

[4]

H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475-486, 2006.

Digital Library

[5]

K. S. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD Conference, pages 359--370, 1999.

Digital Library

[6]

N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, pages 369--380, 2002.

Digital Library

[7]

M. J. Carey and D. Kossmann. On saying "enough already!" in sql. In SIGMOD Conference, pages 219--230, 1997.

Digital Library

[8]

K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects based on relationships. In SIGMOD Conference, pages 371--382, 2006.

Digital Library

[9]

K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD Conference, pages 346--357, 2002.

Digital Library

[10]

S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. SIGMOD Record, 26(1):65--74, 1997.

Digital Library

[11]

J. Clauben, A. Kemper, D. Kossmann, and C. Wiesner. Exploiting early sorting and early partitioning for decision support query processing. VLDB J., 9(3):190--213, 2000.

Digital Library

[12]

G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, pages 451--462, 2006.

Digital Library

[13]

P. Deshpande, K. Ramasamy, A. Shukla, and J. F. Naughton. Caching multidimensional queries using chunks. In SIGMOD Conference, pages 259--270, 1998.

Digital Library

[14]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.

Digital Library

[15]

M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB, pages 299--310, 1998.

Digital Library

[16]

V. Gaede and O. GAunther. Multidimensional access methods. ACM Comput. Surv., 30(2):170--231, 1998.

Digital Library

[17]

J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov., 1(1):29--53, 1997.

Digital Library

[18]

J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. In SIGMOD Conference, pages 1--12, 2001.

Digital Library

[19]

V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD Conference, pages 205--216, 1996.

Digital Library

[20]

M. Hua, J. Pei, A. W.-C. Fu, X. Lin, and H. fung Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.

Digital Library

[21]

I. F. Ilyas, R. Shah, W. G. Aref, J. S. Vitter, and A. K. Elmagarmid. Rank-aware query optimization. In SIGMOD Conference, pages 203--214, 2004.

Digital Library

[22]

L. V. S. Lakshmanan, J. Pei, and J. Han. Quotient cube: How to summarize the semantics of a data cube. In VLDB, pages 778--789, 2002.

Digital Library

[23]

C. Li, K. C.-C. Chang, and I. F. Ilyas. Supporting ad-hoc ranking aggregates. In SIGMOD Conference, pages 61--72, 2006.

Digital Library

[24]

C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song. Ranksql: Query algebra and optimization for relational top-k queries. In SIGMOD Conference, pages 131--142, 2005.

Digital Library

[25]

H.-G. Li, H. Yu, D. Agrawal, and A. E. Abbadi. Progressive ranking of range aggregates. In DaWaK, pages 179--189, 2005.

Digital Library

[26]

X. Li, J. Han, and H. Gonzalez. High-dimensional olap: A minimal cubing approach. In VLDB, pages 528--539, 2004.

Digital Library

[27]

Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: top-k keyword query in relational databases. In SIGMOD Conference, pages 115--126, 2007.

Digital Library

[28]

S. Sarawagi and M. Stonebraker. Efficient organization of large multidimensional arrays. In ICDE, pages 328--336, 1994.

Digital Library

[29]

A. Shukla, P. Deshpande, and J. F. Naughton. Materialized view selection for multidimensional datasets. In VLDB, pages 488--499, 1998.

Digital Library

[30]

Y. Sismanis, A. Deligiannakis, N. Roussopoulos, and Y. Kotidis. Dwarf: shrinking the petacube. In SIGMOD Conference, pages 464--475, 2002.

Digital Library

[31]

I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, May 1999.

Digital Library

[32]

D. Xin, J. Han, H. Cheng, and X. Li. Answering top-k queries with multi-dimensional selections: The ranking cube approach. In VLDB, pages 463--475, 2006.

Digital Library

[33]

D. Xin, J. Han, X. Li, and B. W. Wah. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In VLDB, pages 476--487, 2003.

Digital Library

[34]

Y. Zhao, P. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD Conference, pages 159--170, 1997.

Digital Library

Cited By

Hua YLiu XHua YLiu X(2019)Semantic-Aware Data Cube for Cloud NetworksSearchable Storage in Cloud Computing10.1007/978-981-13-2721-6_8(179-204)Online publication date: 9-Feb-2019
https://doi.org/10.1007/978-981-13-2721-6_8
Miranda FLins LKlosowski JSilva C(2018)TopKubeIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2017.267134124:3(1394-1407)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1109/TVCG.2017.2671341
Wang WTang BZhu M(2018)Efficient Longest Streak Discovery in Multidimensional Sequence DataWeb and Big Data10.1007/978-3-319-96893-3_13(166-181)Online publication date: 19-Jul-2018
https://doi.org/10.1007/978-3-319-96893-3_13
Show More Cited By

Index Terms

ARCube: supporting ranking aggregate queries in partially materialized data cubes
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Functional dependencies are helpful for partial materialization of data cubes

Functional dependencies (FD's) are a powerful concept in data organization. They have been proven very useful in e.g., relational databases for reducing data redundancy. Little work however has been done so far for using them in the context of data ...
AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Interactive analytics requires database systems to be able to answer aggregation queries within interactive response times. As the amount of data is continuously growing at an unprecedented rate, this is becoming increasingly challenging. In the past, ...
Dynamic Materialization for Building Personalized Smart Cubes
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVI - Volume 9670

Selecting the optimal subset of views for materialization provides an effective way to reduce the query evaluation time for real-time Online Analytic Processing OLAP queries posed against a data warehouse. However, materializing a large number of views ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

June 2008

1396 pages

ISBN:9781605581026

DOI:10.1145/1376616

General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '08

Sponsor:

SIGMOD/PODS '08: SIGMOD/PODS '08 - International Conference on Management of Data

June 9 - 12, 2008

Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
669
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hua YLiu XHua YLiu X(2019)Semantic-Aware Data Cube for Cloud NetworksSearchable Storage in Cloud Computing10.1007/978-981-13-2721-6_8(179-204)Online publication date: 9-Feb-2019
https://doi.org/10.1007/978-981-13-2721-6_8
Miranda FLins LKlosowski JSilva C(2018)TopKubeIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2017.267134124:3(1394-1407)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1109/TVCG.2017.2671341
Wang WTang BZhu M(2018)Efficient Longest Streak Discovery in Multidimensional Sequence DataWeb and Big Data10.1007/978-3-319-96893-3_13(166-181)Online publication date: 19-Jul-2018
https://doi.org/10.1007/978-3-319-96893-3_13
Haider MKumar T(2017)Query Frequency based View SelectionInternational Journal of Business Analytics10.4018/IJBAN.20170101034:1(36-55)Online publication date: Jan-2017
https://doi.org/10.4018/IJBAN.2017010103
Wasay AWei XDayan NIdreos SChirkova RYang JSuciu D(2017)Data CanopyProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064051(557-572)Online publication date: 9-May-2017
https://dl.acm.org/doi/10.1145/3035918.3064051
Tang BHan SYiu MDing RZhang DChirkova RYang JSuciu D(2017)Extracting Top-K Insights from Multi-dimensional DataProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035922(1509-1524)Online publication date: 9-May-2017
https://dl.acm.org/doi/10.1145/3035918.3035922
Hua YLiu XJiang H(2014)ANTELOPEIEEE Transactions on Computers10.1109/TC.2013.11063:9(2146-2159)Online publication date: 1-Sep-2014
https://dl.acm.org/doi/10.1109/TC.2013.110
Hua YFeng D(2014)A correlation-aware partial materialization scheme for near real-time automotive queries2014 International Conference on Smart Computing10.1109/SMARTCOMP.2014.7043864(237-244)Online publication date: Nov-2014
https://doi.org/10.1109/SMARTCOMP.2014.7043864
Cheng DHao R(2014)Fault-tolerant cycles embedding in hypercubes with faulty edgesInformation Sciences: an International Journal10.1016/j.ins.2014.05.052282(57-69)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.ins.2014.05.052
Khabbaz MLakshmanan L(2011)TopRecsProceedings of the 14th International Conference on Extending Database Technology10.1145/1951365.1951392(213-224)Online publication date: 21-Mar-2011
https://dl.acm.org/doi/10.1145/1951365.1951392
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten