skip to main content
10.1145/1989323.1989415acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Latent OLAP: data cubes over latent variables

Published: 12 June 2011 Publication History

Abstract

We introduce a novel class of data cube, called latent-variable cube. For many data analysis tasks, data in a database can be represented as points in a multi-dimensional space. Ordinary data cubes compute aggregate functions over these "observed" data points for each cell (i.e., region) in the space, where the cells have different granularities defined by hierarchies. While useful, data cubes do not provide sufficient capability for analyzing "latent variables" that are often of interest but not directly observed in data. For example, when analyzing users' interaction with online advertisements, observed data informs whether a user clicked an ad or not. However, the real interest is often in knowing the click probabilities of ads for different user populations. In this example, click probabilities are latent variables that are not observed but have to be estimated from data. We argue that latent variables are a useful construct for a number of OLAP application scenarios. To facilitate such analyses, we propose cubes that compute aggregate functions over latent variables. Specifically, we discuss the pitfalls of common practice in scenarios where latent variables should, but are not considered; we rigorously define latent-variable cube based on Bayesian hierarchical models and provide efficient algorithms. Through extensive experiments on both simulated and real data, we show that our method is accurate and runs orders of magnitude faster than the baseline.

References

[1]
D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In KDD, 2010.
[2]
D. Agarwal, A. Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, and M. Sayyadian. Estimating rates of rare events at multiple resolutions. In KDD, 2007.
[3]
D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In ICDM, 2009.
[4]
D. Agarwal, B.-C. Chen, P. Elango, et al. Online models for content optimization. In NIPS, 2008.
[5]
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. on Knowledge and Data Engineering, 2009.
[6]
D. Barbará and X. Wu. Loglinear-based quasi cubes. J. Intell. Inf. Syst., 2001.
[7]
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cube. In SIGMOD, 1999.
[8]
H. C. Bravo and R. Ramakrishnan. Optimizing mpf queries: decision support and probabilistic inference. In SIGMOD, 2007.
[9]
D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. Olap over uncertain and imprecise data. VLDB Journal, 2007.
[10]
B. Carlin and T. Louis. Bayesian Methods for Data Analysis. Chapman and Hall, 2010.
[11]
B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In VLDB, 2005.
[12]
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In VLDB, 2002.
[13]
H. cheng Huang and N. Cressie. Multiscale graphical modeling in space: Applications to command and control. Spatial statistics: methodological aspects and applications, 2000.
[14]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB Journal, 2007.
[15]
M. DeGroot. Optimal Statistical Decisions. Wiley, 1970.
[16]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Society, Series B, 1977.
[17]
A. E. Gelfand. Gibbs sampling. Journal of the American Statistical Association, 452:1300--1304, 1995.
[18]
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1997.
[19]
D. Griffiths. Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of disease. Biometrics, 1973.
[20]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006.
[21]
J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. SIGMOD, 2001.
[22]
J. C. Kleinman. Proportions with extraneous variance: Single and independent sample. J. of American Statistical Association, 68:46--54, 1973.
[23]
X. Li, J. Han, Z. Yin, J.-G. Lee, and Y. Sun. Sampling cube: a framework for statistical olap over sampling data. In SIGMOD, 2008.
[24]
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of olap data cubes. In EDBT, 1998.
[25]
D. Smith. Maximum likelihood estimation of the parameters of the beta binomial distribution. Applied Statistics, 1983.
[26]
D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. VLDB, 2008.
[27]
D. Xin, J. Han, X. Li, and B. W. Wah. Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In VLDB, 2003.

Cited By

View all
  • (2015)Intelligent analysis of data cube via statistical methods2015 Tenth International Conference on Digital Information Management (ICDIM)10.1109/ICDIM.2015.7381880(20-27)Online publication date: Oct-2015
  • (2015)Modular Neural Networks for Extending OLAP to PredictionTransactions on Large-Scale Data- and Knowledge-Centered Systems XXI10.1007/978-3-662-47804-2_4(73-93)Online publication date: 17-Jul-2015
  • (2013)Discovering diverse association rules from multidimensional schemaExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.05.03140:15(5975-5996)Online publication date: 1-Nov-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algebraic aggregation
  2. bayesian hierarchical models
  3. posterior mode
  4. variance estimation

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Intelligent analysis of data cube via statistical methods2015 Tenth International Conference on Digital Information Management (ICDIM)10.1109/ICDIM.2015.7381880(20-27)Online publication date: Oct-2015
  • (2015)Modular Neural Networks for Extending OLAP to PredictionTransactions on Large-Scale Data- and Knowledge-Centered Systems XXI10.1007/978-3-662-47804-2_4(73-93)Online publication date: 17-Jul-2015
  • (2013)Discovering diverse association rules from multidimensional schemaExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.05.03140:15(5975-5996)Online publication date: 1-Nov-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media