skip to main content
10.1145/1183512.1183520acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Pre-aggregation with probability distributions

Published: 10 November 2006 Publication History

Abstract

Motivated by the increasing need to analyze complex, uncertain multidimensional data this paper proposes probabilistic OLAP queries that are computed using probability distributions rather than atomic values. The paper describes how to create probability distributions from base data, and how the distributions can be subsequently used in pre-aggregation. Since the probability distributions can become large, we show how to achieve good time and space efficiency by approximating the distributions. We present the results of several experiments that demonstrate the effectiveness of our methods. The work is motivatedwith a real-world case study, based on our collaboration with a leading Danish vendor of location-based services. This paper is the first to consider the approximate processing of probabilistic OLAP queries over probability distributions.

References

[1]
D. Barbara et al. The Management of Probabilistic Data. TKDE 4(5):487--502, 1992.
[2]
T. Brinkhoff. A Framework for Generating Network-Based Moving Objects. Geoinformatica 6(2):153--180, 2002
[3]
Burdick et al. OLAP Over Uncertain and Imprecise Data. In Proc. VLDB, pp. 970--981, 2005.
[4]
R. Cavallo and M. Pittarelli. The Theory of Probabilistic Databases. In Proc. VLDB, pp. 71--81, 1987.
[5]
N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. In Proc. VLDB, pp. 864--875, 2004.
[6]
C.E. Dyreson. Information Retrieval from an Incomplete Data Cube. In Proc. VLDB, pp. 532--543, 1996.
[7]
C. Hage et al. Integrated Data Management for Mobile Services in the Real World. In Proc. VLDB, pp. 1019--1031, 2003.
[8]
J.M. Hellerstein et al. Online Aggregation. In Proc. SIGMOD, pp. 171--182, 1997.
[9]
H.V. Jagadish et al. Optimal Histograms with Quality Guarantees. In Proc. VLDB, pp. 275--286, 1998.
[10]
R.B. Kearfott. Interval Computations: Introduction, Uses, and Resources. Euromath Bulletin 2(1):95--112, 1996.
[11]
B.R. Moole. A Probabilistic Multidimensional Data Model and Algebra for OLAP in Decision Support Systems. In Proc. IEEE SoutheastCon, pp. 18--30, 2003.
[12]
} T.B. Pedersen et al. Supporting Imprecision in Multidimensional Databases Using Granularities. In Proc. SSDBM, pp. 90--101, 1999.
[13]
T.B. Pedersen et al. A Foundation for Capturing and Querying Complex Multidimensional Data. Information Systems 2 (5):383--423, 2001.
[14]
T.B. Pedersen and N. Tryfona. Pre-aggregation in Spatial Data Warehouses. In Proc. SSTD, pp. 460--478, 2001.
[15]
V. Poosala and V. Ganti. Fast Approximate Answers to Aggregate Queries on a Data Cube. In SSDBM, pp. 24--33, 1999.
[16]
V. Poosala et al. Approximate Query Answering using Histograms. Bulletin of the IEEE TCDE 22(4):5--14.
[17]
M.S. Puckette. Shannon Entropy and the Central Limit Theorem. Ph.D. Thesis. Dep. Math., Harvard Uni., 1986.
[18]
H.M. Regan et al. Equivalence of Methods for Uncertainty Propagation of Real-Valued Random Variables. International Journal of Approximate Reasoning 36:1--30, 2004.
[19]
Y. Tao et al. Spatio-Temporal Aggregation Using Sketches. In Proc. ICDE, pp. 214--226, 2004.
[20]
I. Timko, C.E. Dyreson, and T.B. Pedersen. Probabilistic Data Modeling and Querying for Location-Based Data Warehouses. In Proc. SSDBM, pp. 273--282, 2005.
[21]
I. Timko C.E. Dyreson, and T.B. Pedersen. Probability Distributions as Pre-Aggregated Data in Data Warehouses. DBTR-14, www.cs.aau.dk/DBTR.
[22]
E. Thomsen et al. Microsoft OLAP Solutions. Wiley, 1999.
[23]
J.S. Vitter and M. Wang. Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. In Proc. SIGMOD, pp. 193--204, 1999.
[24]
D. Zhang et al. Temporal and Spatio-Temporal Aggregations over Data Streams Using Multiple Time Granularities. Information Systems 28(1-2): 61--84, 2003.

Cited By

View all
  • (2020)OLAP over Probabilistic Data Cubes II: Parallel Materialization and Extended AggregatesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291342032:10(1966-1981)Online publication date: 1-Oct-2020
  • (2019)A Decomposition Framework for Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational DataFundamenta Informaticae10.5555/2637688.2637693132:2(239-266)Online publication date: 4-Jan-2019
  • (2016)OLAP over probabilistic data cubes I: Aggregating, materializing, and querying2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498291(799-810)Online publication date: May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DOLAP '06: Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
November 2006
110 pages
ISBN:1595935304
DOI:10.1145/1183512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OLAP
  2. aggregation queries
  3. approximate query processing
  4. data warehousing
  5. incomplete data
  6. location-based services
  7. pre-aggregation
  8. probability distributions

Qualifiers

  • Article

Conference

CIKM06
CIKM06: Conference on Information and Knowledge Management
November 10, 2006
Virginia, Arlington, USA

Acceptance Rates

Overall Acceptance Rate 29 of 79 submissions, 37%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)OLAP over Probabilistic Data Cubes II: Parallel Materialization and Extended AggregatesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291342032:10(1966-1981)Online publication date: 1-Oct-2020
  • (2019)A Decomposition Framework for Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational DataFundamenta Informaticae10.5555/2637688.2637693132:2(239-266)Online publication date: 4-Jan-2019
  • (2016)OLAP over probabilistic data cubes I: Aggregating, materializing, and querying2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498291(799-810)Online publication date: May-2016
  • (2014)OLAP over Uncertain and Imprecise Data StreamsEncyclopedia of Business Analytics and Optimization10.4018/978-1-4666-5202-6.ch149(1670-1679)Online publication date: 2014
  • (2013)Approximate OLAP Query Processing over Uncertain and Imprecise Multidimensional Data StreamsDatabase and Expert Systems Applications10.1007/978-3-642-40173-2_15(156-173)Online publication date: 2013
  • (2013)A Theoretically-Sound Approach for OLAPing Uncertain and Imprecise Multidimensional Data StreamsAdvances in Probabilistic Databases for Uncertain Information Management10.1007/978-3-642-37509-5_5(109-129)Online publication date: 2013
  • (2010)Efficiently computing and querying multidimensional OLAP data cubes over probabilistic relational dataProceedings of the 14th east European conference on Advances in databases and information systems10.5555/1885872.1885886(132-148)Online publication date: 20-Sep-2010
  • (2010)OLAP Over Uncertain and Imprecise DataProceedings of the 2010 Workshops on Database and Expert Systems Applications10.1109/DEXA.2010.71(331-336)Online publication date: 30-Aug-2010
  • (2010)Revisiting the cube lifecycle in the presence of hierarchiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-009-0160-319:2(257-282)Online publication date: 1-Apr-2010
  • (2010)Efficiently Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational DataAdvances in Databases and Information Systems10.1007/978-3-642-15576-5_12(132-148)Online publication date: 2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media