ABSTRACT
Accurate summary data is of paramount concern in data warehouse systems; however, there have been few attempts to completely characterize the ability to summarize measures. The sum operator is the typical aggregate operator for summarizing the large amount of data in these systems. We look to uncover and characterize potentially inaccurate summaries resulting from aggregating measures using the sum operator. We discuss the effect of classification hierarchies, and non-, semi-, and fully- additive measures on summary data, and develop a taxonomy of the additive nature of measures. Additionally, averaging and rounding rules can add complexity to seemingly simple aggregations. To deal with these problems, we describe the importance of storing metadata that can be used to restrict potentially inaccurate aggregate queries. These summary constraints could be integrated into data warehouses, just as integrity constraints and are integrated into OLTP systems. We conclude by suggesting methods for identifying and dealing with non- and semi- additive attributes.
- Adamson, C., Venerable, M. (1998). Data Warehouse Design Solutions, John Wiley and Sons, Inc. Google ScholarDigital Library
- Bedell, J. (1998). "Outstanding Challenges in OLAP." Data Engineering. Proceedings of 14th ICDE, 23-27 Feb. 1998. 178--179. Google ScholarDigital Library
- Chaudhuri, S., and Dayal, U. (1997). "An Overview of Data Warehousing and OLAP Technology". SIGMOD Record. 65--74. ACM Press, New York, NY. Google ScholarDigital Library
- Ewen, E. F., Medsker, C. E., and Dusterhoft, L. E. (1998). "Data Warehousing in an Integrated Health System; Building the Business Case". DOLAP '98, Washington, DC. Google ScholarDigital Library
- Golfarelli, M., Maio, D., and Rizzi, S. (1998). "Conceptual Design of Data Warehouses from E/R Schemes". Proceedings of the Thirty-First Hawaii International Conference, 6-9 Jan. 1998, 7, 334--343. Google ScholarDigital Library
- Holowczak, R., Adam, N., Artigas, J., and Bora, I. (2003). "Data Warehousing in Environmental Digital Libraries". Communications Of The ACM, September 2003, 46. 172--178. Google ScholarDigital Library
- Hüsemann, B., Lechtenbörger, J, and Vossen, G. (2000). "Conceptual data warehouse design". Proc. Of International Workshop on Design and Management of Data Warehouses, 2000.Google Scholar
- Kim, B., Choi, K., Kim, S., and Lee,D. (2003). "A Taxonomy of Dirty Data". Data Mining and Knowledge Discovery". 81--99. Google ScholarDigital Library
- Kimball, R. and Ross, M. (2002). The Data Warehouse Toolkit: Second Edition. John Wiley and Sons, Inc. Google ScholarDigital Library
- Lehner, W. (1998). "Modeling Large Scale OLAP Scenarios". In Proceedings of the Sixth International Conference on Extending Database Technology, 153--167. Google ScholarDigital Library
- Martyn, T. (2004). "Reconsidering multi-dimensional schemas". SIGMOD Record. 33 - 1. 83--88 ACM Press New York, NY. Google ScholarDigital Library
- Pourabbas, E. and Rafanelli, M. (1999). "Characterizations of hierarchies and some operators in OLAP environments". Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP (DOLAP'99). Kansas City, Missouri. 54--59. Google ScholarDigital Library
- Shoshani, A. (1997). "OLAP and statistical databases: Similarities and differences". Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. Tucson, Arizona. 185--196. ACM Press New York, NY. Google ScholarDigital Library
- Trujillo, J., Palomar, M. Gomez, J., and Song, I. (2001). "Designing Data Warehouses with OO Conceptual Models". IEEE Computer. V34, No 12, 66--75. Google ScholarDigital Library
- Tryfona, N., Busborg, F., and Borch Christiansen, J. (1999). "StarER: A Conceptual Model for Data Warehouse Design". ACM, DOLAP '99 Kansas City, MO. USA. Google ScholarDigital Library
- United States Environmental Protection Agency (US EPA). 1997. National Ambient Air Quality Standards for Particulate Matter, Final Rule, US EPA, Part 50 of Title 40 of the Code of Federal Regulations.Google Scholar
Index Terms
- An analysis of additivity in OLAP systems
Recommendations
A relational–XML data warehouse for data aggregation with SQL and XQuery
Integrating information from multiple data sources is becoming increasingly important for enterprises that partner with other companies for e-commerce. However, companies have their internal business applications deployed on diverse platforms and no ...
Comparative Study of Row and Column Oriented Database
ICETET '12: Proceedings of the 2012 Fifth International Conference on Emerging Trends in Engineering and TechnologySince long time, the relational row oriented databases are most often used for Data warehouse implementation because it is more efficient in the database which contains large number of short on-line transactions (INSERT, UPDATE, and DELETE) in OLTP (...
Metadata Based Student Data Extraction from Universities Data Warehouse
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing SystemsAbstract-This paper is intended to introduce the general concept of data warehouse and metadata. A data warehouse system loads data from various operational databases, clean data, transforms the data attribute in a commonly defined attribute description ...
Comments