Abstract
On-Line Analytical Processing (OLAP) enables analysts to gain insight about data through fast and interactive access to a variety of possible views on information, organized in a dimensional model. The demand for data integration is rapidly becoming larger as more and more information sources appear in modern enterprises. In the data warehousing approach, selected information is extracted in advance and stored in a repository, yielding good query performance. However, in many situations a logical (rather than physical) integration of data is preferable. Previous web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques focused on the conceptual level. Also, previous integration techniques for web-based data have not addressed the special needs of OLAP tools such as handling dimensions with hierarchies. Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web. The rapid emergence of XML data on the web, e.g., business-to-business (B2B) e-commerce, is making it necessary for OLAP and other data analysis tools to handle XML data as well as traditional data formats.
Based on a real-world case study, this paper presents an approach to specification of OLAP DBs based on web data. Unlike previous work, this approach takes special OLAP issues such as dimension hierarchies and correct aggregation of data into account. Also, the approach works on the conceptual level, using Unified Modeling Language (UML) as a basis for so-called UML snowflake diagrams that precisely capture the multidimensional structure of the data. An integration architecture that allows the logical integration of XML and relational data sources for use by OLAP tools is also presented.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abiteboul, S. (1997). Querying Semistructured Data. In Proceeding of the Sixth International Conference on Database Theory(pp. 1-18).
Abiteboul, S. et al. (1999). Tools for Data Translation and Integration. Data Engineering Bulletin, 22(1), 3-8.
Bonifati, A. et al. (2000). Comparative Analyses of Five XML Query Languages. SIGMOD Record, 29(1), 68-79.
Cattell, R. (2000). The Object Database Standard: ODMG 3.0.San Mateo, CA: Morgan-Kaufmann.
Chamberlin, D. et al. (2000). Quilt: An XML Query Language for Heterogeneous Data Sources. In Proceedings of the Third International Workshop on the Web and Databases(pp. 53-62).
Computer Associates Corporation. (2001). ERwin Product Brochure. www.cai.com/products/alm/erwin/ erwin pd.pdf
Deutsch, A. et al. (1999). Storing Semistructured Data with STORED. In Proceedings of ACM SIGMOD Conference(pp 431-442).
Fernandez, M. F. et al. (2000). Declarative Specification of Web Sites with Strudel. VLDB Journal, 9(1), 38-55.
Florescu, D. and Kossmann, D. (1999). Storing and Querying XML Data using and RDMBS. Data Engineeing Bulletin, 22(3), 27-34.
Gamma, E. et al. (1995). Design Patterns. Reading, MA: Addison-Wesley.
Garcia-Molina, H. et al. (1997). The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems, 8(2), 117-132.
Gray, J. et al. (1997). Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals, Data Mining and Knowledge Discovery, 1(1), 29-53.
Hellerstein, J.M. et al. (1999). Independent, Open Enterprise Data Integration. Data Engineering Bulletin, 22(1), 43-49.
Hyperion Corporation. (2001). Hyperion Essbase OLAP 6. www.hyperion.com/essbaseolap.cfm
Jensen, M.R., Møller, T.H., and Pedersen T.B. (2001a). Converting XML Data to UML Diagrams For Conceptual Data Integration. In Proceedings of the First International Workshop on Data Integration Over The Web(pp. 17-31).
Jensen, M.R.,Møller, T.H., and Pedersen, T.B. (2001b). Specifying OLAP Cubes on XML Data. Technical Report R-01-5003, Department of Computer Science, Aalborg University, 22 p.
Kimball, R. et al. (1998). The Data Warehouse Lifecycle Toolkit.New York: Wiley.
Kimball, R. (1996). The Data Warehouse Toolkit.New York: Wiley.
Lahiri, T. et al. (1999). Ozone: Integrating Structured and Semistructured Data. In Proceedings of the Seventh International Conference on Database Programming Languages(pp. 297-323).
Lenz, H. and Shoshani, A. (1997). Summarizability in OLAP and Statistical Databases. In Proceedings of the Ninth International Conference on Statistical and Scientific Database Management(pp. 39-48).
Melton, J. et al. (1995). Understanding the New SQL: A Complete Guide. San Mateo, CA: Morgan-Kaufmann.
Microsoft Corporation. (2001). SQL Server 2000 Analysis Services White Paper. www.microsoft.com/sql/ evaluation/compare/analysisservicesWP.asp
Object Management Group. (2001). OMG Unified Modeling Language Specification 1.3.www.rational.com/uml/ resources/documentation/index.jsp
Oracle Corporation. (2001). Oracle Express OLAP. www.oracle.com/ip/analyze/warehouse/bus_intell/index.html
Pedersen,T.B. et al. (1999). Extending Practical Pre-Aggregation in On-Line Analytical Processing. In Proceedings of the Twenty-Fifth International Conference on Very Large Databases(pp. 663-674).
Pedersen, T.B. et al. (2000). Extending OLAP Querying to External Object Databases. In Proceedings of the Ninth International Conference on Information and Knowledge Management(pp. 405-413).
Pinnock, J. et al. (2000). Professional XML.Chicago, IL: Wrox Press.
Rafanelli, M. et al. (1990). STORM: A Statistical Object Representation Model. In Proceedings of the Fifth Conference on Statistical and Scientific Database Management(pp. 14-29). Heidelberg, Germany: Springer Verlag, 1990.
Roth, M.T. et al. (1996). The Garlie Project. In Proceedings of ACM SIGMOD Conference(p. 557). New York, NY.
Shanmugasundaram, J. et al. (1999). Relational Databases for QueryingXMLDocuments: Limitations and Opportunities. In Proceedings of the Twenty-Fifth International Conference on Very Large Databases(pp. 302-314).
Silicon Integration Initiative (SII). (2001). The Electronic Component Information Exchange QuickData Architecture.www.-si2.org/ecix/
Thomsen, E. et al. (1999). Microsoft OLAP Solutions. New York, NY: Wiley.
Thomsen, E. (1997). OLAP Solutions: Building Multidimensional Information System.New York, NY: Wiley.
World Wide Web Consortium (W3C) (2001a). Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation. www.w3.org/TR/2000/REC-xml-20001006.
World Wide Web Consortium (W3C) (2001b). XML Schema, W3C Candidate Recommendation. www.w3.org/ XML/Schema.html
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jensen, M.R., Møller, T.H. & Pedersen, T.B. Specifying OLAP Cubes on XML Data. Journal of Intelligent Information Systems 17, 255–280 (2001). https://doi.org/10.1023/A:1012814015209
Issue Date:
DOI: https://doi.org/10.1023/A:1012814015209