Abstract
Since XML has become a standard for data exchange over the Internet, especially in B2B and B2C communication, there is an increasing need of integrating XML data into data warehousing systems. In this paper we propose a methodology for data warehouse design, when data sources are XML Schemas and conforming XML documents. Particular relevance is given to the conceptual and logical multidimensional design. A prototype tool has been developed to verify and support our methodology. Because of the semi-structured nature of XML data, not all the information needed for design can be safely derived from XML Schema. In these situations, XQuery statements are generated by the tool to examine XML documents. The functionality of the tool is explained on a real-life XML Schema that describes purchase orders.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Florescu, D., Kossmann, D.: Storing and Querying XML Data Using an RDBMS. IEEE Data Engineering Bulletin 22(3) (1999)
Bohannon, P., Freire, J., Roy, P., Simeon, J.: From XML Schema to Relations: A Cost-Based Approach to XML Storage. In: Proc. of Int’l. Conf. on Data Engineering (ICDE 2002), San Jose, USA (2002)
Golfarelli, M., Rizzi, S., Vrdoljak, B.: Data Warehouse Design from XML Sources. In: Proc. Int. Workshop on Data Warehousing and OLAP (DOLAP 2001), pp. 40–47. ACM Press, New York (2001)
Vrdoljak, B., Banek, M., Rizzi, S.: Designing Web Warehouses from XML Schemas. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 89–98. Springer, Heidelberg (2003)
Jensen, M.R., Møller, T.H., Pedersen, T.B.: Converting XML Data to UML Diagrams for Conceptual Data Integration. In: Int. Workshop Data Integration over the Web (DIWeb 2001), pp. 17–31 (2001)
Jensen, R.M., Møller, T.H., Pedersen, T.B.: Specifying OLAP Cubes on XML Data. J. Intelligent Information Systems 17(2-3), 255–280 (2001)
Pedersen, D., Riis, K., Pedersen, T.B.: XML Extended OLAP Querying. In: Proc. Int. Conf. on Scientific and Statistical Database Management (SSDBM 2002), pp. 195–206. IEEE Computer Society Press, Los Alamitos (2002)
Li, Y., An, A.: Representing UML Snowflake Diagram from Integrating XML Data Using XML Schema. In: Proc. Int. Workshop on Data Engineering Issues in E-Commerce (DEEC 2005), pp. 103–111. IEEE Computer Society Press, Los Alamitos (2005)
Open Applications Group (OAG), OAG Integration Specification (OAGIS), Release 7.2.1, http://www.openapplications.org/downloads/oagidownloads.htm
Park, B.-K., Han, H., Song, I.-Y.: XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 32–42. Springer, Heidelberg (2005)
Golfarelli, M., Maio, D., Rizzi, S.: Conceptual design of data warehouses from E/R schemes. In: Proc. Hawaii Int. Conf. on System Sciences (HICSS), vol. VII, pp. 334–343 (1998)
Golfarelli, M., Rizzi, S.: Designing the Data Warehouse: Key Steps and Crucial Issues. J. of Computer Science and Information Management 2(3), 1–14 (1999)
Golfarelli, M., Maio, D., Rizzi, S.: The Dimensional Fact Model: a Conceptual Model for Data Warehouses. Int. J. of Cooperative Information Systems 7(2-3), 215–247 (1998)
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proc. Very Large Data Bases Conf (VLDB 1999), pp. 302–314. Morgan Kaufmann, San Francisco (1999)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, New York (2002)
World Wide Web Consortium (W3C): XML Schema Part 0: Primer Second Edition (W3C Recommendation, as of October 28, 2004), http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/
World Wide Web Consortium (W3C): XQuery 1.0: An XML Query Language (W3C Candidate Recommendation, as of November 3, 2005), http://www.w3.org/TR/2005/CR-xquery-20051103/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vrdoljak, B., Banek, M., Skočir, Z. (2006). Integrating XML Sources into a Data Warehouse. In: Lee, J., Shim, J., Lee, Sg., Bussler, C., Shim, S. (eds) Data Engineering Issues in E-Commerce and Services. DEECS 2006. Lecture Notes in Computer Science, vol 4055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780397_11
Download citation
DOI: https://doi.org/10.1007/11780397_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35440-6
Online ISBN: 978-3-540-35441-3
eBook Packages: Computer ScienceComputer Science (R0)