Abstract
Nowadays, huge volumes of Web data are organized or exported in tree-structured form. Popular examples of such structures are product catalogs of e-market stores, taxonomies of thematic categories, XML data encodings, etc. Even for a single knowledge domain, name mismatches, structural differences and structural inconsistencies raise difficulties when many data sources need to be integrated and queried in a uniform way. In this paper, we present a method for semantically integrating tree-structured data. We introduce dimensions which are sets of semantically related nodes in tree structures. Based on dimensions, we suggest dimension graphs. Dimension graphs can be automatically extracted from trees and abstract their structural information. They are semantically rich constructs that provide query guidance to pose queries, assist query evaluation and support integration of tree-structured data. We design a query language to query tree-structured data. The language allows full, partial or no specification of the structure of the underlying tree-structured data used to issue queries. Thus, queries in our language are not restricted by the structure of the trees. We provide necessary and sufficient conditions for checking query satisfiability and we present a technique for evaluating satisfiable queries. Finally, we conducted several experiments to compare our method for integrating tree-structured data with one that does not exploit dimension graphs. Our results demonstrate the superiority of our approach.
Work supported in part by PYTHAGORAS EPEAEK II programme, EU and Greek Ministry of Education, co-funded by the European Social Fund (75%) and National Resources (25%).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Exchangeable Faceted Metadata Language, XFML (2003), http://www.xfml.org/
XML Topic Maps (XTM) (2001), http://www.topicmaps.org
World Wide Web Consortium site (W3C), http://www.w3c.org
XML Path Language (XPath). World Wide Web Consortium site, W3C (2003-2005), http://www.w3c.org/TR/xpath20/
XML Query (XQuery). World Wide Web Consortium site (W3C), The Architecture Domain (2003-2005), http://www.w3.org/XML/Query
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco (2000)
Amann, B., Beeri, C., Fundulaki, I., Scholl, M.: Ontology-based integration of XML web resources. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 117. Springer, Heidelberg (2002)
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 496. Springer, Heidelberg (2002)
Behrens, R.: A grammar based model for XML schema integration. In: Jeffery, K., Lings, B. (eds.) BNCOD 2000. LNCS, vol. 1832, p. 172. Springer, Heidelberg (2000)
Bergamaschi, S., Guerra, F., Vincini, M.: A data integration framework for e-commerce product classification. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 379. Springer, Heidelberg (2002)
Buneman, P., Davidson, S.B., Fernandez, M.F., Suciu, D.: Adding structure to unstructured data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186. Springer, Heidelberg (1996)
Camillo, S.D., Heuser, C.A., dos Santos Mello, R.: Querying heterogeneous XML sources through a conceptual schema. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 186–199. Springer, Heidelberg (2003)
Chaudhri, A.B., Rashid, A., Zicari, R.: XML Data Management. Addison-Wesley, Reading (2003)
Christophides, V., Cluet, S., Simeon, J.: On wrapping query languages and efficient XML integration. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2000), Dallas, Texas, USA (May 2000)
Cluet, S., Veltri, P., Vodislav, D.: Views in a large scale XML repository. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)
dos Santos Mello, R., Heuser, C.A.: A bottom-up approach for integration of XML sources. In: Proceedings of the International Workshop on Information Integration on the Web (WIIW 2001), Rio de Janeiro, Brazil (April 2001)
Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: A system for extracting document type descriptors from XML documents. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2000), Dallas, Texas, USA (May 2000)
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece (August 1997)
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2002), Madison, USA (June 2002)
Halevy, A.: Data integration: a status report. In: Proceedings of the Datenbanksysteme fur Business, Technologie und Web, BTW 2003 (2003)
Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the 16th Symposium on Principles of Database Systems (ACM PODS 1997), Tucson, Arizona (May 1997)
Kim, D., Kim, J., Lee, S.-G.: Catalog integration for electronic commerce through category-hierarchy merging technique. In: Proceedings of the 12th International Workshop on Research Issues in Data Engineering (RIDE 2002), San Jose, USA (March 2002)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML schemas for effective integration. In: Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA (November 2002)
Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the 21st Symposium on Principles of Database Systems (ACM PODS 2002), Madison, Wisconsin, USA (Jun 2002)
Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries over heterogeneous data sources. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)
Marron, P.J., Lausen, G., Weber, M.: Catalog integration made easy. In: Proceedings of the 19th International Conference on Data Engineering (ICDE 2003) (poster), Bangalore, India (March 2003)
Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2002), Madison, USA (June 2002)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Ram, S., Ramesh, V.: Management of Heterogeneous and Autonomous Database Systems. Morgan Kaufmann Publishers, San Francisco (1999)
Theodoratos, D., Dalamagas, T.: Querying tree-structured data using dimension graphs. In: Pastor, Ó., Falcão e Cunha, J. (eds.) CAiSE 2005. LNCS, vol. 3520, pp. 201–215. Springer, Heidelberg (2005)
Tzitzikas, Y., Spyratos, N., Constantopoulos, P., Analyti, A.: Extended faceted taxonomies for web catalogs. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE 2002), Grand Hyatt, Singapore (December 2002)
Widom, J.: Research problems in data warehousing. In: Proceedings of the 4th International Conference on Information and Knowledge Management (CIKM 2002), Baltimore, Maryland, USA (December 1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dalamagas, T., Theodoratos, D., Koufopoulos, A., Liu, IT. (2005). Semantic Integration of Tree-Structured Data Using Dimension Graphs. In: Spaccapietra, S. (eds) Journal on Data Semantics IV. Lecture Notes in Computer Science, vol 3730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11603412_8
Download citation
DOI: https://doi.org/10.1007/11603412_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31001-3
Online ISBN: 978-3-540-31447-9
eBook Packages: Computer ScienceComputer Science (R0)