Skip to main content

Semantic Integration of Tree-Structured Data Using Dimension Graphs

  • Conference paper
Journal on Data Semantics IV

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 3730))

  • 933 Accesses

Abstract

Nowadays, huge volumes of Web data are organized or exported in tree-structured form. Popular examples of such structures are product catalogs of e-market stores, taxonomies of thematic categories, XML data encodings, etc. Even for a single knowledge domain, name mismatches, structural differences and structural inconsistencies raise difficulties when many data sources need to be integrated and queried in a uniform way. In this paper, we present a method for semantically integrating tree-structured data. We introduce dimensions which are sets of semantically related nodes in tree structures. Based on dimensions, we suggest dimension graphs. Dimension graphs can be automatically extracted from trees and abstract their structural information. They are semantically rich constructs that provide query guidance to pose queries, assist query evaluation and support integration of tree-structured data. We design a query language to query tree-structured data. The language allows full, partial or no specification of the structure of the underlying tree-structured data used to issue queries. Thus, queries in our language are not restricted by the structure of the trees. We provide necessary and sufficient conditions for checking query satisfiability and we present a technique for evaluating satisfiable queries. Finally, we conducted several experiments to compare our method for integrating tree-structured data with one that does not exploit dimension graphs. Our results demonstrate the superiority of our approach.

Work supported in part by PYTHAGORAS EPEAEK II programme, EU and Greek Ministry of Education, co-funded by the European Social Fund (75%) and National Resources (25%).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Exchangeable Faceted Metadata Language, XFML (2003), http://www.xfml.org/

  2. XML Topic Maps (XTM) (2001), http://www.topicmaps.org

  3. World Wide Web Consortium site (W3C), http://www.w3c.org

  4. XML Path Language (XPath). World Wide Web Consortium site, W3C (2003-2005), http://www.w3c.org/TR/xpath20/

  5. XML Query (XQuery). World Wide Web Consortium site (W3C), The Architecture Domain (2003-2005), http://www.w3.org/XML/Query

  6. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  7. Amann, B., Beeri, C., Fundulaki, I., Scholl, M.: Ontology-based integration of XML web resources. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 117. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Å altenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 496. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Behrens, R.: A grammar based model for XML schema integration. In: Jeffery, K., Lings, B. (eds.) BNCOD 2000. LNCS, vol. 1832, p. 172. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Bergamaschi, S., Guerra, F., Vincini, M.: A data integration framework for e-commerce product classification. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 379. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Buneman, P., Davidson, S.B., Fernandez, M.F., Suciu, D.: Adding structure to unstructured data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186. Springer, Heidelberg (1996)

    Google Scholar 

  12. Camillo, S.D., Heuser, C.A., dos Santos Mello, R.: Querying heterogeneous XML sources through a conceptual schema. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 186–199. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Chaudhri, A.B., Rashid, A., Zicari, R.: XML Data Management. Addison-Wesley, Reading (2003)

    Google Scholar 

  14. Christophides, V., Cluet, S., Simeon, J.: On wrapping query languages and efficient XML integration. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2000), Dallas, Texas, USA (May 2000)

    Google Scholar 

  15. Cluet, S., Veltri, P., Vodislav, D.: Views in a large scale XML repository. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)

    Google Scholar 

  16. dos Santos Mello, R., Heuser, C.A.: A bottom-up approach for integration of XML sources. In: Proceedings of the International Workshop on Information Integration on the Web (WIIW 2001), Rio de Janeiro, Brazil (April 2001)

    Google Scholar 

  17. Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: A system for extracting document type descriptors from XML documents. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2000), Dallas, Texas, USA (May 2000)

    Google Scholar 

  18. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece (August 1997)

    Google Scholar 

  19. Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2002), Madison, USA (June 2002)

    Google Scholar 

  20. Halevy, A.: Data integration: a status report. In: Proceedings of the Datenbanksysteme fur Business, Technologie und Web, BTW 2003 (2003)

    Google Scholar 

  21. Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the 16th Symposium on Principles of Database Systems (ACM PODS 1997), Tucson, Arizona (May 1997)

    Google Scholar 

  22. Kim, D., Kim, J., Lee, S.-G.: Catalog integration for electronic commerce through category-hierarchy merging technique. In: Proceedings of the 12th International Workshop on Research Issues in Data Engineering (RIDE 2002), San Jose, USA (March 2002)

    Google Scholar 

  23. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML schemas for effective integration. In: Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA (November 2002)

    Google Scholar 

  24. Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the 21st Symposium on Principles of Database Systems (ACM PODS 2002), Madison, Wisconsin, USA (Jun 2002)

    Google Scholar 

  25. Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries over heterogeneous data sources. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)

    Google Scholar 

  26. Marron, P.J., Lausen, G., Weber, M.: Catalog integration made easy. In: Proceedings of the 19th International Conference on Data Engineering (ICDE 2003) (poster), Bangalore, India (March 2003)

    Google Scholar 

  27. Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD 2002), Madison, USA (June 2002)

    Google Scholar 

  28. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  29. Ram, S., Ramesh, V.: Management of Heterogeneous and Autonomous Database Systems. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  30. Theodoratos, D., Dalamagas, T.: Querying tree-structured data using dimension graphs. In: Pastor, Ó., Falcão e Cunha, J. (eds.) CAiSE 2005. LNCS, vol. 3520, pp. 201–215. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  31. Tzitzikas, Y., Spyratos, N., Constantopoulos, P., Analyti, A.: Extended faceted taxonomies for web catalogs. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE 2002), Grand Hyatt, Singapore (December 2002)

    Google Scholar 

  32. Widom, J.: Research problems in data warehousing. In: Proceedings of the 4th International Conference on Information and Knowledge Management (CIKM 2002), Baltimore, Maryland, USA (December 1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dalamagas, T., Theodoratos, D., Koufopoulos, A., Liu, IT. (2005). Semantic Integration of Tree-Structured Data Using Dimension Graphs. In: Spaccapietra, S. (eds) Journal on Data Semantics IV. Lecture Notes in Computer Science, vol 3730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11603412_8

Download citation

  • DOI: https://doi.org/10.1007/11603412_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31001-3

  • Online ISBN: 978-3-540-31447-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics