Skip to main content

Identification of Syntactically Similar DTD Elements for Schema Matching

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Abstract

XML Document Type Definition (DTD) enforces the structure of XML documents. XML applications such as data translation, schema integration, and wrapper generation require DTD schema matching as a core procedure. While schema matching usually relies on a human arbiter, we are aiming at an automated system that can give the arbiter a starting point for designing a matching that can best meet the requirements of the given application. We present an approach that identifies the syntactically similar DTD elements that can be potential matching components. We first describe DTD element graph, a data model for the DTD elements. We then define the distance between two DTD element graphs. We introduce the concept of syntactically equivalent and syntactically similar graphs. Then, we describe the algorithm to detect both schema equivalent and similar DTD elements. We have implemented the matching detection algorithm and several heuristics which improve performance. Our experimental results show reasonable precision of the algorithm in terms of recognition of correct matches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. In Pattern Recognition Letters 19 (1998), 1998.

    Google Scholar 

  2. S. Castano and V. D. Antonellis. A schema analysis and reconciliation tool environment for heterogeneous databases. In IDEAS, 1999.

    Google Scholar 

  3. S. S. Chawathe and H. Garcia-Molina. Meaningful Change Detection in Structured Data. In SIGMOD, 1997.

    Google Scholar 

  4. H. Carcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems, 1997.

    Google Scholar 

  5. S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In SIGMOD, 1996.

    Google Scholar 

  6. A. Doan, P. Domingos, and A. Levy. Learning source descriptions for data integration. In WebDB International Workshop on the Web and Databases, 2000.

    Google Scholar 

  7. Alin Deutsch, Mary F. Fernandez, and Dan Suciu. Storing semistructured data with STORED. In SIGMOD, 1999.

    Google Scholar 

  8. DocBook.org. The docbook dtd. http://www.docbook.org/intro.html, August 2000.

  9. Z. G. Ives, D. Florescu, M. A. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In SIGMOD, 1999.

    Google Scholar 

  10. R. J. Miller, S. Y. E. Ioannidis, and R. Ramakrishnan. The use of information capacity in schema integration and translation. In VLDB, 1993.

    Google Scholar 

  11. T. Milo and S. Zohar. Schema-based data translation. In WebDB International Workshop on the Web and Databases, pages 33–41, 1998.

    Google Scholar 

  12. T. Milo and S. Zohar. Using schema matching to simplify heterogenous data translation. In SIGMOD, 1998.

    Google Scholar 

  13. XML Org. XML. Org Registry Open for Business. http://www.xml.org/registry, 1998.

  14. A. Sahuguet. Everything you ever wanted to know about DTDs, but were afraid to ask. In WebDB, 2000.

    Google Scholar 

  15. J. Shanmugasundaram, G. He, K. Tufte, C. Zhang, D. DeWitt, and J. Naughton. Relational databases for querying XML Documents: Limitations and Opportunities. In VLDB, pages 302–314, Edinburgh, Scotland, UK, September 1999.

    Google Scholar 

  16. H. Su, S. Padmanabhan, and M. L. Lo. Identification of Syntactically Similar DTD Elements for Schema Matching. In Technical Report, January 2001.

    Google Scholar 

  17. W3C. XML TM. http://www.w3.org/XML, 1998.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, H., Padmanabhan, S., Lo, ML. (2001). Identification of Syntactically Similar DTD Elements for Schema Matching. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-47714-4_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42298-3

  • Online ISBN: 978-3-540-47714-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics