Abstract
XML Document Type Definition (DTD) enforces the structure of XML documents. XML applications such as data translation, schema integration, and wrapper generation require DTD schema matching as a core procedure. While schema matching usually relies on a human arbiter, we are aiming at an automated system that can give the arbiter a starting point for designing a matching that can best meet the requirements of the given application. We present an approach that identifies the syntactically similar DTD elements that can be potential matching components. We first describe DTD element graph, a data model for the DTD elements. We then define the distance between two DTD element graphs. We introduce the concept of syntactically equivalent and syntactically similar graphs. Then, we describe the algorithm to detect both schema equivalent and similar DTD elements. We have implemented the matching detection algorithm and several heuristics which improve performance. Our experimental results show reasonable precision of the algorithm in terms of recognition of correct matches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. In Pattern Recognition Letters 19 (1998), 1998.
S. Castano and V. D. Antonellis. A schema analysis and reconciliation tool environment for heterogeneous databases. In IDEAS, 1999.
S. S. Chawathe and H. Garcia-Molina. Meaningful Change Detection in Structured Data. In SIGMOD, 1997.
H. Carcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems, 1997.
S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In SIGMOD, 1996.
A. Doan, P. Domingos, and A. Levy. Learning source descriptions for data integration. In WebDB International Workshop on the Web and Databases, 2000.
Alin Deutsch, Mary F. Fernandez, and Dan Suciu. Storing semistructured data with STORED. In SIGMOD, 1999.
DocBook.org. The docbook dtd. http://www.docbook.org/intro.html, August 2000.
Z. G. Ives, D. Florescu, M. A. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In SIGMOD, 1999.
R. J. Miller, S. Y. E. Ioannidis, and R. Ramakrishnan. The use of information capacity in schema integration and translation. In VLDB, 1993.
T. Milo and S. Zohar. Schema-based data translation. In WebDB International Workshop on the Web and Databases, pages 33–41, 1998.
T. Milo and S. Zohar. Using schema matching to simplify heterogenous data translation. In SIGMOD, 1998.
XML Org. XML. Org Registry Open for Business. http://www.xml.org/registry, 1998.
A. Sahuguet. Everything you ever wanted to know about DTDs, but were afraid to ask. In WebDB, 2000.
J. Shanmugasundaram, G. He, K. Tufte, C. Zhang, D. DeWitt, and J. Naughton. Relational databases for querying XML Documents: Limitations and Opportunities. In VLDB, pages 302–314, Edinburgh, Scotland, UK, September 1999.
H. Su, S. Padmanabhan, and M. L. Lo. Identification of Syntactically Similar DTD Elements for Schema Matching. In Technical Report, January 2001.
W3C. XML TM. http://www.w3.org/XML, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, H., Padmanabhan, S., Lo, ML. (2001). Identification of Syntactically Similar DTD Elements for Schema Matching. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_14
Download citation
DOI: https://doi.org/10.1007/3-540-47714-4_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive