ABSTRACT
We describe XIS, an XML document integration system. The system is based on an algorithm that computes the top-down edit distance between an XML document and a schema. The complexity of the algorithm is t x s x log s, where t is the size of the document and s is the size of the schema.The system includes a GUI that allows the user to visualize the operations performed on the XML document. Synthesized and real data-sets will be used to show the efficiency and efficacy of the system.
- E. Rodney Canfield, Guangming Xing, Approximate XML document matching}, SAC'05, pp 787--788, Sante Fe, NM, 2005. Google ScholarDigital Library
- Nobutaka Suzuki, Finding an Optimum Edit Script between an XML Document and a DTD}, SAC'05, pp. 647--653, March, 2005, Santa Fe, NM. Google ScholarDigital Library
- D. Shasha, K. Zhang, Approximate Tree Pattern Matching, Chapter 14 Pattern Matching Algorithms (eds. Apostolico, A. and Galil, Z.), Oxford University Press, June 1997.Google Scholar
- M. Murata Hedge Automata: A Formal Model for XML Schemata http://www.xml.gr.jp/relax/hedge_nice.htmlGoogle Scholar
Index Terms
- XIS: an XML document integration system
Recommendations
Approximate XML document matching
SAC '05: Proceedings of the 2005 ACM symposium on Applied computingRegular Hedge Grammar is a formal method to specify XML schema. XML document can be viewed as an ordered labeled tree. Computing the approximate matching between an XML document with a schema with minimum cost is not only theoretically interesting. This ...
Extracting differences between regular tree grammars
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied ComputingAn XML document is usually stored with its schema so that the structural consistency of the document is ensured. In general, schemas are continuously updated according to changes in real world. Thus, we have to precisely know how a schema is updated to ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Comments