Abstract
XML information items collected from heterogeneous sources often carry similar semantics but turn out to be structured in different ways. Variations in structure make effective search of information across multiple datasources hard to achieve. Our approach is aimed at a flexible search and processing technique, capable to extract relevant information from a possibly huge set of XML documents. ApproXML is a software tool supporting approximate pattern-based querying, able to locate and extract XML information dealing flexibly with differences in structure and tag vocabulary.
Our method relies on representing XML documents as graphs, through a variant of the DOM model. The relevant information is selected as follows [Dam00a]: first, a XML pattern, i.e. a partially specified subtree, is provided by the user. Then, the XML documents of the target dataset are scanned; XML fragments are located and sorted according to their similarity to the pattern.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
E. Damiani, L. Tanca. “Blind Queries to XML Data”. Proceedings of DEXA 2000, London, UK, September 4–8, 2000. Lecture Notes in Computer Science, Vol. 1873, Springer, 2000, Pages: 345–356.
E. Damiani, L. Tanca, F. Arcelli Fontana. “Fuzzy XML Queries via Contextbased Choice of Aggregations”. Kybernetika n.16 vol.4, 2000.
E. Damiani, B. Oliboni, L. Tanca. “Fuzzy Techniques for XML Data Smushing”. Proceedings of 7th Fuzzy Days, Dortmund, Germany, October 1–3, 2001.
W. May. “Information extraction and integration with Florid: The Mondial case study”. Technical Report 131, Universität Freiburg, Institut für Informatik, 1999. Available from http://www.informatik.uni-freiburg.de/~may/Mondial/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Damiani, E. et al. (2002). The ApproXML Tool Demonstration. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_52
Download citation
DOI: https://doi.org/10.1007/3-540-45876-X_52
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive