Abstract
XML documents and related technologies represent a widely accepted standard for managing semi-structured data. However, a surprisingly high number of XML documents is affected by well-formedness errors, structural invalidity or data inconsistencies. The aim of this paper is the proposal of a correction framework involving structural repairs of elements with respect to single type tree grammars. Via the inspection of the state space of a finite automaton recognising regular expressions, we are always able to find all minimal repairs against a defined cost function. These repairs are compactly represented by shortest paths in recursively nested multigraphs, which can be translated to particular sequences of edit operations altering XML trees. We have proposed an efficient algorithm and provided a prototype implementation.
This work was partially supported by the Czech Science Foundation (GAČR), grants number 201/09/P364 and P202/10/0573.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bouchou, B., Cheriat, A., Ferrari Alves, M.H., Savary, A.: Integrating Correction into Incremental Validation. In: BDA (2006)
Allauzen, C., Mohri, M.: A Unified Construction of the Glushkov, Follow, and Antimirov Automata. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 110–121. Springer, Heidelberg (2006)
Corrector Prototype Implementation, http://www.ksi.mff.cuni.cz/~svoboda/
Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema Part 1: Structures, 2nd edn. (2004), http://www.w3.org/TR/xmlschema-1/
Mlynkova, I., Toman, K., Pokorny, J.: Statistical Analysis of Real XML Data Collections. In: Proceedings of the 13th International Conference on Management of Data (2006)
Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML Schema Languages using Formal Language Theory. ACM Trans. Internet Technol. 5(4), 660–704 (2005)
Svoboda, M.: Processing of Incorrect XML Data. Master’s thesis, Department of Software Engineering, Charles University in Prague, Czech Republic, Malostranske namesti 25, 118 00 Praha 1, Czech Republic (July 2010)
Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and Repairing Inconsistent XML Data. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 175–188. Springer, Heidelberg (2005)
Staworko, S., Chomicky, J.: Validity-Sensitive Querying of XML Databases. In: Freund, Y., Györfi, L., Turán, G., Zeugmann, T. (eds.) ALT 2008. LNCS (LNAI), vol. 5254. Springer, Heidelberg (2008)
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F., Cowan, J.: Extensible Markup Language (XML) 1.1, 2nd edn. (2006), http://www.w3.org/XML/
Boobna, U., de Rougemont, M.: Correctors for XML data. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) XSym 2004. LNCS, vol. 3186, pp. 97–111. Springer, Heidelberg (2004)
Tan, Z., Zhang, Z., Wang, W., Shi, B.-L.: Computing Repairs for Inconsistent XML Document Using Chase. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 293–304. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Svoboda, M., Mlýnková, I. (2011). Correction of Invalid XML Documents with Respect to Single Type Tree Grammars. In: Fong, S. (eds) Networked Digital Technologies. NDT 2011. Communications in Computer and Information Science, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22185-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-22185-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22184-2
Online ISBN: 978-3-642-22185-9
eBook Packages: Computer ScienceComputer Science (R0)