Abstract
In this paper we expand upon the previous efforts to infer schema information from existing XML documents. We focus on inference of integrity constraints, more specifically ID/IDREF/IDREFS attributes in DTD. Building on the research by Barbosa and Mendelzon (2003) we introduce a heuristic approach to the problem of finding an optimal ID set. The approach is evaluated and tuned in a wide range of experiments.
The paper was supported by the GAČR grant no. P202/10/0573.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Ph.D. thesis, Department of Computer Science, University of Helsinki. Series of Publications A, Report A-1996-4 (1996)
Barbosa, D., Mendelzon, A.: Finding ID Attributes in XML Documents. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds.) XSym 2003. LNCS, vol. 2824, pp. 180–194. Springer, Heidelberg (2003)
Bex, G.J., Neven, F., Vansummeren, S.: SchemaScope: a System for Inferring and Cleaning XML Schemas. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1259–1262. ACM, New York (2008)
Bray, T., Paoli, J., Maler, E., Yergeau, F., Sperberg-McQueen, C.M.: Extensible Markup Language (XML) 1.0, 5th edn. W3C recommendation, W3C (2008)
Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.C.: Keys for XML. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 201–210. ACM, New York (2001)
Dantzig, G.: Linear Programming and Extensions. Landmarks in Physics and Mathematics. Princeton University Press (1998)
Dorigo, M., Stützle, T.: Ant Colony Optimization. Bradford Books. MIT Press (2004)
Fajt, S.: Mining XML Integrity Constraints. Master’s thesis, Charles University in Prague (2010), http://www.ksi.mff.cuni.cz/projects/infer/keyminer/Fajt.pdf
Fomin, F.V., Grandoni, F., Kratsch, D.: A Measure & Conquer Approach for the Analysis of Exact Algorithms. J. ACM 56, 25:1–25:32 (2009)
Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Norwell (1997)
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. In: Artificial Intelligence. Addison-Wesley Pub. Co. (1989)
Klempa, M., Mikula, M., Smetana, R., Švirec, M., Vitásek, M.: jInfer Architecture (2011), http://jinfer.sourceforge.net/modules/architecture.pdf
Klempa, M., Mikula, M., Smetana, R., Švirec, M., Vitásek, M.: jInfer: Java Framework for XML Schema Inference (2011), http://jinfer.sourceforge.net
Paschos, V.T.: A survey of Approximately Optimal Solutions to Some Covering and Packing Problems. ACM Comput. Surv. 29, 171–209 (1997)
Robson, J.: Algorithms for Maximum Independent Sets. Journal of Algorithms 7(3), 425–440 (1986)
Vitásek, M.: Inference of XML Integrity Constraints. Master’s thesis, Charles University in Prague (2012), http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vitasek.pdf
Vošta, O., Mlýnková, I., Pokorný, J.: Even an Ant Can Create an XSD. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 35–50. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vitásek, M., Mlýnková, I. (2013). Inference of XML Integrity Constraints. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32741-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-32741-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32740-7
Online ISBN: 978-3-642-32741-4
eBook Packages: EngineeringEngineering (R0)