Abstract
We study the problem of repairing XML functional dependency violations by making the smallest value modifications in terms of repair cost. Our cost model assigns a weight to each leaf node in the XML document, and the cost of a repair is measured by the total weight of the modified nodes. We show that it is beyond reach in practice to find optimum repairs: this problem is already NP-complete for a setting with a fixed DTD, a fixed set of functional dependencies, and equal weights for all the nodes in the XML document. To this end we provide an efficient two-step heuristic method to repair XML functional dependency violations. First, the initial violations are captured and fixed by leveraging the conflict hypergraph. Second, the remaining conflicts are resolved by modifying the violating nodes and their related nodes called determinants, in a way that guarantees no new violations. The experimental results demonstrate that our algorithm scales well and is effective in improving data quality.
This work is supported by NSFC under Grant No. 60603043.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley, Reading (1995)
Arenas, M., Libkin, L.: A normal form for XML documents. TODS 29(1), 195–232 (2004)
Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Keys for XML. In: WWW, pp. 201–210 (2001)
Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD, pp. 143–154 (2005)
Beskales, G., Ilyas, I., Golab, L.: Sampling the repairs of functional dependency violations under dard constraints. In: VLDB (2010)
Chomicki, J.: Consistent query answering: Five easy pieces. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 1–17. Springer, Heidelberg (2006)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB, pp. 315–326 (2007)
Fan, W.: Dependencies revisited for improving data quality. In: PODS, pp. 159–170 (2008)
Fan, W., Bohannon, P.: Information preserving XML schema embedding. TODS 33(1) (2008)
Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairs and consistent answers for XML data with functional dependencies. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds.) XSym 2003. LNCS, vol. 2824, pp. 238–253. Springer, Heidelberg (2003)
Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and repairing inconsistent XML data. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 175–188. Springer, Heidelberg (2005)
Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT, pp. 53–62 (2009)
Lopatenko, A., Bravo, L.: Efficient approximation algorithms for repairing inconsistent databases. In: ICDE, pp. 216–225 (2007)
Ng, W.: Repairing inconsistent merged XML data. In: MaÅ™Ãk, V., Å tÄ›pánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 244–255. Springer, Heidelberg (2003)
Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: VLDB, pp. 974–985 (2002)
Vazirani, V.V.: Approximation algorithms. Springer, Heidelberg (2001)
Vincent, M., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. TODS 29(3), 445–462 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tan, Z., Zhang, L. (2011). Improving XML Data Quality with Functional Dependencies. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20149-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-20149-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20148-6
Online ISBN: 978-3-642-20149-3
eBook Packages: Computer ScienceComputer Science (R0)