Skip to main content

Improving XML Data Quality with Functional Dependencies

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6587))

Included in the following conference series:

  • 1443 Accesses

Abstract

We study the problem of repairing XML functional dependency violations by making the smallest value modifications in terms of repair cost. Our cost model assigns a weight to each leaf node in the XML document, and the cost of a repair is measured by the total weight of the modified nodes. We show that it is beyond reach in practice to find optimum repairs: this problem is already NP-complete for a setting with a fixed DTD, a fixed set of functional dependencies, and equal weights for all the nodes in the XML document. To this end we provide an efficient two-step heuristic method to repair XML functional dependency violations. First, the initial violations are captured and fixed by leveraging the conflict hypergraph. Second, the remaining conflicts are resolved by modifying the violating nodes and their related nodes called determinants, in a way that guarantees no new violations. The experimental results demonstrate that our algorithm scales well and is effective in improving data quality.

This work is supported by NSFC under Grant No. 60603043.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)

    Google Scholar 

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  3. Arenas, M., Libkin, L.: A normal form for XML documents. TODS 29(1), 195–232 (2004)

    Article  Google Scholar 

  4. Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Keys for XML. In: WWW, pp. 201–210 (2001)

    Google Scholar 

  5. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD, pp. 143–154 (2005)

    Google Scholar 

  6. Beskales, G., Ilyas, I., Golab, L.: Sampling the repairs of functional dependency violations under dard constraints. In: VLDB (2010)

    Google Scholar 

  7. Chomicki, J.: Consistent query answering: Five easy pieces. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 1–17. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB, pp. 315–326 (2007)

    Google Scholar 

  9. Fan, W.: Dependencies revisited for improving data quality. In: PODS, pp. 159–170 (2008)

    Google Scholar 

  10. Fan, W., Bohannon, P.: Information preserving XML schema embedding. TODS 33(1) (2008)

    Google Scholar 

  11. Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Repairs and consistent answers for XML data with functional dependencies. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds.) XSym 2003. LNCS, vol. 2824, pp. 238–253. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and repairing inconsistent XML data. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 175–188. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT, pp. 53–62 (2009)

    Google Scholar 

  14. Lopatenko, A., Bravo, L.: Efficient approximation algorithms for repairing inconsistent databases. In: ICDE, pp. 216–225 (2007)

    Google Scholar 

  15. Ng, W.: Repairing inconsistent merged XML data. In: Mařík, V., Å tÄ›pánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 244–255. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: VLDB, pp. 974–985 (2002)

    Google Scholar 

  17. Vazirani, V.V.: Approximation algorithms. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  18. Vincent, M., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. TODS 29(3), 445–462 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, Z., Zhang, L. (2011). Improving XML Data Quality with Functional Dependencies. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20149-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20149-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20148-6

  • Online ISBN: 978-3-642-20149-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics