Skip to main content

Repairing Data Violations with Order Dependencies

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10828))

Abstract

Lexicographical order dependencies (ODs) are proposed to describe the relationships between two lexicographical ordering specifications with respect to lists of attributes, and are proved to be useful in query optimizations concerning ordered attributes. To take full advantage of ODs, the data instance is supposed to satisfy OD specifications. In practice, data are often found to violate given ODs, as demonstrated in recent studies on discovery of ODs. This highlights the quest for data repairing techniques for ODs, to restore consistency of the data with respect to ODs. New challenges arise since ODs convey order semantics beyond functional dependencies, and are specified on lists of attributes. In this paper, we make a first effort to develop techniques for repairing data violations with ODs. (1) We formalize the data repairing problem for ODs, and prove that it is NP-hard in the size of the data. (2) Despite the intractability, we develop effective heuristic algorithms to address the problem. (3) We experimentally evaluate the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)

    Google Scholar 

  2. Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: Sampling from repairs of conditional functional dependency violations. VLDB J. 23(1), 103–128 (2014)

    Article  Google Scholar 

  3. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: VLDB (2007)

    Google Scholar 

  4. Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE (2013)

    Google Scholar 

  5. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  6. Dallachiesa, M., Ebaid, A., Eldawy, A. Elmagarmid, A., Ilyas, I., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)

    Google Scholar 

  7. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21(2), 213–238 (2012)

    Article  Google Scholar 

  8. Ginsburg, S., Hull, R.: Order dependency in the relational model. TCS 26(1), 149–195 (1983)

    Article  MathSciNet  Google Scholar 

  9. Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT (2009)

    Google Scholar 

  10. Langer, P., Naumann, F.: Efficient order dependency detection. VLDB J. 25(2), 223–241 (2016)

    Article  Google Scholar 

  11. Ng, W.: An extension of the relational data model to incorporate ordered domains. TODS 26(3), 344–383 (2001)

    Article  Google Scholar 

  12. Song, S., Chen, L.: Differential dependencies: reasoning and discovery. TODS 36(3), 16:1–16:41 (2011)

    Article  Google Scholar 

  13. Szlichta, J., Godfrey, P., Gryz, J.: Fundamentals of order dependencies. PVLDB 5(11), 1220–1231 (2012)

    Google Scholar 

  14. Szlichta, J., Godfrey, P., Golab, L., Kargar, M., Srivastava, D.: Effective and complete discovery of order dependencies via set-based axiomatization. PVLDB 10(7), 721–732 (2017)

    Google Scholar 

  15. Szlichta, J., Godfrey, P., Gryz, J., Zuzarte, C.: Expressiveness and complexity of order dependencies. PVLDB 6(14), 1858–1869 (2013)

    Google Scholar 

  16. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: SIGMOD (2014)

    Google Scholar 

  17. Zhang, A., Song, S., Wang, J.: Sequential data cleaning: a statistical approach. In: SIGMOD (2016)

    Google Scholar 

Download references

Acknowledgements

This work is supported by NSFC 61572135, NSFC 61370157, National High Technology Research and Development Program (863 Program) of China (2015AA050203), State Grid Rsearch Project No. 52094016000A, Shanghai Science and Technology Project (No. 16DZ1100200, 16DZ1110102), Aircraft Risk Management Database Project, National Nonprofit Ocean Research Project (No. 201405031-04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zijing Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, Y., Tan, Z., Yang, K., Yang, W., Zhou, X., Guo, N. (2018). Repairing Data Violations with Order Dependencies. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10828. Springer, Cham. https://doi.org/10.1007/978-3-319-91458-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91458-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91457-2

  • Online ISBN: 978-3-319-91458-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics