Skip to main content

Determining Repairing Sequence of Inconsistencies in Content-Related Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10569))

Abstract

Data consistency is one of the central issues of data quality management. Content-related conditional functional dependencies (CCFDs) are practical techniques for data consistency. CCFDs catch inconsistencies by putting content-related data together. Specially, repairing sequence plays a key role in consistency repairing. Some repairing sequences may bring unexpected results (e.g., incorrect repairs and results with extra repairing-cost). Hence, reasonable repairing sequences are advocated and readily supported by commercial system for better performance. To meet this need, this paper present a method of determining repairing sequence of inconsistencies in content-related data. (1) We present repairing sequence graph about CCFDs to select the inconsistencies which should be repaired preferentially. (2) We analyze the repairing mutex and discuss the interaction between repairing sequence and repairing mutex. (3) We proof that the problem of determining repairing sequence with minimum repairing-cost is NP-complete so that our method heuristically finds the appropriate repairing sequence. Our solution performs to be effective by empirical evaluation on three datasets.

This is a preview of subscription content, log in via an institution.

References

  1. Fan, W., Geerts, F.: Foundations of Data Quality Management. M&C, San Rafael (2012)

    Book  Google Scholar 

  2. Fan, W.: Data quality: from theory to practice. In: Proceedings of the 36th ACM SIGMOD International Conference, pp. 7–18. ACM (2015)

    Article  Google Scholar 

  3. Eckerson, W.W.: Data quality and the bottom line. J. Radioanal. Nucl. Chem. 160(4), 355–362 (1992)

    Google Scholar 

  4. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: Proceedings of the 23rd International Conference of Data Engineering, pp. 746–755. IEEE (2007)

    Google Scholar 

  5. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. Trans. Database Syst. 33(2), 6–47 (2008)

    Article  Google Scholar 

  6. Du, Y.F., Shen, D.R., Nie, T.Z., Kou, Y., Yu, G.: Content-related repairing of inconsistencies in distributed data. J. Comput. Sci. Technol. 31(4), 741–758 (2016)

    Article  Google Scholar 

  7. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 26th ACM SIGMOD International Conference, pp. 143–154. ACM (2005)

    Google Scholar 

  8. Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Int. J. Very Large Data Bases 8(10), 1082–1093 (2015)

    Google Scholar 

  9. Interlandi, M., Tang, N.: Proof positive and negative in data cleaning. In: Proceedings of the 31st International Conference of Data Engineering, pp. 18–29. IEEE (2015)

    Google Scholar 

  10. Bravo, L., Fan, W., Ma, S.: Extending dependencies with conditions. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 243–254 (2007)

    Google Scholar 

  11. Volkovs, M., Fei, C., Szlichta, J., Miller, R.J.: Continuous data cleaning. In: Proceedings of the 30th International Conference of Data Engineering, pp. 244–255. IEEE (2014)

    Google Scholar 

  12. Prokoshyna, N., Szlichta, J., Chiang, F., Miller, R.J., Srivastava, D.: Combining quantitative and logical data cleaning. Int. J. Very Large Data Bases 9(4), 300–311 (2015)

    Google Scholar 

  13. Chen, Q., Tan, Z., He, C., Sha, C., Wang, W.: Repairing functional dependency violations in distributed data. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 441–457. Springer, Cham (2015). doi:10.1007/978-3-319-18120-2_26

    Chapter  Google Scholar 

  14. Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., QuianRuiz, J.A., Tang, N., Yin, S.: Bigdansing: A system for big data cleansing. In: Proceedings of the 36th ACM SIGMOD International Conference, pp. 1215–1230. ACM (2015)

    Google Scholar 

  15. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 315–326. VLDB (2007)

    Google Scholar 

  16. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: Proceedings of the 35th ACM SIGMOD International Conference, pp. 457–468. ACM (2014)

    Google Scholar 

  17. Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The llunatic data-cleaning framework. Int. J. Very Large Data Bases 6(9), 625–636 (2013)

    Google Scholar 

  18. Chalamalla, A., Ilyas, I.F., Ouzzani, M., Papotti, P.: Descriptive and prescriptive data cleaning. In: Proceedings of the 35th ACM SIGMOD International Conference, pp. 445–456. ACM (2014)

    Google Scholar 

  19. Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F., Ouzzani, M., Tang, N.: Nadeef: a commodity data cleaning system. In: Proceedings of the 34th ACM SIGMOD International Conference, pp. 541–552. ACM (2013)

    Google Scholar 

  20. Fan, W., Geerts, F., Tang, N., Yu, W.: Inferring data currency and consistency for conflict resolution. In: Proceedings of the 29th International Conference of Data Engineering, pp. 470–481. IEEE (2013)

    Google Scholar 

  21. Du, Y., Shen, D., Nie, T., Kou, Y., Yu, G.: Discovering condition-combined functional dependency rules. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds.) APWeb 2014. LNCS, vol. 8709, pp. 247–257. Springer, Cham (2014). doi:10.1007/978-3-319-11116-2_22

    Chapter  Google Scholar 

Download references

Acknowledgement

Our research was supported by, the National Natural Science Foundation of China under Grant Nos. 61672142 and 61472070, and the Fundamental Research Fundation for the Central Universities of China under Grant Nos. N150408001-3 and N150404013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derong Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Du, Y., Shen, D., Nie, T., Kou, Y., Yu, G. (2017). Determining Repairing Sequence of Inconsistencies in Content-Related Data. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68783-4_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68782-7

  • Online ISBN: 978-3-319-68783-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics