Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 407))

  • 1067 Accesses

Abstract

Data quality is considered crucial challenge in emerging big data scenarios. Data mining techniques can be reutilized efficiently in data cleaning process. Recent studies have shown that databases are often suffered from inconsistent data issues, which ought to be resolved in the cleaning process. In this paper, we introduce an automated approach for dependably generating rules from databases themselves, in order to detect data inconsistency problems from large databases. The proposed approach employs confidence and lift measures with integrity constraints, in order to guarantee that generated rules are minimal, non-redundant and precise. The proposed approach is validated against several datasets from healthcare domain. We experimentally demonstrate that our approach outperform significant enhancement over existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Automatic synthesis of data cleansing activities (2011)

    Google Scholar 

  2. Li, J., Liu, J., Toivonen, H., Yong, J.: Effective pruning for the discovery of conditional functional dependencies. Comput. J. 56, 378–392 (2013)

    Article  Google Scholar 

  3. Yakout, M., Elmagarmid, A.K., Neville, J.: Ranking for data repairs. In: Proceeding—International Conference Data Engineering, pp. 23–28 (2010)

    Google Scholar 

  4. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. In: Proceeding Journal of Data and Information Quality (JDIQ) vol. 4(4), p. 16 (2014)

    Google Scholar 

  5. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: SIGMOD Conference, pp. 457–468 (2014)

    Google Scholar 

  6. Fan, W., Geerts, F.: Foundations of data quality management. Synth. Lect. Data Manage. 4, 1–217 (2012)

    Article  MATH  Google Scholar 

  7. Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE Trans. Knowl. Data Eng. 24, 251–264 (2012)

    Article  Google Scholar 

  8. Vo, L.T.H., Cao, J., Rahayu, W.: Discovering conditional functional dependencies. Conf. Res. Pract. Inf. Technol. Ser. 115, 143–152 (2011)

    Google Scholar 

  9. Rodríguez, C.C.G., Riveill, M., Antipolis, S.: e-Health monitoring applications : what about data quality ? (2010)

    Google Scholar 

  10. Mans, R.S., van der A., Wil M.P., Vanwersch, R.J.: Data Quality Issues. Process Mining in Healthcare, pp. 79–88. Springer, Berlin (2015)

    Google Scholar 

  11. Kazley, A.S., Diana, M.L., Ford, E.W., Menachemi, N.: Is electronic health record use associated with patient satisfaction in hospitals? Health Care Manage. Rev. 37, 23–30 (2012)

    Article  Google Scholar 

  12. Kalyani, D.D.: Mining constant conditional functional dependencies for improving data quality. 74, 12–20 (2013)

    Google Scholar 

  13. Bharambe, D., Jain, S., Jain, A.: A survey : detection of duplicate record. 2, (2012)

    Google Scholar 

  14. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceeding 33rd International Conference Very Large Data Bases, pp. 315–326. Vienna, Au (2007)

    Google Scholar 

  15. Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21, 121–144 (2012)

    Article  Google Scholar 

  16. Yao, H., Hamilton, H.J.: Mining functional dependencies from data. Data Min. Knowl. Discov. 16, 197–219 (2008)

    Google Scholar 

  17. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: Proceeding—International Conference Data Engineering, pp. 746–755 (2007)

    Google Scholar 

  18. Bauckmann, J., Abedjan, Z., Leser, U., Müller, H., Naumann, F.: Discovering conditional inclusion dependencies. In: 21st ACM International Conference on Information and Knowledge Management, pp. 2094–2098. (2012)

    Google Scholar 

  19. Fan, W., Geerts, F.: Capturing missing tuples and missing values. In: Proceeding 29th ACM SIGACT-SIGMOD-SIGART Symposium Principle of Database System, pp. 169–178 (2010)

    Google Scholar 

  20. Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: Proceeding ACM SIGMOD International Conference Management Data, pp. 75–86 (2010)

    Google Scholar 

  21. Larsson, P.: Evaluation of open source data cleaning tools : open refine and data wrangler. (2013)

    Google Scholar 

  22. Vassiliadis, P., Simitsis, A.: Extraction, transformation, and loading. Encycl. Database Syst. 1095–1101 (2009)

    Google Scholar 

  23. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21, 213–238 (2012)

    Article  Google Scholar 

  24. Fan, W., Gao, H., Jia, X., Li, J., Ma, S.: Dynamic constraints for record matching. VLDB J. 20, 495–520 (2011)

    Article  Google Scholar 

  25. Reiter, J.: Data quality and record linkage techniques. J. Am. Stat. Assoc. 103(482), 881 (2008)

    Article  Google Scholar 

  26. Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. VLDB J. 18, 255–276 (2009)

    Article  Google Scholar 

  27. Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9, 223–248 (2004)

    Article  MathSciNet  Google Scholar 

  28. Chang, I.-C., Li, Y.-C., Wu, T.-Y., Yen, D.C.: Electronic medical record quality and its impact on user satisfaction—Healthcare providers’ point of view. Gov. Inf. Q. 29, 235–242 (2012)

    Article  Google Scholar 

  29. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 144–151 (2012)

    Google Scholar 

  30. Groves, P., Kayyali, B., Knott, D., Van Kuiken, S.: The “ Big Data ” Revolution in Healthcare. McKinsey, New York (2013)

    Google Scholar 

  31. Kush, R.D., Ph.D., Helton, E., Rockhold, F.W., Hardison, C.D.: Electronic health records, medical research, and the tower of Babel. 16–18 (2008)

    Google Scholar 

  32. Koh, H.C., Tan, G.: Data mining applications in healthcare. J. Healthc. Inf. Manage. 19, 64–72 (2005)

    Google Scholar 

  33. Chiang, F., Miller, R.J.: Discovering data quality rules. In: Proceeding VLDB Endowment, pp. 1166–1177 (2008)

    Google Scholar 

  34. Medina, R., Nourine, L.: A unified hierarchy for functional dependencies, conditional functional dependencies and association rules. In: LNAI, Lecture Notes Computer Science (including Subseries Lecture Notes Artifical Intelligent Lecture Notes Bioinformatics). vol. 5548, pp. 98–113 (2009)

    Google Scholar 

  35. Hussein, N., Alashqur, A., Sowan, B.: Using the interestingness measure lift to generate association rules. J. Adv. Comput. Sci. Technol. 4, 156 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asmaa S. Abdo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Abdo, A.S., Salem, R.K., Abdul-Kader, H.M. (2016). Automatic Rules Generation Approach for Data Cleaning in Medical Applications. In: Gaber, T., Hassanien, A., El-Bendary, N., Dey, N. (eds) The 1st International Conference on Advanced Intelligent System and Informatics (AISI2015), November 28-30, 2015, Beni Suef, Egypt. Advances in Intelligent Systems and Computing, vol 407. Springer, Cham. https://doi.org/10.1007/978-3-319-26690-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26690-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26688-6

  • Online ISBN: 978-3-319-26690-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics