Skip to main content

Method for the Assessment of Semantic Accuracy Using Rules Identified by Conditional Functional Dependencies

  • Conference paper
  • First Online:
  • 724 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1057))

Abstract

Data is a central resource of organizations, which makes data quality essential for their intellectual growth. Quality is seen as a multifaceted concept and, in general, refers to suitability for use. This indicates that the pillar for the quality evaluation is the definition of a set of quality rules, determined from the criteria of the business. However, it may be impossible to manually specify the quality rules for the evaluation. The use of Conditional Functional Dependencies (CFDs) allows to automatically identifying context-dependent quality rules. This paper presents a method for assess data quality using the CFD concept to extract quality rules and identify inconsistencies. The quality of the database in the proposed method will be evaluated in the semantic accuracy dimension. The method consolidates the process of knowledge discovery with data quality assessment, listing the respective activities that result in the quantification of semantic accuracy. An instance of the method has been demonstrated by applying it in the context of air quality monitoring data. The evaluation of the method showed that the CFDs rules were able to reflect some atmospheric phenomena, emerging interesting context-dependent rules. The patterns of the transactions, which may be unknown by the users, can be used as input for the evaluation and monitoring of data quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abdo, A.S., Rashed, K.S., Hatem, M.A.: Enhancement of data quality in health care industry: a promising data quality approach. In: Handbook of Research on Machine Learning Innovations and Trends, pp. 230–250. IGI Global (2017)

    Google Scholar 

  2. Abdullah, U., Sawar, M.J., Ahmed, A.: Design of a rule-based system using Structured Query Language. In: Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing DASC 2009, pp. 223–228. IEEE (2009)

    Google Scholar 

  3. Alpar, P., Winkelsträter, S.: Assessment of data quality in accounting data with association rules. Expert Syst. Appl. 41(5), 2259–2268 (2014)

    Article  Google Scholar 

  4. Aria. Teoria da Poluição Atmosférica. <http://www.ariadobrasil.com.br/pollutant_dispersal.php/>. Accessed 18 Feb 2019

  5. Batini, C., Scannapieco, M.: Data and Information Quality. DSA. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7

    Book  MATH  Google Scholar 

  6. Batini, C., et al.: A comprehensive data quality methodology for web and structured data. Int. J. Innovative Comput. Appl. 1(3), 205–218 (2008)

    Article  Google Scholar 

  7. Cetesb website. Qualar. <http://cetesb.sp.gov.br/ar/qualar/>. Accessed 16 July 2018

  8. Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endowment 1(1), 1166–1177 (2008)

    Article  Google Scholar 

  9. Du, Y., et al.: Discovering context-aware conditional functional dependencies. Front. Comput. Sci. 11(4), 688–701 (2017)

    Article  Google Scholar 

  10. English, L.P.: Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. Wiley, Hoboken (1999)

    Google Scholar 

  11. Fan, W., et al.: Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23(5), 683–698 (2011)

    Article  Google Scholar 

  12. Furber, C., Hepp, M.: SWIQA – A semantic web information quality assessment framework. In: European Conference on Information Systems (ECIS) (2011)

    Google Scholar 

  13. Guo, A., Liu, X., Sun, T.: Research on key problems of data quality in large industrial data environment. In: Proceedings of the 3rd International Conference on Robotics, Control and Automation (ICRCA 2018), pp. 245–248. ACM, New York (2018)

    Google Scholar 

  14. Heinrich, B., et al.: Requirements for data quality metrics. J. Data Inf. Qual. 9(2), 32 (2018). Article 12

    Google Scholar 

  15. IEC 25012: 2008 Software engineering-Software product Quality requirements and evaluation (SQuaRE) - data quality model (2008)

    Google Scholar 

  16. Lira, T.S.: Modelagem e previsão da qualidade do ar na cidade de Uberlândia – MG. Tese (doutorado) Universidade Federal de Uberlândia, Programa de Pós-Graduação em Engenharia Química (2009)

    Google Scholar 

  17. Maydanchik, A.: Data Quality Assessment. Technics Publications, Basking Ridge, 322 p. (2007)

    Google Scholar 

  18. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)

    Article  Google Scholar 

  19. Saha, B., Srivastava, D.: Data quality: the other face of big data. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 1294–1297. IEEE (2014)

    Google Scholar 

  20. Salem, R., Abdo, A.: Fixing rules for data cleaning based on conditional functional dependency. Future Comput. Inf. J. 1(1–2), 10–26 (2016)

    Article  Google Scholar 

  21. Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)

    Article  Google Scholar 

  22. Zhou, J., et al.: A method for generating fixing rules from constant conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 6–11 (2016)

    Google Scholar 

  23. Zhang, C., Yufeng, D.: Conditional functional dependency discovery and data repair based on decision tree. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 864–868 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vanusa S. Santana or Fábio S. Lopes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santana, V.S., Lopes, F.S. (2019). Method for the Assessment of Semantic Accuracy Using Rules Identified by Conditional Functional Dependencies. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds) Metadata and Semantic Research. MTSR 2019. Communications in Computer and Information Science, vol 1057. Springer, Cham. https://doi.org/10.1007/978-3-030-36599-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36599-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36598-1

  • Online ISBN: 978-3-030-36599-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics