Skip to main content

Automated Detection and Monitoring of Advanced Data Quality Rules

  • Conference paper
  • First Online:
Book cover Database and Expert Systems Applications (DEXA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Included in the following conference series:

Abstract

Nowadays business decisions heavily rely on data in data warehouse systems (DWH), thus data quality (DQ) in DWH is a highly relevant topic. Consequently, sophisticated yet still easy to use solutions for monitoring and ensuring high data quality are needed. This paper is based on the IQM4HD project in which a prototype of an automated data quality monitoring system has been designed and implemented. Specifically, we focus on the aspect of expressing advanced data quality rules such as checking whether data conforms to a certain time series or whether data deviates significantly in any of the dimensions within a data cube. We show how such types of data quality rules can be expressed in our domain specific language (DSL) RADAR which has been introduced in [10]. Since manual specification of such rules tends to be complex, it is particularly important to support the DQ manager in detecting and creating potential rules by profiling of historic data. Thus we also explain the data profiling component of our prototype and illustrate how advanced rules can be semi-automatically detected and suggested to the DQ manager.

The project IQM4HD has been funded by the German Federal Ministry of Education and Research under grant no. 01IS15053A. We would also like to thank our partners SHS Viveon/mVise for implementing the prototype and CTS Eventim for providing important requirements and reviewing practical applicability of the prototype and concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://iqm4hd.wp.hs-hannover.de/english.html.

References

  1. Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015). https://doi.org/10.1007/s00778-015-0389-y

    Article  Google Scholar 

  2. Aggarwal, C.C.: Outlier Analysis, 1st edn. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6396-2

    Book  MATH  Google Scholar 

  3. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 746–755. IEEE (2007). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4221723

  4. Caruccio, L., Deufemia, V., Polese, G.: Relaxed functional dependencies - a survey of approaches. IEEE Trans. Knowl. Data Eng. 28(1), 147–165 (2016)

    Article  Google Scholar 

  5. Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endow. 1(1), 1166–1177 (2008). http://dl.acm.org/citation.cfm?id=1453980

    Article  Google Scholar 

  6. Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014). https://doi.org/10.1109/TKDE.2013.184

    Article  MATH  Google Scholar 

  7. Heine, F., Kleiner, C., Koschel, A., Westermayer, J.: The data checking engine: complex rules for data quality monitoring (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.682.7950&rep=rep1&type=pdf

  8. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts, Melbourne (2018)

    Google Scholar 

  9. Li, X., Han, J.: Mining approximate top-k subspace anomalies in multi-dimensional time-series data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 447–458. VLDB Endowment (2007)

    Google Scholar 

  10. Oelsner, T., Heine, F., Kleiner, C.: IQM4HD concepts. Technical report, University of Applied Sciences and Arts Hannover, Germany (2018). http://iqm4hd.wp.hs-hannover.de/ConceptsIQM4HD.pdf

  11. Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  12. Schelter, S., Lange, D., Schmidt, P., Celikel, M., Biessmann, F., Grafberger, A.: Automating large-scale data quality verification. Proc. VLDB Endow. 11(12), 1781–1794 (2018). https://doi.org/10.14778/3229863.3229867. http://www.vldb.org/pvldb/vol11/p1781-schelter.pdf

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carsten Kleiner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Heine, F., Kleiner, C., Oelsner, T. (2019). Automated Detection and Monitoring of Advanced Data Quality Rules. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27615-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27614-0

  • Online ISBN: 978-3-030-27615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics