Skip to main content

Conditional Differential Dependencies (CDDs)

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

Abstract

Differential dependency (DD) is a newly proposed data dependency theory that captures the relationships amongst data values. Like the classical functional dependency (FD) theory, DDs are defined to hold over entire instances of relations. This paper proposes a novel extension of the DD theory to hold over subsets of relations, called conditional DD (CDD), similar to the relaxations of FD to conditional FD (CFD) [4] and conditional FD with predicates (CFDPs) [6]. In this work, we present: the formal definitions; the consistency and implication analysis; and a set of axioms to infer CDDs. Furthermore, we study the discovery problem of CDDs and present an algorithm for mining a minimal cover set \(\varSigma _c\) of constant CDDs from a given instance of a relation. And, we propose an interestingness measure for ranking discovered CDDs and reducing the size \(|\varSigma _c|\) of \(\varSigma _c\). We demonstrate the efficiency, effectiveness and scalability of the discovery algorithm through experiments on both real and synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    PW is petal width; SL is sepal length; CL is Iris class.

  2. 2.

    we note that we use \(A\langle \mathtt {a}\rangle \) for a constant CS of A; and A[w] for a DF of A.

  3. 3.

    \(w_a=[x,y]\) where x, y are the min. and max. distance of values in \(\varPsi _A\) respectively.

  4. 4.

    similar to the dependent quality measure in [14].

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conference on VLDB, pp. 487–499 (1994)

    Google Scholar 

  2. Armstrong, W.W.: Dependency structures of data base relationships. In: World Computer Congress - IFIP, pp. 580–583 (1974)

    Google Scholar 

  3. Bache, K., Lichman, M.: UCI Machine Learning repository (2013). http://archive.ics.uci.edu/ml

  4. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: 23rd ICDE, pp. 746–755 (2007)

    Google Scholar 

  5. Bravo, L., Fan, W., Ma, S.: Extending dependencies with conditions. In: 33rd International Conference on VLDB, pp. 243–254 (2007)

    Google Scholar 

  6. Chen, W., Fan, W., Ma, S.: Analyses and validation of conditional dependencies with built-in predicates. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 576–591. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  7. Fan, W., Geerts, F.: Foundations of Data Quality Management. Synth. Lect. Data Manage. 4(5), 1–127 (2012). doi:10.2200/S00439ED1V01Y201207DTM030

  8. Kwashie, S., Liu, J., Li, J., Ye, F.: Mining differential dependencies: a subspace clustering approach. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 50–61. Springer, Heidelberg (2014)

    Google Scholar 

  9. Li, J., Liu, G., Wong, L.: Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: 13th ACM SIGKDD, pp. 430–439 (2007)

    Google Scholar 

  10. Liu, G., Li, J., Sim, K., Wong, L.: Distance based subspace clustering with flexible dimension partitioning. In: 23rd ICDE, pp. 1250–1254 (2007)

    Google Scholar 

  11. Liu, J., Kwashie, S., Li, J., Ye, F., Vincent, M.W.: Discovery of approximate differential dependencies. CoRR, abs/1309.3733 (2013)

    Google Scholar 

  12. Simpson, E.H.: The interpretation of interaction in contingency tables. J. Roy. Stat. Soc. Ser. B (Stat. Meth.) 13(2), 238–241 (1951)

    MathSciNet  MATH  Google Scholar 

  13. Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. 36(3), 16:1–16:41 (2011)

    Article  Google Scholar 

  14. Song, S., Chen, L., Cheng, H.: Parameter-free determination of distance thresholds for metric distance constraints. In: 28th ICDE, pp. 846–857 (2012)

    Google Scholar 

  15. Uno, T., Kiyomi, M., Arimura, H.: LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: 1st International Workshop on Open Source Data Mining, pp. 77–86 (2005)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by NSFC 61472166.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Selasi Kwashie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kwashie, S., Liu, J., Li, J., Ye, F. (2015). Conditional Differential Dependencies (CDDs). In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics