Skip to main content
Log in

Using association rules to mine for strong approximate dependencies

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction. This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large databases with real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD conference, pp 207–216

  • Bell S (1995) Discovery and maintenance of functional dependencies by independencies. In: Proceedings of the first international conference on knowlege discovery and data mining (KDD’95), pp 27–32

  • Bell S (1997) Dependency mining in relational databases. In: Proceedings of the ECSQARU-FAPR’97, pp 16–29

  • Berzal F, Blanco I, Sánchez D and Vila M (2002). Measuring the accuracy and interest of association rules: A new framework. Intell Data Anal 6: 221–235

    MATH  Google Scholar 

  • Berzal F, Cubero J, Sánchez D, Serrano J, Vila MA (2003) Finding fuzzy approximate dependencies within STULONG data. In: Berka P (ed) Proceedings of the ECML/PKDD 2003 workshop on discovery challenge, pp 34–46

  • Berzal F, Blanco I, Sánchez D, Serrano J and Vila MA (2005). A definition for fuzzy approximate dependencies. Fuzzy Set Syst 149: 105–129

    Article  MATH  Google Scholar 

  • Bitton D, Millman J, Torgersen S (1989) A feasibility and performance study of dependency inference. In: Proceedings of the 5th international conference on data engineering, pp 635–641

  • Bosc P, Lietard L, Pivert O (1997) Functional dependencies revisited under graduality and imprecision. In: Annual meeting of NAFIPS, pp 57–62

  • Bra PD and Paredaens J (1983). Horizontal decompositions for handling exceptions to functional dependencies. Adv Database Theor 2: 123–144

    Google Scholar 

  • Brin S, Motwani R, Ullman J and Tsur S (1997). Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2): 255–264

    Article  Google Scholar 

  • Calero J, Delgado G, Sánchez-Marañón M, Sánchez D, Serrano J, Vila MA (2003) Helping user to discover association rules. a case in soil color as aggregation of other soil properties. In: Proceedings of the 5th international conference on enterprise information systems, ICEIS’03, pp 533–540

  • Calero J, Delgado G, Sánchez D, Serrano J, Vila MA (2004a) A proposal of fuzzy correspondence analysis based on flexible data mining techniques. In: López-Díaz M, Gil M, Grzegorzewski P, Hyrniewicz O, Lawry J (eds) Soft methodology and random information systems. Advances in soft computing series. Springer, pp 447–454

  • Calero J, Delgado G, Sánchez-Marañón M, Sánchez D, Vila MA, Serrano J (2004b) An experience in management of imprecise soil databases by means of fuzzy association rules and fuzzy approximate dependencies. In: Proceedings of the 6th international conference on enterprise information systems, ICEIS’04, pp 138–146

  • Calero J, Delgado G, Serrano J, Sánchez D, Vila MA (2004c) Fuzzy approximate dependencies over imprecise domains. an example in soil data management. In: Proceedings of the IADIS international conference applied computing 2004, pp 396–403

  • Cubero J, Cuenca F, Blanco I, Vila M (1998) Incomplete functional dependencies versus knowledge discovery in databases. In: Proceedings of the EUFIT’98, Aachen, Germany, pp 731–74

  • Delgado M, Marín N, Sánchez D and Vila M (2003). Fuzzy association rules: general model and applications. IEEE Trans Fuzzy Syst 11(2): 214–225

    Article  Google Scholar 

  • Dubois D, Hüllermeier E and Prade H (2006). A systematic approach to the assessment of fuzzy association rules. Data Min Knowl Disc 13(2): 167–192

    Article  Google Scholar 

  • Flach P and Savnik I (1999). Database dependency discovery: a machine learning approach. AI Commun 12(3): 139–160

    MathSciNet  Google Scholar 

  • Gunopulos D, Mannila H, Saluja S (1997) Discovering all most specific sentences by randomized algorithms. In: Afrati F, Kolaitis P (eds) Proceedings of the international conference on database theory, pp 215–229

  • Huhtala Y, Karkkainen J, Porkka P, Toivonen H (1998) Efficient discovery of functional and approximate dependencies using partitions. In: Proceedings of the 14th international conference on data engineering, pp 392–401

  • Huhtala Y, Karkkainen J, Porkka P and Toivonen H (1999). TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111

    Article  MATH  Google Scholar 

  • Kivinen J and Mannila H (1995). Approximate dependency inference from relations. Theor Comput Sci 149(1): 129–149

    Article  MATH  MathSciNet  Google Scholar 

  • Kramer S, Pfahringer B (1996) Efficient search for strong partial determinations. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD’96), pp 371–374

  • Lavrac N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: LNAI 1364. Springer-Verlag, pp 74–185

  • Lopes S, Petit J and Lakhal L (2002). Functional and approximate dependency mining: Database and FCA points of view. J Expt Theor Artif Intell 14: 93–114

    Article  MATH  Google Scholar 

  • Lukasiewicz J (1970) Die logishen grundlagen der wahrscheinilchkeitsrechnung. In: Borkowski L (ed) Jan Lukasiewicz - Selected Works. North Holland Publishing Company, Amsterdam, London, Polish Scientific Publishers, Warsaw, pp 16–63

  • Mannila H and Räihä K (1992). On the complexity of inferring functional dependencies. Discrete Appl Math 40: 237–243

    Article  MATH  MathSciNet  Google Scholar 

  • Mannila H and Räihä K (1994). Algorithms for inferring functional dependencies. Data Knowl Eng 12(1): 83–99

    Article  MATH  Google Scholar 

  • Pawlak Z (1982). Rough sets. Int J Comput Inf Sci 11(5): 341–356

    Article  MathSciNet  MATH  Google Scholar 

  • Pawlak Z (1991). Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishing, Dordrecht

    MATH  Google Scholar 

  • Pfahringer B, Kramer S (1995) Compression-based evaluation of partial determinations. In: Proceedings of the first international conference on knowledge discovery and data mining (KDD’95), pp 234–239

  • Piatetsky-Shapiro G (1991). Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI/MIT Press, pp 229–238

  • Piatetsky-Shapiro G (1992) Probabilistic data dependencies. In: Zytkow J (ed) Proceedings of machine discovery workshop, pp 11–17

  • Russell S (1989) The use of knowledge in analogy and induction. Pitman Publishing

  • Sánchez D (1999) Adquisición de relaciones entre atributos en bases de datos relacionales (Translates to: Acquisition of relationships between attributes in relational databases) (in Spanish). PhD thesis, Department of Computer Science and Artificial Intelligence, University of Granada

  • Sánchez D, Serrano J, Vila M, Aranda V, Calero J and Delgado G (2003). Using data mining techniques to analyze correspondences between user and scientific knowledge in an agricultural environment. In: Piattini, M, Filipe, J, and Braz, J (eds) Enterprise information systems IV, pp 75–89. Kluwer Academic Publishers, Hingham, MA, USA

    Google Scholar 

  • Savnik I, Flach P (1993) Bottom-up induction of functional dependencies from relations. In: Piatetsky-Shapiro G (ed) Knowledge discovery in databases, papers from the 1993 AAAI workshop. AAAI, pp 174–185

  • Schlimmer J (1993) Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In: Piatetsky-Shapiro G (ed) Proceedings of the 10th international conference on machine learning, pp 284–290

  • Shen W (1991) Discovering regularities from large knowledge bases. In: Proceedings of the 8th international workshop on machine learning, pp 539–543

  • Shortliffe E and Buchanan B (1975). A model of inexact reasoning in medicine. Math Biosci 23: 351–379

    Article  MathSciNet  Google Scholar 

  • Silverstein C, Brin S and Motwani R (1998). Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Disc 2: 39–68

    Article  Google Scholar 

  • Ziarko W (1991) The discovery, analysis and representation of data dependencies in databases. In: Piatetsky-Shapiro G, Frawley W (eds) Knowl discovery databases. AAAI/MIT Press, pp 195–209

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Sánchez.

Additional information

Responsible editor: M. J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez, D., Serrano, J.M., Blanco, I. et al. Using association rules to mine for strong approximate dependencies. Data Min Knowl Disc 16, 313–348 (2008). https://doi.org/10.1007/s10618-008-0092-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0092-3

Keywords

Navigation