Abstract
In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction. This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large databases with real-world data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD conference, pp 207–216
Bell S (1995) Discovery and maintenance of functional dependencies by independencies. In: Proceedings of the first international conference on knowlege discovery and data mining (KDD’95), pp 27–32
Bell S (1997) Dependency mining in relational databases. In: Proceedings of the ECSQARU-FAPR’97, pp 16–29
Berzal F, Blanco I, Sánchez D and Vila M (2002). Measuring the accuracy and interest of association rules: A new framework. Intell Data Anal 6: 221–235
Berzal F, Cubero J, Sánchez D, Serrano J, Vila MA (2003) Finding fuzzy approximate dependencies within STULONG data. In: Berka P (ed) Proceedings of the ECML/PKDD 2003 workshop on discovery challenge, pp 34–46
Berzal F, Blanco I, Sánchez D, Serrano J and Vila MA (2005). A definition for fuzzy approximate dependencies. Fuzzy Set Syst 149: 105–129
Bitton D, Millman J, Torgersen S (1989) A feasibility and performance study of dependency inference. In: Proceedings of the 5th international conference on data engineering, pp 635–641
Bosc P, Lietard L, Pivert O (1997) Functional dependencies revisited under graduality and imprecision. In: Annual meeting of NAFIPS, pp 57–62
Bra PD and Paredaens J (1983). Horizontal decompositions for handling exceptions to functional dependencies. Adv Database Theor 2: 123–144
Brin S, Motwani R, Ullman J and Tsur S (1997). Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2): 255–264
Calero J, Delgado G, Sánchez-Marañón M, Sánchez D, Serrano J, Vila MA (2003) Helping user to discover association rules. a case in soil color as aggregation of other soil properties. In: Proceedings of the 5th international conference on enterprise information systems, ICEIS’03, pp 533–540
Calero J, Delgado G, Sánchez D, Serrano J, Vila MA (2004a) A proposal of fuzzy correspondence analysis based on flexible data mining techniques. In: López-Díaz M, Gil M, Grzegorzewski P, Hyrniewicz O, Lawry J (eds) Soft methodology and random information systems. Advances in soft computing series. Springer, pp 447–454
Calero J, Delgado G, Sánchez-Marañón M, Sánchez D, Vila MA, Serrano J (2004b) An experience in management of imprecise soil databases by means of fuzzy association rules and fuzzy approximate dependencies. In: Proceedings of the 6th international conference on enterprise information systems, ICEIS’04, pp 138–146
Calero J, Delgado G, Serrano J, Sánchez D, Vila MA (2004c) Fuzzy approximate dependencies over imprecise domains. an example in soil data management. In: Proceedings of the IADIS international conference applied computing 2004, pp 396–403
Cubero J, Cuenca F, Blanco I, Vila M (1998) Incomplete functional dependencies versus knowledge discovery in databases. In: Proceedings of the EUFIT’98, Aachen, Germany, pp 731–74
Delgado M, Marín N, Sánchez D and Vila M (2003). Fuzzy association rules: general model and applications. IEEE Trans Fuzzy Syst 11(2): 214–225
Dubois D, Hüllermeier E and Prade H (2006). A systematic approach to the assessment of fuzzy association rules. Data Min Knowl Disc 13(2): 167–192
Flach P and Savnik I (1999). Database dependency discovery: a machine learning approach. AI Commun 12(3): 139–160
Gunopulos D, Mannila H, Saluja S (1997) Discovering all most specific sentences by randomized algorithms. In: Afrati F, Kolaitis P (eds) Proceedings of the international conference on database theory, pp 215–229
Huhtala Y, Karkkainen J, Porkka P, Toivonen H (1998) Efficient discovery of functional and approximate dependencies using partitions. In: Proceedings of the 14th international conference on data engineering, pp 392–401
Huhtala Y, Karkkainen J, Porkka P and Toivonen H (1999). TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111
Kivinen J and Mannila H (1995). Approximate dependency inference from relations. Theor Comput Sci 149(1): 129–149
Kramer S, Pfahringer B (1996) Efficient search for strong partial determinations. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD’96), pp 371–374
Lavrac N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: LNAI 1364. Springer-Verlag, pp 74–185
Lopes S, Petit J and Lakhal L (2002). Functional and approximate dependency mining: Database and FCA points of view. J Expt Theor Artif Intell 14: 93–114
Lukasiewicz J (1970) Die logishen grundlagen der wahrscheinilchkeitsrechnung. In: Borkowski L (ed) Jan Lukasiewicz - Selected Works. North Holland Publishing Company, Amsterdam, London, Polish Scientific Publishers, Warsaw, pp 16–63
Mannila H and Räihä K (1992). On the complexity of inferring functional dependencies. Discrete Appl Math 40: 237–243
Mannila H and Räihä K (1994). Algorithms for inferring functional dependencies. Data Knowl Eng 12(1): 83–99
Pawlak Z (1982). Rough sets. Int J Comput Inf Sci 11(5): 341–356
Pawlak Z (1991). Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishing, Dordrecht
Pfahringer B, Kramer S (1995) Compression-based evaluation of partial determinations. In: Proceedings of the first international conference on knowledge discovery and data mining (KDD’95), pp 234–239
Piatetsky-Shapiro G (1991). Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI/MIT Press, pp 229–238
Piatetsky-Shapiro G (1992) Probabilistic data dependencies. In: Zytkow J (ed) Proceedings of machine discovery workshop, pp 11–17
Russell S (1989) The use of knowledge in analogy and induction. Pitman Publishing
Sánchez D (1999) Adquisición de relaciones entre atributos en bases de datos relacionales (Translates to: Acquisition of relationships between attributes in relational databases) (in Spanish). PhD thesis, Department of Computer Science and Artificial Intelligence, University of Granada
Sánchez D, Serrano J, Vila M, Aranda V, Calero J and Delgado G (2003). Using data mining techniques to analyze correspondences between user and scientific knowledge in an agricultural environment. In: Piattini, M, Filipe, J, and Braz, J (eds) Enterprise information systems IV, pp 75–89. Kluwer Academic Publishers, Hingham, MA, USA
Savnik I, Flach P (1993) Bottom-up induction of functional dependencies from relations. In: Piatetsky-Shapiro G (ed) Knowledge discovery in databases, papers from the 1993 AAAI workshop. AAAI, pp 174–185
Schlimmer J (1993) Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In: Piatetsky-Shapiro G (ed) Proceedings of the 10th international conference on machine learning, pp 284–290
Shen W (1991) Discovering regularities from large knowledge bases. In: Proceedings of the 8th international workshop on machine learning, pp 539–543
Shortliffe E and Buchanan B (1975). A model of inexact reasoning in medicine. Math Biosci 23: 351–379
Silverstein C, Brin S and Motwani R (1998). Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Disc 2: 39–68
Ziarko W (1991) The discovery, analysis and representation of data dependencies in databases. In: Piatetsky-Shapiro G, Frawley W (eds) Knowl discovery databases. AAAI/MIT Press, pp 195–209
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M. J. Zaki.
Rights and permissions
About this article
Cite this article
Sánchez, D., Serrano, J.M., Blanco, I. et al. Using association rules to mine for strong approximate dependencies. Data Min Knowl Disc 16, 313–348 (2008). https://doi.org/10.1007/s10618-008-0092-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0092-3