Abstract
In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp 207–216
Baixeries J (2004) A formal concept analysis framework to mine functional dependencies. In: Proceeding of the Workshop on Mathematical Methods for Learning, Villa Geno, Italy
Baixeries J (2007). Lattice characterization of Armstrong and symmetric dependencies. Ph.D. Thesis, Universitat Politècnica de Catalunya, Spain, 2007
Carpineto C, Romano G and d’Adamo P (1999). Inferring dependencies from relations: a conceptual clustering approach. Computat Intelligence 15(4): 415–441
Demetrovics J, Libkin L and Muchnik IB (1992). Functional dependencies in relational databases: a lattice point of view. Disc Appl Math 40: 155–185
Fagin R (1977). Functional dependencies in a relational database and propositional logic. IBM J Res Dev 21(6): 534–544
Flach PA and Savnik A (1999). Database dependency discovery: a machine learning approach. AI Commun 12(3): 139–160
Flesca S, Furfaro F, Greco S, Zumpano E (2005) Repairing inconsistent XML data with functional dependencies. Encycl Database Technol Appl Idea Group 542–547
Ganter B and Wille R (1999). Formal concept analysis: mathematical foundations. Springer, Berlin/Heidelberg
Goodaire EG and Parmenter MM (1992). Discrete mathematics with graph theory. Prentice Hall, New Jersey
Huhtala Y, Karkkainen J, Porkka P and Toivonen H (1999). TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111
Kalashnikov VD and Mehrotra S (2006). Domain-independent data cleaning via analysis of entity- relationship graph. ACM Trans Database Syst 31(2): 716–767
Lopes S, Petit J-M, Lakhal L (2000) Efficient discovery of functional dependencies and Armstrong relations. In: 7th International Conference on Extending Database Technology (EDBT 2000), pp 350–364
Lopes S, Petit J-M and Lakhal L (2002). Functional and approximate dependency mining: database and FCA points of view. Special issue of J Exp Theor Artif Intelligence (JETAI) on Concept Lattices for KDD 14(2–3): 93–114
Maier D (1983). The theory of relational databases. Computer Science Press, Rockville, Maryland
Mannila H and Raiha KJ (1994). Algorithms for inferring functional dependencies from relations. Data Knowl Eng 12(1): 83–99
Mannila H and Toivonen H (1997). Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3): 241–258
Novelli N, Cicchetti R (2001a) FUN: an efficient algorithm for mining functional and embedded dependencies. In: Proceedings of the International Conference on Database Theory, London, UK, pp 189–203
Novelli N and Cicchetti R (2001b). Functional and embedded dependency inference: A data mining point of view. Inform Syst 26(7): 477–506
Ramakrishnan R and Gehrke J (2002). Database management systems. McGraw-Hill, New York
Sagiv Y, Delobel C, Parker DS and Fagin R (1981). An equivalence between relational database dependencies and a fragment of propositional logic. J ACM 28(3): 435–453
Tan HBK and Zhao Y (2004). Automated elicitation of functional dependencies from source codes of database transactions. Inform Software Technol 46(2): 109–117
UCI Repository of machine learning databases (2005) http://www.ics.uci.edu/~mlearn/MLRepository.html
Ullman JD (1982). Principles of database systems. Computer Science Press, Rockville
Wyss C, Giannella C, Robertson EL (2001) FastFDs, a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In: Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2001), pp 101–110
Yao H, Hamilton HJ, Butz CJ (2002) FD_Mine: discovering functional dependencies in a database using equivalences. In: Proceedings of the 2nd IEEE International Conference on Data Mining, Maebashi City, Japan, pp 729–732
Yao H, Butz CJ, Hamilton HJ (2005) Causal discovery. In: Maimon O Rokach L (eds) The data mining and knowledge discovery handbook, Springer, pp 945–955
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M. J. Zaki.
Rights and permissions
About this article
Cite this article
Yao, H., Hamilton, H.J. Mining functional dependencies from data. Data Min Knowl Disc 16, 197–219 (2008). https://doi.org/10.1007/s10618-007-0083-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-007-0083-9