Abstract
The issue of discovering FDs has received a great deal of attention in the database research community. However, as the problem is exponential in the number of attributes, existing approaches can only be applied on small centralized datasets. It is challenging to discover FDs from big data, especially if data is distributed. We present a new algorithm DFDD for discovering all functional dependencies in parallel in vertically distributed big data following a breadth-first traversal strategy of the attribute lattice that combines efficient pruning. We verify experimentally that our approach can process distributed big datasets and it is scalable with the number of cluster nodes and the size of datasets.
This work was supported in part by National Basic Research Program 973 of China (No. 2012CB316203), Natural Science Foundation of China (Nos. 61033007, 61272121, 61332006, 61472321), National High Technology Research and Development Program 863 of China (No. 2012AA011004), Basic Research Fund of Northwestern Polytechnical University (No. 3102014JSJ0005, 3102014JSJ0013).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Codd, E.F.: Further normalization of the data base model. Technical report 909, IBM (1971)
Yao, H., Hamilton, H.J.: Mining functional dependencies from data. Data Min. Knowl. Disc. 16(2), 197–219 (2008)
Maier, D.: The Theory of Relational Databases. Computer Science Press, Rockville (1983)
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Li, W., Li, Z., Chen, Q., Jiang, T., Liu, H., Pan, W.: Functional dependencies discovering in distributed big data. J. Comput. Res. Dev. 52(2), 282–294 (2015)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice-Hall, Upper Saddle River (1999)
Novelli, N., Cicchetti, R.: FUN: an efficient algorithm for mining functional and embedded dependencies. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 189–203. Springer, Heidelberg (2000)
Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000)
United States Department of Transportation. http://apps.bts.gov/xml/ontimesummarystatistics
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, W., Li, Z., Chen, Q., Jiang, T., Liu, H. (2015). Discovering Functional Dependencies in Vertically Distributed Big Data. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-26187-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)