Abstract
Approximating a collection of patterns is a new and active area of research in data mining. The main motivation lies in two observations : the number of mined patterns is often too large to be useful for any end-users and user-defined input parameters of many data mining algorithms are most of the time almost arbitrary defined (e.g. the frequency threshold).
In this setting, we apply the results given in the seminal paper [11] for frequent sets to the problem of approximating a set of approximate inclusion dependencies with k inclusion dependencies. Using the fact that inclusion dependencies are “representable as sets”, we point out how approximation schemes defined in [11] for frequent patterns also apply in our context. An heuristic solution is also proposed for this particular problem. Even if the quality of this approximation with respect to the best solution cannot be precisely defined, an interaction property between IND and FD may be used to justify this heuristic.
Some interesting perspectives of this work are pointed out from results obtained so far.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Casanova, R. Fagin, and C. Papadimitriou. Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1):29–59, 1984.
M. A. Casanova, L. Tucherman, and A. L. Furtado. Enforcing inclusion dependencies and referencial integrity. In F. Bancilhon and D. J. DeWitt, editors, International Conference on Very Large Data Bases (VLDB’88), Los Angeles, California, USA, pages 38–49. Morgan Kaufmann, 1988.
Q. Cheng, J. Gryz, F. Koo, T. Y. Cliff Leung, L. Liu, X. Qian, and B. Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, International Conference on Very Large Data Bases (VLDB’99), Edinburgh, Scotland, UK, pages 687–698. Morgan Kaufmann, 1999.
F. De Marchi, S. Lopes, and J.-M. Petit. Efficient algorithms for mining inclusion dependencies. In C. S. Jensen, K. G. Jeffery, J. Pokorný, S. Saltenis, E. Bertino, K. Böhm, and M. Jarke, editors, International Conference on Extending Database Technology (EDBT’02), Prague, Czech Republic, volume 2287 of Lecture Notes in Computer Science, pages 464–476. Springer, 2002.
F. De Marchi, S. Lopes, J.-M. Petit, and F. Toumani. Analysis of existing databases at the logical level: the DBA companion project. ACM Sigmod Record, 32(1):47–52, 2003.
F. De Marchi and J-M. Petit. Zigzag: a new algorithm for discovering large inclusion dependencies in relational databases. In International Conference on Data Mining (ICDM’03), Melbourne, Florida, USA, pages 27–34. IEEE Computer Society, 2003.
F. Flouvat, F. De Marchi, and J-M. Petit. Abs: Adaptive borders search of frequent itemsets. In FIMI’04, 2004.
J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pages 211–218. IEEE Computer Society, 2002.
Dorit Hochbaum. Approximation algorithms for NP-hard problems. PWS Publishing Compagny, 1997.
M. Kantola, H. Mannila, K. J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7:591–607, 1992.
W. Kim, R. Kohavi, J. Gehrke, and W. DuMouchel, editors. Approximating a collection of frequent sets. ACM, 2004.
A. Koeller and E. A. Rundensteiner. Discovery of high-dimentional inclusion dependencies (poster). In Poster session of International Conference on Data Engineering (ICDE’03). IEEE Computer Society, 2003.
A. Koeller and E. A. Rundensteiner. Heuristic strategies for inclusion dependency discovery. In R. Meersman and Z. Tari, editors, CoopIS, DOA, and ODBASE, OTM Confederated International Conferences, Napa, Cyprus, Part II, volume 3291 of Lecture Notes in Computer Science, pages 891–908. Springer, 2004.
M. Levene and G. Loizou. A Guided Tour of Relational Databases and Beyond. Springer, 1999.
M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2):281–291, 2000.
S. Lopes, F. De Marchi, and J.-M. Petit. DBA Companion: A tool for logical database tuning. In Demo session of International Conference on Data Engineering (ICDE’04), http://www.isima.fr/~demarchi/dbacomp/, 2004. IEEE Computer Society.
S. Lopes, J.-M. Petit, and F. Toumani. Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1):1–19, 2002.
H. Mannila and K.-J. Räihä. Inclusion dependencies in database design. In International Conference on Data Engineering (ICDE’86), Los Angeles, California, USA, pages 713–718. IEEE Computer Society, 1986.
H. Mannila and K. J. Räihä. The Design of Relational Databases. Addison-Wesley, second edition, 1994.
H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(1):241–258, 1997.
R. J. Miller, M. A. Hernández, L. M. Haas, L. Yan, C. T. H. Ho, R. Fagin, and L. Popa. The clio project: Managing heterogeneity. ACM SIGMOD Record, 30(1):78–83, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Marchi, F., Petit, JM. (2005). Approximating a Set of Approximate Inclusion Dependencies. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_76
Download citation
DOI: https://doi.org/10.1007/3-540-32392-9_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25056-2
Online ISBN: 978-3-540-32392-1
eBook Packages: EngineeringEngineering (R0)