Abstract
Data mining can in many instances be viewed as the task of computing a representation of a theory of a model or a database. In this paper we present a randomized algorithm that can be used to compute the representation of a theory in terms of the most specific sentences of that theory. In addition to randomization, the algorithm uses a generalization of the concept of hypergraph transversal. We apply the general algorithm, for discovering maximal frequent sets in 0/1 data, and for computing minimal keys in relations. We present some empirical results on the performance of these methods on real data. We also show some complexity theoretic evidence of the hardness of these problems.
Work supported by Alexander von Humbold-Stiftung and the Academy of Finland.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
R. Agrawal and R. Srikant. Mining sequential patterns. In International Conference on Data Engineering, Mar. 1995.
S. Bell. Deciding distinctness of query results by discovered constraints. Manuscript.
S. Bell and P. Brockhausen. Discovery of data dependencies in relational databases. Technical Report LS-8 14, Universität Dortmund, Fachbereich Informatik, Lehrstuhl VIII, Künstliche Intelligenz, 1995.
C. Berge. Hypergraphs. Combinatorics of Finite Sets. North-Holland Publishing Company, Amsterdam, 1989.
C. C. Chang and H. J. Keisler. Model Theory. North-Holland, Amsterdam, 1973. 3rd ed., 1990.
L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.
L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.
M. Garey and D. Johnson. Computers and Intractability — A Guide to the Theory of NP-Completeness W.H. Freeman, New York, 1979.
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.
M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, October 1993.
D. S. Johnson, M. Yannakakis, and C. H. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27:119–123, 1988.
J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.
W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.
A. J. Knobbe and P. W. Adriaans. Discovering foreign key relations in relational databases. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 94–99, Heraklion, Crete, Greece, Apr. 1995.
H. Mannila. Aspects of data mining. In Workshop Notes of the ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 1–6, Heraklion, Crete, Greece, Apr. 1995.
H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, 1996. To appear.
H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.
H. Mannila and K.-J. Räihä. Algorithms for inferring functional dependencies. Data & Knowledge Engineering, 12(1):83–99, Feb. 1994.
H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems Research '96, Vienna, Austria, Apr. 1996. To appear.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.
J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.
J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume I. Computer Science Press, Rockville, MD, 1988.
L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gunopulos, D., Mannila, H., Saluja, S. (1996). Discovering all most specific sentences by randomized algorithms extended abstract. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_47
Download citation
DOI: https://doi.org/10.1007/3-540-62222-5_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62222-2
Online ISBN: 978-3-540-49682-3
eBook Packages: Springer Book Archive