Abstract
Work on first-order clustering has primarily been focused on the task of conceptual clustering, i.e., forming clusters with symbolic generalizations in the given representation language. By contrast, for propositional representations, experience has shown that simple algorithms based exclusively on distance measures can often outperform their concept-based counterparts. In this paper, we therefore build on recent advances in the area of first-order distance metrics and present RDBC, a bottom-up agglomerative clustering algorithm for first-order representations that relies on distance information only and features a novel parameter-free pruning measure for selecting the final clustering from the cluster tree. The algorithm can empirically be shown to produce good clusterings (on the mutagenesis domain) that, when used for subsequent prediction tasks, improve on previous clustering results and approach the accuracies of dedicated predictive learners.
Partially supported by ESPRIT IV Long Term Research Project ILP II (No. 20237).
Preview
Unable to display preview. Download preview PDF.
References
G. Bisson. Conceptual clustering in a first order logic representation. In Proc. European Conference on Artificial Intelligence (ECAI-92), 1992.
G. Bisson. Learning in fol with a similarity measure. In AAAI-92 Proc. Tenth Natl. Conference on Artif. Intelligence, 1992.
H. Blockeel and L. De Raedt. Using logical decision trees for clustering. In N. Lavrač and S.Džeroski, editors, Inductive Logic Programming (Proc. 7th Int. Workshop ILP-97), pages 133–140, Berlin/New York, 1997. Springer Verlag.
U. Bohnebeck, T. Horvath, and S. Wrobel. Term comparisons in first-order similarity measures. In D. Page, editor, Proc. 8th Int. Workshop on Inductive Logic Programming (ILP98), Madison, WI, USA, July 1998. to appear.
U. Bohnebeck, W. SdIter, 0. Herzog, M. Wischnewsky, and D. Blohm. An Approach to mRNA Signalstructure Detection through Knowledge Discovery. In Proceedings of GCB'97, pages 125–126, 1997.
R. M. Cameron-Jones and J. R. Quinlan. Efficient top-down induction of logic programs. SIGART Bulletin, 5(l):33–42, 1994.
L. De Raedt, editor. Advances in ILP: Proc. Fifth Int. Workshop on Inductive Logic Programming (ILP-95). IOS Press, Amsterdam, 1996. To appear.
L. De Raedt and L. De Haspe. Clausal discovery. Machine Learning, 26:99ff., 1997.
W. Dillon and M. Goldstein. Multivariate analysis, pages 157–208. John Wiley & Sons, Inc., 1984.
W. Erode. Inductive learning of characteristic concept descriptions. In S. Wrobel, editor, Proc. Fourth International Workshop on Inductive Logic Programming (ILP-94), 53754 Sankt Augustin, Germany, 1994. GMD. GMD-Studien Nr. 237..
W. Erode. Inductive learning of characteristic concept descriptions from small sets of classified examples. In F. Bergadano and L. D. Raedt, editors, Machine Learning: ECML-94, European Conference on Machine Learning, Catania, Italy, April 1994, Proceedings, pages 103–121, Berlin, New York, 1994. Springer-Verlag. Also as Arbeitspapiere der GMD No. 821..
W. Erode and D. Wettschereck. Relational instance based learning. In L. Saitta, editor, Machine Learning — Proceedings 13th International Conference on Machine Learning, pages 122–130. Morgan Kaufmann Publishers, 1996..
A. Hutchinson. Metrics on Terms and Clauses. In M. Someren and G. Widmer, editors, Machine Learning: ECML-97 (Proc. Ninth European Conference on Machine Learning), volume 1224 of LNAI, pages 138–145. Springer Verlag, 1997.
S. Muggleton. Inverse entailment and Progol. In K. Furukawa, D. Michie,and S. Muggleton, editors, Machine Intelligence 14, pages 133–188. Oxford Univ. Press, Oxford, 1995.
S.-H. Nienhuys-Cheng. Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept. In N. Lavrač and S. Džeroski, editors, Inductive Logic Programming (Proc. 7th Int. Work-shop ILP-97), volume 1297 of LNAI, pages 213–226. Springer Verlag, 1997.
M. Sebag. Distance induction in first order logic. In N. Lavrač and S. D2eroski, editors, Inductive Logic Programming (Proc. 7th Int. Workshop ILP-97), LNAI, pages 264–272, Berlin/New York, 1997. Springer Verlag.
A. Srinivasan, S. Muggleton, and R. King. Comparing the use of background knowledge by inductive logic programming systems. In Proceedings of the 5th International Workshop on Inductive Logic Programming; 1995.
A. Srinivasan, S. Muggleton, R. King, and M. Sternberg. Mutagenesis: Up experiments in a non-determinate biological domain. In S. Wrobel, editor, Proc. Fourth Int. Workshop on Inductive Logic Programming (ILP-94), pages 217–232, Schloß Birlinghoven, 53754 Sankt Augustin, Germany, 1994. GMD (German Natl. Research Center for Computer Science). Order from teuberOgmd.de.
A. Srinivasan, S. Muggleton, M. Sternberg, and R. King. Theories for mutagenicity: a study in first-order and feature-based induction. Artificial Intelligence, 85:277–299, 1996.
K. Thompson and P. Langley. Incremental concept formation with composite objects. In Proc. of the Sixth Int. Workshop on Machine Learning, pages 371–374, San Mateo, CA, 1989. Morgan Kaufman.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kirsten, M., Wrobel, S. (1998). Relational distance-based clustering. In: Page, D. (eds) Inductive Logic Programming. ILP 1998. Lecture Notes in Computer Science, vol 1446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027330
Download citation
DOI: https://doi.org/10.1007/BFb0027330
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64738-6
Online ISBN: 978-3-540-69059-7
eBook Packages: Springer Book Archive