Conceptual Clustering of Multi-Relational Data

Fonseca, Nuno A.; Santos Costa, Vítor; Camacho, Rui

doi:10.1007/978-3-642-31951-8_16

Nuno A. Fonseca^21,22,
Vítor Santos Costa^21,23 &
Rui Camacho²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7207))

Included in the following conference series:

International Conference on Inductive Logic Programming

1014 Accesses
4 Citations

Abstract

“Traditional” clustering, in broad sense, aims at organizing objects into groups (clusters) whose members are “similar” among them and are “dissimilar” to objects belonging to the other groups. In contrast, in conceptual clustering the underlying structure of the data together with the description language which is available to the learner is what drives cluster formation, thus providing intelligible descriptions of the clusters, facilitating their interpretation.

We present a novel conceptual clustering system for multi-relational data, based on the popular k − medoids algorithm. Although clustering is, generally, not straightforward to evaluate, experimental results on several applications show promising results. Clusters generated without class information agree very well with the true class labels of cluster’s members. Moreover, it was possible to obtain intelligible and meaningful descriptions of the clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, G., Pfahringer, B.: Clustering Relational Data Based on Randomized Propositionalization. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 39–48. Springer, Heidelberg (2008)
Chapter Google Scholar
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (March 2007)
Google Scholar
Bisson, G.: Conceptual clustering in a first order logic representation. In: ECAI 1992: Proceedings of the 10th European Conference on Artificial Intelligence, pp. 458–462. John Wiley & Sons, Inc., New York (1992)
Google Scholar
Camacho, R., Fonseca, N.A., Rocha, R., Santos Costa, V.: ILP:- Just Trie It. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 78–87. Springer, Heidelberg (2008)
Chapter Google Scholar
Davis, J., Burnside, E., de Castro Dutra, I., Page, D., Santos Costa, V.: An Integrated Approach to Learning Bayesian Networks of Rules. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 84–95. Springer, Heidelberg (2005)
Chapter Google Scholar
Džeroski, S., Lavrač, N.: Learning relations from noisy examples: An empirical comparison of LINUS and FOIL. In: International Workshop on Machine Learning, pp. 399–402. Morgan Kaufmann (1991)
Google Scholar
Emde, W., Wettschereck, D.: Relational instance based learning. In: Proceedings 13th ICML, pp. 122–130. Morgan Kaufmann Publishers (1996)
Google Scholar
Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Machine Learning - Proceedings 13th International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann Publishers (1996)
Google Scholar
Fonseca, N.A., Camacho, R., Rocha, R., Santos Costa, V.: Compile the hypothesis space: do it once, use it often. Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining 89, 45–67 (2008)
MATH Google Scholar
Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: ILP: Compute once, reuse often. In: 6th Workshop on Multi-Relational Data Mining, MRDM 2007 (2007)
Google Scholar
Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: K-RNN: k-relational neareast neighbour algorithm. In: 23rd Annual ACM Symposium on Applied Computing, SAC 2008 (2008)
Google Scholar
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57(3), 205–232 (2004)
Article MATH Google Scholar
Hand, D.J., Smyth, P., Mannila, H.: Principles of data mining. MIT Press, Cambridge (2001)
Google Scholar
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Article Google Scholar
Hathaway, R.J., Bezdek, J.C.: Visual cluster validity for prototype generator clustering models. Pattern Recogn. Lett. 24(9-10), 1563–1569 (2003)
Article MATH Google Scholar
Horvath, T., Wrobel, S., Bohnebeck, U.: Relational instance-based learning with lists and terms. Machine Learning 43(1/2), 53–80 (2001)
Article MATH Google Scholar
Kirsten, M., Wrabel, S., Horváth, T.: Distance based approaches to relational learning and clustering, pp. 213–230 (2000)
Google Scholar
Kirsten, M., Wrobel, S., Horvath, T.: Distance based approaches to relational learning and clustering. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 213–232. Springer (September 2001)
Google Scholar
Kok, S., Domingos, P.: Learning the structure of markov logic networks. In: De Raedt, L., Wrobel, S. (eds.) ICML. ACM International Conference Proceeding Series, vol. 119, pp. 441–448. ACM (2005)
Google Scholar
Kok, S., Domingos, P.: Extracting Semantic Networks from Text Via Relational Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 624–639. Springer, Heidelberg (2008)
Chapter Google Scholar
Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dzeroski, S., Lavraç, N. (eds.) Relational Data Mining, pp. 262–286. Springer New York Inc., New York (2001)
Google Scholar
Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and FOIL. In: National Conference on Artificial Intelligence, pp. 795–800 (2005)
Google Scholar
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: Learning simple relational kernels. In: AAAI (2006)
Google Scholar
Lipkus, A.H.: A proof of the triangle inequality for the tanimoto distance. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)
Article MATH Google Scholar
Michalski, R.S., Stepp, R.E.: Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(4), 396–409 (1983)
Article Google Scholar
Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)
Google Scholar
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. JLP 19/20, 629–679 (1994)
Article Google Scholar
Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)
MATH Google Scholar
De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 133–140. Springer, Heidelberg (1997)
Chapter Google Scholar
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Article Google Scholar
Ramon, J., Bruynooghe, M.: A Framework for Defining Distances between First-Order Logic Objects. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 271–280. Springer, Heidelberg (1998)
Chapter Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Article MATH Google Scholar
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)
Article Google Scholar
Sebag, M.: Distance Induction in First Order Logic. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 264–272. Springer, Heidelberg (1997)
Chapter Google Scholar
Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis Predictions using ILP. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 273–287. Springer, Heidelberg (1997)
Chapter Google Scholar
Yamamoto, A.: Which Hypotheses can be Found with Inverse Entailment? In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 296–308. Springer, Heidelberg (1997)
Chapter Google Scholar
Yin, X., Han, J., Yu, P.S.: Cross-relational clustering with user’s guidance. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 344–353. ACM, New York (2005)
Chapter Google Scholar
Zelezný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CRACS-INESC Porto LA, Universidade do Porto, Rua do Campo Alegre 1021/1055, 4169-007, Porto, Portugal
Nuno A. Fonseca & Vítor Santos Costa
EMBL Outstation, The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Nuno A. Fonseca
DCC-FCUP, Universidade do Porto, Rua do Campo Alegre 1021/1055, 4169-007, Porto, Portugal
Vítor Santos Costa
LIAAD-INESC Porto LA & DEI-FEUP, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465, Porto, Portugal
Rui Camacho

Authors

Nuno A. Fonseca
View author publications
You can also search for this author in PubMed Google Scholar
Vítor Santos Costa
View author publications
You can also search for this author in PubMed Google Scholar
Rui Camacho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Imperial College London, 180 Queen’s Gate, SW7 2AZ, London, UK
Stephen H. Muggleton & Alireza Tamaddoni-Nezhad &
Dipartimento di Informatica, Università degli Studi di Bari “Aldo Moro”, Via E. Orabona, 4, 70125, Bari, Italy
Francesca A. Lisi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fonseca, N.A., Santos Costa, V., Camacho, R. (2012). Conceptual Clustering of Multi-Relational Data. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds) Inductive Logic Programming. ILP 2011. Lecture Notes in Computer Science(), vol 7207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31951-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-31951-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31950-1
Online ISBN: 978-3-642-31951-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics