Skip to main content

Conceptual Clustering of Multi-Relational Data

  • Conference paper
Inductive Logic Programming (ILP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7207))

Included in the following conference series:

Abstract

“Traditional” clustering, in broad sense, aims at organizing objects into groups (clusters) whose members are “similar” among them and are “dissimilar” to objects belonging to the other groups. In contrast, in conceptual clustering the underlying structure of the data together with the description language which is available to the learner is what drives cluster formation, thus providing intelligible descriptions of the clusters, facilitating their interpretation.

We present a novel conceptual clustering system for multi-relational data, based on the popular k − medoids algorithm. Although clustering is, generally, not straightforward to evaluate, experimental results on several applications show promising results. Clusters generated without class information agree very well with the true class labels of cluster’s members. Moreover, it was possible to obtain intelligible and meaningful descriptions of the clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, G., Pfahringer, B.: Clustering Relational Data Based on Randomized Propositionalization. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 39–48. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (March 2007)

    Google Scholar 

  3. Bisson, G.: Conceptual clustering in a first order logic representation. In: ECAI 1992: Proceedings of the 10th European Conference on Artificial Intelligence, pp. 458–462. John Wiley & Sons, Inc., New York (1992)

    Google Scholar 

  4. Camacho, R., Fonseca, N.A., Rocha, R., Santos Costa, V.: ILP:- Just Trie It. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 78–87. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Davis, J., Burnside, E., de Castro Dutra, I., Page, D., Santos Costa, V.: An Integrated Approach to Learning Bayesian Networks of Rules. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 84–95. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Džeroski, S., Lavrač, N.: Learning relations from noisy examples: An empirical comparison of LINUS and FOIL. In: International Workshop on Machine Learning, pp. 399–402. Morgan Kaufmann (1991)

    Google Scholar 

  7. Emde, W., Wettschereck, D.: Relational instance based learning. In: Proceedings 13th ICML, pp. 122–130. Morgan Kaufmann Publishers (1996)

    Google Scholar 

  8. Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Machine Learning - Proceedings 13th International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann Publishers (1996)

    Google Scholar 

  9. Fonseca, N.A., Camacho, R., Rocha, R., Santos Costa, V.: Compile the hypothesis space: do it once, use it often. Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining 89, 45–67 (2008)

    MATH  Google Scholar 

  10. Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: ILP: Compute once, reuse often. In: 6th Workshop on Multi-Relational Data Mining, MRDM 2007 (2007)

    Google Scholar 

  11. Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: K-RNN: k-relational neareast neighbour algorithm. In: 23rd Annual ACM Symposium on Applied Computing, SAC 2008 (2008)

    Google Scholar 

  12. Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57(3), 205–232 (2004)

    Article  MATH  Google Scholar 

  13. Hand, D.J., Smyth, P., Mannila, H.: Principles of data mining. MIT Press, Cambridge (2001)

    Google Scholar 

  14. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  15. Hathaway, R.J., Bezdek, J.C.: Visual cluster validity for prototype generator clustering models. Pattern Recogn. Lett. 24(9-10), 1563–1569 (2003)

    Article  MATH  Google Scholar 

  16. Horvath, T., Wrobel, S., Bohnebeck, U.: Relational instance-based learning with lists and terms. Machine Learning 43(1/2), 53–80 (2001)

    Article  MATH  Google Scholar 

  17. Kirsten, M., Wrabel, S., Horváth, T.: Distance based approaches to relational learning and clustering, pp. 213–230 (2000)

    Google Scholar 

  18. Kirsten, M., Wrobel, S., Horvath, T.: Distance based approaches to relational learning and clustering. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 213–232. Springer (September 2001)

    Google Scholar 

  19. Kok, S., Domingos, P.: Learning the structure of markov logic networks. In: De Raedt, L., Wrobel, S. (eds.) ICML. ACM International Conference Proceeding Series, vol. 119, pp. 441–448. ACM (2005)

    Google Scholar 

  20. Kok, S., Domingos, P.: Extracting Semantic Networks from Text Via Relational Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 624–639. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dzeroski, S., Lavraç, N. (eds.) Relational Data Mining, pp. 262–286. Springer New York Inc., New York (2001)

    Google Scholar 

  22. Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and FOIL. In: National Conference on Artificial Intelligence, pp. 795–800 (2005)

    Google Scholar 

  23. Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: Learning simple relational kernels. In: AAAI (2006)

    Google Scholar 

  24. Lipkus, A.H.: A proof of the triangle inequality for the tanimoto distance. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)

    Article  MATH  Google Scholar 

  25. Michalski, R.S., Stepp, R.E.: Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(4), 396–409 (1983)

    Article  Google Scholar 

  26. Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)

    Google Scholar 

  27. Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. JLP 19/20, 629–679 (1994)

    Article  Google Scholar 

  28. Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)

    MATH  Google Scholar 

  29. De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 133–140. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  30. Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)

    Article  Google Scholar 

  31. Ramon, J., Bruynooghe, M.: A Framework for Defining Distances between First-Order Logic Objects. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 271–280. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  32. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)

    Article  MATH  Google Scholar 

  33. Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)

    Article  Google Scholar 

  34. Sebag, M.: Distance Induction in First Order Logic. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 264–272. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  35. Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis Predictions using ILP. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 273–287. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  36. Yamamoto, A.: Which Hypotheses can be Found with Inverse Entailment? In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 296–308. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  37. Yin, X., Han, J., Yu, P.S.: Cross-relational clustering with user’s guidance. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 344–353. ACM, New York (2005)

    Chapter  Google Scholar 

  38. Zelezný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fonseca, N.A., Santos Costa, V., Camacho, R. (2012). Conceptual Clustering of Multi-Relational Data. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds) Inductive Logic Programming. ILP 2011. Lecture Notes in Computer Science(), vol 7207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31951-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31951-8_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31950-1

  • Online ISBN: 978-3-642-31951-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics