Abstract
Concept discovery is a multi-relational data mining task for inducing definitions of a specific relation in terms of other relations in the data set. Such learning tasks usually have to deal with large search spaces and hence have efficiency and scalability issues. In this paper, we present a hybrid approach that combines association rule mining methods and graph-based approaches to cope with these issues. The proposed method inputs the data in relational format, converts it into a graph representation, and traverses the graph to find the concept descriptors. Graph traversal and pruning are guided based on association rule mining techniques. The proposed method distinguishes from the state-of-the art methods as it can work on n-ary relations, it uses path finding queries to extract concepts and can handle numeric values. Experimental results show that the method is superior to the state-of-the art methods in terms of accuracy and the coverage of the induced concept descriptors and the running time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Although the running time of the proposed method and that of CRIS are not directly comparable, as the experiments are conducted on different computers, we provide them to provide intuition. In addition, these results are obtained from a recent study hence the configurations of computers are expected to be similar.
References
Džeroski, S., De Raedt, L.: Multi-relational data mining: the current frontiers. SIGKDD Explor. Newsl. 5(1), 100–101 (2003)
Doncescu, A., Waissman, J., Richard, G., Roux, G.: Characterization of bio-chemical signals by inductive logic programming. Knowl.-Based Syst. 15(1–2), 129–137 (2002)
Turcotte, M., Muggleton, S., Sternberg, M.J.E.: Generating protein three-dimensional fold signatures using inductive logic programming. Comput. and Chem. 26(1), 57–64 (2002)
Dzeroski, S., Jacobs, N., Molina, M., Moure, C., Muggleton, S., Laer, W.V.: Detecting traffic problems with ILP. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 281–290. Springer, Heidelberg (1998)
Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19, 629–679 (1994)
Duboc, A.L., Paes, A., Zaverucha, G.: Using the bottom clause and mode declarations in FOL theory revision from examples. Mach. Learn. 76(1), 73–107 (2009)
Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. (JAIR) 1, 231–255 (1994)
Ball, T., Larus, J.R.: Efficient path profiling. In: Melvin, S.W., Beaty, S. (eds.) Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 29, Paris, France, 2–4 December 1996, pp. 46–57. ACM/IEEE Computer Society (1996)
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications, vol. 10001. Routledge, New York (1993)
Gao, Z., Zhang, Z., Huang, Z.: Learning relations by path finding and simultaneous covering. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 5, pp. 539–543. IEEE (2009)
Ong, I.M., de Castro Dutra, I., Page, D.L., Santos Costa, V.: Mode directed path finding. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 673–681. Springer, Heidelberg (2005)
Mutlu, A., Karagoz, P.: A hybrid graph-based method for concept rule discovery. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 327–338. Springer, Heidelberg (2013)
Holder, L.B., Cook, D.J., Djoko, S., et al.: Substucture discovery in the subdue system. In: KDD workshop, pp.169–180 (1994)
Gonzalez, J., Holder, L., Cook, D.J.: Application of graph-based concept learning to the predictive toxicology domain. In: Proceedings of the Predictive Toxicology Challenge Workshop (2001)
Karunaratne, T., Böstrom, H.: Differ: a propositionalization approach for learning from structured data. Mutagenesis 80(88.86), 76–92 (2006)
Kavurucu, Y., Senkul, P., Toroslu, I.H.: Concept discovery on relational databases: new techniques for search space pruning and rule quality improvement. Knowl.-Based Syst. 23(8), 743–756 (2010)
Richards, B.L., Mooney, R.J.: Learning relations by pathfinding. In: AAAI, pp. 50–55 (1992)
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media Inc., Sebastopol (2013)
Schling, B.: The Boost C++ Libraries. XML Press (2011)
Goutte, C., Gaussier, É.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005)
Larson, J., Michalski, R.S.: Inductive inference of vl decision rules. ACM SIGART Bull. 63, 38–44 (1977)
Srinivasan, A., Muggleton, S.H., Sternberg, M.J., King, R.D.: Theories for mutagenicity: a study in first-order and feature-based induction. Artif. Intell. 85(1), 277–299 (1996)
Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.J.: The predictive toxicology evaluation challenge. In: IJCAI Citeseer, vol. 1, pp. 4–9 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Abay, N.C., Mutlu, A., Karagoz, P. (2015). A Graph-Based Concept Discovery Method for n-Ary Relations. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-22729-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)