Abstract
This paper describes a propositionalization technique called wordification. Wordification is inspired by text mining and can be seen as a transformation of a relational database into a corpus of documents. Wordification aims at producing simple, easy to understand features, acting as words in the transformed Bag-Of-Words representation. As in other propositionalization methods, after the wordification step any propositional data mining algorithm can be applied. The most notable advantage of the presented technique is greater scalability: the propositionalization step is done in time linear to the number of attributes times the number of examples. The paper presents the wordification methodology, implemented in a cloud-based web data mining platform Clowd-Flows, and describes the experiments in two real-life datasets together with a critical comparison to the RSD propositionalization approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Debnath, A.K., Lopez de Compadre, R.L., Debnath, G., Shusterman, A.J., Hansch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry 34(2), 786–797 (1991)
Demšar, J., Zupan, B., Leban, G., Curk, T.: Orange: From Experimental Machine Learning to Interactive Data Mining. Springer (2004)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Kramer, S., Pfahringer, B., Helma, C.: Stochastic propositionalization of non-determinate background knowledge. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 80–94. Springer, Heidelberg (1998)
Kranjc, J., Podpečan, V., Lavrač, N.: Clowdflows: a cloud based scientific workflow platform. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 816–819. Springer, Heidelberg (2012)
Kuželka, O., Železný, F.: Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Machine Learning 83(2), 163–192 (2011)
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. The Journal of Machine Learning Research 5, 153–188 (2004)
Lavrač, N., Džeroski, S., Grobelnik, M.: Learning nonrecursive definitions of relations with LINUS. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 265–281. Springer, Heidelberg (1991)
Michie, D., Muggleton, S., Page, D., Srinivasan, A.: To the international computing community: A new east-west challenge. Technical report, Oxford University Computing laboratory, Oxford, UK (1994)
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD 2006), Philadelphia, PA, USA, August 20-23, pp. 935–940. ACM Press, NY (2006)
Perovšek, M., Vavpetič, A., Lavrač, N.: A wordification approach to relational data mining: Early results. In: Late Breaking Papers of the 22nd International Conference on Inductive Logic Programming, pp. 56–61 (2012)
Sluban, B., Gamberger, D., Lavrač, N.: Ensemble-based noise detection: noise ranking and visual performance evaluation. In: Data Mining and Knowledge Discovery 2013, pp. 1–39 (2013)
Srinivasan, A.: Aleph manual (March 2007), http://www.cs.ox.ac.uk/activities/machinelearning/Aleph/
Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the sdm-toolkit. The Computer Journal 56(3), 304–320 (2013)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2011)
Železný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62, 33–63 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perovšek, M., Vavpetič, A., Cestnik, B., Lavrač, N. (2013). A Wordification Approach to Relational Data Mining. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds) Discovery Science. DS 2013. Lecture Notes in Computer Science(), vol 8140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40897-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-40897-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40896-0
Online ISBN: 978-3-642-40897-7
eBook Packages: Computer ScienceComputer Science (R0)