Abstract
Redescription mining is a field of knowledge discovery that aims to find different descriptions of subsets of elements in the data by using two or more disjoint sets of descriptive attributes. The ability to find connections between different sets of descriptive attributes and provide a more comprehensive set of rules makes it very useful in practice. In this work, we introduce redescription mining algorithm for generating and iteratively improving a redescription set of user defined size based on multi-target Predictive Clustering Trees. This approach uses information about element membership in different generated rules to search for new redescriptions and is able to produce highly accurate, statistically significant redescriptions described by Boolean, nominal or numeric attributes. As opposed to current tree-based approaches that use multi-class or binary classification, we explore benefits of using multi target classification and regression to create redescriptions. The process of iterative redescription set improvement is illustrated on the dataset describing 199 world countries and their trading patterns. The performance of the algorithm is compared against the state of the art redescription mining algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216, Washington, D.C. (1993)
Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 19–26, Washington, D.C. (2004)
Blockeel., H.: Top-down induction of first order logical decision trees. Ph.d. thesis, Katholieke Universiteit Leuven, Department of Computer Science (1998)
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., D., Yang, C.: Finding interesting associations without support pruning. In: ICDE, pp. 489–499 (2000)
Galbrun, E., Kimmig, A.: Finding relational redescriptions. Mach. Learn. 96, 225–248 (2014)
Galbrun, E., Miettinen, P.: From black and white to full color: extending redescription mining outside the Boolean world. Stat. Anal. Data Mining 5, 284–303 (2012)
Galbrun, E., Miettinen, P.: Siren : An interactive tool for mining and visualizing geospatial redescriptions. In: KDD, pp. 1544–1547 (2012)
Galbrun, E., Miettinen, P.: A case of visual and interactive data analysis: geospatial redescription mining. In: Instant Interactive Data Mining Workshop @ ECML-PKDD (2012)
Galbrun, E.: Methods for redescription mining. Ph.d. thesis, University of Helsinki (2013)
Gallo, A., Miettinen, P., Mannila, H.: Finding subgroups having several descriptions: algorithms for redescription mining. In: Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, pp. 334–345 (2008)
Gamberger, D., Mihelčić, M., Lavrač, N.: Multilayer clustering: a discovery experiment on country level trading data. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 87–98. Springer, Heidelberg (2014)
Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)
Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. SIGKDD Explor. Newsl. 15, 41–50 (2014)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46, 817–833 (2013)
Lavrač, N., Kavšek, B., Flach, P., Lj, T.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T.: Redescription mining with multi-label predictive clustering trees. In: Proceedings of the Fourth Workshop on New Frontiers in Mining Complex Patterns @ ECML-PKDD, pp. 86–97. Porto (2015)
Parida, L., Ramakrishnan, N.: Redescription mining: structure theory and algorithms. In: Proceedings of the 20th National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, pp. 837–844 (2004)
Piccart, B.: Algorithms for multi-target learning. Ph.d. thesis, Katholieke Universiteit Leuven (2012)
Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R. F.: Turning CARTwheels: an alternating algorithm for mining redescriptions. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275. ACM, Seattle, WA (2004)
Stojanova, D., Ceci, M., Appice, A., Džeroski, S.: Network regression with predictive clustering trees. Data Min. Knowl. Disc. 25, 378–413 (2012)
UNCTAD database. http://unctadstat.unctad.org/EN/
World Bank database. http://data.worldbank.org/
Zaki, M. J., Ramakrishnan, N.: Reasoning about sets using redescription mining. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 364–373. ACM, Chicago, Illinois (2005)
Zinchenko, T., Redescription mining over non-binary data sets using decision trees. Masters thesis, Universität des Saarlandes (2014)
Acknowledgement
The authors would like to acknowledge the European Commission’s support through the MAESTRA project (Gr. no. 612944), the MULTIPLEX project (Gr.no. 317532), the InnoMol project (Gr. no. 316289), and support of the Croatian Science Foundation (Pr. no. 9623: Machine Learning Algorithms for Insightful Analysis of Complex Data Structures).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
We present several shorter redescriptions mined by the CLUS-RM and the ReReMi algorithm. The full names of the attributes used in redescription queries can be seen in Fig. 5.
In Table 3 we show two very accurate redescriptions mined with the ReReMi algorithm and compare it to two redescriptions mined with the CLUS-RM.
In Table 4, we present two redescriptions containing conjunction and disjunction operator obtained with the ReReMi algorithm, and two redescriptions containing conjunctions and negations obtained with the CLUS-RM algorithm. This examples demonstrate the main difference between the methodologies. The ReReMi algorithm uses disjunction operator often in redescription construction whereas the CLUS-RM mostly uses conjunction operator to construct redescriptions.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T. (2016). Redescription Mining with Multi-target Predictive Clustering Trees. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-39315-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39314-8
Online ISBN: 978-3-319-39315-5
eBook Packages: Computer ScienceComputer Science (R0)