Redescription Mining with Multi-target Predictive Clustering Trees

Mihelčić, Matej; Džeroski, Sašo; Lavrač, Nada; Šmuc, Tomislav

doi:10.1007/978-3-319-39315-5_9

Matej Mihelčić^18,20,
Sašo Džeroski^19,20,
Nada Lavrač^19,20 &
…
Tomislav Šmuc¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9607))

Included in the following conference series:

International Workshop on New Frontiers in Mining Complex Patterns

593 Accesses

Abstract

Redescription mining is a field of knowledge discovery that aims to find different descriptions of subsets of elements in the data by using two or more disjoint sets of descriptive attributes. The ability to find connections between different sets of descriptive attributes and provide a more comprehensive set of rules makes it very useful in practice. In this work, we introduce redescription mining algorithm for generating and iteratively improving a redescription set of user defined size based on multi-target Predictive Clustering Trees. This approach uses information about element membership in different generated rules to search for new redescriptions and is able to produce highly accurate, statistically significant redescriptions described by Boolean, nominal or numeric attributes. As opposed to current tree-based approaches that use multi-class or binary classification, we explore benefits of using multi target classification and regression to create redescriptions. The process of iterative redescription set improvement is illustrated on the dataset describing 199 world countries and their trading patterns. The performance of the algorithm is compared against the state of the art redescription mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Redescription mining augmented with random forest of multi-target predictive clustering trees

Article 08 February 2017

Rules, Subgroups and Redescriptions as Features in Classification Tasks

InterSet: Interactive Redescription Set Exploration

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216, Washington, D.C. (1993)
Google Scholar
Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 19–26, Washington, D.C. (2004)
Google Scholar
Blockeel., H.: Top-down induction of first order logical decision trees. Ph.d. thesis, Katholieke Universiteit Leuven, Department of Computer Science (1998)
Google Scholar
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., D., Yang, C.: Finding interesting associations without support pruning. In: ICDE, pp. 489–499 (2000)
Google Scholar
Galbrun, E., Kimmig, A.: Finding relational redescriptions. Mach. Learn. 96, 225–248 (2014)
Article MathSciNet MATH Google Scholar
Galbrun, E., Miettinen, P.: From black and white to full color: extending redescription mining outside the Boolean world. Stat. Anal. Data Mining 5, 284–303 (2012)
Article MathSciNet Google Scholar
Galbrun, E., Miettinen, P.: Siren : An interactive tool for mining and visualizing geospatial redescriptions. In: KDD, pp. 1544–1547 (2012)
Google Scholar
Galbrun, E., Miettinen, P.: A case of visual and interactive data analysis: geospatial redescription mining. In: Instant Interactive Data Mining Workshop @ ECML-PKDD (2012)
Google Scholar
Galbrun, E.: Methods for redescription mining. Ph.d. thesis, University of Helsinki (2013)
Google Scholar
Gallo, A., Miettinen, P., Mannila, H.: Finding subgroups having several descriptions: algorithms for redescription mining. In: Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, pp. 334–345 (2008)
Google Scholar
Gamberger, D., Mihelčić, M., Lavrač, N.: Multilayer clustering: a discovery experiment on country level trading data. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 87–98. Springer, Heidelberg (2014)
Google Scholar
Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)
MATH Google Scholar
Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. SIGKDD Explor. Newsl. 15, 41–50 (2014)
Article Google Scholar
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46, 817–833 (2013)
Article Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Lj, T.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
MathSciNet Google Scholar
Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T.: Redescription mining with multi-label predictive clustering trees. In: Proceedings of the Fourth Workshop on New Frontiers in Mining Complex Patterns @ ECML-PKDD, pp. 86–97. Porto (2015)
Google Scholar
Parida, L., Ramakrishnan, N.: Redescription mining: structure theory and algorithms. In: Proceedings of the 20th National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, pp. 837–844 (2004)
Google Scholar
Piccart, B.: Algorithms for multi-target learning. Ph.d. thesis, Katholieke Universiteit Leuven (2012)
Google Scholar
Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R. F.: Turning CARTwheels: an alternating algorithm for mining redescriptions. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275. ACM, Seattle, WA (2004)
Google Scholar
Stojanova, D., Ceci, M., Appice, A., Džeroski, S.: Network regression with predictive clustering trees. Data Min. Knowl. Disc. 25, 378–413 (2012)
Article MathSciNet MATH Google Scholar
UNCTAD database. http://unctadstat.unctad.org/EN/
World Bank database. http://data.worldbank.org/
Zaki, M. J., Ramakrishnan, N.: Reasoning about sets using redescription mining. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 364–373. ACM, Chicago, Illinois (2005)
Google Scholar
Zinchenko, T., Redescription mining over non-binary data sets using decision trees. Masters thesis, Universität des Saarlandes (2014)
Google Scholar

Download references

Acknowledgement

The authors would like to acknowledge the European Commission’s support through the MAESTRA project (Gr. no. 612944), the MULTIPLEX project (Gr.no. 317532), the InnoMol project (Gr. no. 316289), and support of the Croatian Science Foundation (Pr. no. 9623: Machine Learning Algorithms for Insightful Analysis of Complex Data Structures).

Author information

Authors and Affiliations

Ruđer Boškovič Institute, Bijenička cesta 54, 10000, Zagreb, Croatia
Matej Mihelčić & Tomislav Šmuc
Jožef Stefan Institute, Jamova cesta 39, 1000, Ljubljana, Slovenia
Sašo Džeroski & Nada Lavrač
Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia
Matej Mihelčić, Sašo Džeroski & Nada Lavrač

Authors

Matej Mihelčić
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar
Tomislav Šmuc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matej Mihelčić .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Università degli Studi di Bari Aldo Moro, Bari, Italy
Corrado Loglisci
ICAR-CNR, Rende, Italy
Giuseppe Manco
ICAR-CNR, Rende, Italy
Elio Masciari
University of North Carolina, Charlotte, North Carolina, USA
Zbigniew W. Ras

A Appendix

We present several shorter redescriptions mined by the CLUS-RM and the ReReMi algorithm. The full names of the attributes used in redescription queries can be seen in Fig. 5.

Table 3. Redescription examples produced by CLUS-RM and ReReMi algorithm using only conjunction operator

Full size table

Table 4. Redescription examples produced by CLUS-RM and ReReMi algorithm using conjunction, disjunction and negation operators

Full size table

In Table 3 we show two very accurate redescriptions mined with the ReReMi algorithm and compare it to two redescriptions mined with the CLUS-RM.

In Table 4, we present two redescriptions containing conjunction and disjunction operator obtained with the ReReMi algorithm, and two redescriptions containing conjunctions and negations obtained with the CLUS-RM algorithm. This examples demonstrate the main difference between the methodologies. The ReReMi algorithm uses disjunction operator often in redescription construction whereas the CLUS-RM mostly uses conjunction operator to construct redescriptions.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T. (2016). Redescription Mining with Multi-target Predictive Clustering Trees. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-39315-5_9
Published: 18 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39314-8
Online ISBN: 978-3-319-39315-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Redescription Mining with Multi-target Predictive Clustering Trees

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Redescription mining augmented with random forest of multi-target predictive clustering trees

Rules, Subgroups and Redescriptions as Features in Classification Tasks

InterSet: Interactive Redescription Set Exploration

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Redescription Mining with Multi-target Predictive Clustering Trees

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Redescription mining augmented with random forest of multi-target predictive clustering trees

Rules, Subgroups and Redescriptions as Features in Classification Tasks

InterSet: Interactive Redescription Set Exploration

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation