Skip to main content

Redescription Mining with Multi-target Predictive Clustering Trees

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9607))

Included in the following conference series:

Abstract

Redescription mining is a field of knowledge discovery that aims to find different descriptions of subsets of elements in the data by using two or more disjoint sets of descriptive attributes. The ability to find connections between different sets of descriptive attributes and provide a more comprehensive set of rules makes it very useful in practice. In this work, we introduce redescription mining algorithm for generating and iteratively improving a redescription set of user defined size based on multi-target Predictive Clustering Trees. This approach uses information about element membership in different generated rules to search for new redescriptions and is able to produce highly accurate, statistically significant redescriptions described by Boolean, nominal or numeric attributes. As opposed to current tree-based approaches that use multi-class or binary classification, we explore benefits of using multi target classification and regression to create redescriptions. The process of iterative redescription set improvement is illustrated on the dataset describing 199 world countries and their trading patterns. The performance of the algorithm is compared against the state of the art redescription mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216, Washington, D.C. (1993)

    Google Scholar 

  2. Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 19–26, Washington, D.C. (2004)

    Google Scholar 

  3. Blockeel., H.: Top-down induction of first order logical decision trees. Ph.d. thesis, Katholieke Universiteit Leuven, Department of Computer Science (1998)

    Google Scholar 

  4. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., D., Yang, C.: Finding interesting associations without support pruning. In: ICDE, pp. 489–499 (2000)

    Google Scholar 

  5. Galbrun, E., Kimmig, A.: Finding relational redescriptions. Mach. Learn. 96, 225–248 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  6. Galbrun, E., Miettinen, P.: From black and white to full color: extending redescription mining outside the Boolean world. Stat. Anal. Data Mining 5, 284–303 (2012)

    Article  MathSciNet  Google Scholar 

  7. Galbrun, E., Miettinen, P.: Siren : An interactive tool for mining and visualizing geospatial redescriptions. In: KDD, pp. 1544–1547 (2012)

    Google Scholar 

  8. Galbrun, E., Miettinen, P.: A case of visual and interactive data analysis: geospatial redescription mining. In: Instant Interactive Data Mining Workshop @ ECML-PKDD (2012)

    Google Scholar 

  9. Galbrun, E.: Methods for redescription mining. Ph.d. thesis, University of Helsinki (2013)

    Google Scholar 

  10. Gallo, A., Miettinen, P., Mannila, H.: Finding subgroups having several descriptions: algorithms for redescription mining. In: Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, pp. 334–345 (2008)

    Google Scholar 

  11. Gamberger, D., Mihelčić, M., Lavrač, N.: Multilayer clustering: a discovery experiment on country level trading data. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 87–98. Springer, Heidelberg (2014)

    Google Scholar 

  12. Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)

    MATH  Google Scholar 

  13. Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. SIGKDD Explor. Newsl. 15, 41–50 (2014)

    Article  Google Scholar 

  14. Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46, 817–833 (2013)

    Article  Google Scholar 

  15. Lavrač, N., Kavšek, B., Flach, P., Lj, T.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)

    MathSciNet  Google Scholar 

  16. Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T.: Redescription mining with multi-label predictive clustering trees. In: Proceedings of the Fourth Workshop on New Frontiers in Mining Complex Patterns @ ECML-PKDD, pp. 86–97. Porto (2015)

    Google Scholar 

  17. Parida, L., Ramakrishnan, N.: Redescription mining: structure theory and algorithms. In: Proceedings of the 20th National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, pp. 837–844 (2004)

    Google Scholar 

  18. Piccart, B.: Algorithms for multi-target learning. Ph.d. thesis, Katholieke Universiteit Leuven (2012)

    Google Scholar 

  19. Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R. F.: Turning CARTwheels: an alternating algorithm for mining redescriptions. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275. ACM, Seattle, WA (2004)

    Google Scholar 

  20. Stojanova, D., Ceci, M., Appice, A., Džeroski, S.: Network regression with predictive clustering trees. Data Min. Knowl. Disc. 25, 378–413 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  21. UNCTAD database. http://unctadstat.unctad.org/EN/

  22. World Bank database. http://data.worldbank.org/

  23. Zaki, M. J., Ramakrishnan, N.: Reasoning about sets using redescription mining. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 364–373. ACM, Chicago, Illinois (2005)

    Google Scholar 

  24. Zinchenko, T., Redescription mining over non-binary data sets using decision trees. Masters thesis, Universität des Saarlandes (2014)

    Google Scholar 

Download references

Acknowledgement

The authors would like to acknowledge the European Commission’s support through the MAESTRA project (Gr. no. 612944), the MULTIPLEX project (Gr.no. 317532), the InnoMol project (Gr. no. 316289), and support of the Croatian Science Foundation (Pr. no. 9623: Machine Learning Algorithms for Insightful Analysis of Complex Data Structures).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matej Mihelčić .

Editor information

Editors and Affiliations

A Appendix

A Appendix

We present several shorter redescriptions mined by the CLUS-RM and the ReReMi algorithm. The full names of the attributes used in redescription queries can be seen in Fig. 5.

Table 3. Redescription examples produced by CLUS-RM and ReReMi algorithm using only conjunction operator
Table 4. Redescription examples produced by CLUS-RM and ReReMi algorithm using conjunction, disjunction and negation operators

In Table 3 we show two very accurate redescriptions mined with the ReReMi algorithm and compare it to two redescriptions mined with the CLUS-RM.

In Table 4, we present two redescriptions containing conjunction and disjunction operator obtained with the ReReMi algorithm, and two redescriptions containing conjunctions and negations obtained with the CLUS-RM algorithm. This examples demonstrate the main difference between the methodologies. The ReReMi algorithm uses disjunction operator often in redescription construction whereas the CLUS-RM mostly uses conjunction operator to construct redescriptions.

Fig. 5.
figure 5

Indicator full names

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T. (2016). Redescription Mining with Multi-target Predictive Clustering Trees. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39315-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39314-8

  • Online ISBN: 978-3-319-39315-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics