Skip to main content
Log in

The importance of the label hierarchy in hierarchical multi-label classification

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We address the task of hierarchical multi-label classification (HMC). HMC is a task of structured output prediction where the classes are organized into a hierarchy and an instance may belong to multiple classes. In many problems, such as gene function prediction or prediction of ecological community structure, classes inherently follow these constraints. The potential for application of HMC was recognized by many researchers and several such methods were proposed and demonstrated to achieve good predictive performances in the past. However, there is no clear understanding when is favorable to consider such relationships (hierarchical and multi-label) among classes, and when this presents unnecessary burden for classification methods. To this end, we perform a detailed comparative study over 8 datasets that have HMC properties. We investigate two important influences in HMC: the multiple labels per example and the information about the hierarchy. More specifically, we consider four machine learning tasks: multi-label classification, hierarchical multi-label classification, single-label classification and hierarchical single-label classification. To construct the predictive models, we use predictive clustering trees (a generalized form of decision trees), which are able to tackle each of the modelling tasks listed. Moreover, we investigate whether the influence of the hierarchy and the multiple labels carries over for ensemble models. For each of the tasks, we construct a single tree and two ensembles (random forest and bagging). The results reveal that the hierarchy and the multiple labels do help to obtain a better single tree model, while this is not preserved for the ensemble models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://bailando.sims.berkeley.edu/enron_email.html

  2. Note that the hierarchical single-label classification models will be similar to the single-label classification models, with the difference that the predictive models are organized into a hierarchical architecture. This makes the interpretation of the HSC models an even more difficult task.

References

  • Alaydie, N., Reddy, C.K., Fotouhi, F. (2012). Exploiting label dependency for hierarchical multi-label classification. In Proceedings of the 16th Pacific-Asia conference on advances in knowledge discovery and data mining (pp. 294–305). Berlin: Heidelberg, New York: Springer.

  • Bakır, G.H., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N. (Eds.) (2007). Predicting structured data. Cambridge, MA: The MIT Press.

  • Barros, R.C., Cerri, R., Freitas, A.A., de Carvalho, A.C.P.L.F. (2013). Probabilistic clustering for hierarchical multi-label classification of protein functions. In H. Blockeel, K. Kersting, S. Nijssen, F. železný (Eds.), Machine learning and knowledge discovery in databases, Lecture Notes in Computer Science, (Vol. 8189 pp. 385–400). Berlin Heidelberg: Springer.

  • Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830–836.

    Article  Google Scholar 

  • Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1), 105–139.

    Article  Google Scholar 

  • Bi, W., & Kwok, J.T. (2012). Hierarchical multilabel classification with minimum bayes risk. In Proceedings of the 12th international conference on data mining (pp. 101–110).

  • Blockeel, H. (1998). Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium.

  • Blockeel, H., Bruynooghe, M., Džeroski, S., Ramon, J., Struyf, J. (2002). Hierarchical multi-classification. In Proceedings of the ACM SIGKDD workshop on multi-relational data mining (pp. 21–35).

  • Blockeel, H., Schietgat, L., Struyf, J., Džeroski, S., Clare, A. (2006). Decision trees for hierarchical multilabel classification: A case study in functional genomics. In Knowledge discovery in databases: PKDD, Lecture Notes in Computer Science (Vol. 4213 pp. 18–29). Berlin Heidelberg: Springer.

  • Blockeel, H., & Struyf, J. (2002). Efficient algorithms for decision tree cross-validation. Journal of Machine Learning Research, 3, 621–650.

    Google Scholar 

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    MATH  MathSciNet  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J. (1984). Classification and regression trees. London, UK: Chapman & Hall/CRC.

    MATH  Google Scholar 

  • Cerri, R., Barros, R.C., de Carvalho, A.C.P.L.F. (2012). A genetic algorithm for hierarchical multi-label classification. In Proceedings of the 27th annual ACM symposium on applied computing (pp. 250–255).

  • Cerri, R., Barros, R.C., de Carvalho, A.C.P.L.F. (2014). Hierarchical multi-label classification using local neural networks. Journal of Computer and System Sciences, 80(1), 39–56.

    Article  MATH  MathSciNet  Google Scholar 

  • Clare, A. (2003). Machine learning and data mining for yeast functional genomics. Ph.D. thesis, University of Wales Aberystwyth, Aberystwyth, UK.

  • Clare, A., & King, R.D. (2003). Predicting gene function in Saccharomyces cerevisiae. Bioinformatics, 19(S2), ii42–49.

    Google Scholar 

  • Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on machine learning (pp. 233–240).

  • Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Bruns-Pedersen, M., Krogh, P.H. (2006). Using multi-objective classification to model communities of soil. Ecological Modelling, 191(1), 131–143.

    Article  Google Scholar 

  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

    MATH  Google Scholar 

  • Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P. (2008). Structured machine learning: The next ten years. Machine Learning, 73(1), 3–23.

    Article  Google Scholar 

  • Dimitrovski, I., Kocev, D., Loskovska, S., Džeroski, S. (2008). Hierchical annotation of medical images. In Proceedings of the 11th international multiconference - information society (pp. 174–181). Ljubljana:JSI.

  • Džeroski, S. (2009). Machine learning applications in habitat suitability modeling. In: S.E. Haupt, A. Pasini, C. Marzban (Eds.) In Artificial intelligence methods in the environmental sciences. Springer Netherlands, (pp. 397–412).

  • Džeroski, S., Demšar, D., Grbović, J. (2000). Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence, 13(1), 7–17.

    Article  Google Scholar 

  • Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J. (2006). Web categorisation using distance-based decision trees. Electronic Notes in Theoretical Computer Science, 157(2), 35–40.

    Article  Google Scholar 

  • Guan, Y., Myers, C.L., Hess, D.C., Barutcuoglu, Z., Caudy, A., Troyanskaya, O. (2008). Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biology, 9(S1), S3+.

    Article  Google Scholar 

  • Kiritchenko, S., Famili, F., Matwin, S., Nock, R. (2006). Learning and evaluation in the presence of class hierarchies: Application to text categorization. In L. Lamontagne, M. Marchand (Eds.), Advances in artificial intelligence, Lecture Notes in Computer Science, (Vol. 4013 pp. 395–406). Berlin Heidelberg: Springer.

  • Klimt, B., & Yang, Y. (2004). The enron corpus: A new dataset for email classification research. In J.F. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi (Eds.), Machine learning: ECML, Lecture Notes in Computer Science, (Vol. 3201 pp. 217–226). Berlin Heidelberg: Springer.

  • Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.

    Article  Google Scholar 

  • Kriegel, H.P., Borgwardt, K., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A. (2007). Future trends in data mining. Data Mining and Knowledge Discovery, 15, 87–97.

    Article  MathSciNet  Google Scholar 

  • Lehmann, T., Schubert, H., Keysers, D., Kohnen, M., Wein, B. (2003). The IRMA code for unique classification of medical images. In Medical imaging: PACS and integrated medical information systems: Design and evaluation (pp. 440–451).

  • Levatić, J., Kocev, D., Džeroski, S. (2013). The use of the label hierarchy in hmc improves performance: A case study in predicting community structure in ecology. In Proceedings of the workshop on new frontiers in mining complex patterns held in conjunction with ECML/PKDD2013 (pp. 189–201).

  • Levatić, J., Kocev, D., Džeroski, S. (2014). The use of the label hierarchy in hierarchical multi-label classification improves performance. In A. Appice, et al. (Eds.), New frontiers in mining complex patterns, Lecture Notes in Computer Science, (Vol. 8399 pp. 1–16): Springer International Publishing.

  • Lewis, D.D., Yang, Y., Rose, T.G., Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.

    Google Scholar 

  • Obozinski, G., Lanckriet, G., Grant, C., Jordan, M.I., Noble, W.S. (2008). Consistent probabilistic outputs for protein function prediction. Genome Biology, 9(S1), S6+.

    Article  Google Scholar 

  • Otero, F.E., Freitas, A.A., Johnson, C.G. (2010). A hierarchical multi-label classification ant colony algorithm for protein function prediction. Memetic Computing, 2(3), 165–181.

    Article  Google Scholar 

  • Quinlan, J.R. (1993). C4. 5: Programs for machine learning Vol. 1. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models. The Journal of Machine Learning Research, 7, 1601–1626.

    MATH  MathSciNet  Google Scholar 

  • Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Güldener, U., Mannhaupt, G., Münsterkötter, M., et al. (2004). The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32(18), 5539–5545.

    Article  Google Scholar 

  • Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., Džeroski, S. (2010). Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinformatics, 11(2), 1–14.

    Google Scholar 

  • Seni, G., & Elder, J.F. (2010). Ensemble methods in data mining: Improving accuracy through combining predictions: Morgan & Claypool Publishers.

  • Silla, C., & Freitas, A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1-2), 31–72.

    Article  MATH  MathSciNet  Google Scholar 

  • Silla, C.N., & Freitas, A.A. (2009). A global-model naive bayes approach to the hierarchical prediction of protein functions. In Proceeding of the 9th IEEE international conference on data mining (pp. 992–997).

  • Slavkov, I., Gjorgjioski, V., Struyf, J., Džeroski, S. (2010). Finding explained groups of time-course gene expression profiles with predictive clustering trees. Molecular BioSystems, 6(4), 729–740.

    Article  Google Scholar 

  • Valentini, G. (2011). True path rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3), 832–847.

    Article  MathSciNet  Google Scholar 

  • Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jurica Levatić.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levatić, J., Kocev, D. & Džeroski, S. The importance of the label hierarchy in hierarchical multi-label classification. J Intell Inf Syst 45, 247–271 (2015). https://doi.org/10.1007/s10844-014-0347-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-014-0347-y

Keywords

Navigation