Elsevier

Pattern Recognition

Volume 70, October 2017, Pages 89-103
Pattern Recognition

Hierarchical Multi-label Classification using Fully Associative Ensemble Learning

https://doi.org/10.1016/j.patcog.2017.05.007Get rights and content

Highlights

  • Developing a local hierarchical ensemble framework for Hierarchical Multi-label Classification (HMC), in which all the structural relationships in the class hierarchy are used to obtain global prediction.

  • Introducing empirical loss minimization into HMC, so that the learned model can capture the most useful information from historical data.

  • Proposing sparse, kernel, and binary constraint HMC models.

Abstract

Traditional flat classification methods (e.g., binary or multi-class classification) neglect the structural information between different classes. In contrast, Hierarchical Multi-label Classification (HMC) considers the structural information embedded in the class hierarchy, and uses it to improve classification performance. In this paper, we propose a local hierarchical ensemble framework for HMC, Fully Associative Ensemble Learning (FAEL). We model the relationship between each class node’s global prediction and the local predictions of all the class nodes as a multi-variable regression problem with Frobenius norm or l1 norm regularization. It can be extended using the kernel trick, which explores the complex correlation between global and local prediction. In addition, we introduce a binary constraint model to restrict the optimal weight matrix learning. The proposed models have been applied to image annotation and gene function prediction datasets with tree structured class hierarchy and large scale visual recognition dataset with Direct Acyclic Graph (DAG) structured class hierarchy. The experimental results indicate that our models achieve better performance when compared with other baseline methods.

Introduction

Hierarchical Multi-label Classification (HMC) is a variant of classification where each sample has more than one label and all these labels are organized hierarchically in a tree or Direct Acyclic Graph (DAG). In reality, HMC can be applied to many domains [1], [2], [3]. In web page classification, one website with the label “football” could be labeled with a high level label “sport”. In image annotation, an image tagged as “outdoor” might have other low level concept labels, like “beach” or “garden”. In gene function prediction, a gene can be simultaneously labeled as “metabolism” and “catalytic or binding activities” by the biological process hierarchy and the molecular function hierarchy, respectively.

A rich source of hierarchical information in tree and DAG structured class hierarchies is helpful to improve classification performance [4]. Based on how this information is used, previous HMC approaches can be divided into global (big-bang) or local [5]. Global approaches learn a single model for the whole class hierarchy. Global approaches enjoy smaller model size because they build one model for the whole hierarchy. However, they ignore the local modularity, which is an essential advantage of HMC. Local approaches first build multiple local classifiers on the class hierarchy. Then, hierarchical information is aggregated across the local prediction results of all the local classifiers to obtain the global prediction results for all the nodes. We refer to “local prediction result” and “global prediction result” as “local prediction” and “global prediction”, respectively. Previous local approaches suffer from three drawbacks. First, most of them focus only on the parent-child relationship. Other relationships in the hierarchy (e.g., ancestor-descendant, siblings) are ignored. Second, their models are sensitive to local prediction. The global prediction of each node is only decided by the local predictions of several closely related nodes. The error of local predictions is more likely to propagate to global predictions. Third, most local methods assume that the local structural constraint between two nodes will be reflected in their local predictions. However, this assumption might be shaken by different choices of features, local classification models, and positive-negative sample selection rules [6], [7]. In such situations, previous methods would fail to integrate valid structural information into local prediction.

In this paper, we propose a novel local HMC framework, Fully Associative Ensemble Learning (FAEL). We call it “fully associative ensemble” because in our model the global prediction of each node considers the relationships between the current node and all the other nodes. Specifically, a multi-variable regression model is built to minimize the empirical loss between the global predictions of all the training samples and their corresponding true label observations.

Our contributions are: we (i) developed a novel local hierarchical ensemble framework, in which all the structural relationships in the class hierarchy are used to obtain global prediction; (ii) introduced empirical loss minimization into HMC, so that the learned model can capture the most useful information from historical data; and (iii) proposed sparse, kernel, and binary constraint HMC models.

Parts of this work have been published in [8]. In this paper, we extend that work by providing: (i) the sparse basic model with l1 norm; (ii) a new application of DAG structured class hierarchy in a visual recognition dataset based on deep learning features; (ii) the sensitivity analysis of all the parameters; (iii) the performance of two more kernel functions (Laplace kernel and Polynomial kernel) in the kernel model; and (iv) statistical analysis of all the experimental results.

The rest of this paper is organized as follows: in Section 2 we discuss related work. Section 3 describes the proposed FAEL models. The experimental design, results and analysis are presented in Section 4. Section 5 concludes the paper.

Section snippets

Related work

In this section, we review the most recent works in HMC and flat multi-label classification, especially those that are related to our work. Also, we illustrate how our framework is different from previous ones.

In HMC, Both global and local approaches have been developed. Most global approaches are extended from classic single label machine learning algorithms. Wang et al. [9] used association rules for hierarchical document categorization. Hierarchical relationships between different classes

Fully associative ensemble learning

Let S={s1,s2,,sn} represent a hierarchical multi-label training set, which comprises n samples. Its hierarchical label set is denoted by C={c1,c2,,cl}. There are l labels in total, and each label corresponds to one unique node in hierarchy H. The training label matrix is defined as a binary matrix Y={yij}, with size n × l. If the ith sample has the jth label, yij=1, otherwise yij=0. As a local approach, local classifiers F={f1,f2,,fl} are built on each node. The local predictions of S are

Experiments

This section presents the datasets and experimental methodology used to evaluate the proposed framework and compare it to other baseline methods. The sensitivity analysis of all the parameters and statistical analysis are also discussed.

Conclusion

This paper introduces a novel HMC framework. We build a multi-variable regression model between the global and local predictions of all the nodes. The basic model is extended to the sparse model, the kernel model and the binary constraint model. Our work also raises several potential issues that we plan to address in the future. As the number of classes increases, the proposed fully associative model may suffer from both computation and performance limitations. A large-scale, fully associative

Acknowledgments

This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2015-ST-061-BSH001. This grant is awarded to the Borders, Trade, and Immigration (BTI) Institute: A DHS Center of Excellence led by the University of Houston, and includes support for the project “Image and Video Person Identification in an Operational Environment: Phase I” awarded to the University of Houston. The views and conclusions contained in this document are those of the

Lingfeng Zhang received the B.S. degree in Mathematics and the M.S. in computer science from the Chongqing University, Chongqing, China. He is currently a Ph.D. student in Department of Computer Science, University of Houston, Houston, TX, USA. He joined Computational Biomedicine Laboratory in 2012. His current research interests include machine learning, deep learning, image processing, computer vision, big data analysis.

References (58)

  • T. Fagni et al.

    On the selection of negative examples for hierarchical text categorization

    Proceedings of Language and Technology Conference, Poznań, Poland

    (2007)
  • L. Zhang et al.

    Fully associative ensemble learning for hierarchical multi-label classification

    Proceedings of British Machine Vision Conference, Nottingham, UK

    (2014)
  • K. Wang et al.

    Hierarchical classification of real life documents

    Proceedings of SIAM International Conference on Data Mining, Chicago, IL, USA

    (2001)
  • C. Vens et al.

    Decision trees for hierarchical multi-label classification

    Mach. Learn.

    (2008)
  • W. Bi et al.

    Multi-label classification on tree-and DAG-structured hierarchies

    Proceedings of International Conference on Machine Learning, Bellevue, WA

    (2011)
  • R. Cerri et al.

    A genetic algorithm for hierarchical multi-label classification

    Proceedings of Annual ACM Symposium on Applied Computing, Trento, Italy

    (2012)
  • R.C. Barros et al.

    Probabilistic clustering for hierarchical multi-label classification of protein functions

    Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic

    (2013)
  • S. Dumais et al.

    Hierarchical classification of web content

    Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Athens, Greece

    (2000)
  • Z. Barutcuoglu et al.

    Hierarchical shape classification using Bayesian aggregation

    Proceedings of IEEE International Conference on Shape Modeling and Applications, Matsushima, Japan

    (2006)
  • N. Cesa-Bianchi et al.

    Incremental algorithms for hierarchical classification

    J. Mach. Learn. Res.

    (2006)
  • N. Alaydie et al.

    Exploiting label dependency for hierarchical multi-label classification

    Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, Malaysia

    (2012)
  • Z. Ren et al.

    Hierarchical multi-label classification of social text streams

    Proceedings of International ACM SIGIR Conference on Research & Development in Information Retrieval, Queensland, Australia

    (2014)
  • P. Vateekul et al.

    Hierarchical multi-label classification with SVMs: a case study in gene function prediction

    Intell. Data Anal.

    (2014)
  • G. Valentini

    True path rule hierarchical ensembles for genome-wide gene function prediction

    IEEE/ACM Trans. Comput. Biol. Bioinf.

    (2011)
  • G. Valentini et al.

    Prediction of human gene-phenotype associations by exploiting the hierarchical structure of the human phenotype ontology

    Bioinformatics and Biomedical Engineering

    (2015)
  • X. Jiang et al.

    Integration of relational and hierarchical network information for protein function prediction

    BMC Bioinform.

    (2008)
  • P.N. Bennett et al.

    Refined experts: improving classification in large taxonomies

    Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Boston, MA, USA

    (2009)
  • Y. Guan et al.

    Predicting gene function in a hierarchical context with an ensemble of classifiers

    Genome Biol.

    (2008)
  • S. Ji et al.

    A shared-subspace learning framework for multi-label classification

    ACM Trans. Knowl. Discovery Data (TKDD)

    (2010)
  • Cited by (75)

    • Hierarchical GAN-Tree and Bi-Directional Capsules for multi-label image classification

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Implementation Details: To achieve a comparable baseline performance by the same data augmentation [27], we resize the input images to 256 × 256 and randomly crop regions into 224 × 224 with random horizontal flips. For the traditional hierarchical multi-label learning methods, i.e., TD, TPR, FAEL, and K-FAEL, all parameters involved in these baselines mentioned above follow the given values defined in the Ref. [48]. We empirically set the same value for the common parameters for classical Neural Network algorithms, thus making a fair comparison with each other.

    • Removal of self co-articulation and recognition of dynamic hand gestures using deep architectures

      2022, Applied Soft Computing
      Citation Excerpt :

      We opted to develop a hierarchy model using a fuzzy system that could separate the gestures in advance for the classifiers at the child node to handle this uncertainty. Also, the performance of local classifiers over global classifiers in the field of multi-class problems [4,18,33] encouraged us to use multiple local classifiers at the child node than one global classifier at the parent node. For defuzzification, the largest of maximum (lom) is used to compute the crisp value of the two groups.

    • Multi-Directional Multi-Label Learning

      2021, Signal Processing
      Citation Excerpt :

      1) The discriminative classifier is chose as the Discriminative Least Squared Regression model (DLSR) [16] which learns the classifier, bias and dragging matrix simultaneously. 2) In many multi-label data, the labels will follow hierarchical structure [17,18], thus we propose a strictly hierarchical label constraints to guarantees that the predicted probability of parent label is larger than child label for each sample. 3) Based on the hierarchical constraint, we design a generalized label co-occurrence matrix to characterize the label correlations effectively.

    View all citing articles on Scopus

    Lingfeng Zhang received the B.S. degree in Mathematics and the M.S. in computer science from the Chongqing University, Chongqing, China. He is currently a Ph.D. student in Department of Computer Science, University of Houston, Houston, TX, USA. He joined Computational Biomedicine Laboratory in 2012. His current research interests include machine learning, deep learning, image processing, computer vision, big data analysis.

    Shishir K. Shah received the B.S. degree in mechanical engineering and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Texas, Austin, TX, USA. He is currently a Professor with the Department of Computer Science, University of Houston, Houston, TX, USA. He joined the department in 2005. He has co-edited one book and authored numerous papers on object recognition, sensor fusion, statistical pattern analysis, biometrics, and video analytics. He directs research at the Quantitative Imaging Laboratory. His current research interests include fundamentals of computer vision, pattern recognition, and statistical methods in image and data analysis with applications in multimodality sensing, video analytics, object recognition, biometrics, and microscope image analysis.

    Ioannis A. Kakadiaris serves as the Director of the Borders, Trade, and Immigration Institute, a Department of Homeland Security Center of Excellence led by the University of Houston (UH). As director for BTI Institute, Ioannis oversees multiple projects, undertaken with seventeen partners across nine states, which provide homeland security enterprise education and workforce development and which study complex, multi-disciplinary issues related to flows of people, goods, and data across borders. A Hugh Roy and Lillie Cranz Cullen Distinguished University Professor of Computer Science, Ioannis is also an international expert in facial recognition and data/video analytics. He earned his B.S. in physics at the University of Athens in Greece, his M.S. in computer science from Northeastern University, and his Ph.D. in computer science at the University of Pennsylvania. In addition to twice winning the UH Computer Science Research Excellence Award, Ioannis has been recognized for his work with several distinguished honors, including the NSF Early Career Development Award, the Schlumberger Technical Foundation Award, the UH Enron Teaching Excellence Award, and the James Muller Vulnerable Plaque Young Investigator Prize.

    View full text