Hierarchical Multi-label Classification using Fully Associative Ensemble Learning

doi:10.1016/j.patcog.2017.05.007

Pattern Recognition

Volume 70, October 2017, Pages 89-103

https://doi.org/10.1016/j.patcog.2017.05.007 Get rights and content

Highlights

•
Developing a local hierarchical ensemble framework for Hierarchical Multi-label Classification (HMC), in which all the structural relationships in the class hierarchy are used to obtain global prediction.
•
Introducing empirical loss minimization into HMC, so that the learned model can capture the most useful information from historical data.
•
Proposing sparse, kernel, and binary constraint HMC models.

Abstract

Traditional flat classification methods (e.g., binary or multi-class classification) neglect the structural information between different classes. In contrast, Hierarchical Multi-label Classification (HMC) considers the structural information embedded in the class hierarchy, and uses it to improve classification performance. In this paper, we propose a local hierarchical ensemble framework for HMC, Fully Associative Ensemble Learning (FAEL). We model the relationship between each class node’s global prediction and the local predictions of all the class nodes as a multi-variable regression problem with Frobenius norm or l₁ norm regularization. It can be extended using the kernel trick, which explores the complex correlation between global and local prediction. In addition, we introduce a binary constraint model to restrict the optimal weight matrix learning. The proposed models have been applied to image annotation and gene function prediction datasets with tree structured class hierarchy and large scale visual recognition dataset with Direct Acyclic Graph (DAG) structured class hierarchy. The experimental results indicate that our models achieve better performance when compared with other baseline methods.

Introduction

Hierarchical Multi-label Classification (HMC) is a variant of classification where each sample has more than one label and all these labels are organized hierarchically in a tree or Direct Acyclic Graph (DAG). In reality, HMC can be applied to many domains [1], [2], [3]. In web page classification, one website with the label “football” could be labeled with a high level label “sport”. In image annotation, an image tagged as “outdoor” might have other low level concept labels, like “beach” or “garden”. In gene function prediction, a gene can be simultaneously labeled as “metabolism” and “catalytic or binding activities” by the biological process hierarchy and the molecular function hierarchy, respectively.

A rich source of hierarchical information in tree and DAG structured class hierarchies is helpful to improve classification performance [4]. Based on how this information is used, previous HMC approaches can be divided into global (big-bang) or local [5]. Global approaches learn a single model for the whole class hierarchy. Global approaches enjoy smaller model size because they build one model for the whole hierarchy. However, they ignore the local modularity, which is an essential advantage of HMC. Local approaches first build multiple local classifiers on the class hierarchy. Then, hierarchical information is aggregated across the local prediction results of all the local classifiers to obtain the global prediction results for all the nodes. We refer to “local prediction result” and “global prediction result” as “local prediction” and “global prediction”, respectively. Previous local approaches suffer from three drawbacks. First, most of them focus only on the parent-child relationship. Other relationships in the hierarchy (e.g., ancestor-descendant, siblings) are ignored. Second, their models are sensitive to local prediction. The global prediction of each node is only decided by the local predictions of several closely related nodes. The error of local predictions is more likely to propagate to global predictions. Third, most local methods assume that the local structural constraint between two nodes will be reflected in their local predictions. However, this assumption might be shaken by different choices of features, local classification models, and positive-negative sample selection rules [6], [7]. In such situations, previous methods would fail to integrate valid structural information into local prediction.

In this paper, we propose a novel local HMC framework, Fully Associative Ensemble Learning (FAEL). We call it “fully associative ensemble” because in our model the global prediction of each node considers the relationships between the current node and all the other nodes. Specifically, a multi-variable regression model is built to minimize the empirical loss between the global predictions of all the training samples and their corresponding true label observations.

Our contributions are: we (i) developed a novel local hierarchical ensemble framework, in which all the structural relationships in the class hierarchy are used to obtain global prediction; (ii) introduced empirical loss minimization into HMC, so that the learned model can capture the most useful information from historical data; and (iii) proposed sparse, kernel, and binary constraint HMC models.

Parts of this work have been published in [8]. In this paper, we extend that work by providing: (i) the sparse basic model with l₁ norm; (ii) a new application of DAG structured class hierarchy in a visual recognition dataset based on deep learning features; (ii) the sensitivity analysis of all the parameters; (iii) the performance of two more kernel functions (Laplace kernel and Polynomial kernel) in the kernel model; and (iv) statistical analysis of all the experimental results.

The rest of this paper is organized as follows: in Section 2 we discuss related work. Section 3 describes the proposed FAEL models. The experimental design, results and analysis are presented in Section 4. Section 5 concludes the paper.

Section snippets

Related work

In this section, we review the most recent works in HMC and flat multi-label classification, especially those that are related to our work. Also, we illustrate how our framework is different from previous ones.

In HMC, Both global and local approaches have been developed. Most global approaches are extended from classic single label machine learning algorithms. Wang et al. [9] used association rules for hierarchical document categorization. Hierarchical relationships between different classes

Fully associative ensemble learning

Let $S = {s_{1}, s_{2}, \dots, s_{n}}$ represent a hierarchical multi-label training set, which comprises n samples. Its hierarchical label set is denoted by $C = {c_{1}, c_{2}, \dots, c_{l}}$ . There are l labels in total, and each label corresponds to one unique node in hierarchy $H$ . The training label matrix is defined as a binary matrix $Y = {y_{i j}},$ with size n × l. If the ith sample has the jth label, $y_{i j} = 1,$ otherwise $y_{i j} = 0$ . As a local approach, local classifiers $F = {f_{1}, f_{2}, \dots, f_{l}}$ are built on each node. The local predictions of $S$ are

Experiments

This section presents the datasets and experimental methodology used to evaluate the proposed framework and compare it to other baseline methods. The sensitivity analysis of all the parameters and statistical analysis are also discussed.

Conclusion

This paper introduces a novel HMC framework. We build a multi-variable regression model between the global and local predictions of all the nodes. The basic model is extended to the sparse model, the kernel model and the binary constraint model. Our work also raises several potential issues that we plan to address in the future. As the number of classes increases, the proposed fully associative model may suffer from both computation and performance limitations. A large-scale, fully associative

Acknowledgments

This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2015-ST-061-BSH001. This grant is awarded to the Borders, Trade, and Immigration (BTI) Institute: A DHS Center of Excellence led by the University of Houston, and includes support for the project “Image and Video Person Identification in an Operational Environment: Phase I” awarded to the University of Houston. The views and conclusions contained in this document are those of the

Lingfeng Zhang received the B.S. degree in Mathematics and the M.S. in computer science from the Chongqing University, Chongqing, China. He is currently a Ph.D. student in Department of Computer Science, University of Houston, Houston, TX, USA. He joined Computational Biomedicine Laboratory in 2012. His current research interests include machine learning, deep learning, image processing, computer vision, big data analysis.

References (58)

I. Dimitrovski et al.
Hierarchical annotation of medical images
Pattern Recognit.
(2011)
I. Dimitrovski et al.
Hierarchical classification of diatom images using ensembles of predictive clustering trees
Ecol. Inform.
(2012)
R. Cerri et al.
Hierarchical multi-label classification using local neural networks
J. Comput. Syst. Sci.
(2014)
H.J. Escalante et al.
The segmented and annotated IAPR TC-12 benchmark
Comput. Vision Image Understanding
(2010)
J. Zhou et al.
Modeling disease progression via multi-task learning
Neuroimage
(2013)
T. Li et al.
Hierarchical document classification using automatically generated hierarchy
J. Intell. Inf. Syst.
(2007)
N. Cesa-Bianchi et al.
Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference
Mach. Learn.
(2012)
P.N. Robinson et al.
A hierarchical ensemble method for dag-structured taxonomies
Multiple Classifier Systems
(2015)
C.J. Silla et al.
A survey of hierarchical classification across different application domains
Data Min. Knowl. Discov.
(2011)
C.N. Silla et al.
Novel top-down approaches for hierarchical classification and their application to automatic music genre classification
Proceedings of IEEE International Conference on Systems, Man and Cybernetics, San Antonio, Texas, USA
(2009)

T. Fagni et al.

On the selection of negative examples for hierarchical text categorization

Proceedings of Language and Technology Conference, Poznań, Poland

(2007)

L. Zhang et al.

Fully associative ensemble learning for hierarchical multi-label classification

Proceedings of British Machine Vision Conference, Nottingham, UK

(2014)

K. Wang et al.

Hierarchical classification of real life documents

Proceedings of SIAM International Conference on Data Mining, Chicago, IL, USA

(2001)

C. Vens et al.

Decision trees for hierarchical multi-label classification

Mach. Learn.

(2008)

W. Bi et al.

Multi-label classification on tree-and DAG-structured hierarchies

Proceedings of International Conference on Machine Learning, Bellevue, WA

(2011)

R. Cerri et al.

A genetic algorithm for hierarchical multi-label classification

Proceedings of Annual ACM Symposium on Applied Computing, Trento, Italy

(2012)

R.C. Barros et al.

Probabilistic clustering for hierarchical multi-label classification of protein functions

Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic

(2013)

S. Dumais et al.

Hierarchical classification of web content

Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Athens, Greece

(2000)

Z. Barutcuoglu et al.

Hierarchical shape classification using Bayesian aggregation

Proceedings of IEEE International Conference on Shape Modeling and Applications, Matsushima, Japan

(2006)

N. Cesa-Bianchi et al.

Incremental algorithms for hierarchical classification

J. Mach. Learn. Res.

(2006)

N. Alaydie et al.

Exploiting label dependency for hierarchical multi-label classification

Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, Malaysia

(2012)

Z. Ren et al.

Hierarchical multi-label classification of social text streams

Proceedings of International ACM SIGIR Conference on Research & Development in Information Retrieval, Queensland, Australia

(2014)

P. Vateekul et al.

Hierarchical multi-label classification with SVMs: a case study in gene function prediction

Intell. Data Anal.

(2014)

G. Valentini

True path rule hierarchical ensembles for genome-wide gene function prediction

IEEE/ACM Trans. Comput. Biol. Bioinf.

(2011)

G. Valentini et al.

Prediction of human gene-phenotype associations by exploiting the hierarchical structure of the human phenotype ontology

Bioinformatics and Biomedical Engineering

(2015)

X. Jiang et al.

Integration of relational and hierarchical network information for protein function prediction

BMC Bioinform.

(2008)

P.N. Bennett et al.

Refined experts: improving classification in large taxonomies

Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Boston, MA, USA

(2009)

Y. Guan et al.

Predicting gene function in a hierarchical context with an ensemble of classifiers

Genome Biol.

(2008)

S. Ji et al.

A shared-subspace learning framework for multi-label classification

ACM Trans. Knowl. Discovery Data (TKDD)

(2010)

Cited by (75)

Joint optimization of scoring and thresholding models for online multi-label classification
2023, Pattern Recognition
Existing online multi-label classification works cannot well handle the online label thresholding problem and lack regret analysis for their online algorithms. This paper proposes a novel framework of joint optimization of scoring and thresholding models for online multi-label classification, with the aim to overcome the above drawbacks. The key feature of our framework is that both scoring and thresholding models are included as important components of the online multi-label classifier and are incorporated into one online optimization problem. Based on this framework, we present two adaptive label thresholding algorithms and two fixed thresholding algorithms. For each type of algorithms, a first-order method and a second-order one are provided for updating the online multi-label classifier. Both methods enjoy a closed-form update. Our proposed algorithms are proved to achieve a sub-linear regret. Using Mercer kernels, two first-order algorithms can be extended to handle nonlinear multi-label prediction tasks. Experiments show the advantage of the adaptive and the fixed thresholding algorithms, in terms of various multi-label performance metrics.
Hierarchical GAN-Tree and Bi-Directional Capsules for multi-label image classification
2022, Knowledge-Based Systems
Citation Excerpt :
Implementation Details: To achieve a comparable baseline performance by the same data augmentation [27], we resize the input images to 256 × 256 and randomly crop regions into 224 × 224 with random horizontal flips. For the traditional hierarchical multi-label learning methods, i.e., TD, TPR, FAEL, and K-FAEL, all parameters involved in these baselines mentioned above follow the given values defined in the Ref. [48]. We empirically set the same value for the common parameters for classical Neural Network algorithms, thus making a fair comparison with each other.
Compared with the flat multi-label image classification, the hierarchical structure reserves a richer source of structural information to represent complicated relationships between labels in the real world. However, existing multi-label image classification methods focus on the accuracy of label prediction, ignoring the structural information embedded in the hierarchical label space. Furthermore, they hardly form the relevant visual feature space corresponding to the hierarchical label structure. In this paper, we propose a novel hierarchical framework based on the feature and label structural information named Hierarchical GAN-Tree and Bi-Directional Capsules (HGT&BC) to address these problems. We conduct Hierarchical GAN-Tree for feature space representation and Hierarchical Bi-Directional Capsules for label space classification, respectively. Hierarchical GAN-Tree generates hierarchical feature space using the unsupervised divisive clustering pattern according to the hierarchical structure, alleviating the mode-collapse of generators and the overfitting manifestation of conventional GANs. Hierarchical Bi-Directional Capsules utilize the hierarchical label structure in iterations of top-down and bottom-up processes: the top-down process integrates hierarchical relationships into the probability computation to enhance partial hierarchical relationships; the bottom-up process modifies the dynamic routing mechanism between capsules to represent semantic objects for the comprehensive global hierarchical classifiers. Owing to the two components, HGT&BC successfully expresses the hierarchical relationships in both feature and label space and improves the performance of multi-label image classification. Extensive experimental results on four benchmark datasets demonstrate the effectiveness and efficiency of our hierarchical framework in practice.
Removal of self co-articulation and recognition of dynamic hand gestures using deep architectures
2022, Applied Soft Computing
Citation Excerpt :
We opted to develop a hierarchy model using a fuzzy system that could separate the gestures in advance for the classifiers at the child node to handle this uncertainty. Also, the performance of local classifiers over global classifiers in the field of multi-class problems [4,18,33] encouraged us to use multiple local classifiers at the child node than one global classifier at the parent node. For defuzzification, the largest of maximum (lom) is used to compute the crisp value of the two groups.
Natural gesticulation in-air is multifaceted, as defining a gesture’s pattern/size/speed or its stroke information is difficult. The unavailability of vision-based techniques to distinguish the ground truth stroke segments and the intentional movements (self co-articulation) results in capturing everything on the trajectory. But recognizing the gesture patterns (A–Z, a–z, 0–9, 04 operators, and 29 symbols) along with their self co-articulations results in the rise of misclassification. Hence, gestures are separated into 3 sets based on their physical structures using an artificial neural network. Then, set specific pre-processing models are proposed to remove these self co-articulations. As a result, a mean error rate of 0.0112 (ground truth segment removed) and 5.63% of self co-articulations present in the gesture patterns is obtained. A relative improvement of 22% (accuracy — 94.17%) over the existing models is achieved. Then, gestures are clustered into two groups by Mamdani fuzzy interference system using prior stroke information. Using transfer learning, separate pre-trained AlexNet models were utilized to recognize the gesture patterns falling under each group with an average accuracy of 97.05% (precision — 0.9680, recall — 0.9656; F-score — 0.9668). This is a relative improvement of $\sim$ 17% over the existing state-of-the-art models in gesture recognition.
Multi-Directional Multi-Label Learning
2021, Signal Processing
Citation Excerpt :
1) The discriminative classifier is chose as the Discriminative Least Squared Regression model (DLSR) [16] which learns the classifier, bias and dragging matrix simultaneously. 2) In many multi-label data, the labels will follow hierarchical structure [17,18], thus we propose a strictly hierarchical label constraints to guarantees that the predicted probability of parent label is larger than child label for each sample. 3) Based on the hierarchical constraint, we design a generalized label co-occurrence matrix to characterize the label correlations effectively.
In multi-label learning, the key problem is to capture the relationships between multiple labels, including proximities and unconformities. In this paper, we consider the relationships among multiple labels from multi-directions, including utilizing discriminative classifier, proposing a general hierarchical constraint and proximity correlation, meanwhile combining low-rank constraint, to infer a novel Multi-Directional Multi-Label learning (MDML) model. To optimize the problems involved in to the proposed models, we develop an iterative algorithms based on the alternating direction method of multipliers (ADMM) algorithm. In the simulations, the experimental results on 4 popular benchmark datasets demonstrate the superiorities of MDML model.
HMATC: Hierarchical multi-label Arabic text classification model using machine learning
2021, Egyptian Informatics Journal
Multi-label classification assigns multiple labels to each document concurrently. Many real-world classification problems tend to employ high-dimensional label spaces, which can be naturally structured in a hierarchy. In this type of problem, each instance may belong to multiple labels and labels are organized in a hierarchical structure. It presents a more complex problem than flat classification, given that the classification algorithm has to take into account hierarchical relationships between labels and be able to predict multiple labels for the same instance. Few studies have investigated multi-label text classification for the Arabic language. Most of these studies have focused mainly on flat classification and have neglected the hierarchical structure. Therefore, this paper explores the hierarchical multi-label classification in the context of the Arabic language. It proposes a hierarchical multi-label Arabic text classification (HMATC) model with a machine learning approach. The impact of feature selection methods and feature set dimensions on classification performance are also investigated. In addition, the Hierarchy Of Multilabel ClassifiER (HOMER) algorithm is optimized via examination of different sets of multi-label classifiers, clustering algorithms and different numbers of clusters to improve the hierarchical classification. Moreover, this study contributes to existing research by introducing a hierarchical multi-label Arabic dataset in an appropriate format for hierarchical classification and making it publicly available. The results reveal that the proposed model outperforms all models considered in the experiments in terms of the computational cost, which consumed less cost (2 h) compared with other evaluated models. In addition, it shows a significant improvement compared with the state-of-the-art model (Fatwa model) in terms of Hamming loss (0.004), hierarchical loss (1.723), multi-label accuracy (0.758), subset accuracy (0.292), micro-averaged precision (0.879), micro-averaged recall (0.828), and micro-averaged F-measure (0.853).
FleBiC: Learning classifiers from high-dimensional biomedical data using discriminative biclusters with non-constant patterns
2021, Pattern Recognition
The discovery of discriminative patterns from high-dimensional data offers the possibility to learn from informative subspaces and pattern-centric features, paving the way to associative classifiers. Despite the success achieved by associative classifiers, such as random forests or XGBoost, they generally neglect discriminative subspaces with non-constant coherencies. Research on biclustering has for two decades highlighted the role of non-constant patterns in biomedical domains, including additive and order-preserving patterns. Still, their relevance for classification remains unexplored.
This work assesses the impact of discriminative patterns with varying coherence and quality on associative classification. A novel classifier, FleBiC, is proposed as a result. FleBiC extends pattern-based biclustering with principles to match observations against non-constant and noise-tolerant patterns, address generalization difficulties, minimize scarcity of matches, support class disjunctions, and offer statistical guarantees. Results on biological and clinical data highlight the role of non-constant patterns, specially order-preserving patterns, for improving the performance of state-of-the-art classifiers.

View all citing articles on Scopus

Shishir K. Shah received the B.S. degree in mechanical engineering and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Texas, Austin, TX, USA. He is currently a Professor with the Department of Computer Science, University of Houston, Houston, TX, USA. He joined the department in 2005. He has co-edited one book and authored numerous papers on object recognition, sensor fusion, statistical pattern analysis, biometrics, and video analytics. He directs research at the Quantitative Imaging Laboratory. His current research interests include fundamentals of computer vision, pattern recognition, and statistical methods in image and data analysis with applications in multimodality sensing, video analytics, object recognition, biometrics, and microscope image analysis.

Ioannis A. Kakadiaris serves as the Director of the Borders, Trade, and Immigration Institute, a Department of Homeland Security Center of Excellence led by the University of Houston (UH). As director for BTI Institute, Ioannis oversees multiple projects, undertaken with seventeen partners across nine states, which provide homeland security enterprise education and workforce development and which study complex, multi-disciplinary issues related to flows of people, goods, and data across borders. A Hugh Roy and Lillie Cranz Cullen Distinguished University Professor of Computer Science, Ioannis is also an international expert in facial recognition and data/video analytics. He earned his B.S. in physics at the University of Athens in Greece, his M.S. in computer science from Northeastern University, and his Ph.D. in computer science at the University of Pennsylvania. In addition to twice winning the UH Computer Science Research Excellence Award, Ioannis has been recognized for his work with several distinguished honors, including the NSF Early Career Development Award, the Schlumberger Technical Foundation Award, the UH Enron Teaching Excellence Award, and the James Muller Vulnerable Plaque Young Investigator Prize.

View full text

Published by Elsevier Ltd.

Hierarchical Multi-label Classification using Fully Associative Ensemble Learning

Highlights

Abstract

Introduction

Section snippets

Related work

Fully associative ensemble learning

Experiments

Conclusion

Acknowledgments

Pattern Recognit.

Ecol. Inform.

J. Comput. Syst. Sci.

Comput. Vision Image Understanding

Neuroimage

Hierarchical document classification using automatically generated hierarchy

J. Intell. Inf. Syst.

Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference

Mach. Learn.

A hierarchical ensemble method for dag-structured taxonomies

Multiple Classifier Systems

A survey of hierarchical classification across different application domains

Data Min. Knowl. Discov.

Novel top-down approaches for hierarchical classification and their application to automatic music genre classification

Proceedings of IEEE International Conference on Systems, Man and Cybernetics, San Antonio, Texas, USA

On the selection of negative examples for hierarchical text categorization

Proceedings of Language and Technology Conference, Poznań, Poland

Fully associative ensemble learning for hierarchical multi-label classification

Proceedings of British Machine Vision Conference, Nottingham, UK

Hierarchical classification of real life documents

Proceedings of SIAM International Conference on Data Mining, Chicago, IL, USA

Decision trees for hierarchical multi-label classification

Mach. Learn.

Multi-label classification on tree-and DAG-structured hierarchies

Proceedings of International Conference on Machine Learning, Bellevue, WA

A genetic algorithm for hierarchical multi-label classification

Proceedings of Annual ACM Symposium on Applied Computing, Trento, Italy

Probabilistic clustering for hierarchical multi-label classification of protein functions

Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic

Hierarchical classification of web content

Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Athens, Greece

Hierarchical shape classification using Bayesian aggregation

Proceedings of IEEE International Conference on Shape Modeling and Applications, Matsushima, Japan

Incremental algorithms for hierarchical classification

J. Mach. Learn. Res.

Exploiting label dependency for hierarchical multi-label classification

Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, Malaysia

Hierarchical multi-label classification of social text streams

Proceedings of International ACM SIGIR Conference on Research & Development in Information Retrieval, Queensland, Australia

Hierarchical multi-label classification with SVMs: a case study in gene function prediction

Intell. Data Anal.

True path rule hierarchical ensembles for genome-wide gene function prediction

IEEE/ACM Trans. Comput. Biol. Bioinf.

Prediction of human gene-phenotype associations by exploiting the hierarchical structure of the human phenotype ontology

Bioinformatics and Biomedical Engineering

Integration of relational and hierarchical network information for protein function prediction

BMC Bioinform.

Refined experts: improving classification in large taxonomies

Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Boston, MA, USA

Predicting gene function in a hierarchical context with an ensemble of classifiers

Genome Biol.

A shared-subspace learning framework for multi-label classification

ACM Trans. Knowl. Discovery Data (TKDD)