Hierarchical Multi-label Classification using Fully Associative Ensemble Learning
Introduction
Hierarchical Multi-label Classification (HMC) is a variant of classification where each sample has more than one label and all these labels are organized hierarchically in a tree or Direct Acyclic Graph (DAG). In reality, HMC can be applied to many domains [1], [2], [3]. In web page classification, one website with the label “football” could be labeled with a high level label “sport”. In image annotation, an image tagged as “outdoor” might have other low level concept labels, like “beach” or “garden”. In gene function prediction, a gene can be simultaneously labeled as “metabolism” and “catalytic or binding activities” by the biological process hierarchy and the molecular function hierarchy, respectively.
A rich source of hierarchical information in tree and DAG structured class hierarchies is helpful to improve classification performance [4]. Based on how this information is used, previous HMC approaches can be divided into global (big-bang) or local [5]. Global approaches learn a single model for the whole class hierarchy. Global approaches enjoy smaller model size because they build one model for the whole hierarchy. However, they ignore the local modularity, which is an essential advantage of HMC. Local approaches first build multiple local classifiers on the class hierarchy. Then, hierarchical information is aggregated across the local prediction results of all the local classifiers to obtain the global prediction results for all the nodes. We refer to “local prediction result” and “global prediction result” as “local prediction” and “global prediction”, respectively. Previous local approaches suffer from three drawbacks. First, most of them focus only on the parent-child relationship. Other relationships in the hierarchy (e.g., ancestor-descendant, siblings) are ignored. Second, their models are sensitive to local prediction. The global prediction of each node is only decided by the local predictions of several closely related nodes. The error of local predictions is more likely to propagate to global predictions. Third, most local methods assume that the local structural constraint between two nodes will be reflected in their local predictions. However, this assumption might be shaken by different choices of features, local classification models, and positive-negative sample selection rules [6], [7]. In such situations, previous methods would fail to integrate valid structural information into local prediction.
In this paper, we propose a novel local HMC framework, Fully Associative Ensemble Learning (FAEL). We call it “fully associative ensemble” because in our model the global prediction of each node considers the relationships between the current node and all the other nodes. Specifically, a multi-variable regression model is built to minimize the empirical loss between the global predictions of all the training samples and their corresponding true label observations.
Our contributions are: we (i) developed a novel local hierarchical ensemble framework, in which all the structural relationships in the class hierarchy are used to obtain global prediction; (ii) introduced empirical loss minimization into HMC, so that the learned model can capture the most useful information from historical data; and (iii) proposed sparse, kernel, and binary constraint HMC models.
Parts of this work have been published in [8]. In this paper, we extend that work by providing: (i) the sparse basic model with l1 norm; (ii) a new application of DAG structured class hierarchy in a visual recognition dataset based on deep learning features; (ii) the sensitivity analysis of all the parameters; (iii) the performance of two more kernel functions (Laplace kernel and Polynomial kernel) in the kernel model; and (iv) statistical analysis of all the experimental results.
The rest of this paper is organized as follows: in Section 2 we discuss related work. Section 3 describes the proposed FAEL models. The experimental design, results and analysis are presented in Section 4. Section 5 concludes the paper.
Section snippets
Related work
In this section, we review the most recent works in HMC and flat multi-label classification, especially those that are related to our work. Also, we illustrate how our framework is different from previous ones.
In HMC, Both global and local approaches have been developed. Most global approaches are extended from classic single label machine learning algorithms. Wang et al. [9] used association rules for hierarchical document categorization. Hierarchical relationships between different classes
Fully associative ensemble learning
Let represent a hierarchical multi-label training set, which comprises n samples. Its hierarchical label set is denoted by . There are l labels in total, and each label corresponds to one unique node in hierarchy . The training label matrix is defined as a binary matrix with size n × l. If the ith sample has the jth label, otherwise . As a local approach, local classifiers are built on each node. The local predictions of are
Experiments
This section presents the datasets and experimental methodology used to evaluate the proposed framework and compare it to other baseline methods. The sensitivity analysis of all the parameters and statistical analysis are also discussed.
Conclusion
This paper introduces a novel HMC framework. We build a multi-variable regression model between the global and local predictions of all the nodes. The basic model is extended to the sparse model, the kernel model and the binary constraint model. Our work also raises several potential issues that we plan to address in the future. As the number of classes increases, the proposed fully associative model may suffer from both computation and performance limitations. A large-scale, fully associative
Acknowledgments
This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2015-ST-061-BSH001. This grant is awarded to the Borders, Trade, and Immigration (BTI) Institute: A DHS Center of Excellence led by the University of Houston, and includes support for the project “Image and Video Person Identification in an Operational Environment: Phase I” awarded to the University of Houston. The views and conclusions contained in this document are those of the
Lingfeng Zhang received the B.S. degree in Mathematics and the M.S. in computer science from the Chongqing University, Chongqing, China. He is currently a Ph.D. student in Department of Computer Science, University of Houston, Houston, TX, USA. He joined Computational Biomedicine Laboratory in 2012. His current research interests include machine learning, deep learning, image processing, computer vision, big data analysis.
References (58)
- et al.
Hierarchical annotation of medical images
Pattern Recognit.
(2011) - et al.
Hierarchical classification of diatom images using ensembles of predictive clustering trees
Ecol. Inform.
(2012) - et al.
Hierarchical multi-label classification using local neural networks
J. Comput. Syst. Sci.
(2014) - et al.
The segmented and annotated IAPR TC-12 benchmark
Comput. Vision Image Understanding
(2010) - et al.
Modeling disease progression via multi-task learning
Neuroimage
(2013) - et al.
Hierarchical document classification using automatically generated hierarchy
J. Intell. Inf. Syst.
(2007) - et al.
Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference
Mach. Learn.
(2012) - et al.
A hierarchical ensemble method for dag-structured taxonomies
Multiple Classifier Systems
(2015) - et al.
A survey of hierarchical classification across different application domains
Data Min. Knowl. Discov.
(2011) - et al.
Novel top-down approaches for hierarchical classification and their application to automatic music genre classification
Proceedings of IEEE International Conference on Systems, Man and Cybernetics, San Antonio, Texas, USA
(2009)
On the selection of negative examples for hierarchical text categorization
Proceedings of Language and Technology Conference, Poznań, Poland
Fully associative ensemble learning for hierarchical multi-label classification
Proceedings of British Machine Vision Conference, Nottingham, UK
Hierarchical classification of real life documents
Proceedings of SIAM International Conference on Data Mining, Chicago, IL, USA
Decision trees for hierarchical multi-label classification
Mach. Learn.
Multi-label classification on tree-and DAG-structured hierarchies
Proceedings of International Conference on Machine Learning, Bellevue, WA
A genetic algorithm for hierarchical multi-label classification
Proceedings of Annual ACM Symposium on Applied Computing, Trento, Italy
Probabilistic clustering for hierarchical multi-label classification of protein functions
Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic
Hierarchical classification of web content
Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Athens, Greece
Hierarchical shape classification using Bayesian aggregation
Proceedings of IEEE International Conference on Shape Modeling and Applications, Matsushima, Japan
Incremental algorithms for hierarchical classification
J. Mach. Learn. Res.
Exploiting label dependency for hierarchical multi-label classification
Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, Malaysia
Hierarchical multi-label classification of social text streams
Proceedings of International ACM SIGIR Conference on Research & Development in Information Retrieval, Queensland, Australia
Hierarchical multi-label classification with SVMs: a case study in gene function prediction
Intell. Data Anal.
True path rule hierarchical ensembles for genome-wide gene function prediction
IEEE/ACM Trans. Comput. Biol. Bioinf.
Prediction of human gene-phenotype associations by exploiting the hierarchical structure of the human phenotype ontology
Bioinformatics and Biomedical Engineering
Integration of relational and hierarchical network information for protein function prediction
BMC Bioinform.
Refined experts: improving classification in large taxonomies
Proceedings of ACM/SIGIR International Conference on Research and Development in Information Retrieval, Boston, MA, USA
Predicting gene function in a hierarchical context with an ensemble of classifiers
Genome Biol.
A shared-subspace learning framework for multi-label classification
ACM Trans. Knowl. Discovery Data (TKDD)
Cited by (75)
Joint optimization of scoring and thresholding models for online multi-label classification
2023, Pattern RecognitionHierarchical GAN-Tree and Bi-Directional Capsules for multi-label image classification
2022, Knowledge-Based SystemsCitation Excerpt :Implementation Details: To achieve a comparable baseline performance by the same data augmentation [27], we resize the input images to 256 × 256 and randomly crop regions into 224 × 224 with random horizontal flips. For the traditional hierarchical multi-label learning methods, i.e., TD, TPR, FAEL, and K-FAEL, all parameters involved in these baselines mentioned above follow the given values defined in the Ref. [48]. We empirically set the same value for the common parameters for classical Neural Network algorithms, thus making a fair comparison with each other.
Removal of self co-articulation and recognition of dynamic hand gestures using deep architectures
2022, Applied Soft ComputingCitation Excerpt :We opted to develop a hierarchy model using a fuzzy system that could separate the gestures in advance for the classifiers at the child node to handle this uncertainty. Also, the performance of local classifiers over global classifiers in the field of multi-class problems [4,18,33] encouraged us to use multiple local classifiers at the child node than one global classifier at the parent node. For defuzzification, the largest of maximum (lom) is used to compute the crisp value of the two groups.
Multi-Directional Multi-Label Learning
2021, Signal ProcessingCitation Excerpt :1) The discriminative classifier is chose as the Discriminative Least Squared Regression model (DLSR) [16] which learns the classifier, bias and dragging matrix simultaneously. 2) In many multi-label data, the labels will follow hierarchical structure [17,18], thus we propose a strictly hierarchical label constraints to guarantees that the predicted probability of parent label is larger than child label for each sample. 3) Based on the hierarchical constraint, we design a generalized label co-occurrence matrix to characterize the label correlations effectively.
HMATC: Hierarchical multi-label Arabic text classification model using machine learning
2021, Egyptian Informatics Journal
Lingfeng Zhang received the B.S. degree in Mathematics and the M.S. in computer science from the Chongqing University, Chongqing, China. He is currently a Ph.D. student in Department of Computer Science, University of Houston, Houston, TX, USA. He joined Computational Biomedicine Laboratory in 2012. His current research interests include machine learning, deep learning, image processing, computer vision, big data analysis.
Shishir K. Shah received the B.S. degree in mechanical engineering and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Texas, Austin, TX, USA. He is currently a Professor with the Department of Computer Science, University of Houston, Houston, TX, USA. He joined the department in 2005. He has co-edited one book and authored numerous papers on object recognition, sensor fusion, statistical pattern analysis, biometrics, and video analytics. He directs research at the Quantitative Imaging Laboratory. His current research interests include fundamentals of computer vision, pattern recognition, and statistical methods in image and data analysis with applications in multimodality sensing, video analytics, object recognition, biometrics, and microscope image analysis.
Ioannis A. Kakadiaris serves as the Director of the Borders, Trade, and Immigration Institute, a Department of Homeland Security Center of Excellence led by the University of Houston (UH). As director for BTI Institute, Ioannis oversees multiple projects, undertaken with seventeen partners across nine states, which provide homeland security enterprise education and workforce development and which study complex, multi-disciplinary issues related to flows of people, goods, and data across borders. A Hugh Roy and Lillie Cranz Cullen Distinguished University Professor of Computer Science, Ioannis is also an international expert in facial recognition and data/video analytics. He earned his B.S. in physics at the University of Athens in Greece, his M.S. in computer science from Northeastern University, and his Ph.D. in computer science at the University of Pennsylvania. In addition to twice winning the UH Computer Science Research Excellence Award, Ioannis has been recognized for his work with several distinguished honors, including the NSF Early Career Development Award, the Schlumberger Technical Foundation Award, the UH Enron Teaching Excellence Award, and the James Muller Vulnerable Plaque Young Investigator Prize.