Elsevier

Information Sciences

Volume 523, June 2020, Pages 63-76
Information Sciences

Three-way decisions based blocking reduction models in hierarchical classification

https://doi.org/10.1016/j.ins.2020.02.020Get rights and content

Highlights

  • Two three-way decisions based hierarchical classification models are proposed.

  • The ambiguity of hierarchy is noticed and 3WD is used to reduce the uncertainty.

  • The topic model is used to learn category relations.

  • The proposed models perform better than several hierarchical classification methods.

  • Extend the application domain of three-way decisions to fashion image classification.

Abstract

Hierarchical classification (HC) is effective when categories are organized hierarchically. However, the blocking problem makes the effect of hierarchical classification greatly reduced. Blocking means that samples are easily getting misclassified in high-level classifiers so that the samples are blocked at the high-level of the hierarchy. This issue is caused by the inconsistency between the artificially defined hierarchy and the actual hierarchy of the raw data. Another issue is that it is flippant to strictly process data following the hierarchy. Therefore, special treatment is required for some uncertain data. To address the first issue, we learn category relationships and modify the hierarchy. To address the second issue, we introduce three-way decisions (3WD) to targetedly deal with the ambiguous data. We extend original studies and propose two HC models based on 3WD, collectively referred to as TriHC, for carefully modifying the hierarchy to alleviate the blocking problem. The proposed TriHC model learns new category hierarchies by the following three steps: (1) mining category relations; (2) modifying category hierarchies according to the latent category relations; and (3) using 3WD to divide observed objects into three regions: positive region, boundary region, and negative region, and making decisions based on different strategies. Specifically, based on different category relation mining methods, there are two versions of TriHC, cross-level blocking priori knowledge based TriHC (CLPK-TriHC) and expert classifier based TriHC (EC-TriHC). The CLPK-TriHC model defines a cross-level blocking distribution matrix to mine the category relations between the higher and lower levels. To better exploit category hierarchical relations, the EC-TriHC model builds expert classifiers using topic model to learn latent category topics. Experimental results validate that the proposed methods can simultaneously reduce the blocking and improve the classification accuracy.

Introduction

Hierarchical classification (HC) is an effective method to solve multiclass classification problems, especially when categories are organized hierarchically. Many important real-world classification problems naturally can be treated as HC problems, such as text categorization [4], [14], protein function prediction [27], [33], image classification [2], [6] etc. Using HC methods, a large-scale classification task can be divided into several small-scale tasks, so as to reduce the difficulty of classification. Usually, HC method uses a top-down level-based strategy, in which the class hierarchy typically stored as a tree. Such a strategy is simple, intuitive, and interpretable. However, due to the complexity of category relationships, samples are easily misclassified in higher-level classifiers, i.e. blocking. It is clear that blocking is one of reasons that HC performs worse than the traditional flat classification.

Most existing blocking reduction strategies in HC can be divided into two types depending on whether changing the category hierarchy: methods that will change the hierarchy and methods that will not change the hierarchy. For the first type, there are Restricted Voting method (RVM) [37], Priori Knowledge Based Hierarchical Classification method (PKHC) [40], data-driven hierarchical structure modification approach (Global-INF) [25], and etc. These strategies give the blocked samples another chance to return to the correct categories by changing the original hierarchy. For the second type, there are Multiplicative method (MM) [7], Extended Multiplicative method (EMM) [37], Threshold Reduction Method (TRM) [37], and etc. These strategies work on probabilities of base classifiers, taking into account the horizontal or vertical base classifiers’ results and helping blocked samples return to the correct category. In fact, methods of the second type have little improvement in the blocking problem. Since blocking is mainly caused by the complexity of category relationships, methods which change the hierarchy are more suitable for the blocking reduction problems. The two blocking reduction strategies proposed in this paper also belong to the first type.

Although there are many methods of the first type have been presented in the past, less of them considers the inconsistency between the artificially defined hierarchy and the actual hierarchy of the data. Some of existing methods take into account the relationship between base classifiers, but overrefine the hierarchical topological structure, which bring new blocking. Take PKHC method as an example, let us consider the most extreme case, high-level category H1 has two sub-categories L11 and L12, and all samples of L11 are misclassified as H2 at the high-level of the hierarchy, while all samples of L12 are classified correcetly as H1, see Fig. 2. PKHC thinks that samples of H1 are easily to be misclassified as H2, thus it will add both paths from H2 to L11 and L12, it increases training costs and increases the uncertainty of classifier σH2. Actually, only the path from H2 to L11 need to be added. See Section 3 for more details.

To this end, in this study, we propose a new method for blocking reduction problems, the illustration of this method see Fig. 1. In a nutshell, the core of our method is to reconstruct the topology of the hierarchical classification model by learning category relations from the original category hierarchy, then using 3WD method to divide observed objects into three regions: positive region, boundary region and negative region, specially paying attention to the boundary region. We utilize two different methods to mine category relations. The first method takes into account the cross-level blocking priori knowledge based on the method proposed in paper [40], which only considers the relationships between high-level categories but neglects the relationships between high- and low-level categories. However, this method cares only the one-to-one relationships between categories, the second method applies the topic model based label grouping algorithm proposed in paper [39] to learn the many-to-many relationships amongst categories.

The proposed TriHC method can effectively reduce the blocking error and has less impact on categories that are not prone to blocking. Contributions of this study could be summarized as follows.

  • We propose two hierarchical models to modify the category hierarchy to deal with blocking problems caused by the inconsistency between the artificially defined hierarchy and the actual hierarchy of the data.

  • We explore three-way decisions to alleviate the classification errors caused by data uncertainty.

Our experiments on the challenging DeepFashion dataset [23] and Stanford Dogs dataset [13] demonstrate the effectiveness of the proposed models. We validate that our TriHC models can significantly reduce the blocking problem when compared with several previous HC models. The classification accuracy can even be higher than that of the well trained deep convolutional neural network model.

Section snippets

Hierarchical classification

In recent years, the development of deep learning has promoted many state-of-the-art models for the classification task [9], [15], [35], [38]. But this kind of flat classification (FC) model wastes the hierarchical relationship between categories. Also when the number of categories is huge, the training of FC models is difficult. The hierarchical classification methods deal with multi-classification problems by dividing a large-scale classification task into several small-scale tasks [8], [12],

Three-way decisions based blocking reduction models

In this section, we first introduce the notation used in this paper. Then, we present two blocking reduction models based on 3WD.

Experiments

In this section, we evaluate the CLPK-TriHC and EC-TriHC on fashion image classification task. Our experiments on the DeepFashion dataset [23] and the Stanford Dogs dataset [13] demonstrate that the proposed methods perform better than several previous HC methods, and even surpass the well trained convolutional neural network (CNN) in some cases.

Conclusion and future works

In this paper, we proposed a three-way decision based hierarchical classification model (TriHC) to alleviate the blocking problem. The TriHC model learns category relations to rebuild the category hierarchy and uses 3WD to targetedly deal with the uncertain data. Adopting different category relation mining methods, we proposed two variants of TriHC, CLPK-TriHC model and EC-TriHC model. Specifically, in CLPK-TriHC model, we considered the cross-level category relationship between the blocked

CRediT authorship contribution statement

Wen Shen: Conceptualization, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft, Writing - review & editing. Zhihua Wei: Supervision. Qianwen Li: Investigation. Hongyun Zhang: Supervision. Duoqian Miao: Supervision.

Declaration of Competing Interest

There is no declaration of interest statement of this paper.

Acknowledgments

The work is partially supported by the National Key Research and Development Project (No. 213), the National Nature Science Foundation of China (No. 61573259, 61976160, 61573255), the Special Project of the Ministry of Public Security (No. 20170004), and the Key Lab of Information Network Security, Ministry of Public Security (No. C18608).

References (50)

  • Y. Yao

    Three-way decisions with probabilistic rough sets

    Inf. Sci.

    (2011)
  • L. Zhang et al.

    Sequential three-way decision based on multi-granular autoencoder features

    Inf. Sci.

    (2020)
  • Y. Zhang et al.

    A cost-sensitive three-way combination technique for ensemble learning in sentiment classification

    Int. J. Approx. Reason.

    (2019)
  • B. Zhou et al.

    Cost-sensitive three-way email spam filtering

    J. Intell. Inf. Syst.

    (2014)
  • X. Bai et al.

    Learning ECOC code matrix for multiclass classification with application to glaucoma diagnosis

    J. Med. Syst.

    (2016)
  • A. Binder et al.

    Efficient classification of images with taxonomies

    Asian Conference on Computer Vision

    (2009)
  • D.M. Blei et al.

    Latent Dirichlet allocation

    J. Mach. Learn. Res.

    (2003)
  • S. Chakrabarti et al.

    Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

    Vldb J.

    (1998)
  • T.G. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    J. Artif. Intell. Res.

    (1994)
  • S. Dumais et al.

    Hierarchical classification of web content

    Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

    (2000)
  • T. Gao et al.

    Discriminative learning of relaxed hierarchy for large-scale visual recognition

    2011 International Conference on Computer Vision

    (2011)
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2016)
  • S. Jiang et al.

    Learning consensus representation for weak style classification

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • F. Kamiran et al.

    Exploiting reject option in classification for social discrimination control

    Inf. Sci.

    (2018)
  • D. Koller et al.

    Hierarchically classifying documents using very few words

    International Conference on Machine Learning

    (1997)
  • Cited by (0)

    View full text