Elsevier

Pattern Recognition

Volume 124, April 2022, 108449
Pattern Recognition

Deep collaborative multi-task network: A human decision process inspired model for hierarchical image classification

https://doi.org/10.1016/j.patcog.2021.108449Get rights and content

Highlights

  • We propose a deep collaborative multi-task learning framework for hierarchical classification, where each prediction problem in the hierarchy is regarded as a sub-task to obtain multi-granularity intermediate predictions.

  • To well utilize the relations among different sub-tasks, a novel fusion function is designed based on the confidence degree and the uncertainty degree, which can adaptively adjust the weights of the intermediate predictions from all the sub-tasks, acquiring better final predictions.

  • We evaluate the performance of the proposed model on three image datasets. The experimental results demonstrate that considering the relations among different sub-tasks can improve the classification results, and our proposed model can achieve state-of-the-art performance.

Abstract

Hierarchical classification is significant for big data, where the original task is divided into several sub-tasks to provide multi-granularity predictions based on a tree-shape label structure. Obviously, these sub-tasks are highly correlated: results of the coarser-grained sub-tasks can reduce the candidates for the fine-grained sub-tasks, while results of the fine-grained sub-tasks provide attributes describing the coarser-grained classes. A human can integrate feedbacks from all the related sub-tasks instead of considering each sub-task independently. Therefore, we propose a deep collaborative multi-task network for hierarchical image classification. Specifically, we first extract the relationship matrix between every two sub-tasks defined by the hierarchical label structure. Then, the information of each sub-task is broadcasted to all the related sub-tasks through the relationship matrix. Finally, to combine this information, a novel fusion function based on the task evaluation and the decision uncertainty is designed. Extensive experimental results demonstrate that our model can achieve state-of-the-art performance.

Introduction

Recently, deep learning has achieved dramatic performance in image classification [1], [2] with the increasing of annotated data. The explosion of data makes the classification task more and more complicated, where usually hundreds or even thousands of categories are involved to be distinguished. It is challenging to pick out the correct label from a huge amount of candidate categories. Fortunately, a tree-shape label structure is usually adopted to efficiently manage these large-scale data. For example, ImageNet contains more than 20,000 objects organized by the semantic hierarchy of WordNet [3], [4], [5], [6]. And in the Scene UNderstanding (SUN) database, there are 908 scene categories with a manually built overcomplete three-level hierarchy [7]. These label structures make it possible to quickly retrieve data from massive samples. What’s more, it offers important knowledge about the relationship among categories in different granularities, which can improve the performance of classification. Fig. 1 shows an example of the tree-shape label structure, where “whale” and “shark” are difficult to distinguish. However, if “mammal” is recognized in the coarser-grained sub-task, then the fine-grained sub-task just needs to distinguish “whale” from “giraffe”. Conversely, if “lizard” and “crocodile” are highly recommended in the fine-grained sub-task, then “reptile” is promoted for the coarser-grained sub-task.

Consequently, hierarchical classification that tries to integrate the tree-shape label structure into the classification task attracts increasing attention in depth estimation [8], scalable person search [9], image classification [10] and feature selection [11]. To make use of the label structure, a top-down strategy is proposed, which predicts a sample from the root node with a coarser-grained category to a leaf node with a fine-grained category. For each internal node, a classifier is trained to split a coarser-grained class into several smaller concepts at the lower level. The top-down strategy works well on linear models [12], [13], but failed on deep models, because organizing and training multiple deep models hierarchically is time-consuming and memory-intensive. Moreover, since deep models follow the top-down strategy are trained independently, some shared knowledge among them is ignored, making the training procedure low efficient. Therefore, new paradigms are required for deep models to do hierarchical classification. Some researchers introduce the multi-task architecture to learn the shared knowledge among different sub-tasks, which are made up of categories from the same grain in the tree-shape label structure [14], [15].

One important issue for the deep multi-task architecture is how to grasp the relationship among the sub-tasks. Different from those multi-task applications where the sub-tasks are quite different and have no definite relation, sub-tasks in hierarchical classification are explicitly related. This relationship can help to infer the correct category for each sub-task, where results of the coarser-grained sub-tasks can reduce the candidates for the fine-grained sub-tasks, and results of the fine-grained sub-tasks provide multiple additional attributes describing the corresponding coarse-grained class. Take the image recognition process shown in Fig. 1 as an example. If the coarse-grained classifier has recognized “mammal”, then the fine-grained classifier just needs to distinguish the “whale” and “giraffe” rather than the confused “whale” and “shark” (see the green arrows). Inversely, if the fine-grained classifier has chosen “giraffe” or its siblings, then the coarser-grained classifier should be more confident with the “mammal” (see the red arrows). However, few efforts have been made to explore the explicit relations among sub-tasks in the deep multi-task models.

In this paper, we solve this problem in a decisionfusion way. Each sub-task receives two kinds of decisions: results from its own classifier (self-determination), and candidates promoted by other sub-tasks’ classifiers, and the label structure (others-promotion). Then, we should decide how to combine them to make the final decision. People are good at integrating multiple suggestions. For example, a hypothyroidism patient feels inappetence and sleepy (shown in Fig. 2). She wants to see a doctor and has to decide which department to register. Before going to the hospital, she will visit her family physician for some advice. Since a lack of experience, the family physician suggests the psychiatry department (an inappropriate department). She may also search on the web. Based on the descriptions from the web and her feeling, she guesses it’s hypothyroidism, belonging to the endocrinology department. Measuring the suggestions from the family physician and her self-determination, she can make the right final decision. When integrating multiple information, the following factors are considered. (1) Confidence Degree to the Adviser. In this example, if the patient trusts the family physician more, then her advice is more valued than her speculation. (2) Uncertainty Degree of the Decisions. Although the family physician is more reliable, she may suspect with her diagnosis. Then, her advice should be paid less attention to.

To mimic the process of human decision making, we define three items. (1) Confidence degree to each sub-task classifier to measure whose results are more reliable: sub-tasks closer to the target sub-task in the label structure should be more valued (i.e., own a higher confidence degree), as they are more relevant to the target sub-task (similar to that the family physician is more trusted by the patient). (2) Uncertainty degree of self-determination to decide whether others-promotions are important: if a sub-task can definitely recognize a sample, then others-promotions are not essential; conversely, when it is confusing, others-promotions are more valued. (3) Uncertainty degree of others-promotions to evaluate which promotion should be more valued to reduce the impact of incorrect others-promotions. Accordingly, we propose a deep collaborative multi-task network (DC-MTN) for hierarchical classification, where the classification problem in each hierarchy is regarded as a sub-task to get the multi-granularity intermediate predictions. Then, the intermediate prediction of each sub-task is propagated to all the other sub-tasks through the tree-shape label structure to promote some candidates. Consequently, each sub-task receives two kinds of information: its intermediate predictions (i.e., self-determination) and candidates promoted by all the other sub-tasks (i.e., others-promotions). Finally, each sub-task combines this information to make a final decision based on the three aforementioned uncertainty degrees. In this way, relations among different sub-tasks are well explored. The contributions of this paper are summarized as follows.

  • We propose a deep collaborative multi-task learning framework for hierarchical classification, where each prediction problem in the hierarchy is regarded as a sub-task to obtain multi-granularity intermediate predictions.

  • To well utilize the relations among different sub-tasks, a novel fusion function is designed based on the confidence degree and the uncertainty degree, which can adaptively adjust the weights of the intermediate predictions from all the sub-tasks, acquiring better final predictions.

  • We evaluate the proposed method on three image datasets. Our model can achieve the best results compared with state-of-the-art methods.

The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 shows the details of the proposed DC-MTN and the message fusion function based on label structures. Experimental results and analysis of three image datasets are presented in Section 4. Finally, we conclude our work and talk about future work in Section 5.

Section snippets

Related work

Our work is highly related to the top-down strategy for hierarchical classification and the deep multi-task learning models. We will illustrate each of them in the following subsections.

The proposed DC-MTN

In this section, we give details about the proposed DC-MTN for hierarchical classification. Firstly, we describe the architecture of the proposed model and its pipeline for hierarchical classification. Then, the fusion function to combine the self-determination and the others-promotions is introduced. Next, we will talk about the training and inference procedure. Finally, we talk about how to extend our DC-MTN when multiple label structures are available.

Given a dataset consists of {{xi,{yij}j=1

Experiment

In this section, we first describe the image datasets with tree-shape label structures, experimental settings, compared methods, and evaluation metrics in our experiments. Then, the proposed DC-MTN is compared with some state-of-the-art methods. In the following, the effect of some parameters in our DC-MTN is analyzed. Finally, we show the results of our DC-MTN with multiple label structures.

Conclusion and future work

In this paper, we have proposed a deep collaborative multi-task network (DC-MTN) to utilize the relationships between different sub-tasks defined by the tree-shape label structure. In the DC-MTN, the classification problem at each hierarchy along the label tree is regarded as a sub-task, and the final results are decided by its self-determination and others-promotions. To effectively combine self-determination and others-promotions, a fusion function has been designed to integrate these values

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Supported by the National Natural Science Foundation of China [grant number 62006221, 62106174]; the Open Research Project of the State Key Laboratory of Media Convergence and Communication, Communication University of China, China [grant number SKLMCC2020KF004]; the Beijing Municipal Science & Technology Commission [grant number Z191100007119002]; the Key Research Program of Frontier Sciences, CAS, [grant number ZDBS-LY-7024].

Yu Zhou holds the PhD degree from HIT, China, and is an Associate Professor and a PhD supervisor in IIE, CAS with the research interests of computer vision and deep learning. He served as AC/SPC/PC members or reviewers of CVPR/AAAI/ICME/PR/TMM/TCSVT etc. He has published over 40 papers in peer-reviewed journals and conferences including CVPR/AAAI/ACM MM, and PIMNet was selected as the best paper candidate in ACM MM 2021.

References (40)

  • M. Guillaumin et al.

    Large-scale knowledge transfer for object localization in imagenet

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

    (2012)
  • D. Lin

    Wordnet: an electronic lexical database

    Computational Lingus

    (1999)
  • J. Xiao et al.

    Sun database: exploring a large collection of scene categories

    International Journal of Computer Vision, IJCV

    (2016)
  • T. Hoyoux et al.

    Can computer vision problems benefit from structured hierarchical classification?

    Machine Vision and Applications, MVA

    (2016)
  • H. Zhao et al.

    Hierarchical feature selection with recursive regularization

    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI

    (2017)
  • S. Bengio et al.

    Label embedding trees for large multi-class tasks

    Proceedings of Advances in Neural Information Processing Systems, NIPS

    (2010)
  • N. Zhou et al.

    Jointly learning visually correlated dictionaries for large-scale visual recognition applications

    IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMI

    (2014)
  • S. Xie et al.

    Hyper-class augmented and regularized deep learning for fine-grained image classification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

    (2015)
  • H. Hu et al.

    Learning structured inference neural networks with label relations

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

    (2016)
  • M. Sun et al.

    Find the best path: An efficient and accurate classifier for image hierarchies

    Proceedings of the IEEE International Conference on Computer Vision, ICCV

    (2013)
  • Cited by (8)

    • FP-DARTS: Fast parallel differentiable neural architecture search for image classification

      2023, Pattern Recognition
      Citation Excerpt :

      The rapid development of deep learning has proven its dominance in the field of artificial intelligence [1,2].

    • UMT-Net: A Uniform Multi-Task Network with Adaptive Task Weighting

      2024, IEEE Transactions on Intelligent Vehicles
    • Muti-Stage Hierarchical Food Classification

      2023, MADiMa 2023 - Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management, Co-located with: MM 2023
    View all citing articles on Scopus

    Yu Zhou holds the PhD degree from HIT, China, and is an Associate Professor and a PhD supervisor in IIE, CAS with the research interests of computer vision and deep learning. He served as AC/SPC/PC members or reviewers of CVPR/AAAI/ICME/PR/TMM/TCSVT etc. He has published over 40 papers in peer-reviewed journals and conferences including CVPR/AAAI/ACM MM, and PIMNet was selected as the best paper candidate in ACM MM 2021.

    Xiaoni Li is currently a graduate student at the School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China. Her research interests mainly include machine learning and pattern recognition. In these domains, she has already contributed to the work on hierarchical classification and unsupervised learning.

    Yucan Zhou received her Ph.D. degree from the College of Artificial Intelligence, Tianjin University in 2019. She is currently an assistant professor in the Institute of Information Engineering, Chinese Academy of Sciences. Her research interests include artificial intelligence, deep learning, long-tail distribution learning, and hierarchical classification.

    Yu Wang is an assistant professor in the College of Artificial Intelligence, Tianjin University. His research interests include multi-granularity modeling, open set recognition, and class incremental learning in machine learning. He has published many peer-reviewed papers, such as IEEE ICDM, IEEE TKDE, IEEE TFS, PR, etc.

    Qinghua Hu is the full professor and the Dean of the School of Artificial Intelligence, Tianjin University. He is currently supported by the Key Program, National Natural Science Foundation of China. He has published over 200 papers. His research interests are uncertainty modeling, big data, machine learning, intelligent unmanned systems.

    Weiping Wang is the professor and director of the Big Data Research Laboratory in the Institute of Information Engineering, Chinese Academy of Sciences. His research interests are big data, and artificial intelligence. He has undertaken more than 30 national projects, and published more than 100 papers including TPAMI, CVPR, NIPS, AAAI, IJCAI.

    1

    These authors contribute equally to this article.

    View full text