Elsevier

Pattern Recognition

Volume 133, January 2023, 109056
Pattern Recognition

Continuous label distribution learning

https://doi.org/10.1016/j.patcog.2022.109056Get rights and content

Highlights

  • We propose a novel LDL method named CLDL which can utilize the continuous label distribution information to conduct models.

  • We describe labels as a continuous distribution in the latent space, where only a few parameters require to be learned.

  • We propose an effective and scalable strategy for learning continuous label distribution based on theoretical analysis.

  • We systematically analyze the CLDL method. The analysis illustrates the superiority of CLDL to the existing LDL algorithms.

Abstract

Label distribution learning (LDL) is a suitable paradigm to deal with label ambiguity through learning the correlations among different labels. Most existing label distribution learning methods consider the labels to be discrete and directly establish the mapping from features to labels. However, in many real-world applications, labels naturally form a continuous distribution, which is ignored by the existing methods. As a result, the distribution information of labels can not be accurately described and finally affects the whole learning system. The goal of this paper is to propose a novel approach which can capture the continuous distribution of different labels explicitly and effectively. Specifically, we propose Continuous Label Distribution Learning (CLDL) which describes labels as a continuous density function and learns the distribution information of the labels in the latent space. In this way, the high-order correlations among different labels can be effectively extracted and only a few parameters for describing the continuous distribution need to be learned. Extensive description degree prediction experiments on real-world datasets validate the superiority of CLDL over the existing approaches.

Introduction

Learning with ambiguity is a hot topic in recent machine learning and data mining research. A learning process is essentially building a mapping from the instances to the labels [1]. A classical paradigm to deal with label ambiguity is multi-label learning [2]. Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously, and the task is to learn a multi-label predictor which maps an instance to a relevant label set [3], [4]. Essentially, multi-label learning considers the relation between the instance and the label to be binary, i.e., whether or not the label is relevant to the instance.

At present, there are a variety of real-world tasks that instances are involved with labels with different importance degrees [5], [6], [7]. Although multi-label learning solves the problem “Which labels describe this instance?”, it does not address the more general issue of label ambiguity — “How do these labels describe this instance?”. Consequently, a soft label instead of a hard one seems to be a reasonable solution. Inspired by this, recently a novel learning paradigm, Label Distribution Learning (LDL), has been proposed. LDL aims at learning the relative importance of each label involved in the description of an instance, i.e., a distribution over the set of labels. Formally speaking, given an instance x, LDL assigns each yY a real value dxy (label description degree), which indicates the importance of y to x. To make the definition proper, [1] suggests that dxy[0,1] and yYdxy=1, and the real value function d is called the label distribution function. Since LDL extends the supervision from binary to label distribution, which is more applicable for real-world scenarios.

Due to the utility of dealing with ambiguity explicitly, LDL has been extensively applied in varieties of real-world problems [8], [9], [10], [11]. Many attempts have been done to develop LDL methods in each facet. SA-IIS [7], SA-BFGS [12] and EDL [6] directly optimize the distance between the ground truth and the predicting distribution. SCE-LDL [13] adds the sparsity constraints into the objective function to ameliorate the LDL model. DLDL [14] introduces deep learning into LDL to solve various tasks in computer vision. LDL Forests [15] adopts differentiable decision trees in the LDL model. Besides, many algorithms have been proposed to exploit label correlations in LDL. EDL-LRL [16] exploits local label correlation by capturing low-rank structure on clusters of samples with trace-norm regularization. LALOT [17] casts the label correlations exploration as a ground metric learning problem. LDLSF [18] exploits the label correlations and learns the common features for all labels and specific features for each label simultaneously. GD-LDL-SCL [19] and Adam-LDL-SCL [20] encode the influence of local samples and design a local correlation vector as the additional features for each instance to utilize the label correlations on local samples, which have proven empirically successful.

As shown in Fig. 1, in many real-world applications, e.g., age estimation, labels naturally form a continuous distribution. Taking a close look at Fig. 1(a), one may find that the faces at the neighboring ages look quite similar. This results from the fact that aging is a slow and gradual process. In this sense, although the chronological age is unambiguous, the facial appearance age is ambiguous, where the label information can naturally be considered to form a continuous distribution [7]. However, due to the difficulty of obtaining and leveraging the continuous distribution, existing label distribution learning algorithms, instead, regard the labels as a discrete distribution and directly establish the mapping from features to discrete labels, which ignores the natural continuous characteristics in the labels. As a result, the distribution information of labels can not be effectively extracted, which leads to the degradation of effective information that LDL models can utilize. If the label distribution information can be utilized without losing precision, it can be beneficial to the whole learning system. Therefore, it is essential to effectively utilize the observed data to extract and describe the continuous label distribution information and integrate it into the LDL model.

In light of the above observations, we propose Continuous Label Distribution Learning (CLDL), which utilizes the continuous label distribution hidden behind the given label information to conduct the LDL model. Specifically, CLDL describes labels as a continuous distribution and learns the distribution in the latent space. CLDL first converts discrete label distributions into continuous density functions through the label encoding process. Then CLDL establishes the mapping from instances to continuous density functions. Finally, the predicted label distribution can be obtained through the decoding process. In this way, LDL models can be trained accurately and effectively integrate the label correlation information. In the meantime, only a few parameters for describing the continuous distribution need to be learned to extract high-order correlations among different labels. Additionally, a practical algorithm for CLDL is explored.

In summary, the main contributions of this paper can be stated as follows:

  • We propose the continuous label distribution learning (CLDL) method. CLDL is a novel label distribution learning method which can utilize the continuous label distribution information to conduct the LDL model.

  • We theoretically describe labels as a continuous distribution in the latent infinite-dimensional space, where only a few parameters require to be learned.

  • We propose an effective and scalable strategy for learning the continuous label distribution based on the theoretical analysis.

  • We theoretically prove the feasibility of continuous label distribution and systematically analyze the CLDL method. The analysis illustrates the superiority of CLDL to the existing LDL algorithm.

Other chapters of the paper are organized as follows: Section 2 presents some related works of CLDL. Section 3 introduces the proposed CLDL method and gives analyses for CLDL. Section 4 provides the empirical results of the proposed CLDL. Section 5 illustrates the conclusions and the prospects of CLDL.

Section snippets

Related work

Label distribution learning (LDL) is a novel learning paradigm, which assigns an instance a label distribution and learns a mapping from instance to label distribution straightly [1]. LDL has been successfully applied to many real applications, such as facial landmark detection [22], age estimation [7], [9], [23], [24], head pose estimation [25], zero-shot learning [26], emotion analysis [16], [27], [28], [29], [30], video parsing [31], autism spectrum disorder classification [32], fetal brain

Preliminary

First of all, the main notations used in this paper are listed as follows. Let X=Rq denote the input space and Y={y1,y2,,yc} denote the complete set of labels. The instance variable is denoted by x, the particular ith instance is denoted by xi and the particular description degree of the jth label of xi is denoted by dxiyj. We consider dataset S={(xi,Di)}i=1N, where xiX=Rq is a q-dimensional real value vector, and Di=(dxiy1,dxiy2,,dxiyc)TD=[0,1]c is the corresponding c-dimensional

Experiments

In this section, the performance of CLDL will be evaluated in various LDL tasks. All the computations are performed on a GPU server with NVIDIA GeForce RTX 2080 Ti, Intel Core i7-8700 CPU 3.20GHz processor and 32 GB memory.

Conclusion

In this paper, a novel label distribution learning approach named CLDL is proposed, which effectively integrates the correlation and continuous distribution information of different labels into the LDL models. CLDL can learn the continuous representation of labels to extract the high-order correlation among different labels effectively with only a few parameters and ameliorate the scalability of the model significantly. The effectiveness of CLDL is proved by both theoretical analysis and

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Key Research and Development Plan of China under Grant 2018AAA0100104, the National Science Foundation of China under Grants 62125602, 62076063 and 62206050, China Postdoctoral Science Foundation under Grant 2021M700023, and the Jiangsu Province Science Foundation for Youths under Grant BK20210220.

Xingyu Zhao received the B.Sc. and M.Sc. degrees in School of Computer Science and Technology, China University of Mining and Technology in 2016 and 2019, respectively. He is currently pursuing the Ph.D. degree in the School of Computer Science and Engineering, Southeast University. His research interests mainly include pattern recognition and machine learning.

References (67)

  • J. Wang et al.

    Multi-class ASD classification via label distribution learning with class-shared and class-specific decomposition

    Med. Image Anal.

    (2022)
  • K. Hornik

    Approximation capabilities of multilayer feedforward networks

    Neural Netw.

    (1991)
  • M.R. Boutell et al.

    Learning multi-label scene classification

    Pattern Recognit.

    (2004)
  • X. Geng

    Label distribution learning

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • G. Tsoumakas et al.

    Multi-label classification: an overview

    Int. J. Data Wareh. Min.

    (2007)
  • M.-L. Zhang et al.

    A review on multi-label learning algorithms

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • E. Gibaja et al.

    A tutorial on multilabel learning

    ACM Comput. Surv.

    (2015)
  • Z.-H. Zhou et al.

    Multi-instance multi-label learning with application to scene classification

    Advances in Neural Information Processing Systems 19: Annual Conference on Neural Information Processing Systems 2006

    (2006)
  • Y. Zhou et al.

    Emotion distribution recognition from facial expressions

    Proceedings of the 23rd ACM International Conference on Multimedia

    (2015)
  • X. Geng et al.

    Facial age estimation by learning from label distributions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • P. Hou et al.

    Semi-supervised adaptive label distribution learning for facial age estimation

    Proceedings of the 31st AAAI Conference on Artificial Intelligence

    (2017)
  • H. Zhang et al.

    Practical age estimation using deep label distribution learning

    Front. Comput. Sci.

    (2021)
  • S. Chen et al.

    Label distribution learning on auxiliary label space graphs for facial expression recognition

    Proceedings of the 36th IEEE Conference on Computer Vision and Pattern Recognition

    (2020)
  • Y. Gao et al.

    Video summarization via label distributions dual-reward

    Proceedings of the 30th International Joint Conference on Artificial Intelligence

    (2021)
  • X. Geng et al.

    Label distribution learning

    Proceedings of the 13rd IEEE International Conference on Data Mining

    (2013)
  • X. Yang et al.

    Sparsity conditional energy label distribution learning for age estimation

    Proceedings of the 25th International Joint Conference on Artificial Intelligence

    (2016)
  • B.-B. Gao et al.

    Deep label distribution learning with label ambiguity

    IEEE Trans. Image Process.

    (2017)
  • W. Shen et al.

    Label distribution learning forests

    Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017

    (2017)
  • X. Jia et al.

    Facial emotion distribution learning by exploiting low-rank label correlations locally

    Proceedings of the 35th IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • P. Zhao et al.

    Label distribution learning by optimal transport

    Proceedings of the 32nd AAAI Conference on Artificial Intelligence

    (2018)
  • T. Ren et al.

    Label distribution learning with label-specific features

    Proceedings of the 28th International Joint Conference on Artificial Intelligence

    (2019)
  • X. Zheng et al.

    Label distribution learning by exploiting sample correlations locally

    Proceedings of the 32nd AAAI Conference on Artificial Intelligence

    (2018)
  • X. Jia et al.

    Label distribution learning with label correlations on local samples

    IEEE Trans. Knowl. Data Eng.

    (2021)
  • A. Lanitis et al.

    Toward automatic simulation of aging effects on face images

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • K. Su et al.

    Soft facial landmark detection by label distribution learning

    Proceedings of the 33rd AAAI Conference on Artificial Intelligence

    (2019)
  • B.-B. Gao et al.

    Age estimation using expectation of label distribution learning

    Proceedings of the 27th International Joint Conference on Artificial Intelligence

    (2018)
  • X. Wen et al.

    Adaptive variance based label distribution learning for facial age estimation

    Proceedings of the 16th European Conference on Computer Vision

    (2020)
  • X. Geng et al.

    Head pose estimation based on multivariate label distribution

    Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • Z.-W. Huo et al.

    Ordinal zero-shot learning

    Proceedings of the 26th International Joint Conference on Artificial Intelligence

    (2017)
  • D. Zhou et al.

    Emotion distribution learning from texts

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

    (2016)
  • Y. Zhang et al.

    Text emotion distribution learning via multi-task convolutional neural network

    Proceedings of the 27th International Joint Conference on Artificial Intelligence

    (2018)
  • H. Xiong et al.

    Structured and sparse annotations for image emotion distribution learning

    Proceedings of the 33rd AAAI Conference on Artificial Intelligence

    (2019)
  • J. Yang et al.

    A circular-structured representation for visual emotion distribution learning

    Proceedings of the 37th IEEE Conference on Computer Vision and Pattern Recognition

    (2021)
  • Xingyu Zhao received the B.Sc. and M.Sc. degrees in School of Computer Science and Technology, China University of Mining and Technology in 2016 and 2019, respectively. He is currently pursuing the Ph.D. degree in the School of Computer Science and Engineering, Southeast University. His research interests mainly include pattern recognition and machine learning.

    Yuexuan An received the B.Sc. in computer science and technology from Jiangsu Normal University in 2015 and M.Sc. degree in computer application technology in China University of Mining and Technology in 2019. She is currently pursuing the Ph.D. degree in the School of computer science and engineering, Southeast University. Her research interest includes machine learning, pattern recognition, SVM, kernel function and various applications.

    Ning Xu received the B.Sc. and M.Sc. degrees from University of Science and Technology of China and Chinese Academy of Sciences, China, respectively, and the Ph.D. degree from Southeast University, China. He is now an assistant professor in the School of Computer Science and Engineering at Southeast University, China. His research interests mainly include pattern recognition and machine learning.

    Xin Geng is a chair professor of Southeast University, China. His research interests include machine learning, pattern recognition, and computer vision. He has published over 100 refereed papers in these areas. He has been an Associate Editor of IEEE T-MM, FCS and MFC, a Steering Committee Member of PRICAI, a Program Committee Chair for conferences such as PRICAI’18, VALSE’13, etc., an Area Chair for conferences such as IJCAI, CVPR, ACMMM, ICPR, and a Senior Program Committee Member for conferences such as IJCAI, AAAI, ECAI, etc. He is a Distinguished Fellow of IETI and a Senior Member of IEEE.

    View full text