Skip to main content
Log in

Learn structured analysis discriminative dictionary for multi-label classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-label learning is a machine learning classification problem, in which an example belongs to more than one classes at the same time. Recently, multi-label learning has aroused a great deal of attention, and has achieved great success in the fields of text and image classification. In this paper, we propose a new method for multi-label learning, which is named as analysis discriminative dictionary learning for multi-label classification (ADML). We first incorporate analytical discrimination dictionary learning and sparse representation into multi-label classifier to obtain a unified model. The incoherence promoting term and reconstruction error for each label are minimized to obtain the dictionary. We then incorporate an analysis inconsistency promotion term into the model, which minimizes the reconstruction error of the dictionary with the corresponding label of the data. Further, we calculate a linear classifier by taking the label relationships into account. It is worth noting that we implicitly consider the label relationships in the analysis dictionary and linear classifier. Finally, we conduct experiments on 15 datasets to test the performance of the proposed ADML method and baselines. The results show that the proposed ADML method can deliver higher performance than previous multi-label methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://cse.seu.edu.cn/people/zhangml/Resources.htm#data

  2. http://languagelog.ldc.upenn.edu/nll/

  3. http://mlkd.csd.auth.gr/multilabel.html#Datasets

References

  1. Jin C, Jin S. -W. (2019) Multi-label automatic image annotation approach based on multiple improvement strategies. Image Processing Iet 13(4):623–633

    Article  Google Scholar 

  2. Wang X, Feng S, Lang C (2019) Semi supervised dual low-rank feature mapping for multi-label image annotation. Multimed Tools Appl 78(10):113149–13168

    Article  Google Scholar 

  3. Lee J, Yu I, Park J, Kim D-W (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inform Sci 485:263–280

    Article  Google Scholar 

  4. Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543

    Article  Google Scholar 

  5. Chen Z, Ren J (2021) Multi-label text classification with latent word-wise label information. Appl Intell 51(2):966–979

    Article  Google Scholar 

  6. Lee J, Seo W, Park J-H, Kim D-W (2019) Compact feature subset-based multi-label music categorization for mobile devices. Multimed Tools Appl 78(4):4869–4883

    Article  Google Scholar 

  7. Ma Q, Yuan C, Zhou W, Han J, Hu S (2020) Beyond statistical relations: Integrating knowledge relations into style correlations for multi-label music style classification. In: WSDM ’20: The thirteenth ACM international conference on web search and data mining, Houston, TX, USA, February 3-7, 2020, pp 411–419

  8. Kostiuk B, Costa YMG, de Souza Britto A Jr, Hu X, Silla CN (2019) Multi-label emotion classification in music videos using ensembles of audio and video features. In: 31st IEEE International conference on tools with artificial intelligence, ICTAI 2019, Portland, OR, USA, November 4-6, 2019, pp 517–523

  9. Lv J, Wu T, Peng C-L, Liu Y-P, Xu N, Geng X (2020) Compact learning for multi-label classification. Pattern Recognit 113:107833

    Article  Google Scholar 

  10. Zhang M, Zhou Z (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  11. Zhang M, Zhou Z (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  12. Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Eur Conf Mach Learn 76(2):211–225

    Article  MATH  Google Scholar 

  13. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  14. Teisseyre P (2021) Classifier chains for positive unlabelled multi-label learning. Knowl-Based Syst 213:106709

    Article  Google Scholar 

  15. Weng W, Wang D, Chin-Ling Chen JW, Wu S (2020) Label specific features-based classifier chains for multi-label classification. IEEE Access 8:51265–51275

    Article  Google Scholar 

  16. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771

    Article  Google Scholar 

  17. Wu G, Tian Y, Zhang C (2018) A unified framework implementing linear binary relevance for multi-label learning. Neurocomputing 289:86–100

    Article  Google Scholar 

  18. Moral-Garcia S, Mantas CJ, Castellano JG, Abellan J (2018) Using credal-c4.5 with binary relevance for multi-label classification. J Intell Fuzzy Syst 35(6):6501–6512

    Article  Google Scholar 

  19. Kong X, Ng MK, Zhou Z (2013) Transductive multilabel learning via label set propagation. IEEE Trans Knowl Data Eng 25(3):704–719

    Article  Google Scholar 

  20. Shan J, Hou C, Tao H, Zhuge W, Yi D (2019) Co-learning binary classifiers for lp-based multi-label classification. Cogn Syst Res 55:146–152

    Article  Google Scholar 

  21. Tsoumakas G, Vlahavas I (2007) Random k-labelsets: An ensemble method for multilabel classification. In: Machine learning: ECML 2007, 18th European conference on machine learning, Warsaw, Poland, September 17-21, 2007, Proceedings, pp 406–417

  22. Wu Y, Lin H (2017) Progressive random k-labelsets for cost-sensitive multi-label classification. Mach Learn 106(5): 671–694

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhou T, Yang S, Wang L, Yao J, Gui G (2018) Improved cross-label suppression dictionary learning for face recognition. IEEE Access 6:48716–48725

    Article  Google Scholar 

  24. Wang Y, Liu S, Peng Y, Cao H (2018) Discriminative dictionary learning based on sample diversity for face recognition. In: 19th Pacific rim conference on multimedia 2018, vol 2, pp 538– 546

  25. Foroughi H, Shakeri M, Ray N, Zhang H (2017) Face recognition using multi-modal low-rank dictionary learning. In: International conference on image processing, pp 1082–1086

  26. Meng Y, Chang H, Luo W (2017) Discriminative analysis-synthesis dictionary learning for image classification - sciencedirect. Neurocomputing 219:404–411

    Article  Google Scholar 

  27. Rong Y, Xiong S, Gao Y (2017) Low-rank double dictionary learning from corrupted data for robust image classification. Pattern Recogn 72:419–432

    Article  Google Scholar 

  28. Yang M, Chang H, Luo W, Yang J (2017) Fisher discrimination dictionary pair learning for image classification. Neurocomputing 269:13–20

    Article  Google Scholar 

  29. Aharon M, Elad M, Bruckstein AM (2006) K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  MATH  Google Scholar 

  30. Yang M, Liu W, Luo W, Shen L (2016) Analysis-synthesis dictionary learning for universality-particularity representation based classification. Assoc Adv Artif Intell :2251–2257

  31. Jing X, Wu F, Li Z, Ruimin ZD (2016) Multi-label dictionary learning for image annotation. IEEE Trans Image Process 25(6):2712–2725

    Article  MathSciNet  MATH  Google Scholar 

  32. Ji Z, Cui B, Li H, Jiang Y.-G., Xiang Tao, Hospedales TM, Fu Y (2020) Deep ranking for image zero-shot multi-label classification. IEEE Trans Image Process 29:6549–6560

    Article  MathSciNet  Google Scholar 

  33. Ma J, Zhang H, Chow TWS (2021) Multilabel classification with label-specific features and classifiers: A coarse- and fine-tuned framework. IEEE Trans Cybern 51(2):1028–1042

    Article  Google Scholar 

  34. Pereira RB, Plastino A, Zadrozny B, Merschmann LHC (2021) A lazy feature selection method for multi-label classification. Intell Data Anal 25(1):21–34

    Article  Google Scholar 

  35. Dong H, Sun J, Sun X, Ding R (2020) A many-objective feature selection for multi-label classification. Knowl-Based Syst 208:106456

    Article  Google Scholar 

  36. Almeida TB, Borges HB (2017) An adaptation of the ml-knn algorithm to predict the number of classes in hierarchical multi-label classification. In: Modeling decisions for artificial intelligence-14th international conference, MDAI 2017, Kitakyushu, Japan, October 18-20, 2017, Proceedings., pp 77–88

  37. Cheng Z, Zeng Z (2020) Joint label-specific features and label correlation for multi-label learning with missing label. Appl Intell 50(11):4029–4049

    Article  Google Scholar 

  38. Agrawal P, Whitaker RT, Elhabian SY (2020) An optimal, generative model for estimating multi-label probabilistic maps. IEEE Trans Med Imaging 39(7):2316–2326

    Article  Google Scholar 

  39. Wu G, Zheng R, Tian Y, Liu D (2020) Joint ranking svm and binary relevance with robust low-rank learning for multi-label classification. Neural Netw 122:24–39

    Article  MATH  Google Scholar 

  40. Abdi A, Rahmati M, Ebadzadeh MM (2021) Entropy based dictionary learning for image classification, vol 110, p 107634

  41. Yang B, Guan X.-P., Zhu J, Gu C, Wu K, Xu J (2021) Svms multi-class loss feedback based discriminative dictionary learning for image classification, vol 112, p 107690

  42. Peng Y, Liu S, Wang X, Wu X (2020) Joint locality-constraint and fisher discrimination based dictionary learning for image classification. Neurocomputing 398:505–519

    Article  Google Scholar 

  43. Yang X, Jiang X, Tian C, Wang P, Zhou F, Fujita H (2020) Inverse projection group sparse representation for tumor classification: A low rank variation dictionary approach, vol 196, p 105768

  44. Luo X, Xu Y, Yang J (2019) Multi-resolution dictionary learning for face recognition. Pattern Recogn 93:283–292

    Article  Google Scholar 

  45. Lin G, Yang M, Yang J, Shen L, Xie W (2018) Robust, discriminative and comprehensive dictionary learning for face recognition. Pattern Recogn 81:341–356

    Article  MATH  Google Scholar 

  46. Ou W, Luan X, Gou J, Zhou Q, Xiao W, Xiong X, Zeng W (2018) Robust discriminative nonnegative dictionary learning for occluded face recognition. Pattern Recogn Lett 107:41–49

    Article  Google Scholar 

  47. Du H, Zhang Y, Ma L, Zhang F (2021) Structured discriminant analysis dictionary learning for pattern classification, vol 216, p 106794

  48. Wang W, Yang C, Li Q (2019) Discriminative analysis dictionary and classifier learning for pattern classification. In: 2019 IEEE International conference on image processing (ICIP), pp 385–389

  49. Song J, Xie X, Shi G, Dong W (2018) Exploiting class-wise coding coefficients: learning a discriminative dictionary for pattern classification. Neurocomputing 321:114–125

    Article  Google Scholar 

  50. Wang Q, Guo Y, Guo J, Kong X (2018) Synthesis k-svd based analysis dictionary learning for pattern classification. Multimed Tools Appl 77(13):17023–17041

    Article  Google Scholar 

  51. Dong J, Sun C, Yang W (2015) A supervised dictionary learning and discriminative weighting model for action recognition. Neurocomputing 158:246–256

    Article  Google Scholar 

  52. Pham DS, Venkatesh S (2008) Joint learning and dictionary construction for pattern recognition. In: Computer vision and pattern recognition, pp 1–8

  53. Yang J, Yu K, Huang TS (2010) Supervised translation-invariant sparse coding. In: Computer vision and pattern recognition, pp 3517–3524

  54. Hou C, Nie F, Li X, Yi D, Wu Y (2014) Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans Cybern 44(6):793–804

    Article  Google Scholar 

  55. Oramas S, Nieto O, Barbieri F, Serra X (2017) Multi-label music genre classification from audio, text, and images using deep features. In: Proceedings of the 18th International society for music information retrieval conference, ISMIR 2017, pp 23–30

  56. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I (2011) Multi-label classification of music by emotion. Eurasip J Audio Speech & Music Process 2011:4

    Article  Google Scholar 

  57. Gorski J, Pfeuffer F, Klamroth K (2007) Biconvex sets and optimization with biconvex functions: a survey and extensions. Math Methods Oper Res 66(3):373–407

    Article  MathSciNet  MATH  Google Scholar 

  58. Maimon O, Rokach L, Mining Data (2010) Data mining and knowledge discovery handbook, 2nd edn., Springer, Berlin. ISBN 978-0-387-09822-7

  59. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  60. Zhang Z, Jiang W, Qin J, Zhang L, Li F, Zhang M, Yan S (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans Neural Netw Learn Syst 29(8):3798–3814

    Article  MathSciNet  Google Scholar 

  61. Gu S, Zhang L, Zuo W, Feng X (2014) Projective dictionary pair learning for pattern classification. Neural Inf Process Syst (NeurIPS): 793–801

Download references

Acknowledgment

The authors would like to thank the anonymous referees for their significant comments and suggestions. This work was supported in part by the Natural Science Foundation of China under Grant 62076074, 61876044 and 61672169, in part by Guangdong Basic and Appiled Basic Research Foundation Grant 2020A151010670 and 2020A151011501, in part by the Science and Technology Planning Project of Guangzhou under Grant 202002030141.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Appendix:

Proof Proof of Theorem 1

The (11) is solved by the Lagrange dual function, the expression shows as follows:

$$ \begin{aligned} g(\mu)=&inf\{\| X_{l}-D_{l}S_{l} \|_{F}^{2} +\alpha \| D_{l}\overline{S_{l}}\|_{F}^{2} \\ &+ \sum\limits_{i=1}^{k} \mu_{l,i} (\| d_{v}\|_{2}^{2} -1)\}, \end{aligned} $$
(29)

where μl,i is the Lagrange multiplier. A diagonal matrix ElRk×k is constructed, where (El)ii = μl,i denotes the diagonal entry, then we rewrite the formula as follows:

$$ \begin{aligned} L(D_{l},\mu)=&\| X_{l}-D_{l}S_{l} \|_{F}^{2} +\alpha \| D_{l}\overline{S_{l}}\|_{F}^{2} \\ &+tr(D^{T}DE)-tr(E_{l}). \end{aligned} $$
(30)

Let the derivative \(\frac {\partial L(D_{L},\mu )}{\partial D_{L}} \) be zero, the closed-form solution for Dl is obtained as follows:

$$ \begin{aligned} D^{*}=X_{l}{S_{l}^{T}}(S_{l}{S_{l}^{T}} +\gamma\overline{S_{l}}\overline{S_{l}}^{T} +E_{l})^{-1}. \end{aligned} $$
(31)

El is discarded based on the work in [60], which can reduce the computational complexity and decrase the computation cost. Notely, \(S_{l}{S_{l}^{T}}+\gamma \overline {S_{l}}\overline {S_{l}}^{T}\) can not be ensured to be invertible, so \(S_{l}{S_{l}^{T}}+\gamma \overline {S_{l}}\overline {S_{l}}^{T}\) may produce the singular issue. Therefore, similarly as [61], a regularization term 𝜃I (𝜃 = 10e− 4 is a small number) is embedded into \(S_{l}{S_{l}^{T}}+\gamma \overline {S_{l}}\overline {S_{l}}^{T}\), which can avoid the singular problem and achieve a stable performance. □

Proof Proof of Theorem 2

The Lagrange function of this constrained problem (17) is,

$$ \begin{aligned} \mathcal{L}(P)=& \tau \|P_{l}X_{l}-S_{l}\|_{F}^{2}+\tau\|P_{l}\overline{X_{l}}\|_{F}^{2} \\ &-{\sum}_{i=1}^{N} p_{l}\xi_{l}-{\sum}_{i=1}^{N} q_{l}\{[M_{l}\cdot P_{l}X_{l}+\delta_{l}]{Y_{l}^{T}}-1+\xi_{l}\}, \end{aligned} $$
(32)

where ql > 0, pl > 0 are Lagrange multiplier. Let the derivative \(\frac {\partial {\mathscr{L}}(P)}{\partial P}\) be zero, we have the expression of the closed-form solution for P as follows:

$$ \begin{aligned} P^{*}=[S_{l}{X_{l}^{T}}+\frac{1}{2\tau}{\sum}_{i=1}^{N} q_{l}(M_{l}X_{l}){Y_{l}^{T}}](X_{l}{X_{l}^{T}}+\overline{X_{l}}\overline{X_{l}}^{T}+\theta I)^{-1}, \end{aligned} $$
(33)

where 𝜃I is a regularization term, and 𝜃 = 10e− 4. In fact, the samples’ number may smaller than the dimension of feature space; therefore, it is necessary to add regularization term 𝜃I into the formula to avoid the problem of singularity similar as [61]. For example, the inverse of \(X_{l}{X_{l}^{T}}\) may be singular. □

Proof Proof of Theorem 3

Variables Ml, δl and ξl are optimized by the Lagrangian function, and then the dual form of the optimization problem in (20) is obtained. Therefore, αl > 0 and ηl > 0 are introduced as the Lagrange multipliers. By introducing the Lagrangian function into the objective function in (20), we can rewrite the objective function in (20) as follows: □

$$ \begin{aligned} \mathcal{L}(M,\xi,\eta)=& \frac{1}{2}\|M_{l}\|_{2}^{2} +C_{l}\sum\limits_{i=1}^{N}\xi_{l}-\sum\limits_{i=1}^{N}\alpha_{l}\xi_{l} \\ &-\sum\limits_{i=1}^{N} \eta_{l}\{[M_{l}\cdot P_{l}X_{l} + \delta_{l}]{Y_{l}^{T}} -1+\xi_{l}\},\\ s.t.&\ \forall \ \ \eta_{l}>0,\ \alpha_{l}>0. \end{aligned}$$
(34)

A saddle point in the Lagrangian is the minimum value of the variables Ml and ξl, however, it is the maximum value for the dual form. In order to obtain the minimum value of the variables, we require as follows:

$$ \frac{\partial(\mathcal{L})}{\partial \xi_{l}}=C_{l}-\alpha_{l}-\eta_{l}=0,$$
(35)
$$ \frac{\partial(\mathcal{L})}{\partial \delta_{l}}=\sum\limits_{i=1}^{N} \eta_{l}{Y_{l}^{T}}=0.$$
(36)

Similarly, for Ml we require,

$$ \frac{\partial(\mathcal{L})}{\partial M_{l}}= M_{l}-\sum\limits_{i=1}^{N} \eta_{l}P_{l}X_{l}\cdot {Y_{l}^{T}}=0,$$
(37)

The results are presented as follows:

$$ M_{l}=\sum\limits_{i=1}^{N} \eta_{l}P_{l}X_{l}\cdot {Y_{l}^{T}}.$$
(38)

By incorporating (35), (36) and (38) into (34), we have the following optimization function:

$$ \begin{aligned} \underset{\eta_{l}}{\min}\frac{1}{2}\sum\limits_{i=1}^{N} \sum\limits_{j=1}^{N}\eta_{l}\eta_{j}{Y_{l}^{T}}{Y_{j}^{T}}(X_{l}\cdot X_{j})+\sum\limits_{i=1}^{N} \eta_{l},\\ s.t.\sum\limits_{i=1}^{N} \eta_{l}{Y_{l}^{T}}=0,\ \ \ 0<\eta_{l}<C_{l}. \end{aligned} $$
(39)

The dual complementarity condition satisfying the KKT condition is:

$$ \begin{aligned} \eta_{l}^{*}({Y_{l}^{T}}(M_{l}P_{l}X_{l}+\delta_{l})-1+\xi_{l}^{*})=0. \end{aligned} $$
(40)

According to the dual complementarity condition of this KKT condition, we can get:

$$ \begin{aligned} &\eta_{l}^{*}=0\Rightarrow {Y_{l}^{T}}(M_{l}P_{l}X_{l}+\delta_{l})\geq 1,\\ &0<\eta_{l}^{*}<C_{l}\Rightarrow {Y_{l}^{T}}(M_{l}P_{l}X_{l}+\delta_{l})=1,\\ &\eta_{l}^{*}=C_{l}\Rightarrow {Y_{l}^{T}}(M_{l}P_{l}X_{l}+\delta_{l})\leq1. \end{aligned} $$
(41)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Che, Z., Song, K. et al. Learn structured analysis discriminative dictionary for multi-label classification. Appl Intell 52, 3175–3192 (2022). https://doi.org/10.1007/s10489-021-02601-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02601-1

Keywords

Navigation