Skip to main content
Log in

Collaboration based multi-modal multi-label learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Complex objects can be represented as multiple modal features and associated with multiple labels. The major challenge of complex object classification is how to jointly utilize heterogeneous modals in a mutually beneficial way. Besides, how to effectively utilize label correlations is also a challenging issue. Previous methods model the label correlations by requiring that any two label-specific classifiers behave similarly on the same modal if the associated labels are similar. To address the above challenges, we propose a novel modal-oriented deep learning framework named Collaboration based Multi-modal Multi-label Learning (CoM3L). With the help of memory structure in LSTM, CoM3L handles modalities sequentially, which predicts next modal to be extracted and learns label correlations simultaneously. On the one hand, CoM3L can extract the most useful modal sequence, which extracts different modal sequences for different instances. On the other hand, for each label, CoM3L combines the collaboration between its own prediction and the prediction of other labels. Extensive experiments on 5 multi-modal multi-label datasets validate the effectiveness of the proposed CoM3L approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognition 37(9):1757–1771

    Article  Google Scholar 

  2. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

  3. Fang Z, Zhang Z (2012) Simultaneously combining multi-view multi-label learning with maximum margin classification. In: 2012 IEEE 12th international conference on data mining, IEEE, pp 864–869

  4. Feng L, An B, He S (2019) Collaboration based multi-label learning. In: Thirty-Third AAAI conference on artificial intelligence, pp 3550–3557

  5. Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153

    Article  Google Scholar 

  6. Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Computing Surveys (CSUR) 47(3):52

    Article  Google Scholar 

  7. Baltrusaitis T, Ahuja C, Morency, L-P (2019) Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2)

  8. Weng W, Li Y-W, Liu J-H, Wu S-X, Chen C-L (2021) Multi-Label Classification Review and Opportunities. J Netw Intell 6(2):255–275

    Google Scholar 

  9. Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634

  10. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  11. Huang J, Li G, Huang Q, Wu X (2015) Learning label specific features for multi-label classification. In: 2015 IEEE international conference on data mining, IEEE, pp 181–190

  12. Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 48(3):876–889

    Article  Google Scholar 

  13. Huang SJ, Yu Y, Zhou ZH (2012) Multi-label hypothesis reuse. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 525–533

  14. Huang SJ, Zhou ZH (2012) Multi-label learning by exploiting label correlations locally. In: Twenty-sixth AAAI conference on artificial intelligence

  15. Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364

    Article  Google Scholar 

  16. Li CL, Lin HT (2014) Condensed filter tree for cost-sensitive multi-label classification. In: International conference on machine learning, pp 423–431

  17. Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY (2010) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367

    Google Scholar 

  18. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333

    Article  MathSciNet  Google Scholar 

  19. Schroff F, Criminisi A, Zisserman A (2010) Harvesting image databases from the web. IEEE Trans Pattern Anal Mach Intell 33(4):754–766

    Article  Google Scholar 

  20. Wang J, Trapeznikov K, Saligrama V (2015) Efficient learning by directed acyclic graph for resource constrained prediction. In: Advances in neural information processing systems, pp 2152–2160

  21. Wu F, Jing XY, Zhou J, Ji Y, Lan C, Huang Q, Wang R (2019) Semi-supervised multi-view individual and sharable feature learning for webpage classification. In: The World Wide Web conference, ACM, pp 3349–3355

  22. Yang P, Yang H, Fu H, Zhou D, Ye J, Lappas T, He J (2016) Jointly modeling label and feature heterogeneity in medical informatics. ACM Transactions on Knowledge Discovery from Data (TKDD) 10(4):39

    Article  Google Scholar 

  23. Yang Y, Wu YF, Zhan DC, Liu ZB, Jiang Y (2018) Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, pp 2594–2603

  24. Yang Y, Zhan DC, Fan Y, Jiang Y (2017) Instance specific discriminative modal pursuit: A serialized approach. In: Asian conference on machine learning, pp 65–80

  25. Ye HJ, Zhan DC, Li X, Huang ZC, Jiang Y (2016) College student scholarships and subsidies granting: A multi-modal multi-label approach. In: 2016 IEEE 16th international conference on data mining (ICDM), IEEE, pp 559–568

  26. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

  27. Zhang C, Yu Z, Hu Q, Zhu P, Liu X, Wang X (2018) Latent semantic aware multi-view multi-label classification. In: Thirty-Second AAAI conference on artificial intelligence

  28. Zhang ML, Zhou ZH (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition 40(7):2038–2048

    Article  Google Scholar 

  29. Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819

  30. Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Article  Google Scholar 

  31. Zhang Y, Zeng C, Cheng H, Wang C, Zhang L (2019) Many could be better than all: A novel instance-oriented algorithm for multi-modal multi-label problem. In: 2019 IEEE international conference on multimedia and expo (ICME), IEEE, pp 838–843

  32. Zhu Y, Kwok JT, Zhou ZH (2018) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 6:1081–1094

    Article  Google Scholar 

Download references

Acknowledgements

This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1403400), the National Natural Science Foundation of China (Grant No. 61876080), the Key Research and Development Program of Jiangsu(Grant No. BE2019105), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chongjung Wang.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is an extended version of the paper of Yi Zhang et al. named “Many Could be Better Than All: A Novel Instance-Oriented Algorithm for Multi-modal Multi-label Problem”, which is presented at 2019 IEEE International Conference on Multimedia and Expo (ICME).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhu, Y., Zhang, Z. et al. Collaboration based multi-modal multi-label learning. Appl Intell 52, 14204–14217 (2022). https://doi.org/10.1007/s10489-021-03130-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03130-7

Keywords

Navigation