Collaboration based multi-modal multi-label learning

Zhang, Yi; Zhu, Yinlong; Zhang, Zhecheng; Wang, Chongjung

doi:10.1007/s10489-021-03130-7

Collaboration based multi-modal multi-label learning

Published: 04 March 2022

Volume 52, pages 14204–14217, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yi Zhang ORCID: orcid.org/0000-0002-6874-3658¹,
Yinlong Zhu¹,
Zhecheng Zhang¹ &
…
Chongjung Wang¹

368 Accesses
1 Altmetric
Explore all metrics

Abstract

Complex objects can be represented as multiple modal features and associated with multiple labels. The major challenge of complex object classification is how to jointly utilize heterogeneous modals in a mutually beneficial way. Besides, how to effectively utilize label correlations is also a challenging issue. Previous methods model the label correlations by requiring that any two label-specific classifiers behave similarly on the same modal if the associated labels are similar. To address the above challenges, we propose a novel modal-oriented deep learning framework named Collaboration based Multi-modal Multi-label Learning (CoM3L). With the help of memory structure in LSTM, CoM3L handles modalities sequentially, which predicts next modal to be extracted and learns label correlations simultaneously. On the one hand, CoM3L can extract the most useful modal sequence, which extracts different modal sequences for different instances. On the other hand, for each label, CoM3L combines the collaboration between its own prediction and the prediction of other labels. Extensive experiments on 5 multi-modal multi-label datasets validate the effectiveness of the proposed CoM3L approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

References

Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognition 37(9):1757–1771
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
Fang Z, Zhang Z (2012) Simultaneously combining multi-view multi-label learning with maximum margin classification. In: 2012 IEEE 12th international conference on data mining, IEEE, pp 864–869
Feng L, An B, He S (2019) Collaboration based multi-label learning. In: Thirty-Third AAAI conference on artificial intelligence, pp 3550–3557
Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Article Google Scholar
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Computing Surveys (CSUR) 47(3):52
Article Google Scholar
Baltrusaitis T, Ahuja C, Morency, L-P (2019) Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2)
Weng W, Li Y-W, Liu J-H, Wu S-X, Chen C-L (2021) Multi-Label Classification Review and Opportunities. J Netw Intell 6(2):255–275
Google Scholar
Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Huang J, Li G, Huang Q, Wu X (2015) Learning label specific features for multi-label classification. In: 2015 IEEE international conference on data mining, IEEE, pp 181–190
Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 48(3):876–889
Article Google Scholar
Huang SJ, Yu Y, Zhou ZH (2012) Multi-label hypothesis reuse. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 525–533
Huang SJ, Zhou ZH (2012) Multi-label learning by exploiting label correlations locally. In: Twenty-sixth AAAI conference on artificial intelligence
Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364
Article Google Scholar
Li CL, Lin HT (2014) Condensed filter tree for cost-sensitive multi-label classification. In: International conference on machine learning, pp 423–431
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY (2010) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367
Google Scholar
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Article MathSciNet Google Scholar
Schroff F, Criminisi A, Zisserman A (2010) Harvesting image databases from the web. IEEE Trans Pattern Anal Mach Intell 33(4):754–766
Article Google Scholar
Wang J, Trapeznikov K, Saligrama V (2015) Efficient learning by directed acyclic graph for resource constrained prediction. In: Advances in neural information processing systems, pp 2152–2160
Wu F, Jing XY, Zhou J, Ji Y, Lan C, Huang Q, Wang R (2019) Semi-supervised multi-view individual and sharable feature learning for webpage classification. In: The World Wide Web conference, ACM, pp 3349–3355
Yang P, Yang H, Fu H, Zhou D, Ye J, Lappas T, He J (2016) Jointly modeling label and feature heterogeneity in medical informatics. ACM Transactions on Knowledge Discovery from Data (TKDD) 10(4):39
Article Google Scholar
Yang Y, Wu YF, Zhan DC, Liu ZB, Jiang Y (2018) Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, pp 2594–2603
Yang Y, Zhan DC, Fan Y, Jiang Y (2017) Instance specific discriminative modal pursuit: A serialized approach. In: Asian conference on machine learning, pp 65–80
Ye HJ, Zhan DC, Li X, Huang ZC, Jiang Y (2016) College student scholarships and subsidies granting: A multi-modal multi-label approach. In: 2016 IEEE 16th international conference on data mining (ICDM), IEEE, pp 559–568
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Zhang C, Yu Z, Hu Q, Zhu P, Liu X, Wang X (2018) Latent semantic aware multi-view multi-label classification. In: Thirty-Second AAAI conference on artificial intelligence
Zhang ML, Zhou ZH (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition 40(7):2038–2048
Article Google Scholar
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Article Google Scholar
Zhang Y, Zeng C, Cheng H, Wang C, Zhang L (2019) Many could be better than all: A novel instance-oriented algorithm for multi-modal multi-label problem. In: 2019 IEEE international conference on multimedia and expo (ICME), IEEE, pp 838–843
Zhu Y, Kwok JT, Zhou ZH (2018) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 6:1081–1094
Article Google Scholar

Download references

Acknowledgements

This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1403400), the National Natural Science Foundation of China (Grant No. 61876080), the Key Research and Development Program of Jiangsu(Grant No. BE2019105), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University.

Author information

Authors and Affiliations

Department of Computer Science and Technology, State Key Laboratory for Novel Software Technology, Nanjing University, 210023, Nanjing, China
Yi Zhang, Yinlong Zhu, Zhecheng Zhang & Chongjung Wang

Authors

Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yinlong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhecheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chongjung Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chongjung Wang.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is an extended version of the paper of Yi Zhang et al. named “Many Could be Better Than All: A Novel Instance-Oriented Algorithm for Multi-modal Multi-label Problem”, which is presented at 2019 IEEE International Conference on Multimedia and Expo (ICME).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zhu, Y., Zhang, Z. et al. Collaboration based multi-modal multi-label learning. Appl Intell 52, 14204–14217 (2022). https://doi.org/10.1007/s10489-021-03130-7

Download citation

Accepted: 18 December 2021
Published: 04 March 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10489-021-03130-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collaboration based multi-modal multi-label learning

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Collaboration based multi-modal multi-label learning

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation