Abstract
Complex objects can be represented as multiple modal features and associated with multiple labels. The major challenge of complex object classification is how to jointly utilize heterogeneous modals in a mutually beneficial way. Besides, how to effectively utilize label correlations is also a challenging issue. Previous methods model the label correlations by requiring that any two label-specific classifiers behave similarly on the same modal if the associated labels are similar. To address the above challenges, we propose a novel modal-oriented deep learning framework named Collaboration based Multi-modal Multi-label Learning (CoM3L). With the help of memory structure in LSTM, CoM3L handles modalities sequentially, which predicts next modal to be extracted and learns label correlations simultaneously. On the one hand, CoM3L can extract the most useful modal sequence, which extracts different modal sequences for different instances. On the other hand, for each label, CoM3L combines the collaboration between its own prediction and the prediction of other labels. Extensive experiments on 5 multi-modal multi-label datasets validate the effectiveness of the proposed CoM3L approach.
Similar content being viewed by others
References
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognition 37(9):1757–1771
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
Fang Z, Zhang Z (2012) Simultaneously combining multi-view multi-label learning with maximum margin classification. In: 2012 IEEE 12th international conference on data mining, IEEE, pp 864–869
Feng L, An B, He S (2019) Collaboration based multi-label learning. In: Thirty-Third AAAI conference on artificial intelligence, pp 3550–3557
Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Computing Surveys (CSUR) 47(3):52
Baltrusaitis T, Ahuja C, Morency, L-P (2019) Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2)
Weng W, Li Y-W, Liu J-H, Wu S-X, Chen C-L (2021) Multi-Label Classification Review and Opportunities. J Netw Intell 6(2):255–275
Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Huang J, Li G, Huang Q, Wu X (2015) Learning label specific features for multi-label classification. In: 2015 IEEE international conference on data mining, IEEE, pp 181–190
Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 48(3):876–889
Huang SJ, Yu Y, Zhou ZH (2012) Multi-label hypothesis reuse. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 525–533
Huang SJ, Zhou ZH (2012) Multi-label learning by exploiting label correlations locally. In: Twenty-sixth AAAI conference on artificial intelligence
Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364
Li CL, Lin HT (2014) Condensed filter tree for cost-sensitive multi-label classification. In: International conference on machine learning, pp 423–431
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY (2010) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Schroff F, Criminisi A, Zisserman A (2010) Harvesting image databases from the web. IEEE Trans Pattern Anal Mach Intell 33(4):754–766
Wang J, Trapeznikov K, Saligrama V (2015) Efficient learning by directed acyclic graph for resource constrained prediction. In: Advances in neural information processing systems, pp 2152–2160
Wu F, Jing XY, Zhou J, Ji Y, Lan C, Huang Q, Wang R (2019) Semi-supervised multi-view individual and sharable feature learning for webpage classification. In: The World Wide Web conference, ACM, pp 3349–3355
Yang P, Yang H, Fu H, Zhou D, Ye J, Lappas T, He J (2016) Jointly modeling label and feature heterogeneity in medical informatics. ACM Transactions on Knowledge Discovery from Data (TKDD) 10(4):39
Yang Y, Wu YF, Zhan DC, Liu ZB, Jiang Y (2018) Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, pp 2594–2603
Yang Y, Zhan DC, Fan Y, Jiang Y (2017) Instance specific discriminative modal pursuit: A serialized approach. In: Asian conference on machine learning, pp 65–80
Ye HJ, Zhan DC, Li X, Huang ZC, Jiang Y (2016) College student scholarships and subsidies granting: A multi-modal multi-label approach. In: 2016 IEEE 16th international conference on data mining (ICDM), IEEE, pp 559–568
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Zhang C, Yu Z, Hu Q, Zhu P, Liu X, Wang X (2018) Latent semantic aware multi-view multi-label classification. In: Thirty-Second AAAI conference on artificial intelligence
Zhang ML, Zhou ZH (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition 40(7):2038–2048
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Zhang Y, Zeng C, Cheng H, Wang C, Zhang L (2019) Many could be better than all: A novel instance-oriented algorithm for multi-modal multi-label problem. In: 2019 IEEE international conference on multimedia and expo (ICME), IEEE, pp 838–843
Zhu Y, Kwok JT, Zhou ZH (2018) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 6:1081–1094
Acknowledgements
This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1403400), the National Natural Science Foundation of China (Grant No. 61876080), the Key Research and Development Program of Jiangsu(Grant No. BE2019105), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is an extended version of the paper of Yi Zhang et al. named “Many Could be Better Than All: A Novel Instance-Oriented Algorithm for Multi-modal Multi-label Problem”, which is presented at 2019 IEEE International Conference on Multimedia and Expo (ICME).
Rights and permissions
About this article
Cite this article
Zhang, Y., Zhu, Y., Zhang, Z. et al. Collaboration based multi-modal multi-label learning. Appl Intell 52, 14204–14217 (2022). https://doi.org/10.1007/s10489-021-03130-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03130-7