SGW-Based Multi-task Learning in Vision Tasks

Zhang, Ruiyuan; Chen, Yuyao; Liu, Jiaxiang; Xi, Dianbing; Huo, Yuchi; Liu, Jie; Wu, Chao

doi:10.1007/978-981-96-0911-6_8

Ruiyuan Zhang¹²,
Yuyao Chen¹²,
Jiaxiang Liu¹²,
Dianbing Xi¹²,
Yuchi Huo¹²,
Jie Liu¹² &
…
Chao Wu¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15475))

Included in the following conference series:

Asian Conference on Computer Vision

138 Accesses

Abstract

Multi-task-learning (MTL) is a multi-target optimization task. Neural networks try to realize each target using a shared interpretative space within MTL. However, as the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging. In this paper, we first re-examine previous cross-attention MTL methods from the perspective of noise. We theoretically analyze this issue and identify it as a flaw in the cross-attention mechanism. To address this issue, we propose an information bottleneck knowledge extraction module (KEM). This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity. Furthermore, we have employed neural collapse to stabilize the knowledge-selection process. That is, before input to KEM, we projected the features into ETF space. This mapping makes our method more robust. We implemented and conducted comparative experiments with this method on multiple datasets. The results demonstrate that our approach significantly outperforms existing methods in multi-task learning.

R. Zhang and Y. Chen—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-task Learning

Map-and-acquisition networks

Article 07 August 2024

Adaptive and Dynamic Knowledge Transfer in Multi-task Learning with Attention Networks

References

Bansal, A., Chen, X., Russell, B., Gupta, A., Ramanan, D.: Pixelnet: Representation of the pixels, by the pixels, and for the pixels. arXiv preprint arXiv:1702.06506 (2017)
Bhattacharjee, D., Zhang, T., Süsstrunk, S., Salzmann, M.: Mult: An end-to-end multitask learning transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12031–12041 (2022)
Google Scholar
Bruggemann, D., Kanakis, M., Georgoulis, S., Van Gool, L.: Automated search for resource-efficient branched multi-task networks. arXiv preprint arXiv:2008.10292 (2020)
Brüggemann, D., Kanakis, M., Obukhov, A., Georgoulis, S., Van Gool, L.: Exploring relational context for multi-task dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 15869–15878 (2021)
Google Scholar
Chen, J., Zhu, D., Shen, X., Li, X., Liu, Z., Zhang, P., Krishnamoorthi, R., Chandra, V., Xiong, Y., Elhoseiny, M.: Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478 (2023)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 801–818 (2018)
Google Scholar
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1971–1978 (2014)
Google Scholar
Dang, H., Tran, T., Nguyen, T., Ho, N.: Neural collapse for cross-entropy class-imbalanced learning with unconstrained relu feature model. arXiv preprint arXiv:2401.02058 (2024)
Dittadi, A., Träuble, F., Locatello, F., Wüthrich, M., Agrawal, V., Winther, O., Bauer, S., Schölkopf, B.: On the transfer of disentangled representations in realistic settings. arXiv preprint arXiv:2010.14407 (2020)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Article Google Scholar
Fang, C., He, H., Long, Q., Su, W.J.: Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proc. Natl. Acad. Sci. 118(43), e2103091118 (2021)
Article MathSciNet Google Scholar
Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3205–3214 (2019)
Google Scholar
Goyal, A., Didolkar, A., Lamb, A., Badola, K., Ke, N.R., Rahaman, N., Binas, J., Blundell, C., Mozer, M., Bengio, Y.: Coordination among neural modules through a shared global workspace. arXiv preprint arXiv:2103.01197 (2021)
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2022)
Article Google Scholar
Hong, J., Park, K.H., Pavlic, T.P.: Concept-centric transformers: Enhancing model interpretability through object-centric concept learning within a shared global workspace. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 4880–4891 (2024)
Google Scholar
Hu, Y., Xian, R., Wu, Q., Fan, Q., Yin, L., Zhao, H.: Revisiting scalarization in multi-task learning: A theoretical perspective. Advances in Neural Information Processing Systems 36 (2024)
Google Scholar
Im Im, D., Ahn, S., Memisevic, R., Bengio, Y.: Denoising criterion for variational auto-encoding framework. In: Proceedings of the AAAI conference on artificial intelligence. vol. 31 (2017)
Google Scholar
Ji, W., Lu, Y., Zhang, Y., Deng, Z., Su, W.J.: An unconstrained layer-peeled perspective on neural collapse. arXiv preprint arXiv:2110.02796 (2021)
Kanakis, M., Bruggemann, D., Saha, S., Georgoulis, S., Obukhov, A., Van Gool, L.: Reparameterizing convolutions for incremental multi-task learning without task interference. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. pp. 689–707. Springer (2020)
Google Scholar
Lachapelle, S., Deleu, T., Mahajan, D., Mitliagkas, I., Bengio, Y., Lacoste-Julien, S., Bertrand, Q.: Synergies between disentanglement and sparsity: Generalization and identifiability in multi-task learning. In: International Conference on Machine Learning. pp. 18171–18206. PMLR (2023)
Google Scholar
Liu, D., Shah, V., Boussif, O., Meo, C., Goyal, A., Shu, T., Mozer, M., Heess, N., Bengio, Y.: Stateful active facilitator: Coordination and environmental heterogeneity in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2210.03022 (2022)
Liu, J., Hao, J., Lin, H., Pan, W., Yang, J., Feng, Y., Wang, G., Li, J., Jin, Z., Zhao, Z., et al.: Deep learning-enabled 3d multimodal fusion of cone-beam ct and intraoral mesh scans for clinically applicable tooth-bone reconstruction. Patterns 4(9) (2023)
Google Scholar
Liu, J., Hu, T., Zhang, Y., Feng, Y., Hao, J., Lv, J., Liu, Z.: Parameter-efficient transfer learning for medical visual question answering. IEEE Transactions on Emerging Topics in Computational Intelligence (2023)
Google Scholar
Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1871–1880 (2019)
Google Scholar
Maninis, K.K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1851–1860 (2019)
Google Scholar
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition. pp. 2437–2445 (2020)
Google Scholar
Miladinović, Đ., Gondal, M.W., Schölkopf, B., Buhmann, J.M., Bauer, S.: Disentangled state space representations. arXiv preprint arXiv:1906.03255 (2019)
Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3994–4003 (2016)
Google Scholar
Montero, M.L., Ludwig, C.J., Costa, R.P., Malhotra, G., Bowers, J.: The role of disentanglement in generalisation. In: International Conference on Learning Representations (2020)
Google Scholar
Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., de Albuquerque, V.H.C.: Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans. Intell. Transp. Syst. 22(7), 4316–4336 (2020)
Article Google Scholar
Papyan, V., Han, X., Donoho, D.L.: Prevalence of neural collapse during the terminal phase of deep learning training. Proc. Natl. Acad. Sci. 117(40), 24652–24663 (2020)
Article MathSciNet Google Scholar
Razavi, A., Van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019)
Google Scholar
Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning. Advances in neural information processing systems 30 (2017)
Google Scholar
Shwartz-Ziv, R., Goldblum, M., Li, Y., Bruss, C.B., Wilson, A.G.: Simplifying neural network training under class imbalance. Advances in Neural Information Processing Systems 36 (2024)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. pp. 746–760. Springer (2012)
Google Scholar
Sun, G., Probst, T., Paudel, D.P., Popović, N., Kanakis, M., Patel, J., Dai, D., Van Gool, L.: Task switching network for multi-task learning. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8291–8300 (2021)
Google Scholar
Tirer, T., Bruna, J.: Extended unconstrained features model for exploring deep neural collapse. In: International Conference on Machine Learning. pp. 21478–21505. PMLR (2022)
Google Scholar
Tirer, T., Huang, H., Niles-Weed, J.: Perturbation analysis of neural collapse. In: International Conference on Machine Learning. pp. 34301–34329. PMLR (2023)
Google Scholar
Tucker, M., Li, H., Agrawal, S., Hughes, D., Sycara, K., Lewis, M., Shah, J.A.: Emergent discrete communication in semantic spaces. Adv. Neural. Inf. Process. Syst. 34, 10574–10586 (2021)
Google Scholar
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017)
Google Scholar
Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., Van Gool, L.: Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3614–3633 (2021)
Google Scholar
Vandenhende, S., Georgoulis, S., Van Gool, L.: Mti-net: Multi-scale task interaction networks for multi-task learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. pp. 527–543. Springer (2020)
Google Scholar
Wang, W., Xu, H., Gan, Z., Li, B., Wang, G., Chen, L., Yang, Q., Wang, W., Carin, L.: Graph-driven generative models for heterogeneous multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 979–988 (2020)
Google Scholar
Wang, Y., Li, L., Yang, J., Lin, Z., Wang, Y.: Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective. Advances in neural information processing systems 36 (2024)
Google Scholar
Xin, Y., Du, J., Wang, Q., Yan, K., Ding, S.: Mmap: Multi-modal alignment prompt for cross-domain multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 16076–16084 (2024)
Google Scholar
Xu, D., Ouyang, W., Wang, X., Sebe, N.: Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 675–684 (2018)
Google Scholar
Xu, X., Zhao, H., Vineet, V., Lim, S.N., Torralba, A.: Mtformer: Multi-task learning via transformer and cross-task reasoning. In: European Conference on Computer Vision. pp. 304–321. Springer (2022)
Google Scholar
Yang, Y., Chen, S., Li, X., Xie, L., Lin, Z., Tao, D.: Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network? Adv. Neural. Inf. Process. Syst. 35, 37991–38002 (2022)
Google Scholar
Zhang, R., Liu, J., Li, Z., Dong, H., Fu, J., Wu, C.: Scalable geometric fracture assembly via co-creation space among assemblers. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 7269–7277 (2024)
Google Scholar
Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34(12), 5586–5609 (2021)
Article Google Scholar
Zhong, Z., Cui, J., Yang, Y., Wu, X., Qi, X., Zhang, X., Jia, J.: Understanding imbalanced semantic segmentation through neural collapse. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19550–19560 (2023)
Google Scholar
Zhu, Z., Ding, T., Zhou, J., Li, X., You, C., Sulam, J., Qu, Q.: A geometric analysis of neural collapse with unconstrained features. Adv. Neural. Inf. Process. Syst. 34, 29820–29834 (2021)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Key Research and Development Project of China (2021ZD0110505), the Zhejiang Provincial Key Research and Development Project (2023C01043), and Academy Of Social Governance Zhejiang University.

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, Zhejiang, China
Ruiyuan Zhang, Yuyao Chen, Jiaxiang Liu, Dianbing Xi, Yuchi Huo, Jie Liu & Chao Wu

Authors

Ruiyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuyao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dianbing Xi
View author publications
You can also search for this author in PubMed Google Scholar
Yuchi Huo
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Wu .

Editor information

Editors and Affiliations

University of Science and Technology (POSTECH), Pohang, Korea (Republic of)
Minsu Cho
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Ivan Laptev
Google, Mountain View, CA, USA
Du Tran
National University of Singapore, Singapore, Singapore
Angela Yao
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R. et al. (2025). SGW-Based Multi-task Learning in Vision Tasks. In: Cho, M., Laptev, I., Tran, D., Yao, A., Zha, H. (eds) Computer Vision – ACCV 2024. ACCV 2024. Lecture Notes in Computer Science, vol 15475. Springer, Singapore. https://doi.org/10.1007/978-981-96-0911-6_8

Download citation

DOI: https://doi.org/10.1007/978-981-96-0911-6_8
Published: 08 December 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0910-9
Online ISBN: 978-981-96-0911-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SGW-Based Multi-task Learning in Vision Tasks