Skip to main content

SGW-Based Multi-task Learning in Vision Tasks

  • Conference paper
  • First Online:
Computer Vision – ACCV 2024 (ACCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15475))

Included in the following conference series:

  • 138 Accesses

Abstract

Multi-task-learning (MTL) is a multi-target optimization task. Neural networks try to realize each target using a shared interpretative space within MTL. However, as the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging. In this paper, we first re-examine previous cross-attention MTL methods from the perspective of noise. We theoretically analyze this issue and identify it as a flaw in the cross-attention mechanism. To address this issue, we propose an information bottleneck knowledge extraction module (KEM). This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity. Furthermore, we have employed neural collapse to stabilize the knowledge-selection process. That is, before input to KEM, we projected the features into ETF space. This mapping makes our method more robust. We implemented and conducted comparative experiments with this method on multiple datasets. The results demonstrate that our approach significantly outperforms existing methods in multi-task learning.

R. Zhang and Y. Chen—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bansal, A., Chen, X., Russell, B., Gupta, A., Ramanan, D.: Pixelnet: Representation of the pixels, by the pixels, and for the pixels. arXiv preprint arXiv:1702.06506 (2017)

  2. Bhattacharjee, D., Zhang, T., Süsstrunk, S., Salzmann, M.: Mult: An end-to-end multitask learning transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12031–12041 (2022)

    Google Scholar 

  3. Bruggemann, D., Kanakis, M., Georgoulis, S., Van Gool, L.: Automated search for resource-efficient branched multi-task networks. arXiv preprint arXiv:2008.10292 (2020)

  4. Brüggemann, D., Kanakis, M., Obukhov, A., Georgoulis, S., Van Gool, L.: Exploring relational context for multi-task dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 15869–15878 (2021)

    Google Scholar 

  5. Chen, J., Zhu, D., Shen, X., Li, X., Liu, Z., Zhang, P., Krishnamoorthi, R., Chandra, V., Xiong, Y., Elhoseiny, M.: Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478 (2023)

  6. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 801–818 (2018)

    Google Scholar 

  7. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1971–1978 (2014)

    Google Scholar 

  8. Dang, H., Tran, T., Nguyen, T., Ho, N.: Neural collapse for cross-entropy class-imbalanced learning with unconstrained relu feature model. arXiv preprint arXiv:2401.02058 (2024)

  9. Dittadi, A., Träuble, F., Locatello, F., Wüthrich, M., Agrawal, V., Winther, O., Bauer, S., Schölkopf, B.: On the transfer of disentangled representations in realistic settings. arXiv preprint arXiv:2010.14407 (2020)

  10. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 303–338 (2010)

    Article  Google Scholar 

  11. Fang, C., He, H., Long, Q., Su, W.J.: Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proc. Natl. Acad. Sci. 118(43), e2103091118 (2021)

    Article  MathSciNet  Google Scholar 

  12. Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3205–3214 (2019)

    Google Scholar 

  13. Goyal, A., Didolkar, A., Lamb, A., Badola, K., Ke, N.R., Rahaman, N., Binas, J., Blundell, C., Mozer, M., Bengio, Y.: Coordination among neural modules through a shared global workspace. arXiv preprint arXiv:2103.01197 (2021)

  14. Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2022)

    Article  Google Scholar 

  15. Hong, J., Park, K.H., Pavlic, T.P.: Concept-centric transformers: Enhancing model interpretability through object-centric concept learning within a shared global workspace. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 4880–4891 (2024)

    Google Scholar 

  16. Hu, Y., Xian, R., Wu, Q., Fan, Q., Yin, L., Zhao, H.: Revisiting scalarization in multi-task learning: A theoretical perspective. Advances in Neural Information Processing Systems 36 (2024)

    Google Scholar 

  17. Im Im, D., Ahn, S., Memisevic, R., Bengio, Y.: Denoising criterion for variational auto-encoding framework. In: Proceedings of the AAAI conference on artificial intelligence. vol. 31 (2017)

    Google Scholar 

  18. Ji, W., Lu, Y., Zhang, Y., Deng, Z., Su, W.J.: An unconstrained layer-peeled perspective on neural collapse. arXiv preprint arXiv:2110.02796 (2021)

  19. Kanakis, M., Bruggemann, D., Saha, S., Georgoulis, S., Obukhov, A., Van Gool, L.: Reparameterizing convolutions for incremental multi-task learning without task interference. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. pp. 689–707. Springer (2020)

    Google Scholar 

  20. Lachapelle, S., Deleu, T., Mahajan, D., Mitliagkas, I., Bengio, Y., Lacoste-Julien, S., Bertrand, Q.: Synergies between disentanglement and sparsity: Generalization and identifiability in multi-task learning. In: International Conference on Machine Learning. pp. 18171–18206. PMLR (2023)

    Google Scholar 

  21. Liu, D., Shah, V., Boussif, O., Meo, C., Goyal, A., Shu, T., Mozer, M., Heess, N., Bengio, Y.: Stateful active facilitator: Coordination and environmental heterogeneity in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2210.03022 (2022)

  22. Liu, J., Hao, J., Lin, H., Pan, W., Yang, J., Feng, Y., Wang, G., Li, J., Jin, Z., Zhao, Z., et al.: Deep learning-enabled 3d multimodal fusion of cone-beam ct and intraoral mesh scans for clinically applicable tooth-bone reconstruction. Patterns 4(9) (2023)

    Google Scholar 

  23. Liu, J., Hu, T., Zhang, Y., Feng, Y., Hao, J., Lv, J., Liu, Z.: Parameter-efficient transfer learning for medical visual question answering. IEEE Transactions on Emerging Topics in Computational Intelligence (2023)

    Google Scholar 

  24. Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1871–1880 (2019)

    Google Scholar 

  25. Maninis, K.K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1851–1860 (2019)

    Google Scholar 

  26. Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition. pp. 2437–2445 (2020)

    Google Scholar 

  27. Miladinović, Đ., Gondal, M.W., Schölkopf, B., Buhmann, J.M., Bauer, S.: Disentangled state space representations. arXiv preprint arXiv:1906.03255 (2019)

  28. Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3994–4003 (2016)

    Google Scholar 

  29. Montero, M.L., Ludwig, C.J., Costa, R.P., Malhotra, G., Bowers, J.: The role of disentanglement in generalisation. In: International Conference on Learning Representations (2020)

    Google Scholar 

  30. Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., de Albuquerque, V.H.C.: Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans. Intell. Transp. Syst. 22(7), 4316–4336 (2020)

    Article  Google Scholar 

  31. Papyan, V., Han, X., Donoho, D.L.: Prevalence of neural collapse during the terminal phase of deep learning training. Proc. Natl. Acad. Sci. 117(40), 24652–24663 (2020)

    Article  MathSciNet  Google Scholar 

  32. Razavi, A., Van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019)

    Google Scholar 

  33. Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  34. Shwartz-Ziv, R., Goldblum, M., Li, Y., Bruss, C.B., Wilson, A.G.: Simplifying neural network training under class imbalance. Advances in Neural Information Processing Systems 36 (2024)

    Google Scholar 

  35. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. pp. 746–760. Springer (2012)

    Google Scholar 

  36. Sun, G., Probst, T., Paudel, D.P., Popović, N., Kanakis, M., Patel, J., Dai, D., Van Gool, L.: Task switching network for multi-task learning. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8291–8300 (2021)

    Google Scholar 

  37. Tirer, T., Bruna, J.: Extended unconstrained features model for exploring deep neural collapse. In: International Conference on Machine Learning. pp. 21478–21505. PMLR (2022)

    Google Scholar 

  38. Tirer, T., Huang, H., Niles-Weed, J.: Perturbation analysis of neural collapse. In: International Conference on Machine Learning. pp. 34301–34329. PMLR (2023)

    Google Scholar 

  39. Tucker, M., Li, H., Agrawal, S., Hughes, D., Sycara, K., Lewis, M., Shah, J.A.: Emergent discrete communication in semantic spaces. Adv. Neural. Inf. Process. Syst. 34, 10574–10586 (2021)

    Google Scholar 

  40. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  41. Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., Van Gool, L.: Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3614–3633 (2021)

    Google Scholar 

  42. Vandenhende, S., Georgoulis, S., Van Gool, L.: Mti-net: Multi-scale task interaction networks for multi-task learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. pp. 527–543. Springer (2020)

    Google Scholar 

  43. Wang, W., Xu, H., Gan, Z., Li, B., Wang, G., Chen, L., Yang, Q., Wang, W., Carin, L.: Graph-driven generative models for heterogeneous multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 979–988 (2020)

    Google Scholar 

  44. Wang, Y., Li, L., Yang, J., Lin, Z., Wang, Y.: Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective. Advances in neural information processing systems 36 (2024)

    Google Scholar 

  45. Xin, Y., Du, J., Wang, Q., Yan, K., Ding, S.: Mmap: Multi-modal alignment prompt for cross-domain multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 16076–16084 (2024)

    Google Scholar 

  46. Xu, D., Ouyang, W., Wang, X., Sebe, N.: Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 675–684 (2018)

    Google Scholar 

  47. Xu, X., Zhao, H., Vineet, V., Lim, S.N., Torralba, A.: Mtformer: Multi-task learning via transformer and cross-task reasoning. In: European Conference on Computer Vision. pp. 304–321. Springer (2022)

    Google Scholar 

  48. Yang, Y., Chen, S., Li, X., Xie, L., Lin, Z., Tao, D.: Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network? Adv. Neural. Inf. Process. Syst. 35, 37991–38002 (2022)

    Google Scholar 

  49. Zhang, R., Liu, J., Li, Z., Dong, H., Fu, J., Wu, C.: Scalable geometric fracture assembly via co-creation space among assemblers. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 7269–7277 (2024)

    Google Scholar 

  50. Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34(12), 5586–5609 (2021)

    Article  Google Scholar 

  51. Zhong, Z., Cui, J., Yang, Y., Wu, X., Qi, X., Zhang, X., Jia, J.: Understanding imbalanced semantic segmentation through neural collapse. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19550–19560 (2023)

    Google Scholar 

  52. Zhu, Z., Ding, T., Zhou, J., Li, X., You, C., Sulam, J., Qu, Q.: A geometric analysis of neural collapse with unconstrained features. Adv. Neural. Inf. Process. Syst. 34, 29820–29834 (2021)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Key Research and Development Project of China (2021ZD0110505), the Zhejiang Provincial Key Research and Development Project (2023C01043), and Academy Of Social Governance Zhejiang University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R. et al. (2025). SGW-Based Multi-task Learning in Vision Tasks. In: Cho, M., Laptev, I., Tran, D., Yao, A., Zha, H. (eds) Computer Vision – ACCV 2024. ACCV 2024. Lecture Notes in Computer Science, vol 15475. Springer, Singapore. https://doi.org/10.1007/978-981-96-0911-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0911-6_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0910-9

  • Online ISBN: 978-981-96-0911-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics