Abstract
Multi-task-learning (MTL) is a multi-target optimization task. Neural networks try to realize each target using a shared interpretative space within MTL. However, as the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging. In this paper, we first re-examine previous cross-attention MTL methods from the perspective of noise. We theoretically analyze this issue and identify it as a flaw in the cross-attention mechanism. To address this issue, we propose an information bottleneck knowledge extraction module (KEM). This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity. Furthermore, we have employed neural collapse to stabilize the knowledge-selection process. That is, before input to KEM, we projected the features into ETF space. This mapping makes our method more robust. We implemented and conducted comparative experiments with this method on multiple datasets. The results demonstrate that our approach significantly outperforms existing methods in multi-task learning.
R. Zhang and Y. Chen—Equal contributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bansal, A., Chen, X., Russell, B., Gupta, A., Ramanan, D.: Pixelnet: Representation of the pixels, by the pixels, and for the pixels. arXiv preprint arXiv:1702.06506 (2017)
Bhattacharjee, D., Zhang, T., Süsstrunk, S., Salzmann, M.: Mult: An end-to-end multitask learning transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12031–12041 (2022)
Bruggemann, D., Kanakis, M., Georgoulis, S., Van Gool, L.: Automated search for resource-efficient branched multi-task networks. arXiv preprint arXiv:2008.10292 (2020)
Brüggemann, D., Kanakis, M., Obukhov, A., Georgoulis, S., Van Gool, L.: Exploring relational context for multi-task dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 15869–15878 (2021)
Chen, J., Zhu, D., Shen, X., Li, X., Liu, Z., Zhang, P., Krishnamoorthi, R., Chandra, V., Xiong, Y., Elhoseiny, M.: Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478 (2023)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 801–818 (2018)
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1971–1978 (2014)
Dang, H., Tran, T., Nguyen, T., Ho, N.: Neural collapse for cross-entropy class-imbalanced learning with unconstrained relu feature model. arXiv preprint arXiv:2401.02058 (2024)
Dittadi, A., Träuble, F., Locatello, F., Wüthrich, M., Agrawal, V., Winther, O., Bauer, S., Schölkopf, B.: On the transfer of disentangled representations in realistic settings. arXiv preprint arXiv:2010.14407 (2020)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Fang, C., He, H., Long, Q., Su, W.J.: Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proc. Natl. Acad. Sci. 118(43), e2103091118 (2021)
Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3205–3214 (2019)
Goyal, A., Didolkar, A., Lamb, A., Badola, K., Ke, N.R., Rahaman, N., Binas, J., Blundell, C., Mozer, M., Bengio, Y.: Coordination among neural modules through a shared global workspace. arXiv preprint arXiv:2103.01197 (2021)
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2022)
Hong, J., Park, K.H., Pavlic, T.P.: Concept-centric transformers: Enhancing model interpretability through object-centric concept learning within a shared global workspace. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 4880–4891 (2024)
Hu, Y., Xian, R., Wu, Q., Fan, Q., Yin, L., Zhao, H.: Revisiting scalarization in multi-task learning: A theoretical perspective. Advances in Neural Information Processing Systems 36 (2024)
Im Im, D., Ahn, S., Memisevic, R., Bengio, Y.: Denoising criterion for variational auto-encoding framework. In: Proceedings of the AAAI conference on artificial intelligence. vol. 31 (2017)
Ji, W., Lu, Y., Zhang, Y., Deng, Z., Su, W.J.: An unconstrained layer-peeled perspective on neural collapse. arXiv preprint arXiv:2110.02796 (2021)
Kanakis, M., Bruggemann, D., Saha, S., Georgoulis, S., Obukhov, A., Van Gool, L.: Reparameterizing convolutions for incremental multi-task learning without task interference. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. pp. 689–707. Springer (2020)
Lachapelle, S., Deleu, T., Mahajan, D., Mitliagkas, I., Bengio, Y., Lacoste-Julien, S., Bertrand, Q.: Synergies between disentanglement and sparsity: Generalization and identifiability in multi-task learning. In: International Conference on Machine Learning. pp. 18171–18206. PMLR (2023)
Liu, D., Shah, V., Boussif, O., Meo, C., Goyal, A., Shu, T., Mozer, M., Heess, N., Bengio, Y.: Stateful active facilitator: Coordination and environmental heterogeneity in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2210.03022 (2022)
Liu, J., Hao, J., Lin, H., Pan, W., Yang, J., Feng, Y., Wang, G., Li, J., Jin, Z., Zhao, Z., et al.: Deep learning-enabled 3d multimodal fusion of cone-beam ct and intraoral mesh scans for clinically applicable tooth-bone reconstruction. Patterns 4(9) (2023)
Liu, J., Hu, T., Zhang, Y., Feng, Y., Hao, J., Lv, J., Liu, Z.: Parameter-efficient transfer learning for medical visual question answering. IEEE Transactions on Emerging Topics in Computational Intelligence (2023)
Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1871–1880 (2019)
Maninis, K.K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1851–1860 (2019)
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition. pp. 2437–2445 (2020)
Miladinović, Đ., Gondal, M.W., Schölkopf, B., Buhmann, J.M., Bauer, S.: Disentangled state space representations. arXiv preprint arXiv:1906.03255 (2019)
Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3994–4003 (2016)
Montero, M.L., Ludwig, C.J., Costa, R.P., Malhotra, G., Bowers, J.: The role of disentanglement in generalisation. In: International Conference on Learning Representations (2020)
Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., de Albuquerque, V.H.C.: Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans. Intell. Transp. Syst. 22(7), 4316–4336 (2020)
Papyan, V., Han, X., Donoho, D.L.: Prevalence of neural collapse during the terminal phase of deep learning training. Proc. Natl. Acad. Sci. 117(40), 24652–24663 (2020)
Razavi, A., Van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019)
Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning. Advances in neural information processing systems 30 (2017)
Shwartz-Ziv, R., Goldblum, M., Li, Y., Bruss, C.B., Wilson, A.G.: Simplifying neural network training under class imbalance. Advances in Neural Information Processing Systems 36 (2024)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. pp. 746–760. Springer (2012)
Sun, G., Probst, T., Paudel, D.P., Popović, N., Kanakis, M., Patel, J., Dai, D., Van Gool, L.: Task switching network for multi-task learning. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8291–8300 (2021)
Tirer, T., Bruna, J.: Extended unconstrained features model for exploring deep neural collapse. In: International Conference on Machine Learning. pp. 21478–21505. PMLR (2022)
Tirer, T., Huang, H., Niles-Weed, J.: Perturbation analysis of neural collapse. In: International Conference on Machine Learning. pp. 34301–34329. PMLR (2023)
Tucker, M., Li, H., Agrawal, S., Hughes, D., Sycara, K., Lewis, M., Shah, J.A.: Emergent discrete communication in semantic spaces. Adv. Neural. Inf. Process. Syst. 34, 10574–10586 (2021)
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017)
Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., Van Gool, L.: Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3614–3633 (2021)
Vandenhende, S., Georgoulis, S., Van Gool, L.: Mti-net: Multi-scale task interaction networks for multi-task learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. pp. 527–543. Springer (2020)
Wang, W., Xu, H., Gan, Z., Li, B., Wang, G., Chen, L., Yang, Q., Wang, W., Carin, L.: Graph-driven generative models for heterogeneous multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 979–988 (2020)
Wang, Y., Li, L., Yang, J., Lin, Z., Wang, Y.: Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective. Advances in neural information processing systems 36 (2024)
Xin, Y., Du, J., Wang, Q., Yan, K., Ding, S.: Mmap: Multi-modal alignment prompt for cross-domain multi-task learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 16076–16084 (2024)
Xu, D., Ouyang, W., Wang, X., Sebe, N.: Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 675–684 (2018)
Xu, X., Zhao, H., Vineet, V., Lim, S.N., Torralba, A.: Mtformer: Multi-task learning via transformer and cross-task reasoning. In: European Conference on Computer Vision. pp. 304–321. Springer (2022)
Yang, Y., Chen, S., Li, X., Xie, L., Lin, Z., Tao, D.: Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network? Adv. Neural. Inf. Process. Syst. 35, 37991–38002 (2022)
Zhang, R., Liu, J., Li, Z., Dong, H., Fu, J., Wu, C.: Scalable geometric fracture assembly via co-creation space among assemblers. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 7269–7277 (2024)
Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34(12), 5586–5609 (2021)
Zhong, Z., Cui, J., Yang, Y., Wu, X., Qi, X., Zhang, X., Jia, J.: Understanding imbalanced semantic segmentation through neural collapse. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19550–19560 (2023)
Zhu, Z., Ding, T., Zhou, J., Li, X., You, C., Sulam, J., Qu, Q.: A geometric analysis of neural collapse with unconstrained features. Adv. Neural. Inf. Process. Syst. 34, 29820–29834 (2021)
Acknowledgments
This work was supported by the National Key Research and Development Project of China (2021ZD0110505), the Zhejiang Provincial Key Research and Development Project (2023C01043), and Academy Of Social Governance Zhejiang University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, R. et al. (2025). SGW-Based Multi-task Learning in Vision Tasks. In: Cho, M., Laptev, I., Tran, D., Yao, A., Zha, H. (eds) Computer Vision – ACCV 2024. ACCV 2024. Lecture Notes in Computer Science, vol 15475. Springer, Singapore. https://doi.org/10.1007/978-981-96-0911-6_8
Download citation
DOI: https://doi.org/10.1007/978-981-96-0911-6_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0910-9
Online ISBN: 978-981-96-0911-6
eBook Packages: Computer ScienceComputer Science (R0)