skip to main content
10.1145/3664647.3681183acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Task-Interaction-Free Multi-Task Learning with Efficient Hierarchical Feature Representation

Published: 28 October 2024 Publication History

Abstract

Traditional multi-task learning often relies on explicit task interaction mechanisms to enhance multi-task performance. However, these approaches encounter challenges such as negative transfer when jointly learning multiple weakly correlated tasks. Additionally, these methods handle encoded features at a large scale, which escalates computational complexity to ensure dense prediction task performance. In this study, we introduce a Task-Interaction-Free Network (TIF) for multi-task learning, which diverges from explicitly designed task interaction mechanisms. Firstly, we present a Scale Attentive-Feature Fusion Module (SAFF) to enhance each scale in the shared encoder to have rich task-agnostic encoded features. Subsequently, our proposed task and scale-specific decoders efficiently decode the enhanced features shared across tasks without necessitating task-interaction modules. Concretely, we utilize a Self-Feature Distillation Module (SFD) to explore task-specific features at lower scales and the Low-To-High Scale Feature Diffusion Module (LTHD) to diffuse global pixel relationships from low-level to high-level scales. Experiments on publicly available multi-task learning datasets validate that our TIF attains state-of-the-art performance.

References

[1]
David Brüggemann, Menelaos Kanakis, Anton Obukhov, Stamatios Georgoulis, and Luc Van Gool. 2021. Exploring relational context for multi-task dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision. 15869--15878.
[2]
Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1971--1978.
[3]
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning (ICML). PMLR, 794--803.
[4]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR).
[5]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, Vol. 88 (2010), 303--338.
[6]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3146--3154.
[7]
Kaifeng Gao, Long Chen, Yifeng Huang, and Jun Xiao. 2021. Video relation detection via tracklet based visual transformer. In Proceedings of the 29th ACM international conference on multimedia. 4833--4837.
[8]
Yuan Gao, Jiayi Ma, Mingbo Zhao, Wei Liu, and Alan L Yuille. 2019. Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3205--3214.
[9]
Vitor Guizilini, Igor Vasiljevic, Dian Chen, Rare? Ambru?, and Adrien Gaidon. 2023. Towards zero-shot scale-aware monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9233--9243.
[10]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482--7491.
[11]
Liyang Liu, Yi Li, Zhanghui Kuang, J Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. 2021. Towards impartial multi-task learning. In International Conference on Learning Representations (ICLR).
[12]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.
[13]
Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, and Risheng Liu. 2023. PAIF: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia. 3706--3714.
[14]
Yuxiang Lu, Shalayiding Sirejiding, Yue Ding, Chunlin Wang, and Hongtao Lu. 2024. Prompt guided transformer for multi-task dense prediction. IEEE Transactions on Multimedia (2024).
[15]
S Mahdi H Miangoleh, Zoya Bylinskii, Eric Kee, Eli Shechtman, and Yaugiz Aksoy. 2023. Realistic saliency guided image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 186--194.
[16]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3994--4003.
[17]
Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2019. Latent multi-task architecture learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 4822--4829.
[18]
Ozan Sener and Vladlen Koltun. 2018. Multi-task learning as multi-objective optimization. Advances in neural information processing systems, Vol. 31 (2018).
[19]
Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, and Pieter Abbeel. 2023. Multi-view masked world models for visual robotic manipulation. In International Conference on Machine Learning. PMLR, 30613--30632.
[20]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7--13, 2012, Proceedings, Part V 12. Springer, 746--760.
[21]
Shalayiding Sirejiding, Bayram Bayramli, Yuxiang Lu, Suizhi Huang, Hongtao Lu, and Yue Ding. 2024. Adaptive Task-Wise Message Passing for Multi-Task Learning: A Spatial Interaction Perspective. IEEE Transactions on Circuits and Systems for Video Technology (2024), 1--1. https://doi.org/10.1109/TCSVT.2024.3399613
[22]
Shalayiding Sirejiding, Yuxiang Lu, Hongtao Lu, and Yue Ding. 2023. Scale-aware task message transferring for multi-task learning. In 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1859--1864.
[23]
Guolei Sun, Thomas Probst, Danda Pani Paudel, Nikola Popović, Menelaos Kanakis, Jagruti Patel, Dengxin Dai, and Luc Van Gool. 2021. Task switching network for multi-task learning. In Proceedings of the IEEE/CVF international conference on computer vision. 8291--8300.
[24]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning (ICML). PMLR, 10347--10357.
[25]
Simon Vandenhende, Stamatios Georgoulis, Bert De Brabandere, and Luc Van Gool. 2019. Branched multi-task networks: deciding what layers to share. arXiv preprint arXiv:1904.02920 (2019).
[26]
Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc Van Gool. 2021. Multi-task learning for dense prediction tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 7 (2021), 3614--3633.
[27]
Simon Vandenhende, Stamatios Georgoulis, and Luc Van Gool. 2020. Mti-net: Multi-scale task interaction networks for multi-task learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IV 16. Springer, 527--543.
[28]
Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Charless Fowlkes, Rahul Bhotika, and Stefano Soatto. 2022. Task adaptive parameter sharing for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7561--7570.
[29]
Hengyi Wang, Jingwen Wang, and Lourdes Agapito. 2023. Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13293--13302.
[30]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 10 (2020), 3349--3364.
[31]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.
[32]
Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. 2019. SNAS: stochastic neural architecture search. In International Conference on Learning Representations (ICLR).
[33]
Dan Xu, Wanli Ouyang, Xiaogang Wang, and Nicu Sebe. 2018. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 675--684.
[34]
Jiacong Xu, Zixiang Xiong, and Shankar P Bhattacharyya. 2023. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 19529--19539.
[35]
Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. 2021. Co-scale conv-attentional image transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9981--9990.
[36]
Yangyang Xu, Xiangtai Li, Haobo Yuan, Yibo Yang, and Lefei Zhang. 2023. Multi-task learning with multi-query transformer for dense prediction. IEEE Transactions on Circuits and Systems for Video Technology (2023).
[37]
Yangyang Xu, Yibo Yang, and Lefei Zhang. 2023. DeMT: Deformable mixer transformer for multi-task learning of dense prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 37. 3072--3080.
[38]
Jianwei Yang, Chunyuan Li, Xiyang Dai, and Jianfeng Gao. 2022. Focal modulation networks. Advances in Neural Information Processing Systems, Vol. 35 (2022), 4203--4217.
[39]
Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, and Bo Li. 2024. Multi-Task Dense Prediction via Mixture of Low-Rank Experts. 27927--27937.
[40]
Hanrong Ye and Dan Xu. 2022. Inverted pyramid multi-task transformer for dense scene understanding. In European Conference on Computer Vision. Springer, 514--530.
[41]
Hanrong Ye and Dan Xu. 2023. Taskexpert: Dynamically assembling multi-task representations with memorial mixture-of-experts. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 21828--21837.
[42]
Hanrong Ye and Dan Xu. 2023. TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding. In The Eleventh International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=-CwPopPJda
[43]
Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, and Shuicheng Yan. 2022. Volo: Vision outlooker for visual recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 45, 5 (2022), 6575--6586.
[44]
Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, and Jian Yang. 2018. Joint task-recursive learning for semantic segmentation and depth estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 235--251.
[45]
Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, Nicu Sebe, and Jian Yang. 2019. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4106--4115.
[46]
Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, and Ying Wu. 2018. A modulation module for multi-task learning with applications in image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 401--416.
[47]
Ling Zhou, Zhen Cui, Chunyan Xu, Zhenyu Zhang, Chaoqun Wang, Tong Zhang, and Jian Yang. 2020. Pattern-structure diffusion for multi-task learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dense prediction
  2. multi-scale features
  3. multi-task learning
  4. vision transformer

Qualifiers

  • Research-article

Funding Sources

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 87
    Total Downloads
  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)6
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media