skip to main content
10.1145/3508546.3508604acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Adaptive Transfer Learning via Fine-grained Multi-task Pre-training

Published: 25 February 2022 Publication History

Abstract

Nowadays pre-training paradigm has been widely adopted for deep learning-based applications. In multiple pre-training tasks, conventional methods process them using naive Multi-Task Learning (MTL) technology. The pre-trained models are unavoidably influenced by the well-known negative transfer phenomenon of MTL, which will also make a negative impact on downstream tasks. To deal with this problem, we propose a novel Adaptive Pre-Training (APT) framework. In the pre-training stage, we adopt a task-sensitive MTL technology called Fine-grained Sharing Network (FSN), which trains a subnet mask for each task besides the network weights. In the fine-tuning stage, the most suitable subnet is selected for each downstream task. Therefore, different downstream tasks may be fine-tuned based on different network structures and benefit from the outcome of their most closely related pre-training task. According to our experiments, the proposed APT outperforms the conventional pre-training baseline at a clear margin. In addition, experiments show that, even without the subnet adaptation, the network weights pre-trained by FSN alone are of much higher quality and can be used in the framework of conventional pre-training to improve performance.

References

[1]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. 160–167.
[2]
Aditya Deshpande, Alessandro Achille, Avinash Ravichandran, Hao Li, Luca Zancato, Charless Fowlkes, Rahul Bhotika, Stefano Soatto, and Pietro Perona. 2021. A linearized framework and a new benchmark for model selection for fine-tuning. arXiv preprint arXiv:2102.00084(2021).
[3]
Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, and Maosong Sun. 2020. Train No Evil: Selective Masking for Task-Guided Pre-Training. arXiv:2004.09733 [cs] (Oct. 2020). http://arxiv.org/abs/2004.09733 arXiv:2004.09733.
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.
[5]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[6]
Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, and Boqing Gong. 2021. Ranking neural checkpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2663–2673.
[7]
Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, and Hayden Kwok-Hay So. 2020. Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=SJlbGJrtDB
[8]
Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-Task Deep Neural Networks for Natural Language Understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4487–4496. https://doi.org/10.18653/v1/P19-1441
[9]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930–1939.
[10]
Romain Mormont, Pierre Geurts, and Raphaël Marée. 2020. Multi-task pre-training of deep neural networks for digital pathology. IEEE journal of biomedical and health informatics 25, 2(2020), 412–421. Publisher: IEEE.
[11]
Cuong Nguyen, Tal Hassner, Matthias Seeger, and Cedric Archambeau. 2020. Leep: A new measure to evaluate transferability of learned representations. In International Conference on Machine Learning. PMLR, 7294–7305.
[12]
Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, and Nan Duan. 2021. M3p: Learning universal representations via multitask multilingual multimodal pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3977–3986.
[13]
Jason Phang, Thibault Févry, and Samuel R Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088(2018).
[14]
Subhojeet Pramanik, Shashank Mujumdar, and Hima Patel. 2020. Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning. arXiv preprint arXiv:2009.14457(2020).
[15]
Ximeng Sun, Rameswar Panda, Rogerio Feris, and Kate Saenko. 2020. Adashare: Learning what to share for efficient deep multi-task learning. Advances in Neural Information Processing Systems 33 (2020).
[16]
Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Fourteenth ACM Conference on Recommender Systems(RecSys ’20). Association for Computing Machinery, New York, NY, USA, 269–278. https://doi.org/10.1145/3383313.3412236
[17]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.
[18]
Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. 2018. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3712–3722.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
December 2021
699 pages
ISBN:9781450385053
DOI:10.1145/3508546
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACAI'21

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 59
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media