research-article

Adaptive Transfer Learning via Fine-grained Multi-task Pre-training

Authors:

Tiebing LiAuthors Info & Claims

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

Article No.: 58, Pages 1 - 5

https://doi.org/10.1145/3508546.3508604

Published: 25 February 2022 Publication History

Abstract

Nowadays pre-training paradigm has been widely adopted for deep learning-based applications. In multiple pre-training tasks, conventional methods process them using naive Multi-Task Learning (MTL) technology. The pre-trained models are unavoidably influenced by the well-known negative transfer phenomenon of MTL, which will also make a negative impact on downstream tasks. To deal with this problem, we propose a novel Adaptive Pre-Training (APT) framework. In the pre-training stage, we adopt a task-sensitive MTL technology called Fine-grained Sharing Network (FSN), which trains a subnet mask for each task besides the network weights. In the fine-tuning stage, the most suitable subnet is selected for each downstream task. Therefore, different downstream tasks may be fine-tuned based on different network structures and benefit from the outcome of their most closely related pre-training task. According to our experiments, the proposed APT outperforms the conventional pre-training baseline at a clear margin. In addition, experiments show that, even without the subnet adaptation, the network weights pre-trained by FSN alone are of much higher quality and can be used in the framework of conventional pre-training to improve performance.

References

[1]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. 160–167.

Digital Library

[2]

Aditya Deshpande, Alessandro Achille, Avinash Ravichandran, Hao Li, Luca Zancato, Charless Fowlkes, Rahul Bhotika, Stefano Soatto, and Pietro Perona. 2021. A linearized framework and a new benchmark for model selection for fine-tuning. arXiv preprint arXiv:2102.00084(2021).

[3]

Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, and Maosong Sun. 2020. Train No Evil: Selective Masking for Task-Guided Pre-Training. arXiv:2004.09733 [cs] (Oct. 2020). http://arxiv.org/abs/2004.09733 arXiv:2004.09733.

[4]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.

Digital Library

[5]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[6]

Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, and Boqing Gong. 2021. Ranking neural checkpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2663–2673.

[7]

Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, and Hayden Kwok-Hay So. 2020. Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=SJlbGJrtDB

[8]

Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-Task Deep Neural Networks for Natural Language Understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4487–4496. https://doi.org/10.18653/v1/P19-1441

[9]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930–1939.

Digital Library

[10]

Romain Mormont, Pierre Geurts, and Raphaël Marée. 2020. Multi-task pre-training of deep neural networks for digital pathology. IEEE journal of biomedical and health informatics 25, 2(2020), 412–421. Publisher: IEEE.

[11]

Cuong Nguyen, Tal Hassner, Matthias Seeger, and Cedric Archambeau. 2020. Leep: A new measure to evaluate transferability of learned representations. In International Conference on Machine Learning. PMLR, 7294–7305.

[12]

Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, and Nan Duan. 2021. M3p: Learning universal representations via multitask multilingual multimodal pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3977–3986.

[13]

Jason Phang, Thibault Févry, and Samuel R Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088(2018).

[14]

Subhojeet Pramanik, Shashank Mujumdar, and Hima Patel. 2020. Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning. arXiv preprint arXiv:2009.14457(2020).

[15]

Ximeng Sun, Rameswar Panda, Rogerio Feris, and Kate Saenko. 2020. Adashare: Learning what to share for efficient deep multi-task learning. Advances in Neural Information Processing Systems 33 (2020).

[16]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Fourteenth ACM Conference on Recommender Systems(RecSys ’20). Association for Computing Machinery, New York, NY, USA, 269–278. https://doi.org/10.1145/3383313.3412236

Digital Library

[17]

Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.

[18]

Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. 2018. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3712–3722.

Recommendations

Multi-penalty Functions GANs via Multi-task Learning
Artificial Intelligence and Security
Abstract
Adversarial learning stability is one of the difficulties of generative adversarial networks (GANs), which is closely related to networks convergence and generated images quality. For improving the stability, the multi-penalty functions GANs (MPF-...
Multi-task Learning for Fine-Grained Eye Disease Prediction
Pattern Recognition
Abstract
Recently, deep learning techniques have been widely used for medical image analysis. While there exists some work on deep learning for ophthalmology, there is little work on multi-disease predictions from retinal fundus images. Also, most of the ...
Multi-task Pre-training for Lhasa-Tibetan Speech Recognition
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
Compared to mainstream languages such as Chinese and English, Tibetan speech corpus is limited. Pre-training technology can improve the speech recognition performance for low-resource language by using multiple languages corpus, which involves ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 2021

699 pages

ISBN:9781450385053

DOI:10.1145/3508546

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ACAI'21

ACAI'21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 22 - 24, 2021

Sanya, China

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten