Deep multi-task learning with flexible and compact architecture search

Zhao, Jiejie; Lv, Weifeng; Du, Bowen; Ye, Junchen; Sun, Leilei; Xiong, Guixi

doi:10.1007/s41060-021-00274-0

Deep multi-task learning with flexible and compact architecture search

Regular Paper
Published: 24 July 2021

Volume 15, pages 187–199, (2023)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Jiejie Zhao¹,
Weifeng Lv¹,
Bowen Du¹,
Junchen Ye¹,
Leilei Sun¹ &
…
Guixi Xiong¹

747 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Multi-task learning has been applied successfully in various applications. Recent research shows that the performance of multi-task learning methods could be improved by appropriately sharing model architectures. However, the existing work either identifies multi-task architecture manually based on prior knowledge, or simply uses an identical model structure for all tasks with a parameter sharing mechanism. In this paper, we propose a novel architecture search method to discover flexible and compact architectures for deep multi-task learning automatically, which not only extends the expressiveness of existing reinforcement learning-based neural architecture search methods, but also enhances the flexibility of existing hand-crafted multi-task learning methods. The discovered architecture shares structure and parameters adaptively to handle different levels of task relatedness, resulting in effectiveness improvement. In particular, for deep multi-task learning, we propose an architecture search space which includes a combination of partially shared modules at the low-level layer, and a set of task-specific modules with various depths at high-level layers. Secondly, a parameter generation mechanism is proposed to not only explore all possible cross-layer connections, but also reduce the search cost. Thirdly, we propose a task-specific shadow batch normalization mechanism to stabilize the training process and improve the search effectiveness. Finally, an auxiliary module is designed to guide the model training process. Experimental results demonstrate that the learned architectures outperform state-of-the-art methods with fewer learning parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Task-Aware Network for Multi-task Learning

Sample-level weighting for multi-task learning with auxiliary tasks

Article 01 February 2024

HydaLearn

Article 04 July 2022

References

Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Article MATH Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Chen, M., Pan, J., Zhao, Q., Yan, Y.: Multi-task learning in deep neural networks for mandarin-english code-mixing speech recognition. IEICE Trans. Inform. Syst. 99(10), 2554–2557 (2016)
Article Google Scholar
Chen, Y., Yang, T., Zhang, X., Meng, G., Xiao, X., Sun, J.: Detnas: Backbone search for object detection. arXiv preprint arXiv:1903.10979 (2019)
Cheng, Z.Q., Wu, X., Huang, S., Li, J.X., Hauptmann, A.G., Peng, Q.: Learning to transfer: Generalizable attribute learning with multitask neural model search. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 90–98 (2018)
Chu, X., Li, X., Lu, Y., Zhang, B., Li, J.: Mixpath: A unified approach for one-shot neural architecture search. CoRR abs/2001.05887 (2020)
Evgeniou, T., Pontil, M.: Regularized multi–task learning. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 109–117. ACM (2004)
Gao, Y., Bai, H., Jie, Z., Ma, J., Jia, K., Liu, W.: Mtl-nas: Task-agnostic neural architecture search towards general-purpose multi-task learning. In: IEEE conference on computer vision and pattern recognition (CVPR) (2020)
Huang, J., Feris, R.S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 1062–1070 (2015)
Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: Gnas: A greedy neural architecture search method for multi-attribute learning. In: 2018 ACM multimedia conference on multimedia conference, pp. 2049–2057 (2018)
Kaiser, L., Gomez, A.N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., Uszkoreit, J.: One model to learn them all. CoRR abs/1706.05137 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings (2015)
Kumar, A., Daumé III, H.: Learning task grouping and overlap in multi-task learning. In: Proceedings of the 29th international coference on international conference on machine learning, pp. 1723–1730. Omnipress (2012)
Li, J., Chen, E., Ding, Z., Zhu, L., Lu, K., Shen, H.T.: Maximum density divergence for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Liu, C., Chen, L., Schroff, F., Adam, H., Hua, W., Yuille, A.L., Li, F.: Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 82–92 (2019)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019)
Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. In: R. Barzilay, M. Kan (eds.) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1–10. Association for computational linguistics (2017)
Liu, Y., Tian, X., Li, Y., Xiong, Z., Wu, F.: Compact feature learning for multi-domain image classification. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 7193–7201 (2019)
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: International conference on machine learning, pp. 2208–2217. PMLR (2017)
Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5334–5343 (2017)
Ma, J., Zhao, Z., Chen, J., Li, A., Hong, L., Chi, E.H.: SNR: sub-network routing for flexible parameter sharing in multi-task learning. In: The Thirty-Third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp. 216–223. AAAI Press (2019)
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1930–1939 (2018)
Meyerson, E., Miikkulainen, R.: Beyond shared hierarchies: Deep multitask learning through soft layer ordering. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018 (2018)
Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3994–4003 (2016)
Pasunuru, R., Bansal, M.: Continual and multi-task architecture search. In: A. Korhonen, D.R. Traum, L. Màrquez (eds.) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 1911–1922. Association for computational linguistics (2019)
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. CoRR abs/1802.03268 (2018)
Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Latent multi-task architecture learning. In: AAAI (2019)
Shen, J., Karimzadehgan, M., Bendersky, M., Qin, Z., Metzler, D.: Multi-task learning for email search ranking with auxiliary query clustering. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp. 2127–2135 (2018)
Søgaard, A., Goldberg, Y.: Deep multi-task learning with low level tasks supervised at lower layers. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers (2016)
Sun, X., Panda, R., Feris, R.S.: Adashare: Learning what to share for efficient deep multi-task learning. CoRR abs/1911.12423 (2019)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
Article MATH Google Scholar
Wu, S., Zhang, H.R., Ré, C.: Understanding and improving information transfer in multi-task learning. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020)
Wu, Z., Pan, S., Long, G., Jiang, J., Zhang, C.: Graph wavenet for deep spatial-temporal graph modeling. In: S. Kraus (ed.) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pp. 1907–1913 (2019)
Zhao, J., Du, B., Sun, L., Lv, W., Liu, Y., Xiong, H.: Deep multi-task learning with relational attention for business success prediction. Pattern Recog. 110, 107469 (2021)
Article Google Scholar
Zhao, J., Du, B., Sun, L., Zhuang, F., Lv, W., Xiong, H.: Multiple relational attention network for multi-task learning. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & Data Mining, pp. 1123–1131 (2019)
Zhen, X., Yu, M., He, X., Li, S.: Multi-target regression via robust low-rank learning. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 497–504 (2018)
Article Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net (2017)

Download references

Acknowledgements

We thank the reviewers for their constructive comments on this work. This work is supported by the National Key Technology Support Program of China under Grant No. 2019YFF0301302-2, Guangxi Innovation-Driven Development Special Fund Project under Grant No. AA18118053, the National Science Foundation of China under Grant No. 51991395, the National Key R&D Program of China under Grant No. 2018YFB2101003, and the Beijing Science and Technology Project under Grant No. Z181100009018010.

Author information

Authors and Affiliations

SKLSDE and BDBC Lab, Beihang University, Haidian District, Beijing, 100191, China
Jiejie Zhao, Weifeng Lv, Bowen Du, Junchen Ye, Leilei Sun & Guixi Xiong

Authors

Jiejie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Lv
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Du
View author publications
You can also search for this author in PubMed Google Scholar
Junchen Ye
View author publications
You can also search for this author in PubMed Google Scholar
Leilei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Guixi Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bowen Du.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Lv, W., Du, B. et al. Deep multi-task learning with flexible and compact architecture search. Int J Data Sci Anal 15, 187–199 (2023). https://doi.org/10.1007/s41060-021-00274-0

Download citation

Received: 11 March 2021
Accepted: 29 June 2021
Published: 24 July 2021
Issue Date: March 2023
DOI: https://doi.org/10.1007/s41060-021-00274-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep multi-task learning with flexible and compact architecture search

Abstract

Access this article

Similar content being viewed by others

A Task-Aware Network for Multi-task Learning

Sample-level weighting for multi-task learning with auxiliary tasks

HydaLearn

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep multi-task learning with flexible and compact architecture search

Abstract

Access this article

Similar content being viewed by others

A Task-Aware Network for Multi-task Learning

Sample-level weighting for multi-task learning with auxiliary tasks

HydaLearn

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation