Skip to main content
Log in

Deep multi-task learning with flexible and compact architecture search

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Multi-task learning has been applied successfully in various applications. Recent research shows that the performance of multi-task learning methods could be improved by appropriately sharing model architectures. However, the existing work either identifies multi-task architecture manually based on prior knowledge, or simply uses an identical model structure for all tasks with a parameter sharing mechanism. In this paper, we propose a novel architecture search method to discover flexible and compact architectures for deep multi-task learning automatically, which not only extends the expressiveness of existing reinforcement learning-based neural architecture search methods, but also enhances the flexibility of existing hand-crafted multi-task learning methods. The discovered architecture shares structure and parameters adaptively to handle different levels of task relatedness, resulting in effectiveness improvement. In particular, for deep multi-task learning, we propose an architecture search space which includes a combination of partially shared modules at the low-level layer, and a set of task-specific modules with various depths at high-level layers. Secondly, a parameter generation mechanism is proposed to not only explore all possible cross-layer connections, but also reduce the search cost. Thirdly, we propose a task-specific shadow batch normalization mechanism to stabilize the training process and improve the search effectiveness. Finally, an auxiliary module is designed to guide the model training process. Experimental results demonstrate that the learned architectures outperform state-of-the-art methods with fewer learning parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)

    Article  MATH  Google Scholar 

  2. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  3. Chen, M., Pan, J., Zhao, Q., Yan, Y.: Multi-task learning in deep neural networks for mandarin-english code-mixing speech recognition. IEICE Trans. Inform. Syst. 99(10), 2554–2557 (2016)

    Article  Google Scholar 

  4. Chen, Y., Yang, T., Zhang, X., Meng, G., Xiao, X., Sun, J.: Detnas: Backbone search for object detection. arXiv preprint arXiv:1903.10979 (2019)

  5. Cheng, Z.Q., Wu, X., Huang, S., Li, J.X., Hauptmann, A.G., Peng, Q.: Learning to transfer: Generalizable attribute learning with multitask neural model search. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 90–98 (2018)

  6. Chu, X., Li, X., Lu, Y., Zhang, B., Li, J.: Mixpath: A unified approach for one-shot neural architecture search. CoRR abs/2001.05887 (2020)

  7. Evgeniou, T., Pontil, M.: Regularized multi–task learning. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 109–117. ACM (2004)

  8. Gao, Y., Bai, H., Jie, Z., Ma, J., Jia, K., Liu, W.: Mtl-nas: Task-agnostic neural architecture search towards general-purpose multi-task learning. In: IEEE conference on computer vision and pattern recognition (CVPR) (2020)

  9. Huang, J., Feris, R.S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 1062–1070 (2015)

  10. Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: Gnas: A greedy neural architecture search method for multi-attribute learning. In: 2018 ACM multimedia conference on multimedia conference, pp. 2049–2057 (2018)

  11. Kaiser, L., Gomez, A.N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., Uszkoreit, J.: One model to learn them all. CoRR abs/1706.05137 (2017)

  12. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Y. Bengio, Y. LeCun (eds.) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings (2015)

  13. Kumar, A., Daumé III, H.: Learning task grouping and overlap in multi-task learning. In: Proceedings of the 29th international coference on international conference on machine learning, pp. 1723–1730. Omnipress (2012)

  14. Li, J., Chen, E., Ding, Z., Zhu, L., Lu, K., Shen, H.T.: Maximum density divergence for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

  15. Liu, C., Chen, L., Schroff, F., Adam, H., Hua, W., Yuille, A.L., Li, F.: Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 82–92 (2019)

  16. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019)

  17. Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. In: R. Barzilay, M. Kan (eds.) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1–10. Association for computational linguistics (2017)

  18. Liu, Y., Tian, X., Li, Y., Xiong, Z., Wu, F.: Compact feature learning for multi-domain image classification. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 7193–7201 (2019)

  19. Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: International conference on machine learning, pp. 2208–2217. PMLR (2017)

  20. Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5334–5343 (2017)

  21. Ma, J., Zhao, Z., Chen, J., Li, A., Hong, L., Chi, E.H.: SNR: sub-network routing for flexible parameter sharing in multi-task learning. In: The Thirty-Third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp. 216–223. AAAI Press (2019)

  22. Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1930–1939 (2018)

  23. Meyerson, E., Miikkulainen, R.: Beyond shared hierarchies: Deep multitask learning through soft layer ordering. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018 (2018)

  24. Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3994–4003 (2016)

  25. Pasunuru, R., Bansal, M.: Continual and multi-task architecture search. In: A. Korhonen, D.R. Traum, L. Màrquez (eds.) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 1911–1922. Association for computational linguistics (2019)

  26. Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. CoRR abs/1802.03268 (2018)

  27. Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Latent multi-task architecture learning. In: AAAI (2019)

  28. Shen, J., Karimzadehgan, M., Bendersky, M., Qin, Z., Metzler, D.: Multi-task learning for email search ranking with auxiliary query clustering. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp. 2127–2135 (2018)

  29. Søgaard, A., Goldberg, Y.: Deep multi-task learning with low level tasks supervised at lower layers. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers (2016)

  30. Sun, X., Panda, R., Feris, R.S.: Adashare: Learning what to share for efficient deep multi-task learning. CoRR abs/1911.12423 (2019)

  31. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

    Article  MATH  Google Scholar 

  32. Wu, S., Zhang, H.R., Ré, C.: Understanding and improving information transfer in multi-task learning. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020)

  33. Wu, Z., Pan, S., Long, G., Jiang, J., Zhang, C.: Graph wavenet for deep spatial-temporal graph modeling. In: S. Kraus (ed.) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pp. 1907–1913 (2019)

  34. Zhao, J., Du, B., Sun, L., Lv, W., Liu, Y., Xiong, H.: Deep multi-task learning with relational attention for business success prediction. Pattern Recog. 110, 107469 (2021)

    Article  Google Scholar 

  35. Zhao, J., Du, B., Sun, L., Zhuang, F., Lv, W., Xiong, H.: Multiple relational attention network for multi-task learning. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & Data Mining, pp. 1123–1131 (2019)

  36. Zhen, X., Yu, M., He, X., Li, S.: Multi-target regression via robust low-rank learning. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 497–504 (2018)

    Article  Google Scholar 

  37. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net (2017)

Download references

Acknowledgements

We thank the reviewers for their constructive comments on this work. This work is supported by the National Key Technology Support Program of China under Grant No. 2019YFF0301302-2, Guangxi Innovation-Driven Development Special Fund Project under Grant No. AA18118053, the National Science Foundation of China under Grant No. 51991395, the National Key R&D Program of China under Grant No. 2018YFB2101003, and the Beijing Science and Technology Project under Grant No. Z181100009018010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bowen Du.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, J., Lv, W., Du, B. et al. Deep multi-task learning with flexible and compact architecture search. Int J Data Sci Anal 15, 187–199 (2023). https://doi.org/10.1007/s41060-021-00274-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-021-00274-0

Keywords

Navigation