Skip to main content
Log in

Adversarial multi-task deep learning for signer-independent feature representation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Previous research has achieved remarkable progress in Sign Language Recognition (SLR). However, for robust open-set SLR applications, it is necessary to solve signer-independent SLR. This paper proposes a novel adversarial multi-task deep learning (MTL) framework that can incorporate multiple modalities for isolated SLR. Employing the identity recognition task as the competition task to the target SLR task, the proposed model can effectively extract signer-independent features by deviating the optimization direction of the competitive task. Furthermore, the proposed adversarial MTL multi-modality framework can jointly incorporate positive and negative task learning with the target task. Combining multi-modality in the adversarial MTL, our model can extract robust signer-independent representation. We evaluate our method on multiple benchmark datasets from different sign languages. The experimental results demonstrate that the proposed adversarial MTL multi-modality model can effectively realize signer-independent SLR by compensation with relevant tasks and competition with irrelevant tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Abdulnabi AH, Wang G, Lu J, Jia K (2015) Multi-task cnn model for attribute prediction. IEEE Trans Multimed 17(11):1949– 1959

    Article  Google Scholar 

  2. Adaloglou N, Chatzis T, Papastratis I, Stergioulas A, Papadopoulos GT, Zacharopoulou V, Xydopoulos GJ, Atzakas K, Papazachariou D, Daras P (2020) A comprehensive study on sign language recognition methods. arXiv:2007.12530

  3. Adaloglou NM, Chatzis T, Papastratis I, Stergioulas A, Papadopoulos GT, Zacharopoulou V, Xydopoulos G, Antzakas K, Papazachariou D, none Daras P (2021) A comprehensive study on deep learning-based methods for sign language recognition. IEEE Trans Multimed

  4. Adi Y, Zeghidour N, Collobert R, Usunier N, Liptchinsky V, Synnaeve G (2019) To reverse the gradient or not: an empirical comparison of adversarial and multi-task learning in speech recognition. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3742–3746

  5. Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186

    Article  Google Scholar 

  6. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  7. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  Google Scholar 

  8. Cui F, Di H, Shen L, Ouchi K, Liu Z, Xu J (2021) Modeling semantic and emotional relationship in multi-turn emotional conversations using multi-task learning. Appl Intell, pp 1–11

  9. Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans Multimed 21(7):1880–1891

    Article  Google Scholar 

  10. Deng J, Cheng S, Xue N, Zhou Y, Zafeiriou S (2018) Uv-gan: Adversarial facial uv map completion for pose-invariant face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7093–7102

  11. Du L, Ling H (2014) Exploiting competition relationship for robust visual recognition. In: Twenty-eighth AAAI conference on artificial intelligence

  12. Escalera S, Baró X., Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2014) Chalearn looking at people challenge 2014: Dataset and results. In: European conference on computer vision, Springer, pp 459– 473

  13. Fang Y, Ma Z, Zhang Z, Zhang XY, Bai X, et al. (2017) Dynamic multi-task learning with convolutional neural network. In: IJCAI, pp 1668–1674

  14. Fang Y, Xiao Z, Zhang W (2021) Multi-layer adversarial domain adaptation with feature joint distribution constraint. Neurocomputing 463:298–308

    Article  Google Scholar 

  15. Fortun D, Bouthemy P, Kervrann C (2015) Optical flow modeling and computation: a survey. Comput Vis Image Underst 134:1–21

    Article  MATH  Google Scholar 

  16. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):189–209

    MATH  Google Scholar 

  17. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  18. Guo D, Zhou W, Li H, Wang M (2017) Online early-late fusion based on adaptive hmm for sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(1):1–18

    Google Scholar 

  19. Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6546–6555

  20. He K, Sun J (2015) Convolutional neural networks at constrained time cost. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5353–5360

  21. Huang J, Zhou W, Li H, Li W (2018) Attention-based 3d-cnns for large-vocabulary sign language recognition. IEEE Trans Circuits Syst Video Technol 29(9):2822–2832

    Article  Google Scholar 

  22. Ji S, Xu W, Yang M (2012) Yu, k.: 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  23. Jiang B, Zhou Z, Wang X, Tang J, Luo B (2020) cmsalgan: Rgb-d salient object detection with cross-view generative adversarial networks. IEEE Transactions on Multimedia

  24. Koller O, Camgoz C, Ney H, Bowden R (2019) Weakly supervised learning with multi-stream cnn-lstm-hmms to discover sequential parallelism in sign language videos. IEEE transactions on pattern analysis and machine intelligence

  25. Laptev I (2005) On space-time interest points. International Journal of Computer Vision 64 (2-3):107–123

    Article  Google Scholar 

  26. Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 909–918

  27. Liu H, Sun P, Zhang J, Wu S, Yu Z, Sun X (2020) Similarity-aware and variational deep adversarial learning for robust facial age estimation. IEEE Trans Multimed

  28. Liu Y, Wei F, Shao J, Sheng L, Yan J, Wang X (2018) Exploring disentangled feature representation beyond face identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2080–2089

  29. Meng Z, Li J, Chen Z, Zhao Y, Mazalov V, Gang Y, Juang BH (2018) Speaker-invariant training via adversarial learning. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5969–5973

  30. Mullick K, Namboodiri AM (2017) Learning deep and compact models for gesture recognition. In: 2017 IEEE International conference on image processing (ICIP), IEEE, pp 3998–4002

  31. Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4165–4174

  32. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE international conference on computer vision, pp 5533–5541

  33. Rastgoo R, Kiani K, Escalera S (2021) Sign language recognition: a deep survey. Expert Syst Appl 164(113):794

    Google Scholar 

  34. Romera-Paredes B, Argyriou A, Berthouze N, Pontil M (2012) Exploiting unrelated tasks in multi-task learning. In: International conference on artificial intelligence and statistics, pp 951–959

  35. Shinohara Y (2016) Adversarial multi-task learning of deep neural networks for robust speech recognition. In: Interspeech, San Francisco, CA, USA, pp 2369–2372

  36. Si C, Nie X, Wang W, Wang L, Tan T, Feng J (2020) Adversarial self-supervised learning for semi-supervised 3d action recognition. In: European conference on computer vision, Springer, pp 35–51

  37. Song L, Zhang M, Wu X, He R (2018) Adversarial discriminative heterogeneous face recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  38. Sudhakaran S, Escalera S, Lanz O (2020) Gate-shift networks for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1102–1111

  39. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  40. Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6450–6459

  41. Tu G, Fu Y, Li B, Gao J, Jiang YG, Xue X (2019) A multi-task neural approach for emotion attribution, classification, and summarization. IEEE Transactions on Multimedia 22(1):148–159

    Article  Google Scholar 

  42. Uppal H, Sepas-Moghaddam A, Greenspan M, Etemad A (2021) Teacher-student adversarial depth hallucination to improve face recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3671–3680

  43. Wang C, Wang S, Liang G (2019) Identity-and pose-robust facial expression recognition through adversarial feature learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 238–246

  44. Wang H, Chai X, Chen X (2016) Sparse observation (so) alignment for sign language recognition. Neurocomputing 175:674–685

    Article  Google Scholar 

  45. Wang H, Chai X, Zhou Y, Chen X (2015) Fast sign language recognition benefited from low rank approximation. In: 2015 11Th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 1, IEEE, pp 1–6

  46. Wang H, Gong D, Li Z, Liu W (2019) Decorrelated adversarial learning for age-invariant face recognition

  47. Wang H, Klaser A, Schmid C, Cheng-Lin L (2011) Action recognition by dense trajectories. In: 2011 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3169– 3176

  48. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European conference on computer vision, Springer, pp 20–36

  49. Wilcox S (2004) Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed language. Cognitive Linguistics 15(2):119–147

    Article  Google Scholar 

  50. Wu D, Chen J, Sharma N, Pan S, Long G, Blumenstein M (2019) Adversarial action data augmentation for similar gesture action recognition. In: 2019 International joint conference on neural networks (IJCNN), IEEE, pp 1–8

  51. Wu D, Pigou L, Kindermans PJ, Le NDH, Shao L, Dambre J, Odobez JM (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1583–1597

  52. Xia X, Togneri R, Sohel F, Zhao Y, Huang D (2019) Multi-task learning for acoustic event detection using event and frame position information. IEEE Trans Multimed 22(3):569–578

    Article  Google Scholar 

  53. Xu W, Li S, Lu Y (2021) Usr-mtl: an unsupervised sentence representation learning framework with multi-task learning. Appl Intell 51(6):3506–3521

    Article  Google Scholar 

  54. Yang C, Xu Y, Shi J, Dai B, Zhou B (2020) Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 591–600

  55. Yang Y, Hospedales TM (2016) Trace norm regularised deep multi-task learning. arXiv:1606.04038

  56. Zhang H, Hu Z, Qin W, Xu M, Wang M (2021) Adversarial co-distillation learning for image recognition. Pattern Recogn 111(107):659

    Google Scholar 

  57. Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE International conference on multimedia and expo (ICME), IEEE, pp 1–6

  58. Zhang Y, Yang Y, Zhou W, Wang H, Ouyang X (2021) Multi-city traffic flow forecasting via multi-task learning. Appl Intell, pp 1–19

  59. Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning European conference on computer vision, Springer, pp 94–108

  60. Zhang Z, Luo P, Loy CC, Tang X (2015) Learning deep representation for face alignment with auxiliary attributes. IEEE Trans Pattern Anal Mach Intell 38(5):918–930

    Article  Google Scholar 

  61. Zhou H, Zhou W, Zhou Y, Li H (2020) Spatial-temporal multi-cue network for continuous sign language recognition. arXiv:2002.03187

  62. Zhou J, Huang JX, Hu QV, He L (2020) Is position important? deep multi-task learning for aspect-based sentiment analysis. Appl Intell 50:3367–3378

    Article  Google Scholar 

  63. Zhu X, Xu C, Hui L, Lu C, Tao D (2019) Approximated bilinear modules for temporal modeling. In: Proceedings of the IEEE international conference on computer vision, pp 3494–3503

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China under Grant No.: 61976132, 61991411 and U1811461, and the Natural Science Foundation of Shanghai under Grant No.: 19ZR1419200. We appreciate the High Performance Computing Center of Shanghai University and Shanghai Engineering Research Center of Intelligent Computing System No.: 19DZ2252600 for providing the computing resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchun Fang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, Y., Xiao, Z., Cai, S. et al. Adversarial multi-task deep learning for signer-independent feature representation. Appl Intell 53, 4380–4392 (2023). https://doi.org/10.1007/s10489-022-03649-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03649-3

Keywords

Navigation