research-article

DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation

Authors:

Wencan Huang,

Zhou Zhao,

Jinzheng He,

Mingmin ZhangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 5486 - 5495

https://doi.org/10.1145/3503161.3547957

Published: 10 October 2022 Publication History

Get Access

Abstract

Sign Language Production (SLP) aims to translate a spoken language description to its corresponding continuous sign language sequence. A prevailing solution for this problem is in a two-staged manner: it formulates SLP as two sub-tasks, i.e., Text to Gloss (T2G) translation and Gloss to Pose (G2P) animation, with gloss annotations as pivots. Although two-staged approaches achieve better performance than their direct translation counterparts, the requirement of gloss intermediaries causes a parallel data bottleneck. In this paper, to reduce reliance on gloss annotations in two-staged approaches, we propose DualSign, a semi-supervised two-staged SLP framework, which can effectively utilize partially gloss-annotated text-pose pairs and monolingual gloss data. The key component of DualSign is a novel Balanced Multi-Modal Multi-Task Dual Transformation (BM3T-DT) method, where two well-designed models, i.e., a Multi-Modal T2G model (MM-T2G) and a Multi-Task G2P model (MT-G2P), are jointly trained by leveraging their task duality and unlabeled data. After applying BM3T-DT, we derive the expected uni-modal T2G model from the well-trained MM-T2G with knowledge distillation. Considering that the MM-T2G may suffer from modality imbalance when decoding with multiple input modalities, we devise a cross-modal balancing loss, further boosting the system's overall performance. Extensive experiments conducted on the PHOENIX14T dataset show the effectiveness of our approach in the semi-supervised setting. By training with additionally collected unlabeled data, DualSign substantially improves previous state-of-the-art SLP methods.

Supplementary Material

MP4 File (MM22-fp0857.mp4)

We present our paper "DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation" in this video. In this paper, we propose to break the parallel data bottleneck caused by the requirement of gloss annotations in two-staged SLP systems, which is rarely investigated in SLP. By designing a DualSign framework with a novel balanced multi-modal multi-task dual transformation training method, partially gloss-annotated text-pose pairs and monolingual gloss data can be fully exploited to improve the performance of both sub-tasks in two-staged SLP. The proposed cross-modal balancing loss further boosts the system's overall performance by alleviating the modality imbalance problem. Extensive experiments demonstrate the significance of our DualSign framework.

Download
50.29 MB

References

[1]

Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised Neural Machine Translation. In International Conference on Learning Representations.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Search task success evaluation by exploiting multi-view active semi-supervised learning

Semi-supervised multi-label classification using incomplete label information

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations