Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation

Xue, Junxiao; Huang, Shibo; Song, Huawei; Shi, Lei

doi:10.1007/s11704-023-2230-x

Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation

Letter
Published: 31 March 2023

Volume 17, article number 176344, (2023)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Junxiao Xue¹,
Shibo Huang²,
Huawei Song³ &
…
Lei Shi³

93 Accesses
5 Citations
1 Altmetric
Explore all metrics

Conclusion

In this paper, we proposed a seq2seq model based on self-attention and self-distillation for sentence-level lip reading. The model includes the CNN front-end, pixel-wise learning, temporal learning, and decoder. we apply the CNN front-end to capture shallow spatial features inside the image sequence, and employ the Resformer module for the deep spatial correlation between pixels per frame, namely, pixel-wise learning. Then, the encoder is utilized to learn the temporal features, namely, temporal learning. Finally, the decoder decodes visual information to realize text prediction. Besides, the model applies self-distillation to further improve the model. Through experiments on GRID, LRW and LRW-1000, the proposed model achieves competitive experimental results on WER, CER and Acc metrics. However, our work presents certain limitations in the model complexity issue, which need to be tackled in the subsequent work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Xiao J, Yang S, Zhang Y, Shan S, Chen X. Deformation flow based two-stream network for lip reading. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 2020: 364–370
Assael Y M, Shillingford B, Whiteson S, De Freitas N. LipNet: End-to-end sentence-level lipreading. 2017, arXiv preprint arXiv: 1611, 0159: 9
Chung J S, Senior A, Vinyals O, et al. Lip reading sentences in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3444–3453
Xu K, Li D, Cassimatis N, Wang X. LCANet: End-to-end lipreading with cascaded attention-CTC. In: Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 2018: 548–555
Zhang Y, Yang S, Xiao J, et al. Can we read speech beyond the lips? rethinking roi selection for deep visual speech recognition In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 2020: 356–363
Luo M, Yang S, Shan S, Chen X. Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 2020: 273–280
Zhang X, Cheng F, Wang S. Spatio-temporal fusion based convolutional sequence learning for lip reading. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019: 713–722

Download references

Author information

Authors and Affiliations

Research Institute of Artificial Intelligence, Zhejiang Lab, Hangzhou, 311121, China
Junxiao Xue
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Shibo Huang
School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, 450002, China
Huawei Song & Lei Shi

Authors

Junxiao Xue
View author publications
Search author on:PubMed Google Scholar
Shibo Huang
View author publications
Search author on:PubMed Google Scholar
Huawei Song
View author publications
Search author on:PubMed Google Scholar
Lei Shi
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Shibo Huang.

Electronic supplementary material