Speech2Stroke: Generate Chinese Character Strokes Directly from Speech

Zhang, Yinhui; Xi, Wei; Yang, Zhao; Men, Sitao; Jiang, Rui; Yang, Yuxin; Zhao, Jizhong

doi:10.1007/978-3-030-67537-0_6

Yinhui Zhang²¹,
Wei Xi²¹,
Zhao Yang²¹,
Sitao Men²¹,
Rui Jiang²¹,
Yuxin Yang²¹ &
…
Jizhong Zhao²¹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 349))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

1252 Accesses

Abstract

Chinese character is composed of spatial arrangement of strokes. A portion of these strokes combines to form phonetic component, which provides a clue to the pronunciation of the entire character, the others combine to form semantic component, which indicates semantic level information for speech context. How closely the connection between the internal strokes of Chinese characters and speech? In this paper, we propose Speech2Stroke, a end-to-end model that exploits the phonetic and morphologic level information of pictographic words. Specifically, we generate strokes directly from the speech by Speech2Stroke. The performance of Speech2Stroke is evaluated by the specific stroke error rate(SER). The SER of the optimal model can achieve 20.61%. Through the experiments and analysis, we show that our model has the ability to capture the alignment between audio and the internal structures of pictographic characters.

This work was supported by NSFC Grant No. 61832008, 61772413, 61802299, and 61672424.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Standardization of stroke order for modern Chinese homepage (2005). http://www.moe.gov.cn/s78/A19/yxs_left/moe_810/s230/201001/t20100115_75615.html, Accessed 2 May 2020
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)
Google Scholar
Bu, H., Du, J., Na, X., Wu, B., Zheng, H.: Aishell-1: an open-source mandarin speech corpus and a speech recognition baseline. In: 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), pp. 1–5. IEEE (2017)
Google Scholar
Cao, S., Lu, W., Zhou, J., Li, X.: cw2vec: learning Chinese word embeddings with stroke n-gram information. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Advances in Neural Information Processing Systems, pp. 402–408 (2001)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Hannun, A.Y., Maas, A.L., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1966)
MathSciNet Google Scholar
Meng, Y., Wet al.: Glyce: glyph-vectors for Chinese character representations. In: Advances in Neural Information Processing Systems, pp. 2742–2753 (2019)
Google Scholar
Chinese. BLS. Macmillan Education UK, London (1999). https://doi.org/10.1007/978-1-349-27306-5_9
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)
Google Scholar
Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8614–8618. IEEE (2013)
Google Scholar
Su, T.R., Lee, H.Y.: Learning Chinese word representations from glyphs of characters. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 264–273 (2017)
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Google Scholar
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Article Google Scholar
Yin, R., Wang, Q., Li, P., Li, R., Wang, B.: Multi-granularity Chinese word embedding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 981–986 (2016)
Google Scholar
Yu, J., Jian, X., Xin, H., Song, Y.: Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 286–291 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
Yinhui Zhang, Wei Xi, Zhao Yang, Sitao Men, Rui Jiang, Yuxin Yang & Jizhong Zhao

Authors

Yinhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xi
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sitao Men
View author publications
You can also search for this author in PubMed Google Scholar
Rui Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinhui Zhang .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool University, Suzhou, China
Xinheng Wang
London South Bank University, London, UK
Muddesar Iqbal
Hangzhou Dianzi University, Hangzhou, China
Yuyu Yin
Zhejiang University, Hangzhou, China
Jianwei Yin
Fudan University, Shanghai, China
Ning Gu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y. et al. (2021). Speech2Stroke: Generate Chinese Character Strokes Directly from Speech. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 349. Springer, Cham. https://doi.org/10.1007/978-3-030-67537-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-67537-0_6
Published: 22 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67536-3
Online ISBN: 978-3-030-67537-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics