Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning

Luo, Wanyu; Wang, Yanqing; Liu, Yujia; Xu, Yiqin

doi:10.1007/978-981-99-5968-6_33

Wanyu Luo¹²,
Yanqing Wang¹²,
Yujia Liu¹² &
…
Yiqin Xu¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1879))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

Abstract

Aiming at complex and changeable factors such as speech theme and environment, which make it difficult for a speaker to prepare the speech text in a short time, this paper proposes a speech generation and demonstration system based on deep learning. This system is based on the Deep Learning Development Framework (PyTorch), trained through the theory of GPT-2 and the open source pretrained model, to generate multiple speeches according to the topics given by users, and the system generates the final speech and corresponding voice demonstration audio through text modification, speech synthesis and other technologies to help users quickly obtain the target document and audio. Experiments show that the text generated by this model is smooth and easy to use, which helps shorten the preparation time of speakers and improves the confidence of the impromptu speaker. In addition, the paper explores the application prospects of text generation and has certain reference value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hu, M.: Text generation based on generative adversarial networks. University of TC (2020)
Google Scholar
Ma, Z.: Fairy tale text generation study based on the improved GPT-2 model. Shanghai Normal University (2020)
Google Scholar
Le, Q.: Distributed representations of sentences and documents. Journal PMLR (2014)
Google Scholar
Gardner, M., Grus, J., Neumann, M., et al.: AllenNLP: a deep semantic natural language processing platform. Journal (2018)
Google Scholar
Ghanem, R., Erbay, H.: Spam detection on social networks using deep contextualized word representation. J. 82(3), 3697–3712 (2023)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention Is All You Need. Journal arXiv (2017)
Google Scholar
A R. Improving language understanding by generative pretraining. Journal (2018)
Google Scholar
A R. Language models are unsupervised multitask learners. J. OpenAI blog 1(8) (2019)
Google Scholar
T B. Language models are few-shot learners. Advances in neural information processing systems (2020)
Google Scholar
Y W. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXi (2017)
Google Scholar
Y R. Fastspeech: Fast, robust and controllable text to speech. Advances in neural information processing systems (2019)
Google Scholar
Jiao, Y., Gabrys, A., et al.: Universal neural vocoding with parallel wavenet. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6044–6048 (2021)
Google Scholar
Amoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multiresolution spectrogram//ICASS, (ICASSP). IEEE (ICASSP) (2020)
Google Scholar
H L. VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis. Journal arXiv preprint arXi (2021)
Google Scholar
Sun, F.: Study on the automatic generation method of scientific consulting report based on GPT-2 model. Xidian University (2021)
Google Scholar
Wang, Z.: Research and implementation of speech synthesis based on Fastspeech. Sichuan University of Light and Chemical Technology (2021)
Google Scholar
Chen, F.: Multidiscriminant song synthetic vocoders based on generative adversarial networks. Zhejiang University (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing Xiaozhuang University, Nanjing, Jiangsu, 211171, China
Wanyu Luo, Yanqing Wang, Yujia Liu & Yiqin Xu

Authors

Wanyu Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yujia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yiqin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanqing Wang .

Editor information

Editors and Affiliations

Harbin Engineering University, Harbin, China
Zhiwen Yu
Harbin Engineering University, Harbin, China
Qilong Han
Harbin Institute of Technology, Harbin, China
Hongzhi Wang
Northwestern Polytechnical University, Xi'an, China
Bin Guo
Shiga University, Shiga, Japan
Xiaokang Zhou
Harbin University of Science and Technology, Harbin, China
Xianhua Song
National Academy of Guo Ding Institute of Data Science, Beijing, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, W., Wang, Y., Liu, Y., Xu, Y. (2023). Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning. In: Yu, Z., et al. Data Science. ICPCSEE 2023. Communications in Computer and Information Science, vol 1879. Springer, Singapore. https://doi.org/10.1007/978-981-99-5968-6_33

Download citation

DOI: https://doi.org/10.1007/978-981-99-5968-6_33
Published: 15 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5967-9
Online ISBN: 978-981-99-5968-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics