Abstract
Aiming at complex and changeable factors such as speech theme and environment, which make it difficult for a speaker to prepare the speech text in a short time, this paper proposes a speech generation and demonstration system based on deep learning. This system is based on the Deep Learning Development Framework (PyTorch), trained through the theory of GPT-2 and the open source pretrained model, to generate multiple speeches according to the topics given by users, and the system generates the final speech and corresponding voice demonstration audio through text modification, speech synthesis and other technologies to help users quickly obtain the target document and audio. Experiments show that the text generated by this model is smooth and easy to use, which helps shorten the preparation time of speakers and improves the confidence of the impromptu speaker. In addition, the paper explores the application prospects of text generation and has certain reference value.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hu, M.: Text generation based on generative adversarial networks. University of TC (2020)
Ma, Z.: Fairy tale text generation study based on the improved GPT-2 model. Shanghai Normal University (2020)
Le, Q.: Distributed representations of sentences and documents. Journal PMLR (2014)
Gardner, M., Grus, J., Neumann, M., et al.: AllenNLP: a deep semantic natural language processing platform. Journal (2018)
Ghanem, R., Erbay, H.: Spam detection on social networks using deep contextualized word representation. J. 82(3), 3697–3712 (2023)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention Is All You Need. Journal arXiv (2017)
A R. Improving language understanding by generative pretraining. Journal (2018)
A R. Language models are unsupervised multitask learners. J. OpenAI blog 1(8) (2019)
T B. Language models are few-shot learners. Advances in neural information processing systems (2020)
Y W. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXi (2017)
Y R. Fastspeech: Fast, robust and controllable text to speech. Advances in neural information processing systems (2019)
Jiao, Y., Gabrys, A., et al.: Universal neural vocoding with parallel wavenet. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6044–6048 (2021)
Amoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multiresolution spectrogram//ICASS, (ICASSP). IEEE (ICASSP) (2020)
H L. VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis. Journal arXiv preprint arXi (2021)
Sun, F.: Study on the automatic generation method of scientific consulting report based on GPT-2 model. Xidian University (2021)
Wang, Z.: Research and implementation of speech synthesis based on Fastspeech. Sichuan University of Light and Chemical Technology (2021)
Chen, F.: Multidiscriminant song synthetic vocoders based on generative adversarial networks. Zhejiang University (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Luo, W., Wang, Y., Liu, Y., Xu, Y. (2023). Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning. In: Yu, Z., et al. Data Science. ICPCSEE 2023. Communications in Computer and Information Science, vol 1879. Springer, Singapore. https://doi.org/10.1007/978-981-99-5968-6_33
Download citation
DOI: https://doi.org/10.1007/978-981-99-5968-6_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5967-9
Online ISBN: 978-981-99-5968-6
eBook Packages: Computer ScienceComputer Science (R0)