research-article

Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network

Authors:

Mohammad Shabaz,

Ashutosh Sharma,

Mohd Anul HaqAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, Issue 5

Article No.: 140, Pages 1 - 13

https://doi.org/10.1145/3545569

Published: 08 May 2023 Publication History

Abstract

There is a need to prevent the use of modulated voice signals to conduct criminal activities. Voice signal change detection based on convolutional neural networks is proposed. We use three commonly used voice processing software (Audacity, CoolEdit, and RTISI) to change tones in voice libraries. The research further raises each voice by five semitones and are recorded at different levels (+4, +5, +6, +7, and +8, respectively). Simultaneously, every voice is lowered by five halftones, represented as –4, –5, –6, –7, and –8, respectively. The convolution neural network corresponding to network b-3 is determined as the final classifier in this article through experiments. The average accuracy A1 of its three categories has reached more than 97%, the detection accuracy A2 of electronic tone sandhi speech has reached more than 97%, and the false alarm rate of the original speech is less than 1.9%. The outcomes obtained shows that the detection algorithm in this article is effective, and it has good generalization ability.

References

[1]

T. Zhang, Y. Zhang, Y. Cao, N. Li, and L. Hao. 2020. Diagnosing parkinson's disease with speech signal based on convolutional neural network. Int. J. Comput. Appl. Technol. 63, 4 (2020), 348.

Digital Library

[2]

T. Ping, X. R. Nan, I. Yuen, L. Gao, and K. Demuth. 2019. The development of abstract representations of tone sandhi. Dev. Psychol. 55, 10 (2019), 2114–2122.

[3]

A. H. Andersen, J. M. D. Haan, Z. H. Tan, and J. Jensen. 2018. Non-intrusive speech intelligibility prediction using convolutional neural networks. IEEE/ACM Trans. Aud. Speech Lang. Process. 99 (2018), 1–1.

[4]

S. Chen, Y. Yang, X. Liu, and S. Zhu. 2022. Dual discriminator GAN: Restoring ancient Yi characters. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–23.

Digital Library

[5]

Z. Mao, C. Chu, and S. Kurohashi. 2022. Linguistically driven Multi-Task Pre-Training for Low-Resource neural machine translation. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–29.

Digital Library

[6]

M. A. Abderrahim and M. E. A. Abderrahim. 2022. Arabic word sense disambiguation for information retrieval. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–19.

Digital Library

[7]

Y. Cui, W. Che, Z. Yang, T. Liu, B. Qin, S. Wang, and G. Hu. 2022. Interactive gated decoder for machine reading comprehension. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–19.

Digital Library

[8]

F. Husain and O. Uzuner. 2022. Investigating the effect of preprocessing arabic text on offensive language and hate speech detection. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–20.

Digital Library

[9]

G. Kumar, S. Kumar, and N. Kumar. 2015. Comparative study of wavelet and wavelet packet transform for denoising telephonic speech signal. Int. J. Comput. Appl. 110, 15 (2015), 1–8.

[10]

H. Yan and J. Zhang. 2016. Pattern substitution in wuxi tone sandhi and its implication for phonological learning. Int. J. Chin. Ling. 3, 1 (2016), 1–44.

[11]

R. K. Kandagatla and V. S. Potluri. 2020. Performance analysis of neural network, nmf and statistical approaches for speech enhancement. Int. J. Speech Technol. 23, 4 (2020), 1–21.

[12]

N. Tominaga, Y. Sugiura, and T. Shimamura. 2020. Speech enhancement based on deep neural networks considering features of speech distribution. J. Sign. Process. 24, 4 (2020), 179–182.

[13]

R. Y. Belorutsky and S. V. Zhitnik. 2019. Speech recognition based on convolution neural networks. Iss. Radio Electr. 4 (2019), 47–52.

[14]

N. Hajarolasvadi and H. Demirel. 2019. 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21, 5 (2019), 479.

[15]

Z. Wen, K. Li, Z. Huang, C. H. Lee, and J. Tao. 2018. Improving deep neural network based speech synthesis through contextual feature parametrization and multi-task learning. J. Sign. Process. Syst. 90, 7 (2018), 1025–1037.

Digital Library

[16]

T. K. Dash and S. S. Solanki. 2020. Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Pers. Commun. 111, 2 (2020), 1073–1087.

[17]

Y. Wu, H. Mao, and Z. Yi. 2018. Audio classification using attention-augmented convolutional neural network. Knowl.-Bas. Syst. 161, 1 (December 2018), 90–100.

[18]

J. Mekyska, E. Janousova, P. Gomez-Vilda, Z. Smekal, I. Rektorova, I. Eliasova, and K. Lopez-de-Ipina. 2015. Robust and complex approach of pathological speech signal analysis. Neurocomputing 167 (2015), 94–111.

Digital Library

[19]

T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai. 2018. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoust. Sci. Technol. 39, 2 (2018), 163–166.

[20]

M. O. Tamm, Y. Muhammad, and N. Muhammad. 2020. Classification of vowels from imagined speech with convolutional neural networks. Computers 9, 2 (2020), 46.

[21]

H. Alshaibani and H. M. Swady. 2019. Mobile-based human emotion recognition is based on speech and heart rate. Univ. Baghd. Eng. J. 25, 11 (2019), 55–66.

[22]

J. H. Liu, Z. H. Ling, S. Wei, G. P. Hu, and L. R. Dai. 2017. Improving the decoding efficiency of deep neural network acoustic models by cluster-based senone selection. J. Sign. Process. Syst. 90, 2 (2017), 1–13.

Digital Library

[23]

S. Shamila, U. Snekhalatha, and D. Balakrishnan. 2017. Spectral analysis and feature extraction of speech signal in dysphonia patients. Int. J. Pure Appl. Math. 113, 11 (2017), 151–160.

[24]

M. R. Kamble, H. Tak, and H. A. Patil. 2020. Amplitude and frequency modulation-based features for detection of replay spoof speech. Speech Commun. 125, 4 (2020), 114–127.

[25]

Azar Mahmoodzadeh and HamidRezaAbutalebi. 2017. Hybrid approach to single-channel speech separation based on coherent–incoherent modulation filtering. Circ. Syst. Sign. Process. 36, 5 (2017), 1970–1988.

Digital Library

[26]

M. Fawzy, M. Shalaby, Y. Kamal, and S. Elramly. 2017. A speech cryptosystem based on chaotic modulation technique. Egypt. J. Lang. Eng. 4, 1 (2017), 1–10.

[27]

K. A. Ghaidan, H. Issa, and E. Trad. 2015. Artificial intelligence for speech recognition based on neural networks. J. Sign. Inf. Process. 06, 2 (2015), 66–72.

[28]

C. Bhardwaj, S. Jain, and M. Sood. 2021. Diabetic retinopathy severity grading employing quadrant-based Inception-V3 convolution neural network architecture. Int. J. Imag. Syst. Technol. 31, 2 (2021), 592–608.

[29]

C. Bhardwaj, S. Jain, and M. Sood. 2021. Deep learning–based diabetic retinopathy severity grading system employing quadrant ensemble model. J. Digit. Imag. 34, 2 (2021), 440–457.

[30]

Z. Qu, W. Wang, C. Hou, and C. Hou. 2019. Radar signal intra-pulse modulation recognition based on convolutional denoising autoencoder and deep convolutional neural network. IEEE Access 99 (2019), 1–1.

[31]

J. Sun, G. Xu, W. Ren, and Z. Yan. 2018. Radar emitter classification based on unidimensional convolutional neural network. Radar Sonar Navig. IET 12, 8 (2018), 862–867.

[32]

D. Wang, M. Zhang, J. Li, Z. Li, J. Li, and C. Song. 2017. Intelligent constellation diagram analyzer using convolutional neural network-based deep learning. Opt. Express 25, 15 (2017), 17150.

[33]

S. JRgensen, R. Decorsière, and T. Dau. 2015. Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility. J. Acoust. Soc. Am. 137, 3 (2015), 1401.

[34]

G. Kovacs, L. Toth, and D. V. Compernolle. 2015. Selection and enhancement of gabor filters for automatic speech recognition. Int. J. Speech Technol. 18, 1 (2015), 1–16.

Digital Library

[35]

R. H. Zeng and S. Q. Zhang. 2018. Speech and emotional recognition method based on improving convolutional neural networks. J. Appl. Sci. 36, 5 (2018), 837–844.

[36]

S. Peng, H. Jiang, H. Wang, H. Alwageed, and Y. D. Yao. 2018. Modulation classification based on signal constellation diagrams and deep learning. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1–10.

[37]

J. Wang, W. Wang, F. Luo, and S. Wei. 2019. Modulation classification based on denoising autoencoder and convolutional neural network with gnu radio. J. Eng. 19 (2019), 6188–6191.

Cited By

Index Terms

Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as ...
Vowel based Voice Activity Detection with LSTM Recurrent Neural Network
ICSPS 2016: Proceedings of the 8th International Conference on Signal Processing Systems

Voice activity detection (VAD) determines whether the incoming signal segments are speech or noiseand is an important technique in almost all of speech-related applications. In order to improve VAD performance in various noise environments, ...
A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis

This paper proposes a technique of improving tone correctness in speech synthesis of a tonal language based on an average-voice model trained with a corpus from nonprofessional speakers' speech. We focused on reducing tone disagreements in speech data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 5

May 2023

653 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3596451

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2023

Online AM: 19 September 2022

Accepted: 06 June 2022

Revised: 06 May 2022

Received: 01 March 2022

Published in TALLIP Volume 22, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

Detection algorithm

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
157
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents