skip to main content
research-article

Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network

Published: 08 May 2023 Publication History

Abstract

There is a need to prevent the use of modulated voice signals to conduct criminal activities. Voice signal change detection based on convolutional neural networks is proposed. We use three commonly used voice processing software (Audacity, CoolEdit, and RTISI) to change tones in voice libraries. The research further raises each voice by five semitones and are recorded at different levels (+4, +5, +6, +7, and +8, respectively). Simultaneously, every voice is lowered by five halftones, represented as –4, –5, –6, –7, and –8, respectively. The convolution neural network corresponding to network b-3 is determined as the final classifier in this article through experiments. The average accuracy A1 of its three categories has reached more than 97%, the detection accuracy A2 of electronic tone sandhi speech has reached more than 97%, and the false alarm rate of the original speech is less than 1.9%. The outcomes obtained shows that the detection algorithm in this article is effective, and it has good generalization ability.

References

[1]
T. Zhang, Y. Zhang, Y. Cao, N. Li, and L. Hao. 2020. Diagnosing parkinson's disease with speech signal based on convolutional neural network. Int. J. Comput. Appl. Technol. 63, 4 (2020), 348.
[2]
T. Ping, X. R. Nan, I. Yuen, L. Gao, and K. Demuth. 2019. The development of abstract representations of tone sandhi. Dev. Psychol. 55, 10 (2019), 2114–2122.
[3]
A. H. Andersen, J. M. D. Haan, Z. H. Tan, and J. Jensen. 2018. Non-intrusive speech intelligibility prediction using convolutional neural networks. IEEE/ACM Trans. Aud. Speech Lang. Process. 99 (2018), 1–1.
[4]
S. Chen, Y. Yang, X. Liu, and S. Zhu. 2022. Dual discriminator GAN: Restoring ancient Yi characters. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–23.
[5]
Z. Mao, C. Chu, and S. Kurohashi. 2022. Linguistically driven Multi-Task Pre-Training for Low-Resource neural machine translation. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–29.
[6]
M. A. Abderrahim and M. E. A. Abderrahim. 2022. Arabic word sense disambiguation for information retrieval. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–19.
[7]
Y. Cui, W. Che, Z. Yang, T. Liu, B. Qin, S. Wang, and G. Hu. 2022. Interactive gated decoder for machine reading comprehension. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–19.
[8]
F. Husain and O. Uzuner. 2022. Investigating the effect of preprocessing arabic text on offensive language and hate speech detection. Trans. As. Low-Resour. Lang. Inf. Process. 21, 4 (2022), 1–20.
[9]
G. Kumar, S. Kumar, and N. Kumar. 2015. Comparative study of wavelet and wavelet packet transform for denoising telephonic speech signal. Int. J. Comput. Appl. 110, 15 (2015), 1–8.
[10]
H. Yan and J. Zhang. 2016. Pattern substitution in wuxi tone sandhi and its implication for phonological learning. Int. J. Chin. Ling. 3, 1 (2016), 1–44.
[11]
R. K. Kandagatla and V. S. Potluri. 2020. Performance analysis of neural network, nmf and statistical approaches for speech enhancement. Int. J. Speech Technol. 23, 4 (2020), 1–21.
[12]
N. Tominaga, Y. Sugiura, and T. Shimamura. 2020. Speech enhancement based on deep neural networks considering features of speech distribution. J. Sign. Process. 24, 4 (2020), 179–182.
[13]
R. Y. Belorutsky and S. V. Zhitnik. 2019. Speech recognition based on convolution neural networks. Iss. Radio Electr. 4 (2019), 47–52.
[14]
N. Hajarolasvadi and H. Demirel. 2019. 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21, 5 (2019), 479.
[15]
Z. Wen, K. Li, Z. Huang, C. H. Lee, and J. Tao. 2018. Improving deep neural network based speech synthesis through contextual feature parametrization and multi-task learning. J. Sign. Process. Syst. 90, 7 (2018), 1025–1037.
[16]
T. K. Dash and S. S. Solanki. 2020. Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Pers. Commun. 111, 2 (2020), 1073–1087.
[17]
Y. Wu, H. Mao, and Z. Yi. 2018. Audio classification using attention-augmented convolutional neural network. Knowl.-Bas. Syst. 161, 1 (December 2018), 90–100.
[18]
J. Mekyska, E. Janousova, P. Gomez-Vilda, Z. Smekal, I. Rektorova, I. Eliasova, and K. Lopez-de-Ipina. 2015. Robust and complex approach of pathological speech signal analysis. Neurocomputing 167 (2015), 94–111.
[19]
T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai. 2018. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters. Acoust. Sci. Technol. 39, 2 (2018), 163–166.
[20]
M. O. Tamm, Y. Muhammad, and N. Muhammad. 2020. Classification of vowels from imagined speech with convolutional neural networks. Computers 9, 2 (2020), 46.
[21]
H. Alshaibani and H. M. Swady. 2019. Mobile-based human emotion recognition is based on speech and heart rate. Univ. Baghd. Eng. J. 25, 11 (2019), 55–66.
[22]
J. H. Liu, Z. H. Ling, S. Wei, G. P. Hu, and L. R. Dai. 2017. Improving the decoding efficiency of deep neural network acoustic models by cluster-based senone selection. J. Sign. Process. Syst. 90, 2 (2017), 1–13.
[23]
S. Shamila, U. Snekhalatha, and D. Balakrishnan. 2017. Spectral analysis and feature extraction of speech signal in dysphonia patients. Int. J. Pure Appl. Math. 113, 11 (2017), 151–160.
[24]
M. R. Kamble, H. Tak, and H. A. Patil. 2020. Amplitude and frequency modulation-based features for detection of replay spoof speech. Speech Commun. 125, 4 (2020), 114–127.
[25]
Azar Mahmoodzadeh and HamidRezaAbutalebi. 2017. Hybrid approach to single-channel speech separation based on coherent–incoherent modulation filtering. Circ. Syst. Sign. Process. 36, 5 (2017), 1970–1988.
[26]
M. Fawzy, M. Shalaby, Y. Kamal, and S. Elramly. 2017. A speech cryptosystem based on chaotic modulation technique. Egypt. J. Lang. Eng. 4, 1 (2017), 1–10.
[27]
K. A. Ghaidan, H. Issa, and E. Trad. 2015. Artificial intelligence for speech recognition based on neural networks. J. Sign. Inf. Process. 06, 2 (2015), 66–72.
[28]
C. Bhardwaj, S. Jain, and M. Sood. 2021. Diabetic retinopathy severity grading employing quadrant-based Inception-V3 convolution neural network architecture. Int. J. Imag. Syst. Technol. 31, 2 (2021), 592–608.
[29]
C. Bhardwaj, S. Jain, and M. Sood. 2021. Deep learning–based diabetic retinopathy severity grading system employing quadrant ensemble model. J. Digit. Imag. 34, 2 (2021), 440–457.
[30]
Z. Qu, W. Wang, C. Hou, and C. Hou. 2019. Radar signal intra-pulse modulation recognition based on convolutional denoising autoencoder and deep convolutional neural network. IEEE Access 99 (2019), 1–1.
[31]
J. Sun, G. Xu, W. Ren, and Z. Yan. 2018. Radar emitter classification based on unidimensional convolutional neural network. Radar Sonar Navig. IET 12, 8 (2018), 862–867.
[32]
D. Wang, M. Zhang, J. Li, Z. Li, J. Li, and C. Song. 2017. Intelligent constellation diagram analyzer using convolutional neural network-based deep learning. Opt. Express 25, 15 (2017), 17150.
[33]
S. JRgensen, R. Decorsière, and T. Dau. 2015. Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility. J. Acoust. Soc. Am. 137, 3 (2015), 1401.
[34]
G. Kovacs, L. Toth, and D. V. Compernolle. 2015. Selection and enhancement of gabor filters for automatic speech recognition. Int. J. Speech Technol. 18, 1 (2015), 1–16.
[35]
R. H. Zeng and S. Q. Zhang. 2018. Speech and emotional recognition method based on improving convolutional neural networks. J. Appl. Sci. 36, 5 (2018), 837–844.
[36]
S. Peng, H. Jiang, H. Wang, H. Alwageed, and Y. D. Yao. 2018. Modulation classification based on signal constellation diagrams and deep learning. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1–10.
[37]
J. Wang, W. Wang, F. Luo, and S. Wei. 2019. Modulation classification based on denoising autoencoder and convolutional neural network with gnu radio. J. Eng. 19 (2019), 6188–6191.

Cited By

View all

Index Terms

  1. Generation of Voice Signal Tone Sandhi and Melody Based on Convolutional Neural Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
    May 2023
    653 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3596451
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2023
    Online AM: 19 September 2022
    Accepted: 06 June 2022
    Revised: 06 May 2022
    Received: 01 March 2022
    Published in TALLIP Volume 22, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. Detection algorithm

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 157
      Total Downloads
    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media