Speech analysis and synthesis with a refined adaptive sinusoidal representation

Tabet, Youcef; Boughazi, Mohamed; Afifi, Saddek

doi:10.1007/s10772-018-9519-4

Speech analysis and synthesis with a refined adaptive sinusoidal representation

Published: 15 May 2018

Volume 21, pages 581–588, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

234 Accesses
5 Citations
Explore all metrics

Abstract

This paper explores common speech signal representations along with a brief description of their corresponding analysis–synthesis stages. The main focus is on adaptive sinusoidal representations where a refined model of speech is suggested. This model is referred to as Refined adaptive Sinusoidal Representation (R_aSR). Based on the performance of the recently suggested adaptive Sinusoidal Models of speech, significant refinements are proposed at both the analysis and adaptive stages. First, a quasi-harmonic representation of speech is used in the analysis stage in order to obtain an initial estimation of the instantaneous model parameters. Next, in the adaptive stage, an adaptive scheme combined with an iterative frequency correction mechanism is used to allow a robust estimation of model parameters (amplitudes, frequencies, and phases). Finally, the speech signal is reconstructed as a sum of its estimated time-varying instantaneous components after an interpolation scheme. Objective evaluation tests prove that the suggested R_aSR achieves high quality reconstruction when applied in modeling voiced speech signals compared to state-of-the-art models. Moreover, transparent perceived quality was attained using the R_aSR according to results obtained from listening evaluation tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A uniform phase representation for the harmonic model in speech synthesis applications

Article Open access 16 October 2014

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Article 18 November 2023

References

Abrantes, A. J., Marques, J. S., & Transcoso, I. M. (1991). Hybrid sinusoidal modeling of speech without voicing decicion. In Eurospeeech91, Genova (pp. 231-234).
Almeida, L. B., & Silva, F. M. (1984). Variable-frequency synthesis: An improved harmonic coding scheme. Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), 1, 2751–2754.
Google Scholar
Atal, B., & Hanauer, S. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of Acoustical Society of America (JASA), 50, 637–655.
Article Google Scholar
Degottex, G., & Stylianou, Y. (2012). A full-band adaptive harmonic representation of speech. In Interspeech, Portland, OR.
Degottex, G., & Stylianou, Y. (2013). Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2085–2095.
Article Google Scholar
Fant, G. (1960). Acoustic theory of speech production. Gravenhage: Mounton and Co.
Google Scholar
Griffin, D. W., & Lim, J. S. (1988). Multiband excitation vocoder. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(8), 1223–1235.
Article MATH Google Scholar
Halabi, N. (2016). Modern standard arabic phonetics for speech synthesis. PhD Thesis, University of Southampton.
Hedlin, P. (1981). A tone-oriented voice-excited vocoder. In Proceedings of the IEEE international conference on accoustics, speech and signal processing, Atlanta (pp. 205–208).
Kafentzis, G. P., Degottex, G., Rosec, O., & Stylianou, Y. (2013). Time-scale modifications based on an adaptive harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Vancouver, CA.
Kafentzis, G. P., Degottex, G., Rosec, O., & Stylianou, Y. (2014). Pitch modifications of speech based on an adaptive harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Vancouver, CA.
Kafentzis, G.P., & Stylianou, Y. (2016). High-resolution sinusoidal modeling of unvoiced speech. In International Conference on acoustics, speech, and signal processing, Shanghai, China.
Kafentzis, G. P., Pantazis, Y., Rosec, O., & Stylianou, Y. (2012). An extension of the adaptive quasi-harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Kyoto.
Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2013). On the modeling of voiceless stop sounds of speech using adaptive quasi-harmonic models. In Interspeech, Portland, OR.
Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2014). Robust full-band adaptive sinusoidal analysis and synthesis of speech. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Kyoto.
Kafentzis, G.P., Yakoumaki, T., Mouchtaris, A., & Stylianou, Y. (2014). Analysis of emotional speech using an adaptive sinusoidal model. In European Signal Processing Conference (EUSIPCO), Lisbon.
Kominek, J., & Black, A.W. (2003). The CMU ARCTIC databases for speech synthesis. Technical Report CMU-LTI-03-177, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA
Kominek, J., & Black, A. W. (2004). The CMU ARCTIC speech databases. In 5th ISCA speech synthesis workshop, Pittsburgh (pp. 223-224).
Laroche, J., Stylianou, Y., & Moulines, E. (1993). HNM: A simple, effecient harmonic plus noisemodel for speech. In Workshop on applications of signal processing to audio and acoustics (WASPAA), New Paltz, NY (pp. 169-172).
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.
Article Google Scholar
Markel, J., & Gray, A. (1976). Linear prediction of speech. New York: Springer.
Book MATH Google Scholar
McAulay, R. J., & Quatieri, T. F. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34, 744–754.
Article Google Scholar
McAulay, R., & Quatieri, T. T. Magnitude-only reconstruction using a sinusoidal speech model. In Proceedings of ICASSP-84, SanDiego, CA, session 27.6.1. Mar. x
Oomen, W., & den Brinker, A. C. (1999). Sinusoids plus noise modelling for audio signals. In 17th international conference: High-quality audio coding, Florence.
Pantazis, Y., Rosec, O., & Stylianou, Y. (2008). On the properties of a time-varying quasi-harmonic model of speech. In Interspeech, Brisbane.
Pantazis, Y., Rosec, O., & Stylianou, Y. (2011). Adaptive AM-FM signal decomposition with application to speech analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19, 290–300.
Article Google Scholar
Pantazis, Y., Tzedakis, G., Rosec, O., & Stylianou, Y. (2010). Analysis/synthesis of Speech based on an daptive Quasi-Harmonic plus Noise Model. In Proceedings of the IEEE ICASSP, Dallas, TX.
Quatieri, T. F. (2002). Discrete-time speech signal processing. Engewood Cliffs, NJ: Prentice Hall.
Google Scholar
Quatieri, T. F., & McAuley, R. J. (2002). Audio signal processing based on sinusoidal analysis/synthesis. In M. Kahrs & K. Brandenburg (Eds.), Applications of digital signal processing to audio and acoustics, Chapt 9 (pp. 343–416). Norwell, MA: Kluwer Academic Publishers.
Chapter Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Stylianou, Y. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. PhD Thesis, E.N.S.T - Paris.
Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.
Article Google Scholar
Tabet, Y., Boughazi, M., & Affifi, S. (2015). A tutorial on speech synthesis models. Procedia Computer Science, 73, 48–55.
Article Google Scholar
The ITU Radiocommunication Assembly. (2003). Itu-r bs.1284-1: General methods for the subjective assessment of sound quality, Technical Report, ITU.

Download references

Author information

Authors and Affiliations

Faculté des Sciences de l’Ingéniorat, Université Badji Mokhtar, Annaba, Algérie
Youcef Tabet, Mohamed Boughazi & Saddek Afifi

Authors

Youcef Tabet
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Boughazi
View author publications
You can also search for this author in PubMed Google Scholar
Saddek Afifi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youcef Tabet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabet, Y., Boughazi, M. & Afifi, S. Speech analysis and synthesis with a refined adaptive sinusoidal representation. Int J Speech Technol 21, 581–588 (2018). https://doi.org/10.1007/s10772-018-9519-4

Download citation

Received: 22 December 2017
Accepted: 04 May 2018
Published: 15 May 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-018-9519-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech analysis and synthesis with a refined adaptive sinusoidal representation

Abstract

Access this article

Similar content being viewed by others

A uniform phase representation for the harmonic model in speech synthesis applications

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech analysis and synthesis with a refined adaptive sinusoidal representation

Abstract

Access this article

Similar content being viewed by others

A uniform phase representation for the harmonic model in speech synthesis applications

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation