Skip to main content

A Method of Real-Time Non-uniform Speech Stretching

  • Conference paper
E-Business and Telecommunications (ICETE 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 314))

Included in the following conference series:

Abstract

Developed method of real-time non-uniform speech stretching is presented. The proposed solution is based on the well-known SOLA algorithm (Synchronous Overlap and Add). Non-uniform time-scale modification is achieved by the adjustment of time scaling factor values in accordance with the signal content. Dependently on the speech unit (vowels/consonants), instantaneous rate of speech (ROS), and speech signal presence, values of the scaling factor are selected. This provides as low as possible difference in the duration of the input and output signal and high naturalness and quality of the modified speech. In the experimental part of the paper accuracy of the proposed ROS estimator is examined. Quality of the speech stretched using the proposed method is assessed in the subjective tests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Demol, M., Verhelst, W., Struye, K., Verhoeve, P.: Efficient Non-Uniform Time-Scaling of Speech with WSOLA. In: Speech and Computers, SPECOM (2005)

    Google Scholar 

  2. Grofit, S., Lavner, Y.: Time-Scale Modification of Audio Signals Using Enhanced WSOLA with Management of Transients. IEEE Trans. on Audio, Speech, and Language Processing 16(1) (2008)

    Google Scholar 

  3. Kupryjanow, A., Czyzewski, A.: Real-time speech-rate modification experiments. Audio Engineering Society Convention Paper, Preprint No. 8052, London (2010)

    Google Scholar 

  4. Kupryjanow, A., Czyzewski, A.: Time-scale modification of speech signals for supporting hearing impaired schoolchildren. In: Proc. of the International Conference NTAV/SPA, New Trends in Audio and Video, Signal Processing: Algorithms, Architectures, Arrangements and Applications, Poznan, pp. 159–162 (2009)

    Google Scholar 

  5. Le Beux, S., Doval, B., d’Alessandro, C.: Issues and solutions related to real-time TD-PSOLA implementation. Audio Engineering Society Convention Paper, Preprint No. 8085 (2010)

    Google Scholar 

  6. Mirghafori, N., Fosler, E., Morgan, N.: Towards Robustness to Fast Speech in ASR. In: Proc. ICASSP 1996, pp. I335–I338 (1996)

    Google Scholar 

  7. Morgan, N., Fosler-Lussier, E.: Combining multiple estimators of speaking rate. In: ICASSP, Seattle (1998)

    Google Scholar 

  8. Moulines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16(2), 175–205 (1995)

    Article  Google Scholar 

  9. Narayanan, S., Wang, D.: Speech rate estimation via temporal correlation andselected sub-band correlation. In: ICASSP (2005)

    Google Scholar 

  10. Pesce, F.: Realtime-stretching of speech signals. In: DAFX, Italy (2000)

    Google Scholar 

  11. Pfau, T., Ruske, G.: Estimating the speaking rate by vowel detection. In: ICASSP 1998, Seattle (1998)

    Google Scholar 

  12. Tallal, P., et al.: Language Comprehension in Language-Learning Impaired Children Improved with acoustically modified speech. Science 271 (1996)

    Google Scholar 

  13. Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1993 (1993)

    Google Scholar 

  14. Yoo, I.C., Yook, D.: Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds. ETRI Journal 31(4), s. 451–s. 453 (2009)

    Article  Google Scholar 

  15. Zheng, J., Franco, H., Stolcke, A.: Rate of Speech Modeling for Large Vocabulary Conversational Speech Recognition (2000)

    Google Scholar 

  16. Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H.: Word-level rate-of-speech modeling using rate-specificphones and pronunciations. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Istanbul, vol. 3, pp. 1775–1778 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kupryjanow, A., Czyzewski, A. (2012). A Method of Real-Time Non-uniform Speech Stretching. In: Obaidat, M.S., Sevillano, J.L., Filipe, J. (eds) E-Business and Telecommunications. ICETE 2011. Communications in Computer and Information Science, vol 314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35755-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35755-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35754-1

  • Online ISBN: 978-3-642-35755-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics