Skip to main content

Vocal Emotion Conversion Using WSOLA and Linear Prediction

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

The paper deals with speech emotion conversion using Waveform Similarity Overlap Add (WSOLA) and subsequent linear prediction analysis for spectral transformation. Duration modification is done by taking the ratio between segment durations of neutral and target speech. After performing modification using WSOLA, the duration modified source speech is time aligned with target and further subjected to linear prediction analysis to yield the LP coefficients. The target emotion is re-synthesised by using the prosody manipulated residual and LPCs from source. The waveform similarity property of WSOLA is exploited to give output with minimal distortion. The proposed algorithm is subjectively and objectively evaluated along with popular TD-PSOLA algorithm. The correlation between synthesised and real target shows an average improvement of 60% across all emotions with the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burkhardt, F., Sendilmeier, W.F.: Verification of acoustical correlates of emotional speech using formant synthesis. In: Proceedings of ISCA Workshop on Speech & Emotion, pp. 151–156 (2000)

    Google Scholar 

  2. Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio, Speech, Lang. Pro. 14, 1145–1154 (2006)

    Google Scholar 

  3. Cabral, J., Oliveira, L.C.: Emovoice: a system to generate emotions in speech. In: Proceedings of INTERSPEECH, 17–21 September, PA, USA, pp. 1798–1801 (2006)

    Google Scholar 

  4. Govind, D., Prasanna, S.R.M.: Dynamic prosody modification using zero frequency filtered signal. Int. J. Speech Tech. 16, 41–54 (2013)

    Article  Google Scholar 

  5. Rao, K.S., Vuppala, A.K.: Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. 55, 745–756 (2013)

    Article  Google Scholar 

  6. Vuppala, A.K., Kaidiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: Proceedings of 9th International Conference on Industrial and Information Systems (ICIIS), 15–17 December, pp. 1–4 (2014)

    Google Scholar 

  7. Vydana, H.K., Raju, V.V.V., Gangashetty, S.V., Vuppala, A.K.: Significance of emotionally significant regions of speech for emotive to neutral conversion. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 287–296. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_28

    Chapter  Google Scholar 

  8. Yadav, J., Rao, K.S.: Generation of emotional speech by prosody imposition on sentence, word and syllable level fragments of neutral speech,. In: Proceedings of International Conference on Cognitive Computing and Information Processing (CCIP), 3–4 March. pp. 1–5 (2015)

    Google Scholar 

  9. Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits, Syst., Sig. Proc. 35(5), 1643–1663 (2016)

    Article  Google Scholar 

  10. Vekkot, S., Tripathi, S.: Inter-Emotion conversion using dynamic time warping and prosody imposition. In: Proceedings of 2nd International Symposium on Intelligent Systems, Technologies & Applications, LNMIIT, Jaipur, 21–24 September, pp. 913–924 (2016)

    Google Scholar 

  11. Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1993, vol. 2. IEEE (1993)

    Google Scholar 

  12. Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03547-0_46

    Chapter  Google Scholar 

  13. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(2), 561–580 (1975). IEEE Press, New York

    Article  Google Scholar 

  14. Mourlines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susmitha Vekkot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Vekkot, S., Tripathi, S. (2017). Vocal Emotion Conversion Using WSOLA and Linear Prediction. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_78

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_78

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics