Skip to main content

Recent Advances in Nonlinear Speech Processing: Directions and Challenges

  • Chapter
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 48))

Abstract

Humans have very high requirements and expectations when communicating through speech, other than simplicity, flexibility and easiness of interaction . This is because voice interactions do not require cognitive efforts, attention, and memory resources. Voice technologies are however still constrained to use cases and scenarios giving the existing limitations of speech synthesis and recognition systems. Which is the status of nonlinear speech processing techniques and the steps made for cross-fertilization among disciplines? This chapter will provide a short overview trying to answer the above question.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Here “language” is intended to be “the verbal language” as opposed to other general meanings of the term. The interpretation of a “language” as a code can be found in De Saussure [9].

References

  1. Arjona Ramírez, M., Minami, M.: Technology and standards for low-bit-rate vocoding methods. In: Bidgoli, H. (ed.) The Handbook of Computer Networks, vol. 2, pp. 447–467. Wiley, New York (2011)

    Google Scholar 

  2. Arjona Ramírez, M., Minami, M.: Low bit rate speech coding. In: Proakis, J.G. (ed.) Wiley Encyclopedia of Telecommunications, vol. 3, pp. 1299–1308. Wiley, New York (2003)

    Google Scholar 

  3. Atassi, H., Esposito, A., Smekal, Z.: Analysis of high-level features for vocal emotion recognition. In: Proceedings of 34th IEEE International Conference on Telecommunication and Signal Processing (TSP), pp. 361–366 (2011)

    Google Scholar 

  4. Atassi, H., Riviello, M.T., Smekal, Z., Hussain, A., Esposito, A.: Emotional vocal expressions recognition using the cost 2102 italian database of emotional speech. In: Esposito, A., et al. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony, LNCS 5967, pp. 255–267. Springer, Berlin, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Atassi, H., Esposito, A.: Speaker independent approach to the classification of emotional vocal expressions. In: Proceedings of IEEE Conference on Tools with Artificial Intelligence (ICTAI 2008), vol. 1, pp. 487–494 (2008)

    Google Scholar 

  6. Butterworth, B.L., Beattie, G.W.: Gestures and silence as indicator of planning in speech. In: Smith, P.T., Campbell, R.N. (eds.) Recent Advances in the Psychology of Language, pp. 347–360. Olenum Press, New York (1978)

    Chapter  Google Scholar 

  7. Chafe, W.L.: Cognitive constraint on information flow. In: Tomlin, R. (ed.) Coherence and Grounding in Discourse, pp. 20–51. John Benjamins, Amsterdam (1987)

    Google Scholar 

  8. Cordasco, G., Esposito, M., Masucci, F., Riviello, M.T., Esposito, A., Chollet, G., Schlögl, S., Milhorat, P., Pelosi, G.: Assessing voice user interfaces: the vAssist system prototype. In: 5th IEEE International Conference on Cognitive InfoCommunications, pp. 91–96. Vietri sul Mare, 5–7 Nov 2014

    Google Scholar 

  9. De Saussure, F.: Cours de linguistique générale. Editions Payot, Paris (1922)

    Google Scholar 

  10. Esposito, A., Esposito, A.M., Vogel, C.: Needs and challenges in human computer interaction for processing social emotional information. Pattern Recogn. Lett. 66, 41–51 (2015)

    Article  Google Scholar 

  11. Esposito, A., Esposito, A.M., Likforman, L., Maldonato, M.N., Vinciarelli, A.: On the significance of speech pauses in depressive disorders: results on read and spontaneous narratives. In this volume (2015)

    Google Scholar 

  12. Esposito, A.: The situated multimodal facets of human communication. In: Rojc, M., Campbell, N. (eds.) Coverbal Synchrony in Human-Machine Interaction, ch. 7, pp. 173–202. CRC Press, Taylor & Francis Group, Boca Raton, FL (2013)

    Google Scholar 

  13. Esposito, A., Marinaro, M.: What pauses can tell us about speech and gesture partnership. In: Esposito, A., et al. (eds.) Fundamentals of Verbal and Nonverbal Communication and the Biometric Issue. NATO Publishing Series, vol. 18, pp. 45–57. IOS Press, The Netherlands (2007)

    Google Scholar 

  14. Esposito, A., Bourbakis, N.G.: The role of timing in speech perception and speech production processes and its effects on language impaired individuals. In: Proceedings of the 6th International IEEE Symposium on BioInformatics and BioEngineering (BIBE), pp. 348–356 (2006)

    Google Scholar 

  15. Esposito, A.: The importance of data for training intelligent devices. In: Apolloni, B., Kurfess, C. (eds.) From Synapses to Rules: Discovering Symbolic Knowledge from Neural Processed Data, pp. 229–250. Kluwer Academic Press, Dordrecht (2002)

    Chapter  Google Scholar 

  16. Esposito, A.: Approaching speech signal problems: an unifying viewpoint for the speech recognition process. In: Suarez Garcia, S., Baron Fernandez, R. (eds.) Memoria of Taller Internacional de Tratamiento del Habla, Procesamiento de Vos y el Language, CIC-IPN Obra Compleata (2000). ISBN: 970-18-4936-1

    Google Scholar 

  17. Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., Riviello, M.T.: Classification of emotional speech units in call centre interactions. In: Proceedings of 4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom2013), pp. 403–406. Budapest, Hungary, 2–5 Dec 2013

    Google Scholar 

  18. Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  19. Kiss, G., Tulics, M.G., Sztahó, D., Esposito, A., Vicsi, K.: Language independent detection possibilities of depression by speech. In this volume (2015)

    Google Scholar 

  20. Kroon, P.: Evaluation of speech coders. In: Paliwal, K.K., Bastiaan Kleijn, W. (eds.) Speech Coding and Synthesis, pp. 467–494. Elsevier Science, Amsterdam (1995)

    Google Scholar 

  21. Gibson, J.D.: Speech coding methods, standards, and applications. IEEE Circuits Syst. Mag. 5(4), 30–49 (2005)

    Article  Google Scholar 

  22. Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.): Nonlinear Analyses and Algorithms for Speech Processing, LNAI 3817. Springer, Berlin, Heidelberg (2006)

    Google Scholar 

  23. Lindblom, B.: Explaining phonetic variation: a sketch of the H&H theory. In: Hardcastle, W., Marchal, A. (eds.) Speech Production and Speech Modeling, pp. 403–439. Kluwer, Dordrecht (1990)

    Chapter  Google Scholar 

  24. Meena, R., Skantze, G., Gustafson, J.: Data-driven models for timing feedback responses in a map task dialogue system. Comput. Speech Lang. 28, 903–922 (2014)

    Article  Google Scholar 

  25. Milhorat, P., Schlögl, S., Chollet, G., Boudyy, J., Esposito, A., Pelosi, G.: Building the next generation of personal digital assistants. In: Proceedings of 1st IEEE International Conference on Advanced Technologies for Signal and Image Processing–ATSIP’2014, pp. 458–463. Sousse, Tunisia, 17–19 Mar 2014. ISSN 978-1-4799-4888-8/14/

    Google Scholar 

  26. Park, N., Rhoads, M., Hou, J., Lee, K.M.: Understanding the acceptance of teleconferencing systems among employees: an extension of the technology acceptance model. Comput. Hum. Behav. 39, 118–127 (2014)

    Article  Google Scholar 

  27. Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recogn. Lett. Elsevier (2014)

    Google Scholar 

  28. Schullerm, B.: Deep learning our everyday emotions: a short overview. In: Bassis et al. (eds.) Advances in Neural Networks: Computational and Theoretical Issues. Series: SIST Series, vol. 37, pp. 339–346. Springer, Berlin, Heidelberg (2015)

    Google Scholar 

  29. Scherer, S., Stratou, G., Lucas, G., Mahmoud, M., Boberg, J., Gratch, J., Rizzo, A., Morency, L.P.: Automatic audio-visual behaviour descriptors for psychological disorder analysis. Special Issue on Best of Face and Gesture 2013: Image Vis. Comput. 32(10), 648–658 (2014)

    Google Scholar 

  30. Skantze, G., Hjalmarsson, A.: Towards incremental speech generation in conversational systems. Comput. Speech Lang. 27, 243–262 (2013)

    Article  Google Scholar 

  31. Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.): Progress in Nonlinear Speech Processing, LNCS 4391. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Esposito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Esposito, A. et al. (2016). Recent Advances in Nonlinear Speech Processing: Directions and Challenges. In: Esposito, A., et al. Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-28109-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28109-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28107-0

  • Online ISBN: 978-3-319-28109-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics