Recent Advances in Nonlinear Speech Processing: Directions and Challenges

Esposito, Anna; Faundez-Zanuy, Marcos; Esposito, Antonietta M.; Cordasco, Gennaro; Drugman, Thomas; Solé-Casals, Jordi; Morabito, Francesco Carlo

doi:10.1007/978-3-319-28109-4_2

Recent Advances in Nonlinear Speech Processing: Directions and Challenges

Anna Esposito¹⁰,
Marcos Faundez-Zanuy¹¹,
Antonietta M. Esposito¹²,
Gennaro Cordasco¹³,
Thomas Drugman¹⁴,
Jordi Solé-Casals¹⁵ &
…
Francesco Carlo Morabito¹⁶

Chapter
First Online: 23 January 2016

843 Accesses
1 Citations

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 48))

Abstract

Humans have very high requirements and expectations when communicating through speech, other than simplicity, flexibility and easiness of interaction . This is because voice interactions do not require cognitive efforts, attention, and memory resources. Voice technologies are however still constrained to use cases and scenarios giving the existing limitations of speech synthesis and recognition systems. Which is the status of nonlinear speech processing techniques and the steps made for cross-fertilization among disciplines? This chapter will provide a short overview trying to answer the above question.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Here “language” is intended to be “the verbal language” as opposed to other general meanings of the term. The interpretation of a “language” as a code can be found in De Saussure [9].

References

Arjona Ramírez, M., Minami, M.: Technology and standards for low-bit-rate vocoding methods. In: Bidgoli, H. (ed.) The Handbook of Computer Networks, vol. 2, pp. 447–467. Wiley, New York (2011)
Google Scholar
Arjona Ramírez, M., Minami, M.: Low bit rate speech coding. In: Proakis, J.G. (ed.) Wiley Encyclopedia of Telecommunications, vol. 3, pp. 1299–1308. Wiley, New York (2003)
Google Scholar
Atassi, H., Esposito, A., Smekal, Z.: Analysis of high-level features for vocal emotion recognition. In: Proceedings of 34th IEEE International Conference on Telecommunication and Signal Processing (TSP), pp. 361–366 (2011)
Google Scholar
Atassi, H., Riviello, M.T., Smekal, Z., Hussain, A., Esposito, A.: Emotional vocal expressions recognition using the cost 2102 italian database of emotional speech. In: Esposito, A., et al. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony, LNCS 5967, pp. 255–267. Springer, Berlin, Heidelberg (2010)
Chapter Google Scholar
Atassi, H., Esposito, A.: Speaker independent approach to the classification of emotional vocal expressions. In: Proceedings of IEEE Conference on Tools with Artificial Intelligence (ICTAI 2008), vol. 1, pp. 487–494 (2008)
Google Scholar
Butterworth, B.L., Beattie, G.W.: Gestures and silence as indicator of planning in speech. In: Smith, P.T., Campbell, R.N. (eds.) Recent Advances in the Psychology of Language, pp. 347–360. Olenum Press, New York (1978)
Chapter Google Scholar
Chafe, W.L.: Cognitive constraint on information flow. In: Tomlin, R. (ed.) Coherence and Grounding in Discourse, pp. 20–51. John Benjamins, Amsterdam (1987)
Google Scholar
Cordasco, G., Esposito, M., Masucci, F., Riviello, M.T., Esposito, A., Chollet, G., Schlögl, S., Milhorat, P., Pelosi, G.: Assessing voice user interfaces: the vAssist system prototype. In: 5th IEEE International Conference on Cognitive InfoCommunications, pp. 91–96. Vietri sul Mare, 5–7 Nov 2014
Google Scholar
De Saussure, F.: Cours de linguistique générale. Editions Payot, Paris (1922)
Google Scholar
Esposito, A., Esposito, A.M., Vogel, C.: Needs and challenges in human computer interaction for processing social emotional information. Pattern Recogn. Lett. 66, 41–51 (2015)
Article Google Scholar
Esposito, A., Esposito, A.M., Likforman, L., Maldonato, M.N., Vinciarelli, A.: On the significance of speech pauses in depressive disorders: results on read and spontaneous narratives. In this volume (2015)
Google Scholar
Esposito, A.: The situated multimodal facets of human communication. In: Rojc, M., Campbell, N. (eds.) Coverbal Synchrony in Human-Machine Interaction, ch. 7, pp. 173–202. CRC Press, Taylor & Francis Group, Boca Raton, FL (2013)
Google Scholar
Esposito, A., Marinaro, M.: What pauses can tell us about speech and gesture partnership. In: Esposito, A., et al. (eds.) Fundamentals of Verbal and Nonverbal Communication and the Biometric Issue. NATO Publishing Series, vol. 18, pp. 45–57. IOS Press, The Netherlands (2007)
Google Scholar
Esposito, A., Bourbakis, N.G.: The role of timing in speech perception and speech production processes and its effects on language impaired individuals. In: Proceedings of the 6th International IEEE Symposium on BioInformatics and BioEngineering (BIBE), pp. 348–356 (2006)
Google Scholar
Esposito, A.: The importance of data for training intelligent devices. In: Apolloni, B., Kurfess, C. (eds.) From Synapses to Rules: Discovering Symbolic Knowledge from Neural Processed Data, pp. 229–250. Kluwer Academic Press, Dordrecht (2002)
Chapter Google Scholar
Esposito, A.: Approaching speech signal problems: an unifying viewpoint for the speech recognition process. In: Suarez Garcia, S., Baron Fernandez, R. (eds.) Memoria of Taller Internacional de Tratamiento del Habla, Procesamiento de Vos y el Language, CIC-IPN Obra Compleata (2000). ISBN: 970-18-4936-1
Google Scholar
Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., Riviello, M.T.: Classification of emotional speech units in call centre interactions. In: Proceedings of 4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom2013), pp. 403–406. Budapest, Hungary, 2–5 Dec 2013
Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Kiss, G., Tulics, M.G., Sztahó, D., Esposito, A., Vicsi, K.: Language independent detection possibilities of depression by speech. In this volume (2015)
Google Scholar
Kroon, P.: Evaluation of speech coders. In: Paliwal, K.K., Bastiaan Kleijn, W. (eds.) Speech Coding and Synthesis, pp. 467–494. Elsevier Science, Amsterdam (1995)
Google Scholar
Gibson, J.D.: Speech coding methods, standards, and applications. IEEE Circuits Syst. Mag. 5(4), 30–49 (2005)
Article Google Scholar
Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.): Nonlinear Analyses and Algorithms for Speech Processing, LNAI 3817. Springer, Berlin, Heidelberg (2006)
Google Scholar
Lindblom, B.: Explaining phonetic variation: a sketch of the H&H theory. In: Hardcastle, W., Marchal, A. (eds.) Speech Production and Speech Modeling, pp. 403–439. Kluwer, Dordrecht (1990)
Chapter Google Scholar
Meena, R., Skantze, G., Gustafson, J.: Data-driven models for timing feedback responses in a map task dialogue system. Comput. Speech Lang. 28, 903–922 (2014)
Article Google Scholar
Milhorat, P., Schlögl, S., Chollet, G., Boudyy, J., Esposito, A., Pelosi, G.: Building the next generation of personal digital assistants. In: Proceedings of 1st IEEE International Conference on Advanced Technologies for Signal and Image Processing–ATSIP’2014, pp. 458–463. Sousse, Tunisia, 17–19 Mar 2014. ISSN 978-1-4799-4888-8/14/
Google Scholar
Park, N., Rhoads, M., Hou, J., Lee, K.M.: Understanding the acceptance of teleconferencing systems among employees: an extension of the technology acceptance model. Comput. Hum. Behav. 39, 118–127 (2014)
Article Google Scholar
Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recogn. Lett. Elsevier (2014)
Google Scholar
Schullerm, B.: Deep learning our everyday emotions: a short overview. In: Bassis et al. (eds.) Advances in Neural Networks: Computational and Theoretical Issues. Series: SIST Series, vol. 37, pp. 339–346. Springer, Berlin, Heidelberg (2015)
Google Scholar
Scherer, S., Stratou, G., Lucas, G., Mahmoud, M., Boberg, J., Gratch, J., Rizzo, A., Morency, L.P.: Automatic audio-visual behaviour descriptors for psychological disorder analysis. Special Issue on Best of Face and Gesture 2013: Image Vis. Comput. 32(10), 648–658 (2014)
Google Scholar
Skantze, G., Hjalmarsson, A.: Towards incremental speech generation in conversational systems. Comput. Speech Lang. 27, 243–262 (2013)
Article Google Scholar
Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.): Progress in Nonlinear Speech Processing, LNCS 4391. Springer, Berlin, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Seconda Università di Napoli and IIASS, Caserta, Italy
Anna Esposito
Escola Superior Politècnica Tecnocampus (Pompeu Fabra University), Mataró, Spain
Marcos Faundez-Zanuy
Istituto Nazionale di Geofisica e Vulcanologia, sezione di Napoli Osservatorio Vesuviano, Rome, Italy
Antonietta M. Esposito
Department of Psychology, Seconda Università di Napoli and IIASS, Caserta, Italy
Gennaro Cordasco
University of Mons, TCTS Lab.31, Boulevard Dolez, Mons, Belgium
Thomas Drugman
Data and Signal Processing Research Group, University of Vic, Barcelona, Spain
Jordi Solé-Casals
Università degli Studi “Mediterranea” di Reggio Calabria, Reggio Calabria, Italy
Francesco Carlo Morabito

Authors

Anna Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Faundez-Zanuy
View author publications
You can also search for this author in PubMed Google Scholar
Antonietta M. Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Gennaro Cordasco
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Drugman
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Solé-Casals
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Carlo Morabito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Esposito .

Editor information

Editors and Affiliations

Department of Psychology, Seconda Università di Napoli and IIASS, Caserta, Italy
Anna Esposito
(Pompeu Fabra University), Escola Superior Politècnica Tecnocampus, Mataró, Spain
Marcos Faundez-Zanuy
sezione di Napoli Osservatorio, Istituto Nazionale di Geofisica e Vulcan, Napoli, Italy
Antonietta M. Esposito
Department of Psychology, Seconda Universita di Napoli and IIASS, Caserta, Italy
Gennaro Cordasco
Boulevard Dolez, University of Mons, TCTS Lab.31, Mons, Belgium
Thomas Drugman
Data and Signal Processing Research Grou, University of Vic, Vic, Spain
Jordi Solé-Casals
NeuroLab, Università degli Studi "Mediterranea" di, Reggio Calabria, Italy
Francesco Carlo Morabito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Esposito, A. et al. (2016). Recent Advances in Nonlinear Speech Processing: Directions and Challenges. In: Esposito, A., et al. Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-28109-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-28109-4_2
Published: 23 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28107-0
Online ISBN: 978-3-319-28109-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics