Abstract
Humans have very high requirements and expectations when communicating through speech, other than simplicity, flexibility and easiness of interaction . This is because voice interactions do not require cognitive efforts, attention, and memory resources. Voice technologies are however still constrained to use cases and scenarios giving the existing limitations of speech synthesis and recognition systems. Which is the status of nonlinear speech processing techniques and the steps made for cross-fertilization among disciplines? This chapter will provide a short overview trying to answer the above question.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Here “language” is intended to be “the verbal language” as opposed to other general meanings of the term. The interpretation of a “language” as a code can be found in De Saussure [9].
References
Arjona Ramírez, M., Minami, M.: Technology and standards for low-bit-rate vocoding methods. In: Bidgoli, H. (ed.) The Handbook of Computer Networks, vol. 2, pp. 447–467. Wiley, New York (2011)
Arjona Ramírez, M., Minami, M.: Low bit rate speech coding. In: Proakis, J.G. (ed.) Wiley Encyclopedia of Telecommunications, vol. 3, pp. 1299–1308. Wiley, New York (2003)
Atassi, H., Esposito, A., Smekal, Z.: Analysis of high-level features for vocal emotion recognition. In: Proceedings of 34th IEEE International Conference on Telecommunication and Signal Processing (TSP), pp. 361–366 (2011)
Atassi, H., Riviello, M.T., Smekal, Z., Hussain, A., Esposito, A.: Emotional vocal expressions recognition using the cost 2102 italian database of emotional speech. In: Esposito, A., et al. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony, LNCS 5967, pp. 255–267. Springer, Berlin, Heidelberg (2010)
Atassi, H., Esposito, A.: Speaker independent approach to the classification of emotional vocal expressions. In: Proceedings of IEEE Conference on Tools with Artificial Intelligence (ICTAI 2008), vol. 1, pp. 487–494 (2008)
Butterworth, B.L., Beattie, G.W.: Gestures and silence as indicator of planning in speech. In: Smith, P.T., Campbell, R.N. (eds.) Recent Advances in the Psychology of Language, pp. 347–360. Olenum Press, New York (1978)
Chafe, W.L.: Cognitive constraint on information flow. In: Tomlin, R. (ed.) Coherence and Grounding in Discourse, pp. 20–51. John Benjamins, Amsterdam (1987)
Cordasco, G., Esposito, M., Masucci, F., Riviello, M.T., Esposito, A., Chollet, G., Schlögl, S., Milhorat, P., Pelosi, G.: Assessing voice user interfaces: the vAssist system prototype. In: 5th IEEE International Conference on Cognitive InfoCommunications, pp. 91–96. Vietri sul Mare, 5–7 Nov 2014
De Saussure, F.: Cours de linguistique générale. Editions Payot, Paris (1922)
Esposito, A., Esposito, A.M., Vogel, C.: Needs and challenges in human computer interaction for processing social emotional information. Pattern Recogn. Lett. 66, 41–51 (2015)
Esposito, A., Esposito, A.M., Likforman, L., Maldonato, M.N., Vinciarelli, A.: On the significance of speech pauses in depressive disorders: results on read and spontaneous narratives. In this volume (2015)
Esposito, A.: The situated multimodal facets of human communication. In: Rojc, M., Campbell, N. (eds.) Coverbal Synchrony in Human-Machine Interaction, ch. 7, pp. 173–202. CRC Press, Taylor & Francis Group, Boca Raton, FL (2013)
Esposito, A., Marinaro, M.: What pauses can tell us about speech and gesture partnership. In: Esposito, A., et al. (eds.) Fundamentals of Verbal and Nonverbal Communication and the Biometric Issue. NATO Publishing Series, vol. 18, pp. 45–57. IOS Press, The Netherlands (2007)
Esposito, A., Bourbakis, N.G.: The role of timing in speech perception and speech production processes and its effects on language impaired individuals. In: Proceedings of the 6th International IEEE Symposium on BioInformatics and BioEngineering (BIBE), pp. 348–356 (2006)
Esposito, A.: The importance of data for training intelligent devices. In: Apolloni, B., Kurfess, C. (eds.) From Synapses to Rules: Discovering Symbolic Knowledge from Neural Processed Data, pp. 229–250. Kluwer Academic Press, Dordrecht (2002)
Esposito, A.: Approaching speech signal problems: an unifying viewpoint for the speech recognition process. In: Suarez Garcia, S., Baron Fernandez, R. (eds.) Memoria of Taller Internacional de Tratamiento del Habla, Procesamiento de Vos y el Language, CIC-IPN Obra Compleata (2000). ISBN: 970-18-4936-1
Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., Riviello, M.T.: Classification of emotional speech units in call centre interactions. In: Proceedings of 4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom2013), pp. 403–406. Budapest, Hungary, 2–5 Dec 2013
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Kiss, G., Tulics, M.G., Sztahó, D., Esposito, A., Vicsi, K.: Language independent detection possibilities of depression by speech. In this volume (2015)
Kroon, P.: Evaluation of speech coders. In: Paliwal, K.K., Bastiaan Kleijn, W. (eds.) Speech Coding and Synthesis, pp. 467–494. Elsevier Science, Amsterdam (1995)
Gibson, J.D.: Speech coding methods, standards, and applications. IEEE Circuits Syst. Mag. 5(4), 30–49 (2005)
Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.): Nonlinear Analyses and Algorithms for Speech Processing, LNAI 3817. Springer, Berlin, Heidelberg (2006)
Lindblom, B.: Explaining phonetic variation: a sketch of the H&H theory. In: Hardcastle, W., Marchal, A. (eds.) Speech Production and Speech Modeling, pp. 403–439. Kluwer, Dordrecht (1990)
Meena, R., Skantze, G., Gustafson, J.: Data-driven models for timing feedback responses in a map task dialogue system. Comput. Speech Lang. 28, 903–922 (2014)
Milhorat, P., Schlögl, S., Chollet, G., Boudyy, J., Esposito, A., Pelosi, G.: Building the next generation of personal digital assistants. In: Proceedings of 1st IEEE International Conference on Advanced Technologies for Signal and Image Processing–ATSIP’2014, pp. 458–463. Sousse, Tunisia, 17–19 Mar 2014. ISSN 978-1-4799-4888-8/14/
Park, N., Rhoads, M., Hou, J., Lee, K.M.: Understanding the acceptance of teleconferencing systems among employees: an extension of the technology acceptance model. Comput. Hum. Behav. 39, 118–127 (2014)
Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recogn. Lett. Elsevier (2014)
Schullerm, B.: Deep learning our everyday emotions: a short overview. In: Bassis et al. (eds.) Advances in Neural Networks: Computational and Theoretical Issues. Series: SIST Series, vol. 37, pp. 339–346. Springer, Berlin, Heidelberg (2015)
Scherer, S., Stratou, G., Lucas, G., Mahmoud, M., Boberg, J., Gratch, J., Rizzo, A., Morency, L.P.: Automatic audio-visual behaviour descriptors for psychological disorder analysis. Special Issue on Best of Face and Gesture 2013: Image Vis. Comput. 32(10), 648–658 (2014)
Skantze, G., Hjalmarsson, A.: Towards incremental speech generation in conversational systems. Comput. Speech Lang. 27, 243–262 (2013)
Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.): Progress in Nonlinear Speech Processing, LNCS 4391. Springer, Berlin, Heidelberg (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Esposito, A. et al. (2016). Recent Advances in Nonlinear Speech Processing: Directions and Challenges. In: Esposito, A., et al. Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-28109-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-28109-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28107-0
Online ISBN: 978-3-319-28109-4
eBook Packages: EngineeringEngineering (R0)