Comparing the Rhythmical Characteristics of Speech and Music – Theoretical and Practical Issues

Hübler, Stephan; Hoffmann, Rüdiger

doi:10.1007/978-3-642-18184-9_33

Comparing the Rhythmical Characteristics of Speech and Music – Theoretical and Practical Issues

Stephan Hübler²¹ &
Rüdiger Hoffmann²¹

Chapter

1180 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6456))

Abstract

By comparing the features of music and speech in intelligent audio signal processing, both related research fields might benefit from each other. Music and speech serve as a way for humans to express themselves. The aim of this study is to show similarities and differences between music and speech by comparing the hierarchical structures with an emphasis on rhythm. Especially examining the temporal structure of music and speech could lead to new interesting features that improve existing technology. For example utilizing rhythm in synthetic speech is still an open issue as well as rhythmic features have to be improved for music in the fields of semantic search and music similarity retrieval. Theoretical aspects of rhythm in speech and music are discussed as well as practical issues in speech and music research. To show that common approaches are inherently feasible, an algorithm for onset detection is applied to speech and musical signals.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Spiegelbild der Sprache - Neurokognition von Musik (May 2010), http://www.cbs.mpg.de/institute/foci/mirror
Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing 13(5), 1035–1047 (2005)
Article Google Scholar
Caruntu, A., Toderean, G., Nica, A.: Automatic silence/unvoiced/voiced classifcation of speech using a modified teager energy feature. In: WSEAS Int. Conf. on Dynamical Systems and Control, Venice, Italy, pp. 62–65 (2005)
Google Scholar
Collins, N.: Towards Autonomous Agents for Live Computer Music: Realtime Machine Listening and Interactive Music Systems. PhD thesis, Centre for Science and Music, Faculty of Music, University of Cambridge (2006)
Google Scholar
Cummins, F.: Rhythmic Coordination in English Speech: An Experimental Study. PhD thesis, Indiana University (1997)
Google Scholar
Cushing, I.R., Dellwo, V.: The role of speech rhythm in attending to one of two simultaneous speakers. In: Speech Prosody, 5th International Conference, Chicago, Illinois (2010)
Google Scholar
Dauer, R.M.: Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51–62 (1983)
Google Scholar
Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. In: ISMIR, 4th International Society for Music Information Retrieval Conference, Baltimore, USA, pp. 159–165 (2003)
Google Scholar
Gouyon, F., Dixon, S.: A review of automatic rhythm description systems. Computer Music Journal 29(1), 34–35 (2005)
Article Google Scholar
Gouyon, F., Dixon, S., Pampalk, E., Widmer, G.: Evaluating rhythmic descriptors for musical genre classification. In: AES, 25th International Conference, London, UK (June 2004)
Google Scholar
Hübler, S., Wolff, M., Eichner, M.: Vergleich statistischer Klassifikatoren zur Ermittlung musikalischer Aspekte. In: Hoffmann, R. (ed.) Elektronische Sprachsignalverarbeitung. Tagungsband der 20. Konferenz, Dresden, 21. - 23. 9, of Studientexte zur Sprachkommunikation, Dresden, Germany, vol. 53, pp. 338–345 (September 2009)
Google Scholar
Hirst, D.: The rhythm of text and the rhythm of utterances: from metrics to models. In: Interspeech, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, pp. 1519–1522 (2009)
Google Scholar
Hoffmann, R., Eichner, M., Wolff, M.: Analysis of verbal and nonverbal acoustic signals with the dresden UASR system. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 200–218. Springer, Heidelberg (2007)
Chapter Google Scholar
Keller, E.: A phonetician’s view of signal generation for speech synthesis. In: Vich, R. (ed.) Electronic Speech Signal Processing, 16th Conference, Prague. Studientexte zur Sprachkommunikation, vol. 36, pp. 13–20. TUDpress, Dresden (2005)
Google Scholar
Keller, E.: Beats for individual timing variation. In: Esposito, A., Bratanic, M., Keller, E., Marinaro, M. (eds.) The Fundamentals of Verbal and Non-verbal Communication and the Biometric Issue, pp. 115–128. IOS Press, Amsterdam (2007)
Google Scholar
Keller, E.: From sound to rhythm expectancy (Tutorial). 5o Convegno Nazionale AISV. Università de Zurigo, Switzerland (2009)
Google Scholar
Kühne, M., Wolff, M., Eichner, M., Hoffmann, R.: Voice activation using prosodic features. In: Interspeech, 8th International Conference on Spoken Language Processing, Jeju, Korea, pp. 3001–3004 (2004)
Google Scholar
Klapuri, A.: Sound onset detection by appliying psychoacoustic knowledge. In: ICASSP, Phoenix, USA, vol. 6, pp. 3089–3092 (March1999)
Google Scholar
Kompe, R.: Prosody in Speech Understanding Systems. LNCS, vol. 1307. Springer, Heidelberg (1997)
Google Scholar
Kotnik, B., Sendorek, P., Astrov, S., Koc, T., Ciloglu, T., Fernández, L.D., Banga, E.R., Höge, H., Kacic, Z.: Evaluation of voice activity and voicing detection. In: Interspeech, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, pp. 1642–1645 (2008)
Google Scholar
Krishna, A.G., Sreenivas, T.V.: Music instrument recognition: From isolated notes to solo phrases. In: ICASSP, pp. 265–268 (2004)
Google Scholar
Lehiste, I.: Isochrony reconsidered. Journal of Phonetics 5, 253–263 (1977)
Google Scholar
Lehrdal, F., Jackendoff, R.: The Generative Theory of Tonal Music. MIT Press, Cambridge (1983)
Google Scholar
Leuschel, A., Docherty, G.J.: Prosodic assessment of dysarthria. In: Disorders of Motor Speech: Assessment, Treatment and Clinical Characterization, pp. 155–178. Paul H Brookes Publishing Co. Inc. (1996)
Google Scholar
London, J.: Grove music online: Rhythm (April 2010), http://www.grovemusic.com
Aniruddh Patel, D.: Language, music, syntax and the brain. Nature Neuroscience 6(7), 674–681 (2003)
Article Google Scholar
Volín, J., Pollák, P.: The dynamic dimension of the global speech-rhythm attributes. In: Interspeech, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, pp. 1543–1546 (2009)
Google Scholar
Werner, S., Eichner, M., Wolff, M., Hoffmann, R.: Toward spontanuos speech synthesis - utilizing language model information in tts. IEEE Transactions on Speech and Audio Processing 12(4), 436–445 (2004)
Article Google Scholar
Wolkowicz, J., Kesel, V.: Predicting development of research in music based on parallel with natural language processing. In: ISMIR, 11th International Society for Music Information Retrieval Conference, Utrecht, Netherlands, pp. 665–667 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Acoustics and Speech Communication, Technische Universität Dresden, 01062, Dresden, Germany
Stephan Hübler & Rüdiger Hoffmann

Authors

Stephan Hübler
View author publications
You can also search for this author in PubMed Google Scholar
Rüdiger Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Advanced Scientific Studies, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
Istituto Nazionale di Geofisica e Vulcanologia, Osservatorio Vesuviano, Via Diocleziano 328, 80124, Napoli, Italy
Antonietta M. Esposito
Dipartemento di Ingegneria dell’ Informazione, Seconda Università di Napoli, Via Roma 29, 81031, Aversa (CE), Italy
Raffaele Martone
Department of Humanities and Social Sciences, Anatolia College/ACT, Kennedy Street, 55510, Pylaia, Greece
Vincent C. Müller
Departmnet of Physics "E.R. Caoamoeööp", University of Salerno and IIASS, International Institute for Advanced Scientific Studies, 84081, Baronissi (SA), Italy
Gaetano Scarpetta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hübler, S., Hoffmann, R. (2011). Comparing the Rhythmical Characteristics of Speech and Music – Theoretical and Practical Issues. In: Esposito, A., Esposito, A.M., Martone, R., Müller, V.C., Scarpetta, G. (eds) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues. Lecture Notes in Computer Science, vol 6456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18184-9_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-18184-9_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18183-2
Online ISBN: 978-3-642-18184-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics