Automatic lyrics alignment for Cantonese popular music

Wong, Chi Hang; Szeto, Wai Man; Wong, Kin Hong

doi:10.1007/s00530-006-0055-8

Automatic lyrics alignment for Cantonese popular music

Regular paper
Published: 12 September 2006

Volume 12, pages 307–323, (2007)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Chi Hang Wong¹,
Wai Man Szeto¹ &
Kin Hong Wong¹

335 Accesses
14 Citations
Explore all metrics

Abstract

From lyrics-display on electronic music players and Karaoke videos to surtitles for live Chinese opera performance, one feature is common to all these everyday functionalities temporal: synchronization of the written text and its corresponding musical phrase. Our goal is to automate the process of lyrics alignment, a procedure which, to date, is still handled manually in the Cantonese popular song (Cantopop) industry. In our system, a vocal signal enhancement algorithm is developed to extract vocal signals from a CD recording in order to detect the onsets of the syllables sung and to determine the corresponding pitches. The proposed system is specifically designed for Cantonese, in which the contour of the musical melody and the tonal contour of the lyrics must match perfectly. With this prerequisite, we use a dynamic time warping algorithm to align the lyrics. The robustness of this approach is supported by experiment results. The system was evaluated with 70 twenty-second music segments and most samples have their lyrics aligned correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Abdulla, W.H., Chow, D., Sin, G.: Cross-words reference template for dtw based speech recognition systems. In: IEEE TENCON, pp. 1576–1579 (2003)
Berenzweig, A.L., Ellis, D.P.W.: Locating singing voice segments within music signals. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, vol. W2001, pp. 1–4 (2001)
Boll S.F. (1979): Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process ASSP 27(2): 113–120
Article Google Scholar
Chan, M.K.M.: Tone and melody in cantonese. In: Berkeley Linguistic Society, Proceedings of the Thirteenth Annual Meeting, pp. 26–37 (1987)
de Cheveigné A., Kawahara H. (2002): Yin, a fundamental frequency estimator for speech and music. J Acoust Soc Am 111(4): 1917–1930
Article Google Scholar
Chiang, Y.H., Fan, K., Kwan, T.W.: A chinese talking syllabary of the cantonese dialect. http://humanum.arts.cuhk. edu.hk/lexis/canton2/
Chou, W., Gu, L.: Robust singing detection in speech/music discriminator design. In: Proceedings of the ICASSP, pp. 865–868 (2001)
Clarisse, L.P., Martens, J.P., Lesaffre, M., De Baets, B., DeMeyer, H., Leman, M.: An auditory model based transcriber of singing sequences. In: Proceedings of the ISMIR, pp. 116–123 (2002)
Crystal D. (1997): The Cambridge Encyclopedia of Language, 2nd edn. Cambridge University Press, London
Google Scholar
Dixon, S.: Live tracking of musical performances using on-line time warping. In: Proceedings of the 8th International Conference on Digital Audio Effects (DAFx05), pp. 92–97 (2005)
GoldWave Incorporated.: Goldwave version 5.06. http://www. goldwave.com
Hunt, J.M., Lenning, M., Mermelstein, P.: Experiments in syllable-based recognition of continuous speech. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’80, vol. 5, pp. 880–883 (1980)
Kennedy M. (1994): The Oxford Dictionary of Music, 2nd edn. Oxford University Press, New York
Google Scholar
Klapuri, A.: Automatic transcription of music. Master’s Thesis, Tampere University of Technology (1998)
Lawrence, S., Berenzweig, A., Ellis, D.P.W.: Using voice segments to improve artist classification of music. In: AES 22nd International Conference (2002)
Leung, T.-W., Ngo, C.-W., Lau, R.W.H.: Ica-fx features for classification of singing voice and instrumental sound. In: 17th International Conference on Pattern Recognition (ICPR’04), vol. 2, pp. 367–370 (2004)
Linguistic Society of Hong Kong.: Cantonese romanization. http://cpct92.cityu.edu.hk/lshk/
Loscos, A., Cano, P., Bonada, J.: Low-delay singing voice alignment to text. In: Proceedings of International Computer Music Conference (1999)
Lu, L., Jiang, H., Zhang, H.J.: A robust audio classification and segmentation method. In: Proceedings of the 9th ACM International Conference on Multimedia, pp. 203–211 (2001)
Maddage, N.C., Wan, K., Xu, C., Wang, Y.: Singing voice detection using twice-iterated composite fourier transform. In: 2004 IEEE International Conference on Multimedia and Expo (ICME), pp. 1347–1350 (2004)
Nwe, T.L., Shenoy, A., Wang, Y.: Singing voice detection in popular music. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 324–327 (2004)
Owsinski B. (1999): Mixing Engineer’s Handbook. Thomson Course Technology, Boston
Google Scholar
Pinquier, J., Rouas, J., Andr e Obrecht, R.: Robust speech/ music classification in audio documents. In: International Conference on Spoken Language Processing, vol. 3, pp. 2005–2008 (2002)
Ryynänen, M.P., Klapuri, A.P.: Modelling of note events for singing transcription. In: Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing (2004)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-26, pp. 43–49 (1978)
Scheirer E.D. (1998): Tempo and beat analysis of acoustic musical signals. J Acoust Soc Am 1, 588–601
Article Google Scholar
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of the ICASSP, pp. 1331–1334 (1997)
Vaseghi S.V. (2000): Advanced Signal Processing and Digital Noise Reduction. Wiley, New York
Google Scholar
Wang, Y., Kan, M.-Y., Nwe, T.L., Shenoy, A., Yin, J.: Lyrically: automatic synchronization of acoustic musical signals and textual lyrics. In: MULTIMEDIA ’04: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 212–219 (2004)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Chi Hang Wong, Wai Man Szeto & Kin Hong Wong

Authors

Chi Hang Wong
View author publications
You can also search for this author in PubMed Google Scholar
Wai Man Szeto
View author publications
You can also search for this author in PubMed Google Scholar
Kin Hong Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chi Hang Wong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, C.H., Szeto, W.M. & Wong, K.H. Automatic lyrics alignment for Cantonese popular music. Multimedia Systems 12, 307–323 (2007). https://doi.org/10.1007/s00530-006-0055-8

Download citation

Published: 12 September 2006
Issue Date: March 2007
DOI: https://doi.org/10.1007/s00530-006-0055-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatic lyrics alignment for Cantonese popular music

Abstract

Access this article

Similar content being viewed by others

An Efficient DTW-Based Approach for Melodic Similarity in Flamenco Singing

Towards Automatic Music Performance Comparison with the Multiple Sequence Alignment Technique

Recognition of score words in freestyle kayaking using improved DTW matching

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic lyrics alignment for Cantonese popular music

Abstract

Access this article

Similar content being viewed by others

An Efficient DTW-Based Approach for Melodic Similarity in Flamenco Singing

Towards Automatic Music Performance Comparison with the Multiple Sequence Alignment Technique

Recognition of score words in freestyle kayaking using improved DTW matching

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation