Skip to main content
Log in

Automatic lyrics alignment for Cantonese popular music

  • Regular paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

From lyrics-display on electronic music players and Karaoke videos to surtitles for live Chinese opera performance, one feature is common to all these everyday functionalities temporal: synchronization of the written text and its corresponding musical phrase. Our goal is to automate the process of lyrics alignment, a procedure which, to date, is still handled manually in the Cantonese popular song (Cantopop) industry. In our system, a vocal signal enhancement algorithm is developed to extract vocal signals from a CD recording in order to detect the onsets of the syllables sung and to determine the corresponding pitches. The proposed system is specifically designed for Cantonese, in which the contour of the musical melody and the tonal contour of the lyrics must match perfectly. With this prerequisite, we use a dynamic time warping algorithm to align the lyrics. The robustness of this approach is supported by experiment results. The system was evaluated with 70 twenty-second music segments and most samples have their lyrics aligned correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdulla, W.H., Chow, D., Sin, G.: Cross-words reference template for dtw based speech recognition systems. In: IEEE TENCON, pp. 1576–1579 (2003)

  2. Berenzweig, A.L., Ellis, D.P.W.: Locating singing voice segments within music signals. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, vol. W2001, pp. 1–4 (2001)

  3. Boll S.F. (1979): Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process ASSP 27(2): 113–120

    Article  Google Scholar 

  4. Chan, M.K.M.: Tone and melody in cantonese. In: Berkeley Linguistic Society, Proceedings of the Thirteenth Annual Meeting, pp. 26–37 (1987)

  5. de Cheveigné A., Kawahara H. (2002): Yin, a fundamental frequency estimator for speech and music. J Acoust Soc Am 111(4): 1917–1930

    Article  Google Scholar 

  6. Chiang, Y.H., Fan, K., Kwan, T.W.: A chinese talking syllabary of the cantonese dialect. http://humanum.arts.cuhk. edu.hk/lexis/canton2/

  7. Chou, W., Gu, L.: Robust singing detection in speech/music discriminator design. In: Proceedings of the ICASSP, pp. 865–868 (2001)

  8. Clarisse, L.P., Martens, J.P., Lesaffre, M., De Baets, B., DeMeyer, H., Leman, M.: An auditory model based transcriber of singing sequences. In: Proceedings of the ISMIR, pp. 116–123 (2002)

  9. Crystal D. (1997): The Cambridge Encyclopedia of Language, 2nd edn. Cambridge University Press, London

    Google Scholar 

  10. Dixon, S.: Live tracking of musical performances using on-line time warping. In: Proceedings of the 8th International Conference on Digital Audio Effects (DAFx05), pp. 92–97 (2005)

  11. GoldWave Incorporated.: Goldwave version 5.06. http://www. goldwave.com

  12. Hunt, J.M., Lenning, M., Mermelstein, P.: Experiments in syllable-based recognition of continuous speech. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’80, vol. 5, pp. 880–883 (1980)

  13. Kennedy M. (1994): The Oxford Dictionary of Music, 2nd edn. Oxford University Press, New York

    Google Scholar 

  14. Klapuri, A.: Automatic transcription of music. Master’s Thesis, Tampere University of Technology (1998)

  15. Lawrence, S., Berenzweig, A., Ellis, D.P.W.: Using voice segments to improve artist classification of music. In: AES 22nd International Conference (2002)

  16. Leung, T.-W., Ngo, C.-W., Lau, R.W.H.: Ica-fx features for classification of singing voice and instrumental sound. In: 17th International Conference on Pattern Recognition (ICPR’04), vol. 2, pp. 367–370 (2004)

  17. Linguistic Society of Hong Kong.: Cantonese romanization. http://cpct92.cityu.edu.hk/lshk/

  18. Loscos, A., Cano, P., Bonada, J.: Low-delay singing voice alignment to text. In: Proceedings of International Computer Music Conference (1999)

  19. Lu, L., Jiang, H., Zhang, H.J.: A robust audio classification and segmentation method. In: Proceedings of the 9th ACM International Conference on Multimedia, pp. 203–211 (2001)

  20. Maddage, N.C., Wan, K., Xu, C., Wang, Y.: Singing voice detection using twice-iterated composite fourier transform. In: 2004 IEEE International Conference on Multimedia and Expo (ICME), pp. 1347–1350 (2004)

  21. Nwe, T.L., Shenoy, A., Wang, Y.: Singing voice detection in popular music. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 324–327 (2004)

  22. Owsinski B. (1999): Mixing Engineer’s Handbook. Thomson Course Technology, Boston

    Google Scholar 

  23. Pinquier, J., Rouas, J., Andr e Obrecht, R.: Robust speech/ music classification in audio documents. In: International Conference on Spoken Language Processing, vol. 3, pp. 2005–2008 (2002)

  24. Ryynänen, M.P., Klapuri, A.P.: Modelling of note events for singing transcription. In: Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing (2004)

  25. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-26, pp. 43–49 (1978)

  26. Scheirer E.D. (1998): Tempo and beat analysis of acoustic musical signals. J Acoust Soc Am 1, 588–601

    Article  Google Scholar 

  27. Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of the ICASSP, pp. 1331–1334 (1997)

  28. Vaseghi S.V. (2000): Advanced Signal Processing and Digital Noise Reduction. Wiley, New York

    Google Scholar 

  29. Wang, Y., Kan, M.-Y., Nwe, T.L., Shenoy, A., Yin, J.: Lyrically: automatic synchronization of acoustic musical signals and textual lyrics. In: MULTIMEDIA ’04: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 212–219 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi Hang Wong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, C.H., Szeto, W.M. & Wong, K.H. Automatic lyrics alignment for Cantonese popular music. Multimedia Systems 12, 307–323 (2007). https://doi.org/10.1007/s00530-006-0055-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-006-0055-8

Keywords

Navigation