short-paper

Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper

Authors:
Danbinaerin Han

Department of Art & Technology, Sogang University, South Korea

Department of Art & Technology, Sogang University, South Korea

0009-0008-1048-5466
View Profile

,
Daewoong Kim

Department of Artificial Intelligence, Sogang University, South Korea

Department of Artificial Intelligence, Sogang University, South Korea

0009-0004-0841-2314
View Profile

,
Dasaem Jeong

Department of Art & Technology, Sogang University, South Korea

Department of Art & Technology, Sogang University, South Korea

0009-0002-3655-1181
View Profile

DLfM '23: Proceedings of the 10th International Conference on Digital Libraries for MusicologyNovember 2023Pages 7–11https://doi.org/10.1145/3625135.3625154

Published:10 November 2023Publication History

DLfM '23: Proceedings of the 10th International Conference on Digital Libraries for Musicology

Pages 7–11

ABSTRACT

In this study, we introduce a method for time-alignment of lyrics in Korean folk song audio using a transformer encoder-decoder model specifically designed to utilize incomplete lyric data. We analyzed the characteristics of Korean folk song lyrics and found some discrepancies between the lyrics and the corresponding audio recordings. To address these challenges and maximize the use of existing transcriptions, we introduce RefWhisper. This is a variant of OpenAI’s Whisper and includes an extra encoder module and cross-attention layer, enabling the model to consult incomplete lyrics during the transcription process. The added cross-attention layer facilitates not only the alignment of the reference text with the predicted transcription but also with the audio. We make public the transcribed outcomes and timestamp data, which are aligned at both the sentence and word levels, for a corpus of 13,801 Korean folk songs.

References

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449–12460.Google Scholar
Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng. 2016. Alignment of lyrics with accompanied singing audio based on acoustic-phonetic vowel likelihood modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 11 (2016), 1998–2008.Google ScholarDigital Library
Yejin Cho. 2017. Korean Grapheme-to-Phoneme Analyzer (KoG2P). https://github.com/scarletcho/KoG2P.Google Scholar
Sang Il Choi. 2000. Articles on Recordings : ‘Anthology of Korean Traditional Folksongs’ About the Project and the Records Published(창간 10 주년 기념호: 음반;’한국민요대전’사업과 음반 발간). Korean Recording Studies(한국음반학) 10 (2000), 459–480.Google Scholar
Simon Durand, Daniel Stoller, and Sebastian Ewert. 2023. Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.Google ScholarCross Ref
Georgi Bogomilov Dzhambazov, Ajay Srinivasamurthy, Sertan Sentürk, and Xavier Serra. 2016. On the use of note onsets for improved lyrics-to-audio alignment in turkish makam music. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR). International Society for Music Information Retrieval (ISMIR), 716–722.Google Scholar
Toni Giorgino. 2009. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software 31, 7 (2009). https://doi.org/10.18637/jss.v031.i07Google ScholarCross Ref
Danbinaerin Han, Rafael Caro Repetto, and Dasaem Jeong. 2023. Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR).Google Scholar
iMBC. 1991-1996. Comentary of ‘Anthology of Korean Traditional Folksongs’. Copyright 1994 by Munhwa Broadcasting Corporation, Yoido-dong 31, Yongdeungpo-gu, Seoul, Korea. http://www.urisori.co.kr/doku.php?id=%ED%95%9C%EA%B5%AD%EB%AF%BC%EC%9A%94%EB%8C%80%EC%A0%84_%EC%9E%90%EB%A3%8CcdGoogle Scholar
Sang Won Lee and Jeffrey Scott. 2017. Word level lyrics-audio synchronization using separated vocals. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 646–650.Google ScholarDigital Library
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In Proceedings of 7th International Conference on Learning Representations (ICLR).Google Scholar
Jérôme Louradour. 2023. whisper-timestamped. https://github.com/linto-ai/whisper-timestamped.Google Scholar
Eric Nichols, Dan Morris, Sumit Basu, and Christopher Raphael. 2009. Relationships between lyrics and melody in popular music. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR). 471–476.Google Scholar
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In Proceedings of International Conference on Machine Learning (ICML). PMLR, 28492–28518.Google Scholar
Rafael Caro Repetto, Shuo Zhang, and Xavier Serra. 2017. Quantitative analysis of the relationship between linguistic tones and melody in jingju using music scores. In Proceedings of the 4th International Workshop on Digital Libraries for Musicology. 41–44.Google ScholarDigital Library
Daniel Stoller, Simon Durand, and Sebastian Ewert. 2019. End-to-end lyrics alignment for polyphonic music using an audio-to-character recognition model. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 181–185.Google ScholarCross Ref
Shuo Zhang, Rafael Caro Repetto, and Xavier Serra. 2014. Study of the similarity between linguistic tones and melodic pitch contours in Beijing opera singing. In Proceedings of the 15th Conference of the International Society for Music Information Retrieval (ISMIR 2014); 2014 Oct 27-31; Taipei, Taiwan. Taipei: International Society for Music Information Retrieval; 2014. International Society for Music Information Retrieval (ISMIR), 343–348.Google Scholar

Index Terms

Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper
1. Applied computing
  1. Computers in other domains
    1. Digital libraries and archives
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Automated analysis of performance variations in folk song recordings
MIR '10: Proceedings of the international conference on Multimedia information retrieval

Performance analysis of recorded music material has become increasingly important in musicological research and music psychology. In this paper, we present various techniques for extracting performance aspects from field recordings of folk songs. Main ...
Read More
A Trend Analysis on Concreteness of Popular Song Lyrics
DLfM '19: Proceedings of the 6th International Conference on Digital Libraries for Musicology

Recently, music complexity has drawn attention from researchers in Music Digital Libraries area. In particular, computational methods to measure music complexity have been studied to provide better music services in large-scale music digital libraries. ...
Read More
LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a multimodal approach, where the appropriate pairing of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DLfM '23: Proceedings of the 10th International Conference on Digital Libraries for Musicology
November 2023
139 pages
ISBN:9798400708336
DOI:10.1145/3625135
Editor:
Martha E. Thomae
McGill University / Universidade NOVA de Lisoba
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DNN
Korean folk song
datasets
lyric alignment
lyric transcription
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate27of48submissions,56%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 69
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper

DLfM '23: Proceedings of the 10th International Conference on Digital Libraries for Musicology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automated analysis of performance variations in folk song recordings

A Trend Analysis on Concreteness of Popular Song Lyrics

LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper

DLfM '23: Proceedings of the 10th International Conference on Digital Libraries for Musicology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automated analysis of performance variations in folk song recordings

A Trend Analysis on Concreteness of Popular Song Lyrics

LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media