A method of music autotagging based on audio and lyrics

Wang, Hei-Chia; Syu, Sheng-Wei; Wongchaisuwat, Papis

doi:10.1007/s11042-020-10381-y

A method of music autotagging based on audio and lyrics

Published: 04 February 2021

Volume 80, pages 15511–15539, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

878 Accesses
8 Citations
Explore all metrics

Abstract

With the development of the Internet and technology, online music platforms and music streaming services are flourishing. Information overload due to an abundance of digital music has become a common problem for many users. Social tags that are helpful for music recommendations have been discussed. However, label sparsity and a cold start problem, commonly observed with social tags, limit the effectiveness in supporting the recommendation system. A music autotagging system then becomes an alternative solution for supplementing a shortage of tags. Most prior studies on automatic labeling used only audio data for their analysis. However, some studies have suggested that lyrics enhance the music classification system to obtain more information and improve the overall accuracy. In addition to lyrics, audio data are also an important resource for finding music features. In summary, this paper proposes a music autotagging system that relies on both audio and lyrics to solve the above problems. Due to the development of deep learning algorithms in recent years, many scholars have effectively used neural networks to extract audio and textual features. Some of them also considered a structure of lyrics to extract features that consequentially improves the classification task. For lyric feature extraction, this study employs two types of deep learning models: convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The feature extraction architecture is mainly motivated and characterized by the lyric architecture. In addition, a multitask learning method is adopted to learn correlations between tags. The experiments support that a multitask learning classifier that combines audio and lyric information has a better performance than a single-task learning classification method using only audio data than previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Music Auto-tagging Based on Attention Mechanism and Multi-label Classification

Music content personalized recommendation system based on a convolutional neural network

Article 12 December 2023

Hierarchical attentive deep neural networks for semantic music annotation through multiple music representations

Article 01 January 2020

Notes

enwiki-20,190,220-pages-articles:https://dumps.wikimedia.org/enwiki/20190220/

References

Alías, F., Socoró, J. C., & Sevillano, X. (2016). A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences 6(5):143
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Bertin-Mahieux, T., Eck, D., & Mandel, M. I. (2011). Automatic Tagging of Audio: The State-of-the-Art. Machine audition: Principles, algorithms and systems, IGI Global.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Choi K (2018) Deep neural networks for music tagging. Queen Mary University of London
Choi, K., Fazekas, G., & Sandler, M. (2016). Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298.
Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: Paper presented at the 2017 IEEE International Conference on Acoustics. Signal Processing (ICASSP), Speech and
Google Scholar
Coviello, E. (2014). Automatic music tagging with time series models. UC San Diego.
Google Scholar
Datta AK, Solanki SS, Sengupta R, Chakraborty S, Mahto K, Patranabis A (2017) Signal analysis of Hindustani classical music: Springer.Datta, A. K., Solanki, S. S., Sengupta, R., Chakraborty, S., Mahto, K., & Patranabis, A. (2017). Springer Singapore, Signal Analysis of Hindustani Classical Music
Book Google Scholar
De Leon, F., & Martinez, K. (2012). Enhancing timbre model using MFCC and its time derivatives for music similarity estimation. Paper presented at the 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music mood detection based on audio and lyrics with deep neural Net. arXiv preprint arXiv:1809.07276.
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. Paper presented at the IEEE International Conference on Acoustics. Speech and Signal Processing, Florence, Italy
Google Scholar
Duan SF, Zhang JL, Roe P, Towsey M (2014) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661 Retrieved from <Go to ISI>://WOS:000345089400005
Article Google Scholar
Fell M, Sporleder C (2014) Lyrics-based analysis and classification of music. Paper presented at the International Conference on Computational Linguistics. Dublin, Ireland
Google Scholar
Gossi D, Gunes MH (2016) Lyric-based music recommendation. In: Cherifi H, Gonçalves B, Menezes R, Sinatra R (eds) Complex networks VII: Proceedings of the 7th Workshop on Complex Networks CompleNet 2016. Springer International Publishing, Cham, pp 301–310
Chapter Google Scholar
Gouyon, F., Sturm, B., Oliveira, J., Hespanhol, N. & Langlois, T. (2014) On evaluation validity in music autotagging, arXiv preprint arXiv:1410.0001.
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780
Article Google Scholar
Horsburgh B, Craw S, Massie S (2015) Learning pseudo-tags to augment sparse tagging in hybrid music recommender systems. Artificial Intelligence Review 219(C):25–39
Article Google Scholar
Hu X, Choi K, Downie JS (2017) A framework for evaluating multimodal music mood classification. Journal of the Association for Information Science and Technology 68(2):273–285 Retrieved from https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.23649
Article Google Scholar
Huang, Y., Wang, W., Wang, L., & Tan, T. (2013). Multi-task deep neural network for multi-label learning. Paper presented at the 2013 IEEE International Conference on Image Processing.
Huang Y, Wang W, Wang L (2015) Unconstrained multimodal multi-label learning. Ieee Trans Multimed 17(11):1923–1935
Article Google Scholar
Jeong, I.-Y., & Lim, H. (2018). Audio tagging system using densely connected convolutional networks. Paper presented at the Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018).
Kaminskas M, Ricci F, Schedl M (2013) Location-aware music recommendation using auto-tagging and hybrid matching. In: Paper presented at the 7th ACM conference on Recommender systems. China, Hong Kong
Google Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
Kim, T., Lee, J., & Nam, J. (2018). Sample-level cnn architectures for music auto-tagging using raw waveforms. Paper presented at the 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP).
Knees P, Schedl M (2013) A survey of music similarity and recommendation from music context data. Acm Transactions on Multimedia Computing Communications and Applications 10(1):21 Retrieved from <Go to ISI>://WOS:000329025400002
Article Google Scholar
Labrosa. (2011). Last.Fm dataset. Retrieved from: http://labrosa.ee.columbia.edu/millionsong/lastfm
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. Paper presented at the Association for the Advancement of Artificial Intelligence. Austin Texas, USA
Google Scholar
Lee J, Nam J (2017) Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE signal processing letters 24(8):1208–1212
Article Google Scholar
Lee, J., Park, J., Kim, K. L., & Nam, J. (2017). Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv preprint arXiv:.01789.
Lee J, Park J, Kim KL, Nam J (2018) SampleCNN: end-to-end deep convolutional neural networks using very small filters for music classification. Applied Sciences 8(1):150
Article Google Scholar
Liu, K., Li, Y., Xu, N., & Natarajan, P. (2018). Learn to combine modalities in multimodal deep learning. arXiv preprint arXiv:.11730.
Malheiro R, Panda R, Gomes P, Paiva RP (2018) Emotionally-relevant features for classification and regression of music lyrics. IEEE Trans Affective Comput (2):240–254
Marques G, Domingues M, Langlois T, Gouyon F (2011) Three current issues in music autotagging, paper presented at the Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011. Miami, Florida
Google Scholar
Nam, J., Herrera, J., & Lee, K. (2015). A deep bag-of-features model for music auto-tagging. arXiv preprint arXiv:.04999.
Nayyar, R. K., Nair, S., Patil, O., Pawar, R., & Lolage, A. (2017). Content-based auto-tagging of audios using deep learning. Paper presented at the 2017 International Conference on Big Data, IoT and Data Science (BID).
Oğul H, Kırmacı B (2016) Lyrics mining for music meta-data estimation. Paper presented at the International Conference on Artificial Intelligence Applications and Innovations. Thessaloniki, Greece
Google Scholar
Panagakis Y, Kotropoulos C (2012) Automatic music tagging by low-rank representation. In: Paper presented at the 2012 IEEE International Conference on Acoustics. Signal Processing (ICASSP), Speech and
Google Scholar
PwC. (2017). Perspectives from the Global Entertainment and Media Outlook 2017–2021. Retrieved from https://www.pwc.com/gx/en/entertainment-media/pdf/outlook-2017-curtain-up.pdf
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:.05098.
Shao X, Cheng Z, Kankanhalli MS (2019) Music auto-tagging based on the unified latent semantic modeling. Multimedia Tools Applications 78(1):161–176
Article Google Scholar
Sharma G, Umapathy K, Krishnan S (2020) Trends in audio signal feature extraction methods. Applied Acoustics 158:107020
Article Google Scholar
Shen J, Meng W, Yan S, Pang H, Hua X (2010) Effective music tagging through advanced statistical modeling. Paper presented at the Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval.
Shen J, Tao M, Qu Q, Tao D, Rui Y (2019) Toward efficient indexing structure for scalable content-based music retrieval. Multimedia Systems 25(6):639–653
Article Google Scholar
Song G, Wang Z, Han F, Ding S, Iqbal MA (2018) Music auto-tagging using deep recurrent neural networks. Neurocomputing 292:104–110 Retrieved from <Go to ISI>://WOS:000429321400009
Article Google Scholar
Sturm B (2014) The state of the art ten years after a state of the art: future research in music information retrieval. Journal of New Music Research 43(2):147–172
Article Google Scholar
Sung B, Chung M, Ko I (2008) A feature based music content recognition method using simplified MFCC. International Journal Principles Applications of Information Science and Technology 2(1):13–23
Google Scholar
Thiruvengatanadhan R (2018) Music classification using MFCC and SVM. International Research Journal of Engineering and Technology 5:922–924
Google Scholar
Tsaptsinos A (2017) Lyrics-based music genre classification using a hierarchical attention network. arXiv preprint arXiv:1707.04678.
Turnbull D, Barrington L, Torres D, Lanckriet G (2008a) Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, Language Processing 16(2):467–476
Article Google Scholar
Turnbull D, Barrington L, Lanckriet G (2008b) Five approaches to collecting tags for music. Paper presented at the Proceedings of the 9th International Conference on Music Information Retrieval, ISMIR, Philadelphia.
Wang Q, Su F, Wang Y (2019) A hierarchical attentive deep neural network model for semantic music annotation integrating multiple music representations. Paper presented at the Proceedings of the 2019 on International Conference on Multimedia Retrieval.
Wei S, Xu K, Wang D, Liao F, Wang H, Kong Q (2018) Sample mixed-based data augmentation for domestic audio tagging. arXiv preprint arXiv:.03883.
Won M, Chun S, Serra X (2019a) Toward interpretable music tagging with self-attention. arXiv preprint arXiv:.04972.
Won M, Chun S, Nieto O, Serra X (2019b) Automatic music tagging with Harmonic CNN. Paper presented at the 20th International society for music information retrieval Deft, Netherlands.
Xie B, Bian W, Tao D, Chordia P (2011) Music tagging with regularized logistic regression. Paper presented at the ISMIR.
Google Scholar
Xu Y, Kong Q, Wang W, Plumbley MD (2018). Large-scale weakly supervised audio classification using gated convolutional neural network. Paper presented at the 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP).
Zhuang N, Yan Y, Chen S, Wang H, Shen C (2018) Multi-label learning based deep transfer neural network for facial attribute classification. Pattern Recognition 80:225–240
Article Google Scholar
Zuo Y, Zeng J, Gong M, Jiao L (2016) Tag-aware recommender systems based on deep neural networks. Neurocomputing 204:51–60
Article Google Scholar

Download references

Acknowledgements

The research is based on work supported by Taiwan Ministry of Science and Technology under Grant No. MOST 107-2410-H-006 040-MY3 and 108-2511-H-006-009. We would like to thank the Center of Innovative Fintech Business Models for a research grant to support this research.

Author information

Authors and Affiliations

Institute of Information Management, National Cheng Kung University, Tainan City, Taiwan
Hei-Chia Wang & Sheng-Wei Syu
Center for Innovative FinTech Business Models, National Cheng Kung University, Tainan City, Taiwan
Hei-Chia Wang
Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand
Papis Wongchaisuwat

Authors

Hei-Chia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Wei Syu
View author publications
You can also search for this author in PubMed Google Scholar
Papis Wongchaisuwat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hei-Chia Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, HC., Syu, SW. & Wongchaisuwat, P. A method of music autotagging based on audio and lyrics. Multimed Tools Appl 80, 15511–15539 (2021). https://doi.org/10.1007/s11042-020-10381-y

Download citation

Received: 11 June 2020
Revised: 28 September 2020
Accepted: 22 December 2020
Published: 04 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11042-020-10381-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method of music autotagging based on audio and lyrics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Music Auto-tagging Based on Attention Mechanism and Multi-label Classification

Music content personalized recommendation system based on a convolutional neural network

Hierarchical attentive deep neural networks for semantic music annotation through multiple music representations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A method of music autotagging based on audio and lyrics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Music Auto-tagging Based on Attention Mechanism and Multi-label Classification

Music content personalized recommendation system based on a convolutional neural network

Hierarchical attentive deep neural networks for semantic music annotation through multiple music representations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation