Skip to main content

A Computational Approach to Author Identification from Bengali Song Lyrics

  • Conference paper
  • First Online:
Proceedings of International Joint Conference on Computational Intelligence

Abstract

Music is one of the truest forms of art. People listen to music both as a form of entertainment and means of relaxation. Every country or region in the world has its own form and style of music. Bangladesh is no exception as it has a great history of music with a great tradition of song writings over centuries. Although songs are very popular among the enthusiasts, authors of them get little recognition. As a result, author identification from songs, more specifically from lyrics, is an important and realistic possibility. Authorship attribution is one of the ways of identifying the author from a linguistic corpus. This paper demonstrates a guideline to identify the author of a Bengali song from the lyrics of that song using machine learning. It presents the first work on machine learning-based computational approach for author attribution from the lyrics of Bengali songs. Six methods of machine learning were used for the author identification, and high accuracy had been achieved from these methods while applied to the data sets D2A, D4A, and D7A, which were built from Bengali song lyrics. It is observed that the Naive Bayes (NB) classifier provides higher accuracy in comparison with the other methods as it shows 93.9, 85, and 86.7% of accuracy while considering the stop words for our three data sets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AlSallal M, Iqbal R, Palade V, Amin S, Chang V (2017) An integrated approach for intrinsic plagiarism detection. Future Generation Computer Systems

    Google Scholar 

  2. Zheng R, Li J, Chen H, Huang Z (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. J Am Soc Inf Sci Technol 57(3):378–393

    Article  Google Scholar 

  3. Corrêa DC, Rodrigues FA (2016) A survey on symbolic data-based music genre classification. Expert Syst Appl 60:190–210

    Article  Google Scholar 

  4. Deng JJ, Leung CH, Milani A, Chen L (2015) Emotional states associated with music: Classification, prediction of changes, and consideration in recommendation. ACM Trans Interact Intell Syst (TiiS) 5(1),  4

    Article  Google Scholar 

  5. Roblek D, Eck D (2018) Machine learning to generate music from text (July 5 2018) US Patent App. 15/394,895

    Google Scholar 

  6. Goienetxea I, Martínez-Otzeta JM, Sierra B, Mendialdua I (2018) Towards the use of similarity distances to music genre classification: a comparative study. PloS one 13(2):e0191417

    Article  Google Scholar 

  7. Malheiro R, Panda R, Gomes P, Paiva RP (2018) Emotionally-relevant features for classification and regression of music lyrics. IEEE Trans Affect Comput (2), 240–254

    Article  Google Scholar 

  8. Stamatatos E (2009) A survey of modern authorship attribution methods. J Am Soc Inf Sci Technol 60(3):538–556

    Article  Google Scholar 

  9. Chaski CE (2005) Whos at the keyboard? authorship attribution in digital evidence investigations. Int J Digital Evid 4(1):1–13

    Google Scholar 

  10. De Vel O, Anderson A, Corney M, Mohay G (2001) Mining e-mail content for author identification forensics. ACM Sigmod Rec 30(4):55–64

    Article  Google Scholar 

  11. Schein AI, Caver JF, Honaker RJ, Martell CH (2010) Author attribution evaluation with novel topic cross-validation. In: KDIR, Citeseer, pp 206–215

    Google Scholar 

  12. Mara M (2014) Artist attribution via song lyrics

    Google Scholar 

  13. Mayer R, Neumayer R, Rauber A (2008) Rhyme and style features for musical genre classification by song lyrics. In: ISMIR, pp 337–342

    Google Scholar 

  14. Fell M, Sporleder C (2014) Lyrics-based analysis and classification of music. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 620–631

    Google Scholar 

  15. Rish I (2001) An empirical study of the naive bayes classifier

    Google Scholar 

  16. Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inf Fusion 36:10–25

    Article  Google Scholar 

  17. Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L (2014) Syntactic n-grams as machine learning features for natural language processing. Expert Syst Appl 41(3):853–860

    Article  Google Scholar 

  18. Bandhakavi A, Wiratunga N, Padmanabhan D, Massie S (2017) Lexicon based feature extraction for emotion text classification. Pattern Recognit Lett 93:133–142

    Article  Google Scholar 

  19. Allan J, Papka R, Lavrenko V (2017) On-line new event detection and tracking. In: ACM SIGIR forum, vol 51. ACM, pp 185–193

    Google Scholar 

  20. Zhai C, Lafferty J (2014) A study of smoothing methods for language models applied to adhoc information retrieval. In: ACM SIGIR forum, vol 51. ACM, pp 268–276

    Google Scholar 

  21. Jing LP, Huang HK, Shi HB (2002) Improved feature selection approach tfidf in text mining. In: 2002 proceedings international conference on machine learning and cybernetics, vol  2. IEEE, pp 944–946

    Google Scholar 

  22. Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on machine learning. ICML ’04, New York, NY, USA. ACM, pp 116–

    Google Scholar 

  23. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3), 27:1–27:27

    Article  Google Scholar 

  24. Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition

    Google Scholar 

  25. Kibriya AM, Frank E, Pfahringer B, Holmes G (2004) Multinomial naive bayes for text categorization revisited. In: Australasian joint conference on artificial intelligence. Springer, Berlin, pp 488–499

    Google Scholar 

  26. Draper NR, Smith H (2014) Applied regression analysis. Wiley, New York

    Google Scholar 

  27. Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296

    Article  MATH  Google Scholar 

  28. Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585

    MathSciNet  MATH  Google Scholar 

  29. Powers DMW (2011) Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Int J Mach Learn Technol 2(1):37–63

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashraful Islam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ontika, N.N., Kabir, M.F., Islam, A., Ahmed, E., Huda, M.N. (2020). A Computational Approach to Author Identification from Bengali Song Lyrics. In: Uddin, M., Bansal, J. (eds) Proceedings of International Joint Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-13-7564-4_31

Download citation

Publish with us

Policies and ethics