ABSTRACT
Online activities such as social networking, shopping, and consuming multi-media create digital traces often used to improve user experience and increase revenue, e.g., through better-fitting recommendations and targeted marketing. We investigate to which extent the music listening habits of users of the social music platform Last.fm can be used to predict their age, gender, and nationality. We propose a TF-IDF-like feature modeling approach for artist listening information and artist tags combined with additionally extracted features. We show that we can substantially outperform a baseline majority voting approach and can compete with existing approaches. Further, regarding prediction accuracy vs. available listening data we show that even one single listening event per user is enough to outperform the baseline in all prediction tasks. We conclude that personal information can be derived from music listening information, which indeed can help better tailoring recommendations.
- Z. Cheng, J. Caverlee, and K. Lee. 2010. You Are Where You Tweet: A Content-based Approach to Geo-locating Twitter Users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 759--768. Google ScholarDigital Library
- J. Golbeck, C. Robles, and K. Turner. 2011. Predicting Personality with Social Media. In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems. ACM, 253--262. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations 11,1 (2009), 10--18. Google ScholarDigital Library
- T. Hastie and R. Tibshirani. 1998. Classification by Pairwise Coupling. In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10. MIT Press. Google ScholarCross Ref
- G. Holmes, M. Hall, and E. Frank. 1999. Generating Rule Sets from Model Trees. In Proceedings of the 12th Australian Joint Conference on Artificial Intelligence. Springer, 1--12. Google ScholarCross Ref
- H. Hotelling. 1933. Analysis of a Complex of Statistical Variables Into Principal Components. Journal of Educational Psychology 24, 6 (1933), 417--441 and 498--520.Google ScholarCross Ref
- Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 8th IEEE International Conference on Data Mining. IEEE, 263--272. Google ScholarDigital Library
- G. John and P. Langley. 1995. Estimating Continuous Distributions in Bayesian Classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 338--345.Google ScholarDigital Library
- S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and K.R.K. Murthy. 2001. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation 13, 3 (2001), 637--649. Google ScholarDigital Library
- M. Kosinski, D. Stillwell, and T. Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802--5805. Google ScholarCross Ref
- S. le Cessie and J.C. van Houwelingen. 1992. Ridge Estimators in Logistic Regression. Applied Statistics 41, 1 (1992), 191--201. Google ScholarCross Ref
- J. Liu and Y. Yang. 2012. Inferring Personal Traits from Music Listening History. In Proceedings of the 2nd International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies. ACM, 31--36. Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
- J. Platt. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola (Eds.). MIT Press.Google ScholarDigital Library
- R. Quinlan. 1992. Learning with Continuous Classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence. World Scientific, 343--348.Google Scholar
- M. Schedl. 2016. The LFM-1b Dataset for Music Retrieval and Recommendation. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 103--110. Google ScholarDigital Library
- M. Schedl, D. Hauger, K. Farrahi, and M. Tkalčič. 2015. On the Influence of User Characteristics on Music Recommendation. In Proceedings of the 37th European Conference on Information Retrieval. Springer. Google ScholarCross Ref
- S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, and K.R.K. Murthy. 1999. Improvements to the SMO Algorithm for SVM Regression. IEEE Transactions on Neural Networks 11 (1999), 1188--1193. Google ScholarDigital Library
- A.J. Smola and B. Schoelkopf. 1998. A tutorial on support vector regression. Technical Report. NeuroCOLT2 Tech. Rep. NC2-TR-1998-030.Google Scholar
- J. Su, H. Zhang, C. Ling, and S. Matwin. 2008. Discriminative Parameter Learning for Bayesian Networks. In Proceedings of the 25th International Conference on Machine Learning. ACM, 1016--1023. Google ScholarDigital Library
- Y. Wang and I.H. Witten. 1997. Induction of model trees for predicting continuous classes. In Poster papers of the 9th European Conference on Machine Learning. Springer.Google Scholar
- M. Wu, J. Jang, and C. Lu. 2014. Gender Identification and Age Estimation of Users Based on Music Metadata. In Proceedings of the 15th International Society for Music Information Retrieval Conference. ISMIR, 555--560.Google Scholar
- W. Youyou, M. Kosinski, and D. Stillwell. 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 1036--1040. Google ScholarCross Ref
Index Terms
- Prediction of User Demographics from Music Listening Habits
Recommendations
Predicting user demographics from music listening information
Online activities such as social networking, online shopping, and consuming multi-media create digital traces, which are often analyzed and used to improve user experience and increase revenue, e. g., through better-fitting recommendations and more ...
Automatic playlist generation based on tracking user's listening habits
Algorithms for automatic playlist generation solve the problem of tedious and time consuming manual selection of musical playlists. These algorithms generate playlists according to the user's music preferences of the moment. The user describes his ...
Music/lyrics composition system considering user's image and music genre
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and CyberneticsThis paper proposes a music/lyrics composition system consisting of two sections, a lyric composing section and a music composing section, which considers user's image of a song and music genre. First of all, a user has an image of music/lyrics to ...
Comments