Abstract
These days, due to the increasing amount of information generated on the web, most web service providers try to personalize their services. Users also interact with web-based systems in multiple ways and state their interests and preferences by rating the provided items. In this paper, we propose a framework to predict users’ demographic based on ratings registered by users in a system. To the best of our knowledge, this is the first time that the item ratings are employed for users’ demographic prediction problem, which has extensively been studied in recommendation systems and service personalization. We apply the framework to Movielens dataset’s ratings and predict users’ age and gender. The experimental results show that using all ratings registered by users improves the prediction accuracy by at least 16% compared with previously studied models. Moreover, by classifying the items as popular and unpopular, we eliminate ratings belong to 95% of items and still reach an acceptable level of accuracy. This significantly reduces update cost in a time-varying environment. Besides this classification, we propose other methods to reduce data volume while keeping the predictions accurate.





Similar content being viewed by others
Data Availability
The data used in this research is readily available online and has been cited appropriately within the text of this paper. Specifically, we utilized two publicly accessible datasets: the Internet Movie Database (IMDb) dataset and the MovieLens dataset. The IMDb and MovieLens dataset can be accessed through the mentioned links in Section 2.1. Detailed information on our utilization of these datasets, including data preprocessing and citation, is provided in the relevant sections of this paper. Researchers interested in reproducing or further exploring our findings are encouraged to refer to these sources for access to the data used in this study.
References
Ahmadian S, Joorabloo N, Jalili M, Ren Y, Meghdadi M, Afsharchi M (2020) A social recommender system based on reliable implicit relationships. Knowl-Based Syst 192:105371. https://doi.org/10.1016/j.knosys.2019.105371
Al-Zuabi IM, Jafar A, Aljoumaa K (2019) Predicting customer’s gender and age depending on mobile phone data. Journal of Big Data 6(1):1–16. https://doi.org/10.1186/s40537-019-0180-9
Bin Tareaf R, Berger P, Hennig P, Jung J, Meinel C (2017) Identifying audience attributes: predicting age, gender and personality for enhanced article writing. In: Proceedings of the 2017 international conference on cloud and big data computing, pp 79–88. https://doi.org/10.1145/3141128.3141129
Díez J, Martínez-Rego D, Alonso-Betanzos A, Luaces O, Bahamonde A (2019) Optimizing novelty and diversity in recommendations. Progress in Artificial Intelligence 8(1):101–109. https://doi.org/10.1007/s13748-018-0158-4
Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 15–24. https://doi.org/10.1145/2623330.2623703
Eirinaki M, Gao J, Varlamis I, Tserpes K (2018). Recommender systems for large-scale social networks: a review of challenges and solutions. https://doi.org/10.1016/j.future.2017.09.015Get
Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435. https://doi.org/10.1109/TPAMI.2011.142
Garcia-Guzman R, Andrade-Ambriz YA, Ibarra-Manzano MA, Ledesma S, Gomez JC, Almanza-Ojeda DL (2020) Trend-based categories recommendations and age-gender prediction for pinterest and twitter users. Appl Sci 10(17):5957. https://doi.org/10.3390/app10175957
Gardner J, Brooks C (2018) Student success prediction in moocs. User Model User-Adap Inter 28(2):127–203. https://doi.org/10.1007/s11257-018-9203-z
Gong W, Wu H, Wang X, Zhang X, Wang Y, Chen Y, Khosravi MR (2023) Diversified and compatible web apis recommendation based on game theory in iot. Digital Communications and Networks
Guimaraes RG, Rosa RL, De Gaetano D, Rodriguez DZ, Bressan G (2017) Age groups classification in social network using deep learning. IEEE Access 5:10805–10816. https://doi.org/10.1109/ACCESS.2017.2706674
Hamedani EM, Kaedi M (2019) Recommending the long tail items through personalized diversification. Knowl-Based Syst 164:348–357. https://doi.org/10.1016/j.knosys.2018.11.004
Hu J, Zeng HJ, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In: Proceedings of the 16th international conference on World Wide Web, pp 151–160. https://doi.org/10.1145/1242572.1242594
Huang J, Li B, Zhu J, Chen J (2017) Age classification with deep learning face representation. Multimedia Tools and Applications 76(19):20231–20247. https://doi.org/10.1007/s11042-017-4646-5
Huang X, Wu F (2019) A novel topic-based framework for recommending long tail products. Computers & Industrial Engineering 137:106063. https://doi.org/10.1016/j.cie.2019.106063
Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press. https://doi.org/10.1017/CBO9780511921803
Kalimeri K, Beiró MG, Delfino M, Raleigh R, Cattuto C (2019) Predicting demographics, moral foundations, and human values from digital behaviours. Comput Hum Behav 92:428–445. https://doi.org/10.1016/j.chb.2018.11.024
Karatzoglou A, Ebbing J, Ostheimer P, Hua W, Beigl M (2020) Sentient destination prediction. User Modeling and User-adapted Interaction, pp 1–33. https://doi.org/10.1007/s11257-020-09257-5
Katna R, Kalsi K, Gupta S, Yadav D, Yadav AK (2022) Machine learning based approaches for age and gender prediction from tweets. Multimedia Tools and Applications, pp 1–19. https://doi.org/10.1007/s11042-022-12920-1
Kim I, Pant G (2019) Predicting web site audience demographics using content and design cues. Information & Management 56(5):718–730. https://doi.org/10.1016/j.im.2018.11.005
Li Y, Yang L, Xu B, Wang J, Lin H (2019) Improving user attribute classification with text and social network attention. Cogn Comput 11(4):459–468. https://doi.org/10.1007/s12559-019-9624-y
Malmi E, Weber I (2016) You are what apps you use: demographic prediction based on user’s apps. In: Proceedings of the international AAAI conference on Web and social media, vol 10
Morgan-Lopez AA, Kim AE, Chew RF, Ruddle P (2017) Predicting age groups of twitter users based on language and metadata features. PLoS ONE 12(8):e0183537. https://doi.org/10.1371/journal.pone.0183537
Nguyen D, Gravel R, Trieschnigg D, Meder T (2013) ” how old do you think i am?” a study of language and age in twitter. In: Proceedings of the international AAAI conference on Web and social media, vol 7
Pandya A, Oussalah M, Monachesi P, Kostakos P (2020) On the use of distributed semantics of tweet metadata for user age prediction. Futur Gener Comput Syst 102:437–452
Park YJ, Tuzhilin A (2008) The long tail of recommender systems and how to leverage it. In: Proceedings of the 2008 ACM conference on Recommender systems, pp 11–18. https://doi.org/10.1145/1454008.1454012
Sreepada RS, Patra BK (2020) Mitigating long tail effect in recommendations using few shot learning technique. Expert Syst Appl 140:112887. https://doi.org/10.1016/j.eswa.2019.112887
Taeuscher K (2019) Uncertainty kills the long tail: demand concentration in peer-to-peer marketplaces. Electron Mark 29(4):649–660. https://doi.org/10.1007/s12525-019-00339-w
Valcarce D, Parapar J, Barreiro Á (2016) Item-based relevance modelling of recommendations for getting rid of long tail products. Knowl-Based Syst 103:41–51. https://doi.org/10.1016/j.knosys.2016.03.021
Wang S, Gong M, Li H, Yang J (2016) Multi-objective optimization for long tail recommendation. Knowl-Based Syst 104:145–155. https://doi.org/10.1016/j.knosys.2016.04.018
Zhong E, Tan B, Mo K, Yang Q (2013) User demographics prediction based on mobile data. Pervasive Mob Comput 9(6):823–837. https://doi.org/10.1016/j.pmcj.2013.07.009
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no competing financial interests or personal relationships that influence the work in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shafiloo, R., Kaedi, M. & Pourmiri, A. Predicting user demographics based on interest analysis in movie dataset. Multimed Tools Appl 83, 69973–69987 (2024). https://doi.org/10.1007/s11042-024-18422-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18422-6