Abstract
Many applications need to perform classification on large sparse datasets. Classifying the cold-start users who have very few feedbacks is still a challenging task. Previous work has applied active learning to classification with partially observed data. However, for large and sparse data, the number of feedbacks to be queried is huge and many of them are invalid. In this paper, we develop an active classification framework that can address these challenges by leveraging online Matrix Factorization models. We first identify a step-wise data acquisition heuristic which is useful for active classification. We then use the estimations of online Probabilistic Matrix Factorization to compute this heuristic function. In order to reduce the number of invalid queries, we further estimate the probability that a query can be answered by the cold-start user with online Poisson Factorization. During active learning, a query is selected based on the current knowledge learned in these two online factorization models. We demonstrate with real-world movie rating datasets that our framework is highly effective. It not only gains better improvement in classification, but also reduces the number of invalid queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If a certain feedback is missing, we treat it as 0 by convention.
- 2.
Our analysis can be extended to other scenarios such as regression and multi-label classification.
- 3.
- 4.
References
Bennett, J., Lanning, S.: The netflix prize. In: Proceedings of KDD cup and workshop. 2007, p. 35 (2007)
Cesa-Bianchi, N., Shalev-Shwartz, S., Shamir, O.: Efficient learning with partially observed attributes. J. Mach. Learn. Res. (JMLR) 12, 2857–2878 (2011)
Gao, T., Koller, D.: Active classification based on value of classifier. In: Advances in Neural Information Processing Systems, pp. 1062–1070 (2011)
Gopalan, P., Hofman, J.M., Blei, D.M.: Scalable recommendation with poisson factorization. arXiv preprint arXiv:1311.1704 (2013)
Greiner, R., Grove, A.J., Roth, D.: Learning cost-sensitive active classifiers. Artif. Intell. 139(2), 137–174 (2002)
Kanani, P., Melville, P.: Prediction-time active feature-value acquisition for cost-effective customer targeting, (2008)
Kawale, J., Bui, H.H., Kveton, B., Tran-Thanh, L., Chawla, S.: Efficient thompson sampling for online matrix-factorization recommendation. In: Advances in Neural Information Processing Systems, pp. 1297–1305 (2015)
Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using markov chain monte carlo. In: Proceedings of the 25th International Conference on Machine Learning (ICML), pp. 880–887. ACM (2008)
Yang, C., et al.: Repolike: amulti-feature-based personalized recommendation approach for open-source repositories. Front. Inf. Technol. Electron. Eng. 20(2), 222–237 (2019)
Zhang, Y., Wu, Y., Wang, T., Wang, H.M.: A novel approach for recommending semantically linkable issues in github projects. Sci. China Inf. Sci. 62(9), 202–204 (2019)
Zhao, X., Zhang, W., Wang, J.: Interactive collaborative filtering. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1411–1420. ACM (2013)
Acknowledgement
This work is supported by the National Key Research and Development Program of China (2018YFB1004502), the National Natural Science Foundation of China (61702532) and the Key Program of National Natural Science Foundation of China (61532001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, X., Li, X., Wang, T. (2020). Active Classification of Cold-Start Users in Large Sparse Datasets. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12317. Springer, Cham. https://doi.org/10.1007/978-3-030-60259-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-60259-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60258-1
Online ISBN: 978-3-030-60259-8
eBook Packages: Computer ScienceComputer Science (R0)