Skip to main content

Active Classification of Cold-Start Users in Large Sparse Datasets

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12317))

  • 1620 Accesses

Abstract

Many applications need to perform classification on large sparse datasets. Classifying the cold-start users who have very few feedbacks is still a challenging task. Previous work has applied active learning to classification with partially observed data. However, for large and sparse data, the number of feedbacks to be queried is huge and many of them are invalid. In this paper, we develop an active classification framework that can address these challenges by leveraging online Matrix Factorization models. We first identify a step-wise data acquisition heuristic which is useful for active classification. We then use the estimations of online Probabilistic Matrix Factorization to compute this heuristic function. In order to reduce the number of invalid queries, we further estimate the probability that a query can be answered by the cold-start user with online Poisson Factorization. During active learning, a query is selected based on the current knowledge learned in these two online factorization models. We demonstrate with real-world movie rating datasets that our framework is highly effective. It not only gains better improvement in classification, but also reduces the number of invalid queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If a certain feedback is missing, we treat it as 0 by convention.

  2. 2.

    Our analysis can be extended to other scenarios such as regression and multi-label classification.

  3. 3.

    http://grouplens.org/datasets/movielens/.

  4. 4.

    http://webscope.sandbox.yahoo.com/catalog.php?datatype=r.

References

  1. Bennett, J., Lanning, S.: The netflix prize. In: Proceedings of KDD cup and workshop. 2007, p. 35 (2007)

    Google Scholar 

  2. Cesa-Bianchi, N., Shalev-Shwartz, S., Shamir, O.: Efficient learning with partially observed attributes. J. Mach. Learn. Res. (JMLR) 12, 2857–2878 (2011)

    MathSciNet  MATH  Google Scholar 

  3. Gao, T., Koller, D.: Active classification based on value of classifier. In: Advances in Neural Information Processing Systems, pp. 1062–1070 (2011)

    Google Scholar 

  4. Gopalan, P., Hofman, J.M., Blei, D.M.: Scalable recommendation with poisson factorization. arXiv preprint arXiv:1311.1704 (2013)

  5. Greiner, R., Grove, A.J., Roth, D.: Learning cost-sensitive active classifiers. Artif. Intell. 139(2), 137–174 (2002)

    Article  MathSciNet  Google Scholar 

  6. Kanani, P., Melville, P.: Prediction-time active feature-value acquisition for cost-effective customer targeting, (2008)

    Google Scholar 

  7. Kawale, J., Bui, H.H., Kveton, B., Tran-Thanh, L., Chawla, S.: Efficient thompson sampling for online matrix-factorization recommendation. In: Advances in Neural Information Processing Systems, pp. 1297–1305 (2015)

    Google Scholar 

  8. Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using markov chain monte carlo. In: Proceedings of the 25th International Conference on Machine Learning (ICML), pp. 880–887. ACM (2008)

    Google Scholar 

  9. Yang, C., et al.: Repolike: amulti-feature-based personalized recommendation approach for open-source repositories. Front. Inf. Technol. Electron. Eng. 20(2), 222–237 (2019)

    Article  Google Scholar 

  10. Zhang, Y., Wu, Y., Wang, T., Wang, H.M.: A novel approach for recommending semantically linkable issues in github projects. Sci. China Inf. Sci. 62(9), 202–204 (2019)

    Google Scholar 

  11. Zhao, X., Zhang, W., Wang, J.: Interactive collaborative filtering. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1411–1420. ACM (2013)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Key Research and Development Program of China (2018YFB1004502), the National Natural Science Foundation of China (61702532) and the Key Program of National Natural Science Foundation of China (61532001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Li, X., Wang, T. (2020). Active Classification of Cold-Start Users in Large Sparse Datasets. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12317. Springer, Cham. https://doi.org/10.1007/978-3-030-60259-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60259-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60258-1

  • Online ISBN: 978-3-030-60259-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics