Learning from Crowds via Joint Probabilistic Matrix Factorization and Clustering in Latent Space

Yao, Wuguannan; Lee, Wonjung; Wang, Junhui

doi:10.1007/978-3-030-67667-4_33

Wuguannan Yao¹¹,
Wonjung Lee¹¹ &
Junhui Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12460))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1607 Accesses

Abstract

Learning from noisy labels is getting trendy in the era of big data. However, in crowdsourcing practice, it is still a challenging task to extract ground truth labels from noisy labels obtained from crowds. In this paper, we propose a latent variable model built on probabilistic logistic matrix factorization model and classical Gaussian mixture model for inferring ground truth labels from noisy, crowdsourced ones. The proposed model incorporates item heterogeneity in contrast to previous works and allows for vector space embeddings of both items and worker labels. Moreover, we derive a tractable mean-field variational inference algorithm to approximate the model posterior. Meanwhile, related MAP approximation problem to the model posterior is also investigated to identify links to existing works. Empirically, we demonstrate that the proposed method achieves good inference accuracy while preserving meaningful uncertainty measures in the embeddings, and therefore better reflects the intrinsic structure of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Uncovering the Latent Structures of Crowd Labeling

Scalable Bayesian preference learning for crowds

Article Open access 06 February 2020

Learning from Crowd Labeling with Semi-crowdsourced Deep Generative Models

References

Ahmed, A., Xing, E.: On tight approximate inference of the logistic-normal topic admixture model. In: Proceedings of the 11th Tenth International Workshop on Artificial Intelligence and Statistics (2007)
Google Scholar
Bhattacharya, A., Dunson, D.B.: Simplex factor models for multivariate unordered categorical data. J. Am. Stat. Assoc. 107(497), 362–377 (2012)
Article MathSciNet Google Scholar
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 127–134. ACM (2003)
Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Article MathSciNet Google Scholar
Böhning, D.: Multinomial logistic regression algorithm. Ann. Inst. Stat. Math. 44(1), 197–200 (1992)
Article Google Scholar
Böhning, D., Lindsay, B.G.: Monotonicity of quadratic-approximation algorithms. Ann. Inst. Stat. Math. 40(4), 641–663 (1988)
Article MathSciNet Google Scholar
Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal components analysis to the exponential family. In: Advances in Neural Information Processing Systems, pp. 617–624 (2002)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 28(1), 20–28 (1979)
Google Scholar
Gollini, I., Murphy, T.B.: Mixture of latent trait analyzers for model-based clustering of categorical data. Stat. Comput. 24(4), 569–588 (2014)
Article MathSciNet Google Scholar
Jagabathula, S., Subramanian, L., Venkataraman, A.: Identifying unreliable and adversarial workers in crowdsourced labeling tasks. J. Mach. Learn. Res. 18(1), 3233–3299 (2017)
MathSciNet MATH Google Scholar
Kajino, H., Tsuboi, Y., Kashima, H.: A convex formulation for learning from crowds. In: 36th AAAI Conference on Artificial Intelligence (2012)
Google Scholar
Karger, D.R., Oh, S., Shah, D.: Budget-optimal crowdsourcing using low-rank matrix approximations. In: 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 284–291. IEEE (2011)
Google Scholar
Khan, M.E., Bouchard, G., Murphy, K.P., Marlin, B.M.: Variational bounds for mixed-data factor analysis. In: Advances in Neural Information Processing Systems, pp. 1108–1116 (2010)
Google Scholar
Mohamed, S., Ghahramani, Z., Heller, K.A.: Bayesian exponential family PCA. In: Advances in Neural Information Processing Systems, pp. 1089–1096 (2009)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Google Scholar
Rai, P., Wang, Y., Guo, S., Chen, G., Dunson, D., Carin, L.: Scalable Bayesian low-rank decomposition of incomplete multiway tensors. In: International Conference on Machine Learning, pp. 1800–1808 (2014)
Google Scholar
Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)
Google Scholar
Raykar, V.C., et al.: Learning from Crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
Google Scholar
Shaham, U., et al.: A deep learning approach to unsupervised ensemble learning. In: International Conference on Machine Learning, pp. 30–39 (2016)
Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)
Google Scholar
Snow, R., O’connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast-but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 254–263 (2008)
Google Scholar
Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, pp. 2424–2432 (2010)
Google Scholar
Whitehill, J., Wu, T., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, pp. 2035–2043 (2009)
Google Scholar
Xu, A., Feng, X., Tian, Y.: Revealing, characterizing, and detecting crowdsourcing spammers: a case study in community Q&A. In: 2015 IEEE Conference on Computer Communications, pp. 2533–2541. IEEE (2015)
Google Scholar
Yang, B., Fu, X., Sidiropoulos, N.D.: Learning from hidden traits: joint factor analysis and latent clustering. IEEE Trans. Sig. Process. 65(1), 256–269 (2016)
Article MathSciNet Google Scholar
Yin, L., Han, J., Zhang, W., Yu, Y.: Aggregating crowd wisdoms with label-aware autoencoders. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1325–1331. AAAI Press (2017)
Google Scholar
Zhang, Y., Chen, X., Zhou, D., Jordan, M.I.: Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. In: Advances in Neural Information Processing Systems, pp. 1260–1268 (2014)
Google Scholar
Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: Advances in Neural Information Processing Systems, pp. 2195–2203 (2012)
Google Scholar

Download references

Acknowledgements

We thank the reviewers for providing valuable comments. Junhui Wang’s research is supported in part by HKRGC Grants GRF-11303918 and GRF-11300919.

Author information

Authors and Affiliations

Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong
Wuguannan Yao & Wonjung Lee
School of Data Science, City University of Hong Kong, Kowloon, Hong Kong
Junhui Wang

Authors

Wuguannan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Wonjung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Junhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wuguannan Yao .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Yuxiao Dong
Jožef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Amazon Alexa Knowledge, Cambridge, UK
Craig Saunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, W., Lee, W., Wang, J. (2021). Learning from Crowds via Joint Probabilistic Matrix Factorization and Clustering in Latent Space. In: Dong, Y., Mladenić, D., Saunders, C. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12460. Springer, Cham. https://doi.org/10.1007/978-3-030-67667-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-67667-4_33
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67666-7
Online ISBN: 978-3-030-67667-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Learning from Crowds via Joint Probabilistic Matrix Factorization and Clustering in Latent Space