skip to main content
10.1145/3340531.3411971acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Deep Generative Positive-Unlabeled Learning under Selection Bias

Published: 19 October 2020 Publication History

Abstract

Learning in the positive-unlabeled (PU) setting is prevalent in real world applications. Many previous works depend upon theSelected Completely At Random (SCAR) assumption to utilize unlabeled data, but the SCAR assumption is not often applicable to the real world due to selection bias in label observations. This paper is the first generative PU learning model without the SCAR assumption. Specifically, we derive the PU risk function without the SCAR assumption, and we generate a set of virtual PU examples to train the classifier. Although our PU risk function is more generalizable, the function requires PU instances that do not exist in the observations. Therefore, we introduce the VAE-PU, which is a variant of variational autoencoders to separate two latent variables that generate either features or observation indicators. The separated latent information enables the model to generate virtual PU instances. We test the VAE-PU on benchmark datasets with and without the SCAR assumption. The results indicate that the VAE-PU is superior when selection bias exists, and the VAE-PU is also competent under the SCAR assumption. The results also emphasize that the VAE-PU is effective when there are few positive-labeled instances due to modeling on selection bias.

Supplementary Material

MP4 File (3340531.3411971.mp4)
This presentation introduces the CIKM 2020 full research paper, Deep Generative Positive-Unlabeled Learning under Selection Bias. In this paper, we propose a generative positive-unlabeled (PU) learning method, VAE-PU, without the selected completely at random (SCAR) assumption. To do this, the authors derive the risk function without SCAR assumption and design a deep generative model to virtually generate the PU instances. Experiment results indicate that the VAE-PU is superior when selection bias exists, and the VAE-PU is also competent under the SCAR assumption. This generative approach is naturally called for because the generation is necessary for solving such biased observations.

References

[1]
Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data Augmentation Generative Adversarial Networks. arxiv: 1711.04340 [stat.ML]
[2]
Jessa Bekker and Jesse Davis. 2018. Estimating the class prior in positive and unlabeled data through decision tree induction, In Proceedings of the 32th AAAI Conference on Artificial Intelligence. Proceedings of the 32th AAAI Conference on Artificial Intelligence, 2712--2719.
[3]
Jessa Bekker, Pieter Robberechts, and Jesse Davis. 2020. Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data. In Machine Learning and Knowledge Discovery in Databases, Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and Céline Robardet (Eds.). Springer International Publishing, Cham, 71--85.
[4]
Luigi Cerulo, Charles Elkan, and Michele Ceccarelli. 2010. Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics, Vol. 11, 1 (2010), 228. https://doi.org/10.1186/1471--2105--11--228
[5]
F. Chiaroni, M. Rahal, N. Hueber, and F. Dufaux. 2018. Learning with A Generative Adversarial Network From a Positive Unlabeled Dataset for Image Classification. In 2018 25th IEEE International Conference on Image Processing (ICIP). 1368--1372. https://doi.org/10.1109/ICIP.2018.8451831
[6]
Charles Elkan and Keith Noto. 2008. Learning Classifiers from Only Positive and Unlabeled Data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA) (KDD '08). Association for Computing Machinery, New York, NY, USA, 213--220. https://doi.org/10.1145/1401890.1401920
[7]
Ming Hou, Brahim Chaib-Draa, Chao Li, and Qibin Zhao. 2018. Generative Adversarial Positive-Unlabeled Learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (Stockholm, Sweden) (IJCAI'18). AAAI Press, 2255--2261.
[8]
Yu-Guan Hsieh, Gang Niu, and Masashi Sugiyama. 2019. Classification from Positive, Unlabeled and Biased Negative Data. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 2820--2829.
[9]
Wenpeng Hu, Ran Le, Bing Liu, Feng Ji, Haiqing Chen, Dongyan Zhao, Jinwen Ma, and Rui Yan. 2020. Learning from Positive and Unlabeled Data with Adversarial Training. https://openreview.net/forum?id=HygPjlrYvB
[10]
Shantanu Jain, Martha White, and Predrag Radivojac. 2016. Estimating the class prior and posterior from noisy positives and unlabeled data. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 2685--2693.
[11]
Xu Ji, Andrea Vedaldi, and Jo ao F. Henriques. 2019. Invariant Information Clustering for Unsupervised Image Classification and Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019), 9864--9873.
[12]
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2017. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI'17). AAAI Press, 1965--1972.
[13]
Inhyuk Jo, Jungtaek Kim, Hyohyeong Kang, Yong-Deok Kim, and Seungjin Choi. 2018. Open Set Recognition by Regularising Classifier with Fake Data Generated by Generative Adversarial Networks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2686--2690.
[14]
Masahiro Kato, Takeshi Teshima, and Junya Honda. 2019. Learning from Positive and Unlabeled Data with a Selection Bias. In International Conference on Learning Representations.
[15]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
[16]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
[17]
Ryuichi Kiryo, Gang Niu, Marthinus C du Plessis, and Masashi Sugiyama. 2017. Positive-Unlabeled Learning with Non-Negative Risk Estimator. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 1675--1685.
[18]
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2016. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, USA, 1558--1566.
[19]
W. Li, Q. Guo, and C. Elkan. 2011. A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data. IEEE Transactions on Geoscience and Remote Sensing, Vol. 49, 2 (Feb 2011), 717--725. https://doi.org/10.1109/TGRS.2010.2058578
[20]
Xiaoli Li and Bing Liu. 2003. Learning to Classify Texts Using Positive and Unlabeled Data. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (Acapulco, Mexico) (IJCAI'03). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 587--592.
[21]
Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S. Yu. 2003. Building Text Classifiers Using Positive and Unlabeled Examples. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM '03). IEEE Computer Society, USA, 179.
[22]
Lawrence Neal, Matthew Olson, Xiaoli Fern, Weng-Keen Wong, and Fuxin Li. 2018. Open set learning with counterfactual images. In Proceedings of the European Conference on Computer Vision (ECCV). 613--628.
[23]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2642--2651.
[24]
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227--2237. https://doi.org/10.18653/v1/N18--1202
[25]
Marthinus Plessis, Gang Niu, and Masashi Sugiyama. 2016. Class-prior Estimation for Learning from Positive and Unlabeled Data. Machine Learning (11 2016). https://doi.org/10.1007/s10994-016--5604--6
[26]
Marthinus C. du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of Learning from Positive and Unlabeled Data. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 703--711.
[27]
Marthinus Du Plessis, Gang Niu, and Masashi Sugiyama. 2015. Convex Formulation for Learning from Positive and Unlabeled Data. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 1386--1394.
[28]
Jing Ren, Qian Liu, John Ellis, and Jinyan Li. 2015. Positive-unlabeled learning for the prediction of conformational B-cell epitopes. BMC Bioinformatics, Vol. 16, 18 (2015), S12. https://doi.org/10.1186/1471--2105--16-S18-S12
[29]
Enrique [Vidal Ruiz]. 1986. An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognition Letters, Vol. 4, 3 (1986), 145 -- 157. https://doi.org/10.1016/0167--8655(86)90013--9
[30]
Toan Tran, Trung Pham, Gustavo Carneiro, Lyle Palmer, and Ian Reid. 2017. A bayesian data augmentation approach for learning deep models. In Advances in neural information processing systems. 2797--2806.
[31]
Yongqin Xian, Saurabh Sharma, Bernt Schiele, and Zeynep Akata. 2019. F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 10267--10276.
[32]
Jinfeng Yi, Cho-Jui Hsieh, Kush R. Varshney, Lijun Zhang, and Yao Li. 2017. Scalable Demand-Aware Recommendation. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 2409--2418.
[33]
Sergey Demyanov Zongyuan Ge and Rahil Garnavi. 2017. Generative OpenMax for Multi-Class Open Set Classification. In Proceedings of the British Machine Vision Conference (BMVC), Gabriel Brostow Tae-Kyun Kim, Stefanos Zafeiriou and Krystian Mikolajczyk (Eds.). BMVA Press, Article 42, 12 pages. https://doi.org/10.5244/C.31.42

Cited By

View all
  • (2024)Positive and unlabeled learning with controlled probability boundary fenceProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693176(27641-27652)Online publication date: 21-Jul-2024
  • (2023)Beyond myopiaProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669077(67589-67602)Online publication date: 10-Dec-2023
  • (2023)GradPUProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25889(7296-7303)Online publication date: 7-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. positive-unlabeled learning
  2. selection bias
  3. variational autoencoders

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Positive and unlabeled learning with controlled probability boundary fenceProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693176(27641-27652)Online publication date: 21-Jul-2024
  • (2023)Beyond myopiaProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669077(67589-67602)Online publication date: 10-Dec-2023
  • (2023)GradPUProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25889(7296-7303)Online publication date: 7-Feb-2023
  • (2023)Conditional generative positive and unlabeled learningExpert Systems with Applications10.1016/j.eswa.2023.120046(120046)Online publication date: Apr-2023
  • (2021)Asymmetric Loss for Positive-Unlabeled Learning2021 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME51207.2021.9428350(1-6)Online publication date: 5-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media