skip to main content
10.1145/3580305.3599307acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free Access

Dependence and Model Selection in LLP: The Problem of Variants

Published:04 August 2023Publication History

ABSTRACT

The problem of Learning from Label Proportions (LLP) has received considerable research attention and has numerous practical applications. In LLP, a hypothesis assigning labels to items is learned using knowledge of only the proportion of labels found in predefined groups, called bags. While a number of algorithmic approaches to learning in this context have been proposed, very little work has addressed the model selection problem for LLP. Nonetheless, it is not obvious how to extend straightforward model selection approaches to LLP, in part because of the lack of item labels. More fundamentally, we argue that a careful approach to model selection for LLP requires consideration of the dependence structure that exists between bags, items, and labels. In this paper we formalize this structure and show how it affects model selection. We show how this leads to improved methods of model selection that we demonstrate outperform the state of the art over a wide range of datasets and LLP algorithms.

Skip Supplemental Material Section

Supplemental Material

rtfp1030-2min-promo.mp4

mp4

156.5 MB

References

  1. Ehsan Mohammady Ardehaly and Aron Culotta. 2016. Domain Adaptation for Learning from Label Proportions Using Self-Training. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (New York, New York, USA). 3670--3676.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ehsan Mohammady Ardehaly and Aron Culotta. 2017. Co-training for demographic classification using deep learning from label proportions. In 2017 IEEE International Conference on Data Mining Workshops. IEEE, 1017--1024.Google ScholarGoogle ScholarCross RefCross Ref
  3. Denis Baručić and Jan Kybic. 2021. Fast learning from label proportions with small bags. arXiv preprint arXiv:2110.03426 (2021).Google ScholarGoogle Scholar
  4. Jing Chai and Ivor W Tsang. 2021. Learning With Label Proportions by Incorporating Unmarked Data. IEEE Transactions on Neural Networks and Learning Systems (2021).Google ScholarGoogle Scholar
  5. Zhensong Chen, Wei Chen, and Yong Shi. 2020. Ensemble learning with label proportions for bankruptcy prediction. Expert Systems with Applications 146 (2020), 113155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zhensong Chen, Zhiquan Qi, Bo Wang, Limeng Cui, Fan Meng, and Yong Shi. 2017. Learning with label proportions based on nonparallel support vector machines. Knowledge-Based Systems 119 (2017), 126--141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Giovanni Comarela, Ramakrishnan Durairajan, Paul Barford, Dino Christenson, and Mark Crovella. 2018. Assessing Candidate Preference through Web Browsing History. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), 158--167. https://doi.org/10.1145/ 3219819.3219884Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. P. Dawid. 1979. Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society: Series B (Methodological) 41, 1 (1979), 1--15. https: //doi.org/10.1111/j.2517-6161.1979.tb01052.xGoogle ScholarGoogle ScholarCross RefCross Ref
  9. Gabriel Dulac-Arnold, Neil Zeghidour, Marco Cuturi, Lucas Beyer, and Jean-Philippe Vert. 2019. Deep multi-class learning from label proportions. arXiv preprint arXiv:1905.12909 (2019).Google ScholarGoogle Scholar
  10. Seth R. Flaxman, Yu-Xiang Wang, and Alexander J. Smola. 2015. Who Supported Obama in 2012? Ecological Inference through Distribution Regression. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD '15). Association for Computing Machinery, New York, NY, USA, 289--298. https://doi.org/10.1145/2783258.2783300Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Maxime Gasse and Alex Aussem. 2016. Identifying the irreducible disjoint factors of a multivariate probability distribution. In Probabilistic Graphical Models. Lugano, Switzerland, 183--194.Google ScholarGoogle Scholar
  12. Jerónimo Hernández-González. 2019. A framework for evaluation in learning from label proportions. Progress in Artificial Intelligence 8, 3 (2019), 359--373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jerónimo Hernández-González, Inaki Inza, Lorena Crisol-Ortíz, María A Guembe, María J Iñarra, and Jose A Lozano. 2018. Fitting the data from embryo implantation prediction: Learning from label proportions. Statistical methods in medical research 27, 4 (2018), 1056--1066.Google ScholarGoogle Scholar
  14. Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  16. Laura Elena Cué La Rosa and Dário Augusto Borges Oliveira. 2022. Learning from Label Proportions with Prototypical Contrastive Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2153--2161.Google ScholarGoogle Scholar
  17. Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010).Google ScholarGoogle Scholar
  18. Jiabin Liu, Zhiquan Qi, Bo Wang, YingJie Tian, and Yong Shi. 2022. SELF-LLP: Self-supervised learning from label proportions with self-ensemble. Pattern Recognition 129 (2022), 108767.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiabin Liu, Bo Wang, Hanyuan Hang, Huadong Wang, Zhiquan Qi, Yingjie Tian, and Yong Shi. 2022. Llp-gan: a gan-based algorithm for learning from label proportions. IEEE Transactions on Neural Networks and Learning Systems (2022).Google ScholarGoogle Scholar
  20. Jiabin Liu, Bo Wang, Zhiquan Qi, YingJie Tian, and Yong Shi. 2019. Learning from Label Proportions with Generative Adversarial Networks. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  21. Jiabin Liu, Bo Wang, Xin Shen, Zhiquan Qi, and Yingjie Tian. 2021. Two-stage Training for Learning from Label Proportions. arXiv preprint arXiv:2105.10635 (2021).Google ScholarGoogle Scholar
  22. Jay Nandy, Rishi Saket, Prateek Jain, Jatin Chauhan, Balaraman Ravindran, and Aravindan Raghuveer. 2022. Domain-Agnostic Contrastive Representations for Learning from Label Proportions. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1542--1551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. James Norton and George Divine. 2015. Simpson's paradox - and how to avoid it. Significance 12, 4 (2015), 40--43. https://doi.org/10.1111/j.1740-9713. 2015.00844.x arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1740-9713.2015.00844.xGoogle ScholarGoogle ScholarCross RefCross Ref
  24. Giorgio Patrini, Richard Nock, Paul Rivera, and Tiberio Caetano. 2014. (Almost) no label no cry. Advances in Neural Information Processing Systems 27 (2014), 190--198.Google ScholarGoogle Scholar
  25. Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rafael Poyiadzi, Raul Santos-Rodriguez, and Niall Twomey. 2018. Label propagation for learning with label proportions. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  27. Rafael Poyiadzi, Raul Santos-Rodriguez, and Niall Twomey. 2019. Active learning with label proportions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3097--3101.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhiquan Qi, Fan Meng, Yingjie Tian, Lingfeng Niu, Yong Shi, and Peng Zhang. 2018. Adaboost-LLP: A Boosting Method for Learning With Label Proportions. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3548--3559. https://doi.org/10.1109/TNNLS.2017.2727065Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhiquan Qi, Bo Wang, Fan Meng, and Lingfeng Niu. 2016. Learning with label proportions via NPSVM. IEEE transactions on cybernetics 47, 10 (2016), 3293--3305.Google ScholarGoogle Scholar
  30. Yaxing Qian, Qiang Tong, and Bo Wang. 2019. Multi-Class Learning from Label Proportions for Bank Customer Classification. Procedia Computer Science 162 (2019), 421--428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yue Qiu, Mingjie Yan, and Zhensong Chen. 2021. Active learning from label proportions via pSVM. Neurocomputing 464 (2021), 227--241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Novi Quadrianto, Alex J Smola, Tiberio S Caetano, and Quoc V Le. 2009. Estimating labels from label proportions. Journal of Machine Learning Research 10, 10 (2009).Google ScholarGoogle Scholar
  33. Sebastian Raschka. 2018. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. CoRR abs/1811.12808 (2018). arXiv:1811.12808 http://arxiv.org/abs/1811.12808Google ScholarGoogle Scholar
  34. Stefan Rueping. 2010. SVM classifier estimation from group probabilities. In Proceedings of the 27th International Conference on International Conference on Machine Learning. 911--918.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rishi Saket, Aravindan Raghuveer, and Balaraman Ravindran. 2022. On Combining Bags to Better Learn from Label Proportions. In International Conference on Artificial Intelligence and Statistics. PMLR, 5913--5927.Google ScholarGoogle Scholar
  36. Clayton Scott and Jianxin Zhang. 2020. Learning from Label Proportions: A Mutual Contamination Framework. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 22256-22267. https://proceedings.neurips.cc/paper/2020/file/fcde14913c766cf307c75059e0e89af5-Paper.pdfGoogle ScholarGoogle Scholar
  37. Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G Dimakis, and Sanjay Shakkottai. 2017. Model-powered conditional independence test. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  38. Yong Shi, Limeng Cui, Zhensong Chen, and Zhiquan Qi. 2019. Learning from label proportions with pinball loss. International Journal of Machine Learning and Cybernetics 10, 1 (2019), 187--205.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yong Shi, Jiabin Liu, and Zhiquan Qi. 2018. Inverse convolutional neural networks for learning from label proportions. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE, 643--646.Google ScholarGoogle ScholarCross RefCross Ref
  40. Yong Shi, Jiabin Liu, Zhiquan Qi, and Bo Wang. 2018. Learning from label proportions on high-dimensional data. Neural Networks 103 (2018), 9--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yong Shi, Jiabin Liu, Bo Wang, Zhiquan Qi, and YingJie Tian. 2020. Deep learning from label proportions with labeled samples. Neural Networks 128 (2020), 73--81.Google ScholarGoogle ScholarCross RefCross Ref
  42. Marco Stolpe and Katharina Morik. 2011. Learning from Label Proportions by Optimizing Cluster Model Selection. In Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III (Athens, Greece) (ECML PKDD'11). Springer-Verlag, Berlin, Heidelberg, 349--364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kuen-Han Tsai and Hsuan-Tien Lin. 2020. Learning from label proportions with consistency regularization. In Asian Conference on Machine Learning. PMLR, 513--528.Google ScholarGoogle Scholar
  44. Yanshan Xiao, HuaiPei Wang, and Bo Liu. 2020. A new transfer learning-based method for label proportions problem. Information Sciences 541 (2020), 391--408.Google ScholarGoogle ScholarCross RefCross Ref
  45. Felix Yu, Dong Liu, Sanjiv Kumar, Jebara Tony, and Shih-Fu Chang. 2013. proptoSVM for Learning with Label Proportions. In International Conference on Machine Learning. PMLR, 504--512.Google ScholarGoogle Scholar
  46. Felix X Yu, Liangliang Cao, Michele Merler, Noel Codella, Tao Chen, John R Smith, and Shih-Fu Chang. 2014. Modeling attributes from category-attribute proportions. In Proceedings of the 22nd ACM international conference on Multimedia. 977--980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Felix X Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, and Shih-Fu Chang. 2014. On learning from label proportions. arXiv:1402.5902 (2014).Google ScholarGoogle Scholar
  48. Fan Zhang, Jiabin Liu, Bo Wang, Zhiquan Qi, and Yong Shi. 2019. A Fast Algorithm for Multi-Class Learning from Label Proportions. Electronics 8, 6 (2019), 609.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dependence and Model Selection in LLP: The Problem of Variants

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2023
        5996 pages
        ISBN:9798400701030
        DOI:10.1145/3580305

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 August 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24
      • Article Metrics

        • Downloads (Last 12 months)138
        • Downloads (Last 6 weeks)18

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader