skip to main content
10.1145/3534678.3539318acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Sample-Efficient Kernel Mean Estimator with Marginalized Corrupted Data

Published: 14 August 2022 Publication History

Abstract

Estimating the kernel mean in a reproducing kernel Hilbert space is central to many kernel-based learning algorithms. Given a finite sample, an empirical average is used as a standard estimation of the target kernel mean. Prior works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distributions. Theoretically, we justify that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, on a variety of tasks, we show that the marginalized kernel mean estimator is sample-efficient and obtains much lower estimation errors than the existing estimators.

References

[1]
Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In ICML. 1247--1255.
[2]
C Andrieu and E Moulines. 2003. Ergodicity of some adaptive markov chain monte carlo algorithm. Technical Report. Technical report.
[3]
Nachman Aronszajn. 1950. Theory of reproducing kernels. Transactions of the American mathematical society 68, 3 (1950), 337--404.
[4]
Francis R Bach and Michael I Jordan. 2002. Kernel independent component analysis. Journal of Machine Learning Research 3, Jul (2002), 1--48.
[5]
Alain Berlinet and Christine Thomas-Agnan. 2011. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media.
[6]
Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, 14 (2006).
[7]
Olivier Chapelle, Jason Weston, Léon Bottou, and Vladimir Vapnik. 2001. Vicinal risk minimization. In NeurIPS. 416--422.
[8]
Minmin Chen, Kilian Weinberger, Fei Sha, and Yoshua Bengio. 2014. Marginalized denoising auto-encoders for nonlinear representations. In ICML. 1476--1484.
[9]
Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. arXiv preprint arXiv:1206.4683 (2012).
[10]
Inderjit S Dhillon, Yuqiang Guan, and Brian Kulis. 2004. Kernel k-means: spectral clustering and normalized cuts. In KDD. 551--556.
[11]
K Eckerle. 1979. Circular interference transmittance study. National Institute of Standards and Technology (NIST), US Department of Commerce, USA 13 (1979).
[12]
Seth Flaxman, Dino Sejdinovic, John P Cunningham, and Sarah Filippi. 2016. Bayesian learning of kernel embeddings. In UAI. 182--191.
[13]
Kenji Fukumizu, Le Song, and Arthur Gretton. 2013. Kernel Bayes' rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research 14, 1 (2013), 3753--3783.
[14]
Walter R Gilks, Gareth O Roberts, and Sujit K Sahu. 1998. Adaptive markov chain monte carlo through regeneration. J. Amer. Statist. Assoc. 93, 443 (1998), 1045--1054.
[15]
Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research 13, Mar (2012), 723--773.
[16]
Arthur Gretton, Olivier Bousquet, Alexander Smola, and Bernhard Schölkopf. 2005. Measuring Statistical Dependence with Hilbert-Schmidt Norms. In Algorithmic Learning Theory.
[17]
Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, Alexander J Smola, et al. 2007. A kernel statistical test of independence. In NeurIPS.
[18]
Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, and Bharath K Sriperumbudur. 2012. Optimal kernel choice for large-scale two-sample tests. In NeurIPS. 1205--1213.
[19]
Jiannan Guo, Haochen Shi, Yangyang Kang, Kun Kuang, Siliang Tang, Zhuoren Jiang, Changlong Sun, Fei Wu, and Yueting Zhuang. 2021. Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels. In ICCV.
[20]
Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, Adam PrügelBennett, and Jonathon Hare. 2020. Fmix: Enhancing mixed sample data augmentation. arXiv preprint arXiv:2002.12047 (2020).
[21]
Junlin Hou, Jilan Xu, Rui Feng, Yuejie Zhang, Fei Shan, and Weiya Shi. 2021. CMC-COV19D: Contrastive Mixup Classification for COVID-19 Diagnosis. In ICCV. 454--461.
[22]
William James and Charles Stein. 1992. Estimation with quadratic loss. In Breakthroughs in statistics. Springer, 443--460.
[23]
Michael Kearns and Dana Ron. 1999. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural computation 11, 6 (1999), 1427--1453.
[24]
Aviral Kumar, Sunita Sarawagi, and Ujjwal Jain. 2018. Trainable calibration measures for neural networks from kernel mean embeddings. In ICML. 2805-- 2814.
[25]
Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos. 2017. Mmd gan: Towards deeper understanding of moment matching network. In NeurIPS. 2203--2213.
[26]
David Lopez-Paz, Krikamol Muandet, Bernhard Schölkopf, and Iliya Tolstikhin. 2015. Towards a learning theory of cause-effect inference. In ICML. 1452--1461.
[27]
Laurens Maaten, Minmin Chen, Stephen Tyree, and Kilian Weinberger. 2013. Learning with marginalized corrupted features. In ICML. 410--418.
[28]
Hannah Marienwald, Jean-Baptiste Fermanian, and Gilles Blanchard. 2021. HighDimensional Multi-Task Averaging and Application to Kernel Mean Embedding. In AISTATS. 1963--1971.
[29]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE TPAMI 41, 8 (2018), 1979--1993.
[30]
Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, and Bernhard Schölkopf. 2012. Learning from distributions via support measure machines. In NeurIPS.
[31]
Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, and Bernhard Schölkopf. 2014. Kernel mean estimation and stein effect. In ICML.
[32]
Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Schölkopf, et al. 2017. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning 10, 1--2 (2017), 1--141.
[33]
Krikamol Muandet and Bernhard Schölkopf. 2013. One-class Support Measure Machines for Group Anomaly Detection. In UAI. 449--458.
[34]
Krikamol Muandet, Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, and Bernhard Schölkopf. 2016. Kernel mean shrinkage estimators. Journal of Machine Learning Research 17, 1 (2016), 1656--1696.
[35]
Krikamol Muandet, Bharath Sriperumbudur, and Bernhard Schölkopf. 2014. Kernel mean estimation via spectral filtering. In NeurIPS. 1--9.
[36]
Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks (2010).
[37]
Karl Pearson. 1900. On Lines and Planes of Closest Fit to Points in Space. Philos. Mag. 2, 11 (1900), 559--572.
[38]
Aaditya Ramdas and Leila Wehbe. 2015. Nonparametric independence testing for small sample sizes. In IJCAI.
[39]
Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. 2001. A generalized representer theorem. In ICOCLT. Springer, 416--426.
[40]
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. In IJANN. 583--588.
[41]
Patrice Y Simard, Yann A LeCun, John S Denker, and Bernard Victorri. 1998. Transformation invariance in pattern recognition-tangent distance and tangent propagation. In Neural networks: tricks of the trade.
[42]
Alex Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. 2007. A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory. Springer, 13--31.
[43]
Le Song, Byron Boots, Sajid M. Siddiqi, Geoffrey Gordon, and Alex Smola. 2010. Hilbert Space Embeddings of Hidden Markov Models. In ICML.
[44]
Le Song, Arthur Gretton, Danny Bickson, Yucheng Low, and Carlos Guestrin. 2011. Kernel belief propagation. arXiv preprint arXiv:1105.5592 (2011).
[45]
Le Song, Jonathan Huang, Alex Smola, and Kenji Fukumizu. 2009. Hilbert space embeddings of conditional distributions with applications to dynamical systems. In ICML. 961--968.
[46]
Le Song, Alex Smola, Arthur Gretton, Karsten M Borgwardt, and Justin Bedo. 2007. Supervised feature selection via dependence estimation. In ICML. 823--830.
[47]
Le Song, Xinhua Zhang, Alex Smola, Arthur Gretton, and Bernhard Schölkopf. 2008. Tailoring density estimation via reproducing kernel moment matching. In ICML.
[48]
Bharath K Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Gert Lanckriet, and Bernhard Schölkopf. 2008. Injective Hilbert space embeddings of probability measures. In COLT. Omnipress, 111--122.
[49]
Charles M Stein. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics (1981).
[50]
Zoltán Szabó, Arthur Gretton, Barnabás Póczos, and Bharath Sriperumbudur. 2015. Two-stage sampled learning theory on distributions. In AISTATS.
[51]
Ilya Tolstikhin, Bharath K Sriperumbudur, and Krikamol Muandet. 2017. Minimax estimation of kernel mean embeddings. Journal of Machine Learning Research 18, 1 (2017), 3002--3048.
[52]
Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In ICML.
[53]
Xiaobo Xia, Tongliang Liu, Bo Han, Chen Gong, Nannan Wang, Zongyuan Ge, and Yi Chang. 2021. Robust early-learning: Hindering the memorization of noisy labels. In ICLR.
[54]
Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, and Masashi Sugiyama. 2020. Part-dependent label noise: Towards instance-dependent label noise. In NeurIPS.
[55]
Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. 2019. Are Anchor Points Really Indispensable in Label-Noise Learning?. In NeurIPS. 6835--6846.
[56]
Xiaobo Xia, Wenhao Yang, Jie Ren, Yewen Li, Yibing Zhan, Bo Han, and Tongliang Liu. 2022. Pluralistic image completion with probabilistic mixture-of-experts. arXiv preprint arXiv:2205.09086 (2022).
[57]
Jian Yang, Alejandro F Frangi, Jing-yu Yang, David Zhang, and Zhong Jin. 2005. KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE TPAMI 27, 2 (2005), 230--244.
[58]
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV. 6023--6032.
[59]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In ICLR

Index Terms

  1. Sample-Efficient Kernel Mean Estimator with Marginalized Corrupted Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. corrupted distributions
    2. kernel mean embedding
    3. kernel methods
    4. marginalized approaches
    5. reproducing kernel hilbert space

    Qualifiers

    • Research-article

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 234
      Total Downloads
    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media