research-article

Sample-Efficient Kernel Mean Estimator with Marginalized Corrupted Data

Authors:

Tongliang LiuAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 2110 - 2119

https://doi.org/10.1145/3534678.3539318

Published: 14 August 2022 Publication History

Abstract

Estimating the kernel mean in a reproducing kernel Hilbert space is central to many kernel-based learning algorithms. Given a finite sample, an empirical average is used as a standard estimation of the target kernel mean. Prior works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distributions. Theoretically, we justify that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, on a variety of tasks, we show that the marginalized kernel mean estimator is sample-efficient and obtains much lower estimation errors than the existing estimators.

References

[1]

Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In ICML. 1247--1255.

[2]

C Andrieu and E Moulines. 2003. Ergodicity of some adaptive markov chain monte carlo algorithm. Technical Report. Technical report.

[3]

Nachman Aronszajn. 1950. Theory of reproducing kernels. Transactions of the American mathematical society 68, 3 (1950), 337--404.

[4]

Francis R Bach and Michael I Jordan. 2002. Kernel independent component analysis. Journal of Machine Learning Research 3, Jul (2002), 1--48.

[5]

Alain Berlinet and Christine Thomas-Agnan. 2011. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media.

[6]

Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, 14 (2006).

[7]

Olivier Chapelle, Jason Weston, Léon Bottou, and Vladimir Vapnik. 2001. Vicinal risk minimization. In NeurIPS. 416--422.

[8]

Minmin Chen, Kilian Weinberger, Fei Sha, and Yoshua Bengio. 2014. Marginalized denoising auto-encoders for nonlinear representations. In ICML. 1476--1484.

[9]

Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. arXiv preprint arXiv:1206.4683 (2012).

Digital Library

[10]

Inderjit S Dhillon, Yuqiang Guan, and Brian Kulis. 2004. Kernel k-means: spectral clustering and normalized cuts. In KDD. 551--556.

[11]

K Eckerle. 1979. Circular interference transmittance study. National Institute of Standards and Technology (NIST), US Department of Commerce, USA 13 (1979).

[12]

Seth Flaxman, Dino Sejdinovic, John P Cunningham, and Sarah Filippi. 2016. Bayesian learning of kernel embeddings. In UAI. 182--191.

[13]

Kenji Fukumizu, Le Song, and Arthur Gretton. 2013. Kernel Bayes' rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research 14, 1 (2013), 3753--3783.

Digital Library

[14]

Walter R Gilks, Gareth O Roberts, and Sujit K Sahu. 1998. Adaptive markov chain monte carlo through regeneration. J. Amer. Statist. Assoc. 93, 443 (1998), 1045--1054.

[15]

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research 13, Mar (2012), 723--773.

Digital Library

[16]

Arthur Gretton, Olivier Bousquet, Alexander Smola, and Bernhard Schölkopf. 2005. Measuring Statistical Dependence with Hilbert-Schmidt Norms. In Algorithmic Learning Theory.

[17]

Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, Alexander J Smola, et al. 2007. A kernel statistical test of independence. In NeurIPS.

[18]

Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, and Bharath K Sriperumbudur. 2012. Optimal kernel choice for large-scale two-sample tests. In NeurIPS. 1205--1213.

[19]

Jiannan Guo, Haochen Shi, Yangyang Kang, Kun Kuang, Siliang Tang, Zhuoren Jiang, Changlong Sun, Fei Wu, and Yueting Zhuang. 2021. Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels. In ICCV.

[20]

Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, Adam PrügelBennett, and Jonathon Hare. 2020. Fmix: Enhancing mixed sample data augmentation. arXiv preprint arXiv:2002.12047 (2020).

[21]

Junlin Hou, Jilan Xu, Rui Feng, Yuejie Zhang, Fei Shan, and Weiya Shi. 2021. CMC-COV19D: Contrastive Mixup Classification for COVID-19 Diagnosis. In ICCV. 454--461.

[22]

William James and Charles Stein. 1992. Estimation with quadratic loss. In Breakthroughs in statistics. Springer, 443--460.

[23]

Michael Kearns and Dana Ron. 1999. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural computation 11, 6 (1999), 1427--1453.

[24]

Aviral Kumar, Sunita Sarawagi, and Ujjwal Jain. 2018. Trainable calibration measures for neural networks from kernel mean embeddings. In ICML. 2805-- 2814.

[25]

Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos. 2017. Mmd gan: Towards deeper understanding of moment matching network. In NeurIPS. 2203--2213.

[26]

David Lopez-Paz, Krikamol Muandet, Bernhard Schölkopf, and Iliya Tolstikhin. 2015. Towards a learning theory of cause-effect inference. In ICML. 1452--1461.

[27]

Laurens Maaten, Minmin Chen, Stephen Tyree, and Kilian Weinberger. 2013. Learning with marginalized corrupted features. In ICML. 410--418.

[28]

Hannah Marienwald, Jean-Baptiste Fermanian, and Gilles Blanchard. 2021. HighDimensional Multi-Task Averaging and Application to Kernel Mean Embedding. In AISTATS. 1963--1971.

[29]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE TPAMI 41, 8 (2018), 1979--1993.

[30]

Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, and Bernhard Schölkopf. 2012. Learning from distributions via support measure machines. In NeurIPS.

[31]

Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, and Bernhard Schölkopf. 2014. Kernel mean estimation and stein effect. In ICML.

[32]

Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Schölkopf, et al. 2017. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning 10, 1--2 (2017), 1--141.

[33]

Krikamol Muandet and Bernhard Schölkopf. 2013. One-class Support Measure Machines for Group Anomaly Detection. In UAI. 449--458.

[34]

Krikamol Muandet, Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, and Bernhard Schölkopf. 2016. Kernel mean shrinkage estimators. Journal of Machine Learning Research 17, 1 (2016), 1656--1696.

Digital Library

[35]

Krikamol Muandet, Bharath Sriperumbudur, and Bernhard Schölkopf. 2014. Kernel mean estimation via spectral filtering. In NeurIPS. 1--9.

[36]

Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks (2010).

[37]

Karl Pearson. 1900. On Lines and Planes of Closest Fit to Points in Space. Philos. Mag. 2, 11 (1900), 559--572.

[38]

Aaditya Ramdas and Leila Wehbe. 2015. Nonparametric independence testing for small sample sizes. In IJCAI.

[39]

Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. 2001. A generalized representer theorem. In ICOCLT. Springer, 416--426.

[40]

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. In IJANN. 583--588.

[41]

Patrice Y Simard, Yann A LeCun, John S Denker, and Bernard Victorri. 1998. Transformation invariance in pattern recognition-tangent distance and tangent propagation. In Neural networks: tricks of the trade.

[42]

Alex Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. 2007. A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory. Springer, 13--31.

Digital Library

[43]

Le Song, Byron Boots, Sajid M. Siddiqi, Geoffrey Gordon, and Alex Smola. 2010. Hilbert Space Embeddings of Hidden Markov Models. In ICML.

[44]

Le Song, Arthur Gretton, Danny Bickson, Yucheng Low, and Carlos Guestrin. 2011. Kernel belief propagation. arXiv preprint arXiv:1105.5592 (2011).

[45]

Le Song, Jonathan Huang, Alex Smola, and Kenji Fukumizu. 2009. Hilbert space embeddings of conditional distributions with applications to dynamical systems. In ICML. 961--968.

[46]

Le Song, Alex Smola, Arthur Gretton, Karsten M Borgwardt, and Justin Bedo. 2007. Supervised feature selection via dependence estimation. In ICML. 823--830.

[47]

Le Song, Xinhua Zhang, Alex Smola, Arthur Gretton, and Bernhard Schölkopf. 2008. Tailoring density estimation via reproducing kernel moment matching. In ICML.

[48]

Bharath K Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Gert Lanckriet, and Bernhard Schölkopf. 2008. Injective Hilbert space embeddings of probability measures. In COLT. Omnipress, 111--122.

[49]

Charles M Stein. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics (1981).

[50]

Zoltán Szabó, Arthur Gretton, Barnabás Póczos, and Bharath Sriperumbudur. 2015. Two-stage sampled learning theory on distributions. In AISTATS.

[51]

Ilya Tolstikhin, Bharath K Sriperumbudur, and Krikamol Muandet. 2017. Minimax estimation of kernel mean embeddings. Journal of Machine Learning Research 18, 1 (2017), 3002--3048.

Digital Library

[52]

Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In ICML.

[53]

Xiaobo Xia, Tongliang Liu, Bo Han, Chen Gong, Nannan Wang, Zongyuan Ge, and Yi Chang. 2021. Robust early-learning: Hindering the memorization of noisy labels. In ICLR.

[54]

Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, and Masashi Sugiyama. 2020. Part-dependent label noise: Towards instance-dependent label noise. In NeurIPS.

[55]

Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. 2019. Are Anchor Points Really Indispensable in Label-Noise Learning?. In NeurIPS. 6835--6846.

[56]

Xiaobo Xia, Wenhao Yang, Jie Ren, Yewen Li, Yibing Zhan, Bo Han, and Tongliang Liu. 2022. Pluralistic image completion with probabilistic mixture-of-experts. arXiv preprint arXiv:2205.09086 (2022).

[57]

Jian Yang, Alejandro F Frangi, Jing-yu Yang, David Zhang, and Zhong Jin. 2005. KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE TPAMI 27, 2 (2005), 230--244.

Digital Library

[58]

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV. 6023--6032.

[59]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In ICLR

Index Terms

Sample-Efficient Kernel Mean Estimator with Marginalized Corrupted Data
1. Computing methodologies
  1. Machine learning

Recommendations

Kernel mean shrinkage estimators

A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern ...
Reproducing kernel Hilbert C*-module and kernel mean embeddings

Kernel methods have been among the most popular techniques in machine learning, where learning tasks are solved using the property of reproducing kernel Hilbert space (RKHS). In this paper, we propose a novel data analysis framework with reproducing ...
Kernel least mean square with adaptive kernel size

Kernel adaptive filters (KAF) are a class of powerful nonlinear filters developed in Reproducing Kernel Hilbert Space (RKHS). The Gaussian kernel is usually the default kernel in KAF algorithms, but selecting the proper kernel size (bandwidth) is still ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
234
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten