Abstract
In supervised learning, the gap between the truth label and the model output is always portrayed by an error function, and a fixed error function corresponds to a specific noise distribution that provides for model optimization. However, the actual noise usually has a much more complex structure. To be better fit for it, in this paper, we propose a robust noise model that embeds a mixture of Gaussian (MoG) noise modeling strategy into a baseline classification model, which is selected as the Gaussian mixture model (GMM) here. Further, to facilitate the automatic selection of the number of mixture components, we apply the penalized likelihood method. Then, we utilize an alternative strategy to update the parameters of the noisy model and the basic GMM classifier. From the meta-learning perspective, the proposed model offers a novel approach to defining the hyperparameters from the error representation. Finally, we compare the proposed approach with three conventional and related classification methods on the synthetic, two benchmark handwriting recognition datasets and the Yale Face dataset. In addition, we embed the noise modeling strategy into the semantic segmentation task. The numerical results validate that our approach achieves the best performance and the efficiency of MoG noise modeling.
Similar content being viewed by others
Notes
MNIST and USPS Data availability statements are released at https://cs.nyu.edu/~roweis/data.html.
Yale Data availability statement is released at http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
GTA5 Data availability statement is released at https://download.visinf.tu-darmstadt.de/data/from_games/.
CityScapes Data availability statements is released at https://download.visinf.tu-darmstadt.de/data/from_games/.
References
Al-Shedivat M, Bansal T, Burda Y, Sutskever I, Mordatch I, Abbeel P (2017) Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641
Antoniou A, Edwards H, Storkey A (2018) How to train your MAML. arXiv preprint arXiv:1810.09502
Babacan SD, Luessi M, Molina R, Katsaggelos AK (2012) Sparse Bayesian methods for low-rank matrix estimation. IEEE Trans Signal Process 60(8):3964–3977
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):1–37
Chen P, Wang N, Zhang NL, Yeung DY (2015) Bayesian adaptive matrix factorization with automatic model selection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1284–1292
Chen X, Han Z, Wang Y, Zhao Q, Meng D, Tang Y (2016) Robust tensor factorization with unknown noise. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5213–5221
Chen Y, Li W, Chen X, Gool LV (2019) Learning semantic segmentation from synthetic data: A geometrically guided input-output adaptation approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1841–1850
Choi J, Kim T, Kim C (2019) Self-ensembling with gan-based data augmentation for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6830–6840
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc Ser B (Methodological) 39(1):1–22
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135
Gao Y, Zhang Z, Lin H, Zhao X, Zou C (2020) Hypergraph learning: methods and practices. IEEE Trans Pattern Anal Mach Intell 99:1–1
Gong R, Li W, Chen Y, Gool LV (2019) Dlow: Domain flow for adaptation and generalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2477–2486
Grant E, Finn C, Levine S, Darrell T, Griffiths T (2018) Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930
Guo Y, Wang W, Wang X (2021) A robust linear regression feature selection method for data sets with unknown noise. IEEE Trans Knowl Data Eng 35(1):31–44
Guo Y, Wang X, Ying S (2023) Domain adaptive semantic segmentation by optimal transport. arXiv preprint arXiv:2303.16435
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Hochreiter S, Younger AS, Conwell PR (2001) Learning to learn using gradient descent. In: International Conference on Artificial Neural Networks, pp. 87–94
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: Cycle-consistent adversarial domain adaptation. In: International conference on machine learning, pp. 1989–1998. Pmlr
Hospedales TM, Antoniou A, Micaelli P, Storkey AJ (2021) Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 44(9):5149–5169
Huang T, Peng H, Zhang K (2017) Model selection for Gaussian mixture models. Statist Sinica 27(1):147–169
Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25(3–4):491–509
Jiang W, Kwok J, Zhang Y (2022) Subspace learning for effective meta-learning. In: International Conference on Machine Learning, pp. 10177–10194. PMLR
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4):577–590
Lakshminarayanan B, Bouchard G, Archambeau C (2011) Robust Bayesian matrix factorisation. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 425–433
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lee DB, Min D, Lee S, Hwang SJ (2020) Meta-gmvae: mixture of Gaussian vae for unsupervised meta-learning. In: International Conference on Learning Representations
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10657–10665
Lee Y, Choi S (2018) Gradient-based meta-learning with learned layerwise metric and subspace. In: International Conference on Machine Learning, pp. 2927–2936
Li B, Zhang Y, Lin Z, Lu H (2015) Subspace clustering by mixture of Gaussian regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2094–2102
Li Y, Yang Y, Che J, Zhang L (2019) Predicting the number of nearest neighbor for KNN classifier. IAENG Int J Comput Sci 46(4):662–669
Li Z, Zhou F, Chen F, Li H (2017) Meta-SGD: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835
Liu JW, Ren ZP, Lu RK, Luo XL (2022) GMM discriminant analysis with noisy label for each class. Neural Comput Appl 33(4):1171–1191
Liu X, Bourennane S, Fossati C (2012) Denoising of hyperspectral images using the parafac model and statistical performance analysis. IEEE Trans Geosci Remote Sens 50(10):3717–3724
Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: IEEE international Conference on Computer Vision, pp. 2200–2207
Luo Y, Liu P, Guan T, Yu J, Yang Y (2019) Significance-aware information bottleneck for domain adaptive semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6778–6787
Luo Y, Zheng L, Guan T, Yu J, Yang Y (2019) Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2507–2516
Ma TH, Xu Z, Meng D, Zhao XL (2020) Hyperspectral image restoration combining intrinsic image characterization with robust noise modeling. IEEE J Sel Topics Appl Earth Observ Remote Sens 14:1628–1644
Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173
Ma Z, Teschendorff AE, Leijon A, Qiao Y, Zhang H, Guo J (2015) Variational Bayesian matrix factorization for bounded support data. IEEE Trans Pattern Anal Mach Intell 37(4):876–889
Ma Z, Xue JH, Leijon A, Tan ZH, Yang Z, Guo J (2016) Decorrelation of neutral vector variables: theory and applications. IEEE Trans Neural Netw Learn Syst 29(1):129–143
Manton JH, Amblard PO (2015) A primer on reproducing kernel Hilbert spaces. Found Trends Signal Process 8(1–2):1–126
Meng D, De La Torre F (2013) Robust matrix factorization with unknown noise. In: IEEE International Conference on Computer Vision, pp. 1337–1344
Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999
Papa JP, Fonseca LM, de Carvalho LA (2010) Projections onto convex sets through particle swarm optimization and its application for remote sensing image restoration. Pattern Recognit Lett 31(13):1876–1886
Renard N, Bourennane S, Blanc-Talon J (2008) Denoising and dimensionality reduction using multilinear tools for hyperspectral images. IEEE Geosci Remote Sens Lett 5(2):138–142
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960
Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T (2016) Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850. PMLR
Shahin I, Nassif AB, Hamsa S (2020) Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments. Neural Comput Appl 32(7):2575–2587
Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evolut Comput 22(2):276–295
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems 30
Tran D, Wagner M (2000) Fuzzy entropy clustering. In: Ninth IEEE International Conference on Fuzzy Systems, vol. 1, pp. 152–157. IEEE
Tsai YH, Hung WC, Schulter S, Sohn K, Yang MH, Chandraker M (2018) Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7472–7481
Wang H, Zhang C, Zhang S (2021) Robust Bayesian matrix decomposition with mixture of Gaussian noise. Neurocomputing 449:108–116
Wang J, Jiang J (2021) Unsupervised deep clustering via adaptive GMM modeling and optimization. Neurocomputing 433:199–211
Wang R, Zhou J, Jiang H, Han S, Wang L, Wang D, Chen Y (2021) A general transfer learning-based Gaussian mixture model for clustering. Int J Fuzzy Syst 23(3):776–793
Wang ZK, Yang ZB, Li HQ, Wu SM, Chen XF (2021) Robust sparse representation model for blade tip timing. J Sound Vibr 500(4):1–21
Xu W, Zhou Y, Wang X, Chen W (2020) Mog-based robust sparse representation for seismic erratic noise suppression. IEEE Geosci Remote Sens Lett 19:1–5
Yu C, Ning Y, Qin Y, Su W, Zhao X (2021) Multi-label fault diagnosis of rolling bearing based on meta-learning. Neural Comput Appl 33(10):5393–5407
Yu L, Antoni J, Deng J, Li C, Jiang W (2022) Low-rank Gaussian mixture modeling of space-snapshot representation of microphone array measurements for acoustic imaging in a complex noisy environment. Mech Syst Signal Process 165:1–20
Zhang HR, Qian J, Qu HL, Min F (2022) A Mixture-of-Gaussians model for estimating the magic barrier of the recommender system. Appl Soft Comput 114:1–11
Zhao Q, Meng D, Xu Z, Zuo W, Yan Y (2015) L1-norm low-rank matrix factorization by variational Bayesian method. IEEE Trans Neural Netw Learn Syst 26(4):825–839
Zhao Q, Meng D, Xu Z, Zuo W, Zhang L (2014) Robust principal component analysis with complex noise. In: International Conference on Machine Learning, pp. 55–63
Zhu JY, Park T, Isola P, Efros, AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China (No. 11971296) and the National Key R & D Program of China (No. 2021YFA1003004).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dong, J., Shi, J., Gao, Y. et al. GAME: GAussian Mixture Error-based meta-learning architecture. Neural Comput & Applic 35, 20445–20461 (2023). https://doi.org/10.1007/s00521-023-08843-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08843-z