Skip to main content
Log in

GAME: GAussian Mixture Error-based meta-learning architecture

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In supervised learning, the gap between the truth label and the model output is always portrayed by an error function, and a fixed error function corresponds to a specific noise distribution that provides for model optimization. However, the actual noise usually has a much more complex structure. To be better fit for it, in this paper, we propose a robust noise model that embeds a mixture of Gaussian (MoG) noise modeling strategy into a baseline classification model, which is selected as the Gaussian mixture model (GMM) here. Further, to facilitate the automatic selection of the number of mixture components, we apply the penalized likelihood method. Then, we utilize an alternative strategy to update the parameters of the noisy model and the basic GMM classifier. From the meta-learning perspective, the proposed model offers a novel approach to defining the hyperparameters from the error representation. Finally, we compare the proposed approach with three conventional and related classification methods on the synthetic, two benchmark handwriting recognition datasets and the Yale Face dataset. In addition, we embed the noise modeling strategy into the semantic segmentation task. The numerical results validate that our approach achieves the best performance and the efficiency of MoG noise modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. MNIST and USPS Data availability statements are released at https://cs.nyu.edu/~roweis/data.html.

  2. Yale Data availability statement is released at http://cvc.yale.edu/projects/yalefaces/yalefaces.html.

  3. GTA5 Data availability statement is released at https://download.visinf.tu-darmstadt.de/data/from_games/.

  4. CityScapes Data availability statements is released at https://download.visinf.tu-darmstadt.de/data/from_games/.

References

  1. Al-Shedivat M, Bansal T, Burda Y, Sutskever I, Mordatch I, Abbeel P (2017) Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641

  2. Antoniou A, Edwards H, Storkey A (2018) How to train your MAML. arXiv preprint arXiv:1810.09502

  3. Babacan SD, Luessi M, Molina R, Katsaggelos AK (2012) Sparse Bayesian methods for low-rank matrix estimation. IEEE Trans Signal Process 60(8):3964–3977

    Article  MathSciNet  MATH  Google Scholar 

  4. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):1–37

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen P, Wang N, Zhang NL, Yeung DY (2015) Bayesian adaptive matrix factorization with automatic model selection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1284–1292

  6. Chen X, Han Z, Wang Y, Zhao Q, Meng D, Tang Y (2016) Robust tensor factorization with unknown noise. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5213–5221

  7. Chen Y, Li W, Chen X, Gool LV (2019) Learning semantic segmentation from synthetic data: A geometrically guided input-output adaptation approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1841–1850

  8. Choi J, Kim T, Kim C (2019) Self-ensembling with gan-based data augmentation for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6830–6840

  9. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc Ser B (Methodological) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  10. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135

  11. Gao Y, Zhang Z, Lin H, Zhao X, Zou C (2020) Hypergraph learning: methods and practices. IEEE Trans Pattern Anal Mach Intell 99:1–1

    Article  Google Scholar 

  12. Gong R, Li W, Chen Y, Gool LV (2019) Dlow: Domain flow for adaptation and generalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2477–2486

  13. Grant E, Finn C, Levine S, Darrell T, Griffiths T (2018) Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930

  14. Guo Y, Wang W, Wang X (2021) A robust linear regression feature selection method for data sets with unknown noise. IEEE Trans Knowl Data Eng 35(1):31–44

    Google Scholar 

  15. Guo Y, Wang X, Ying S (2023) Domain adaptive semantic segmentation by optimal transport. arXiv preprint arXiv:2303.16435

  16. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York

    Book  MATH  Google Scholar 

  17. Hochreiter S, Younger AS, Conwell PR (2001) Learning to learn using gradient descent. In: International Conference on Artificial Neural Networks, pp. 87–94

  18. Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: Cycle-consistent adversarial domain adaptation. In: International conference on machine learning, pp. 1989–1998. Pmlr

  19. Hospedales TM, Antoniou A, Micaelli P, Storkey AJ (2021) Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 44(9):5149–5169

    Google Scholar 

  20. Huang T, Peng H, Zhang K (2017) Model selection for Gaussian mixture models. Statist Sinica 27(1):147–169

    MathSciNet  MATH  Google Scholar 

  21. Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25(3–4):491–509

    Article  Google Scholar 

  22. Jiang W, Kwok J, Zhang Y (2022) Subspace learning for effective meta-learning. In: International Conference on Machine Learning, pp. 10177–10194. PMLR

  23. Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4):577–590

    Article  MathSciNet  MATH  Google Scholar 

  24. Lakshminarayanan B, Bouchard G, Archambeau C (2011) Robust Bayesian matrix factorisation. In: 14th International Conference on Artificial Intelligence and Statistics, pp. 425–433

  25. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  26. Lee DB, Min D, Lee S, Hwang SJ (2020) Meta-gmvae: mixture of Gaussian vae for unsupervised meta-learning. In: International Conference on Learning Representations

  27. Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10657–10665

  28. Lee Y, Choi S (2018) Gradient-based meta-learning with learned layerwise metric and subspace. In: International Conference on Machine Learning, pp. 2927–2936

  29. Li B, Zhang Y, Lin Z, Lu H (2015) Subspace clustering by mixture of Gaussian regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2094–2102

  30. Li Y, Yang Y, Che J, Zhang L (2019) Predicting the number of nearest neighbor for KNN classifier. IAENG Int J Comput Sci 46(4):662–669

    Google Scholar 

  31. Li Z, Zhou F, Chen F, Li H (2017) Meta-SGD: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835

  32. Liu JW, Ren ZP, Lu RK, Luo XL (2022) GMM discriminant analysis with noisy label for each class. Neural Comput Appl 33(4):1171–1191

    Article  Google Scholar 

  33. Liu X, Bourennane S, Fossati C (2012) Denoising of hyperspectral images using the parafac model and statistical performance analysis. IEEE Trans Geosci Remote Sens 50(10):3717–3724

    Article  Google Scholar 

  34. Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: IEEE international Conference on Computer Vision, pp. 2200–2207

  35. Luo Y, Liu P, Guan T, Yu J, Yang Y (2019) Significance-aware information bottleneck for domain adaptive semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6778–6787

  36. Luo Y, Zheng L, Guan T, Yu J, Yang Y (2019) Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2507–2516

  37. Ma TH, Xu Z, Meng D, Zhao XL (2020) Hyperspectral image restoration combining intrinsic image characterization with robust noise modeling. IEEE J Sel Topics Appl Earth Observ Remote Sens 14:1628–1644

    Article  Google Scholar 

  38. Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173

    Article  Google Scholar 

  39. Ma Z, Teschendorff AE, Leijon A, Qiao Y, Zhang H, Guo J (2015) Variational Bayesian matrix factorization for bounded support data. IEEE Trans Pattern Anal Mach Intell 37(4):876–889

    Article  Google Scholar 

  40. Ma Z, Xue JH, Leijon A, Tan ZH, Yang Z, Guo J (2016) Decorrelation of neutral vector variables: theory and applications. IEEE Trans Neural Netw Learn Syst 29(1):129–143

    Article  MathSciNet  Google Scholar 

  41. Manton JH, Amblard PO (2015) A primer on reproducing kernel Hilbert spaces. Found Trends Signal Process 8(1–2):1–126

    Article  MathSciNet  MATH  Google Scholar 

  42. Meng D, De La Torre F (2013) Robust matrix factorization with unknown noise. In: IEEE International Conference on Computer Vision, pp. 1337–1344

  43. Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999

  44. Papa JP, Fonseca LM, de Carvalho LA (2010) Projections onto convex sets through particle swarm optimization and its application for remote sensing image restoration. Pattern Recognit Lett 31(13):1876–1886

    Article  Google Scholar 

  45. Renard N, Bourennane S, Blanc-Talon J (2008) Denoising and dimensionality reduction using multilinear tools for hyperspectral images. IEEE Geosci Remote Sens Lett 5(2):138–142

    Article  Google Scholar 

  46. Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960

  47. Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T (2016) Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850. PMLR

  48. Shahin I, Nassif AB, Hamsa S (2020) Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments. Neural Comput Appl 32(7):2575–2587

    Article  Google Scholar 

  49. Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evolut Comput 22(2):276–295

    Article  Google Scholar 

  50. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems 30

  51. Tran D, Wagner M (2000) Fuzzy entropy clustering. In: Ninth IEEE International Conference on Fuzzy Systems, vol. 1, pp. 152–157. IEEE

  52. Tsai YH, Hung WC, Schulter S, Sohn K, Yang MH, Chandraker M (2018) Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7472–7481

  53. Wang H, Zhang C, Zhang S (2021) Robust Bayesian matrix decomposition with mixture of Gaussian noise. Neurocomputing 449:108–116

    Article  Google Scholar 

  54. Wang J, Jiang J (2021) Unsupervised deep clustering via adaptive GMM modeling and optimization. Neurocomputing 433:199–211

    Article  Google Scholar 

  55. Wang R, Zhou J, Jiang H, Han S, Wang L, Wang D, Chen Y (2021) A general transfer learning-based Gaussian mixture model for clustering. Int J Fuzzy Syst 23(3):776–793

    Article  Google Scholar 

  56. Wang ZK, Yang ZB, Li HQ, Wu SM, Chen XF (2021) Robust sparse representation model for blade tip timing. J Sound Vibr 500(4):1–21

    Google Scholar 

  57. Xu W, Zhou Y, Wang X, Chen W (2020) Mog-based robust sparse representation for seismic erratic noise suppression. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  58. Yu C, Ning Y, Qin Y, Su W, Zhao X (2021) Multi-label fault diagnosis of rolling bearing based on meta-learning. Neural Comput Appl 33(10):5393–5407

    Article  Google Scholar 

  59. Yu L, Antoni J, Deng J, Li C, Jiang W (2022) Low-rank Gaussian mixture modeling of space-snapshot representation of microphone array measurements for acoustic imaging in a complex noisy environment. Mech Syst Signal Process 165:1–20

    Article  Google Scholar 

  60. Zhang HR, Qian J, Qu HL, Min F (2022) A Mixture-of-Gaussians model for estimating the magic barrier of the recommender system. Appl Soft Comput 114:1–11

    Article  Google Scholar 

  61. Zhao Q, Meng D, Xu Z, Zuo W, Yan Y (2015) L1-norm low-rank matrix factorization by variational Bayesian method. IEEE Trans Neural Netw Learn Syst 26(4):825–839

    Article  MathSciNet  Google Scholar 

  62. Zhao Q, Meng D, Xu Z, Zuo W, Zhang L (2014) Robust principal component analysis with complex noise. In: International Conference on Machine Learning, pp. 55–63

  63. Zhu JY, Park T, Isola P, Efros, AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shihui Ying.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (No. 11971296) and the National Key R & D Program of China (No. 2021YFA1003004).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, J., Shi, J., Gao, Y. et al. GAME: GAussian Mixture Error-based meta-learning architecture. Neural Comput & Applic 35, 20445–20461 (2023). https://doi.org/10.1007/s00521-023-08843-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08843-z

Keywords

Navigation