How Much Training Data Is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

Abitbul, Koren; Dar, Yehuda

doi:10.1007/978-3-031-70344-7_19

Koren Abitbul¹³ &
Yehuda Dar¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14942))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

760 Accesses

Abstract

Overparameterized autoencoder models often memorize their training data. For image data, memorization is often examined by using the trained autoencoder to recover missing regions in its training images (that were used only in their complete forms in the training). In this paper, we propose an inverse problem perspective for the study of memorization. Given a degraded training image, we define the recovery of the original training image as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such recovery and memorization evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Analysis of Generative Methods for Multiple-Image Inpainting

An Analysis of Generative Methods for Multiple Image Inpainting

Learning Prior Feature and Attention Enhanced Image Inpainting

Notes

1.
Appendices E-H are also available at https://arxiv.org/pdf/2310.02897.
2.
In our case where the original image pixel values are in [0, 1]: $PSNR=10\log _{10}\left( {\frac{1}{MSE}}\right) $.

References

Afonso, M.V., Bioucas-Dias, J.M., Figueiredo, M.A.T.: Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)
Article MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Brifman, A., Romano, Y., Elad, M.: Turning a denoiser into a super-resolver using plug and play priors. In: 2016 IEEE International Conference on Image Processing (ICIP) (2016)
Google Scholar
Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., Tramèr, F.: Membership inference attacks from first principles. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914 (2022)
Google Scholar
Chan, S.H., Wang, X., Elgendy, O.A.: Plug-and-play ADMM for image restoration: fixed-point convergence and applications. IEEE Trans. Comput. Imag. 3(1), 84–98 (2017)
Article MathSciNet Google Scholar
Dar, Y., Bruckstein, A.M., Elad, M., Giryes, R.: Postprocessing of compressed images via sequential denoising. IEEE Trans. Image Process. 25(7), 3044–3058 (2016)
Article MathSciNet Google Scholar
Dar, Y., Mayer, P., Luzi, L., Baraniuk, R.G.: Subspace fitting meets regression: the effects of supervision and orthonormality constraints on double descent of generalization errors. In: International Conference on Machine Learning (ICML), pp. 2366–2375 (2020)
Google Scholar
Hertrich, J., Neumayer, S., Steidl, G.: Convolutional proximal neural networks and plug-and-play algorithms. Linear Algebra Appl. 631, 203–234 (2021)
Article MathSciNet Google Scholar
Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: a survey. ACM Comput. Surv. 54(11s), 1–37 (2022)
Article Google Scholar
Jiang, Y., Pehlevan, C.: Associative memory in iterated overparameterized sigmoid autoencoders. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 4828–4838. PMLR (13–18 Jul 2020)
Google Scholar
Kamilov, U.S., Mansour, H., Wohlberg, B.: A plug-and-play priors approach for solving nonlinear imaging inverse problems. IEEE Signal Process. Lett. 24(12), 1872–1876 (2017)
Article Google Scholar
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Article MathSciNet Google Scholar
Nouri, A., Seyyedsalehi, S.A.: Eigen value based loss function for training attractors in iterated autoencoders. Neural Netw. 161, 575–588 (2023)
Article Google Scholar
Radhakrishnan, A., Belkin, M., Uhler, C.: Overparameterized neural networks implement associative memory. Proc. Natl. Acad. Sci. 117(44), 27162–27170 (2020)
Article MathSciNet Google Scholar
Radhakrishnan, A., Uhler, C., Belkin, M.: Downsampling leads to image memorization in convolutional autoencoders (2018)
Google Scholar
Radhakrishnan, A., Yang, K., Belkin, M., Uhler, C.: Memorization in overparameterized autoencoders. arXiv preprint arXiv:1810.10333 (2018)
Rond, A., Giryes, R., Elad, M.: Poisson inverse problems by the plug-and-play scheme. J. Vis. Commun. Image Represent. 41, 96–108 (2016)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Article MathSciNet Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Google Scholar
Sreehari, S., et al.: Plug-and-play priors for bright field electron tomography and sparse interpolation. IEEE Trans. Comput. Imaging 2(4), 408–423 (2016)
Article MathSciNet Google Scholar
Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based reconstruction. In: IEEE GlobalSIP (2013)
Google Scholar
Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. In: International Conference on Learning Representations (ICLR) (2023)
Google Scholar

Download references

Acknowledgements

This work was supported by the Lynn and William Frankel Center for Computer Science at Ben-Gurion University, and by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev.

Author information

Authors and Affiliations

Department of Computer Science, Ben-Gurion University of the Negev, Beersheba, Israel
Koren Abitbul & Yehuda Dar

Authors

Koren Abitbul
View author publications
You can also search for this author in PubMed Google Scholar
Yehuda Dar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koren Abitbul .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6795 KB)

Appendices

A Proof of Theorem 1

In this Appendix, we prove Theorem 1.

Lemma A.1

Given a 2-layer tied autoencoder, f, which can be formulated as

$$\begin{aligned} f(\textbf{x}) = \textbf{W}^{T} \rho (\textbf{W} \textbf{x}) \end{aligned}$$

for $\textbf{x}\in \mathbb {R}^d$, where $\textbf{W} \in \mathbb {R}^{m \times d}$, $\rho : \mathbb {R}^{m}\rightarrow \mathbb {R}^{m}$. Specifically, the activation function has a (separable) componentwise form

$$\begin{aligned} \rho (\textbf{z})=[\bar{\rho }(z_1),\dots ,\bar{\rho }(z_m)]^T ~~~\text {for}~~ \textbf{z}\in \mathbb {R}^m \end{aligned}$$

(21)

whose $j^\textrm{th}$ component is denoted as $z_j$, and for a scalar activation function $\bar{\rho }:\mathbb {R}\rightarrow \mathbb {R}$. We denote $\textbf{z}\triangleq \textbf{W} \textbf{x}$.

Then, the Jacobian matrix of f is

$$\begin{aligned} \frac{df(\textbf{x})}{d\textbf{x}} = \textbf{W}^{T} \mathrm{{diag}}\left( \frac{d\bar{\rho }(z_1)}{d z_1}, \frac{d\bar{\rho }(z_2)}{d z_2}, \ldots , \frac{d\bar{\rho }(z_m)}{d z_m }\right) \textbf{W}. \end{aligned}$$

(22)

where $\text {diag}(\cdot )$ represents a diagonal matrix with the given components along the main diagonal.

Proof

Let us define auxiliary variables. In addition to $\textbf{z}\triangleq \textbf{W} \textbf{x}$, we also define $\textbf{a}\triangleq \rho (\textbf{z})$ and $\boldsymbol{\xi }\triangleq f(\textbf{x}) = \textbf{W}^{T} \textbf{a}$.

Then, by the chain rule, we get

$$\begin{aligned} \frac{df(\textbf{x})}{d\textbf{x}} = \frac{d\boldsymbol{\xi }}{d\textbf{a}} \cdot \frac{d\textbf{a}}{d\textbf{z}} \cdot \frac{d\textbf{z}}{d\textbf{x}} = \textbf{W}^{T} \cdot \frac{d\textbf{a}}{d\textbf{z}} \cdot \textbf{W} \end{aligned}$$

(23)

Next, by definition, $\frac{d\textbf{a}}{d\textbf{z}} = \frac{d\rho (\textbf{z})}{d\textbf{z}}$, and since $\rho (\textbf{z})$ is a vector of componentwise activation functions (see (21)), this Jacobian is a $m\times m$ diagonal matrix in the form of

$$\begin{aligned} \frac{d\rho (\textbf{z})}{d\textbf{z}} = \mathrm{{diag}}\left( \frac{d\bar{\rho }(z_1)}{d z_1}, \frac{d\bar{\rho }(z_2)}{d z_2}, \ldots , \frac{d\bar{\rho }(z_m)}{d z_m }\right) \end{aligned}$$

(24)

Substituting (24) back into (23) gives the Jacobian formula of (22).

Corollary A.1

Let $\bar{\rho }:\mathbb {R}\rightarrow \mathbb {R}$ be a scalar activation function that is differentiable and has derivatives in [0, 1], namely, $\frac{d \bar{\rho }(z)}{dz} \in [0,1]$ for any $z\in \mathbb {R}$. Then, for such activation function, a 2-layer tied autoencoder has a Jacobian in the form of $\textbf{W}^{T} \textbf{D} \textbf{W}$, where $\textbf{D}$ is a diagonal matrix whose values are in [0, 1].

Lemma A.2

Let $\textbf{W}\in \mathbb {R}^{c_2\times c_1}$ and $\textbf{D}$ is a $c_2\times c_2$ diagonal matrix with values in [0, 1]. Then, $\textbf{W}^T \textbf{D} \textbf{W}$ is a symmetric positive semi-definite matrix.

Proof

First, we show that the matrix is symmetric:

$$\begin{aligned} (\textbf{W}^T \textbf{D} \textbf{W})^T = \textbf{W}^T \textbf{D}^T \textbf{W} = \textbf{W}^T \textbf{D} \textbf{W}, \end{aligned}$$

where $\textbf{D}^T=\textbf{D}$ due to the symmetry of a diagonal matrix.

Now, we prove that the matrix $\textbf{W}^T \textbf{D} \textbf{W}$ is positive semi-definite. Namely, we need to show that for any $\textbf{r} \in \mathbb {R}^{c_1}$, $\textbf{r}^T\textbf{W}^T\textbf{D}\textbf{W}\textbf{r} \ge 0$. Define $\widetilde{\textbf{r}} \triangleq \textbf{W} \textbf{r}$, then we need to show that $\widetilde{\textbf{r}}^T\textbf{D}\widetilde{\textbf{r}} \ge 0$. This holds because, by denoting $\widetilde{r}_i$ as the $i^\textrm{th}$ component of $\widetilde{\textbf{r}}$ and $\textbf{D}_{i,i}$ as the $i^\textrm{th}$ main diagonal component of $\textbf{D}$, we get $\widetilde{\textbf{r}}^T\textbf{D}\widetilde{\textbf{r}}=\sum _{i=1}^{c_1} \textbf{D}_{i,i}\widetilde{r}_i^2 \ge 0$ because $\textbf{D}_{i,i}\in [0,1]$ for any i.

Now we proceed to prove Theorem 1, i.e., that a tied autoencoder f from the class described in the theorem is a Moreau proximity operator.

Proof

We prove that $f(\textbf{x})$ is a Moreau proximity operator by showing that the Jacobian matrix of $f(\textbf{x})$ w.r.t. any $\textbf{x} \in \mathbb {R}^d$ satisfies two properties: (i) the Jacobian is a symmetric matrix, and (ii) all the Jacobian matrix eigenvalues are real and in the range of $[0,1]$. Note that previous works on plug and play prior used conditions (i)-(ii) to prove that special types of denoisers are Moreau proximity operators, for example, see [21].

Consider a 2-layer tied autoencoder, f, which can be formulated as

$$\begin{aligned} f(\textbf{x}) = \textbf{W}^{T} \rho (\textbf{W} \textbf{x}) \end{aligned}$$

where $\textbf{W} \in \mathbb {R}^{m \times d}$ has all its singular values in [0, 1], and $\rho : \mathbb {R}^{m}\rightarrow \mathbb {R}^{m}$ is a componentwise activation function as in (21) that is based on a differentiable scalar activation function $\bar{\rho }:\mathbb {R}\rightarrow \mathbb {R}$ whose derivative is in [0, 1].

We will now prove that f is a Moreau proximity operator.

From Corollary A.1 and Lemma A.2, we get that f has a Jacobian matrix $\textbf{W}^{T} \textbf{D} \textbf{W}$, which is symmetric and positive semi-definite.

We will now prove that the eigenvalues of $\textbf{W}^{T} \textbf{D} \textbf{W}$ are all in $[0,1]$. First, notice that the singular values of $\textbf{D}$ are the same as the eigenvalues, which are the diagonal elements that are in $[0,1]$. In addition, the singular values of $\textbf{W}^{T}$ are the same as the singular values of $\textbf{W}$, which are in [0, 1] by the assumption of Theorem 1. Hence, the singular values of each of the matrices in the product $\textbf{W}^{T} \textbf{D} \textbf{W}$ are in [0, 1]. It is also well known that for every two matrices, $\textbf{A}\in \mathbb {R}^{q_1\times q_2}$, $\textbf{B}\in \mathbb {R}^{q_2\times q_3}$,

$$\begin{aligned} \sigma _i(\textbf{AB}) \le \sigma _1(\textbf{A}) \sigma _i(\textbf{B}) \end{aligned}$$

(25)

where $\sigma _i$ denotes the $i^\textrm{th}$ largest singular value of a corresponding matrix. Hence, for $\textbf{C}\in \mathbb {R}^{q_3\times q_4}$,

$$\begin{aligned} \sigma _i(\textbf{ABC}) \le \sigma _1(\textbf{A}) \sigma _1(\textbf{B}) \sigma _i(\textbf{C}). \end{aligned}$$

In our case, $\sigma _1(\textbf{W}^{T})\le 1$, $\sigma _1(\textbf{D})\le 1$, $\sigma _i(\textbf{W})\le 1$, and therefore

$$\begin{aligned} \sigma _i(\textbf{W}^{T} \textbf{D} \textbf{W}) \le \sigma _1(\textbf{W}^{T}) \sigma _1(\textbf{D}) \sigma _i(\textbf{W}) \le 1. \end{aligned}$$

(26)

Consequently, all the singular values of the Jacobian matrix are in [0, 1]. Moreover, for real symmetric matrices, the absolute values of the eigenvalues are equal to the singular values. Since the Jacobian of our 2-layer tied autoencoder is real and symmetric, by (26) we get that the eigenvalues of this Jacobian are in $[-1,1]$. Moreover, by Lemma A.2, the Jacobian is symmetric positive semi-definite and therefore its eigenvalues are non-negative; accordingly, all the eigenvalues of the Jacobian are in $[0,1]$.

To sum up, we showed that a 2-layer tied autoencoder that satisfies the conditions in Theorem 1 has a symmetric semi-positive definite Jacobian with eigenvalues in [0, 1]; therefore, such a 2-layer autoencoder is a Moreau proximity operator.

B Proof of Equation (17)

Recall the notations in (8). The optimization problem (17), i.e.,

$$\begin{aligned} \widehat{\boldsymbol{\xi }}^{(k)} = \mathop {\mathrm {arg\,min}}\limits _{\textbf{x}\in \mathbb {R}^d} \left\| {{ \mathbf {\Theta } \textbf{x} - \textbf{y}}}\right\| _2^2 + \frac{\gamma }{2} \left\| {{ \textbf{x} - \widetilde{\textbf{v}}^{(k)}}}\right\| _2^2 \end{aligned}$$

(27)

has a closed form solution

$$\begin{aligned} \widehat{\boldsymbol{\xi }}^{(k)} = \left( {\mathbf {\Theta }^T \mathbf {\Theta } + \frac{\gamma }{2}\textbf{I}}\right) ^{-1}\left( {\mathbf {\Theta }\textbf{y} + \frac{\gamma }{2}\widetilde{\textbf{v}}^{(k)}}\right) \end{aligned}$$

(28)

Then, the diagonal structure of $\mathbf {\Theta }$ with zeros and ones on its main diagonal implies that $\mathbf {\Theta }^T \mathbf {\Theta } = \mathbf {\Theta }$ and, therefore,

$$\begin{aligned} \widehat{\boldsymbol{\xi }}^{(k)} = \left( {\mathbf {\Theta } + \frac{\gamma }{2}\textbf{I}}\right) ^{-1}\left( {\mathbf {\Theta }\textbf{y} + \frac{\gamma }{2}\widetilde{\textbf{v}}^{(k)}}\right) \end{aligned}$$

(29)

that can be further simplified to the componentwise form of

$$ \widehat{\xi _i}^{(k)} = {\left\{ \begin{array}{ll} \widetilde{v}^{(k)}_i, & \text {if } \mathbf {\Theta }_{i,i} = 0 \\ \frac{y_i + \frac{\gamma }{2}\widetilde{v}^{(k)}_i}{1 + \frac{\gamma }{2}}, & \text {if } \mathbf {\Theta }_{i,i} = 1 \end{array}\right. } $$

where $\widehat{\xi }_i^{(k)}$, $y_i$, $\widetilde{v}^{(k)}_i$ are the $i^\textrm{th}$ components of the vectors $\widehat{\boldsymbol{\xi }}^{(k)}$, $\textbf{y}$, $\widetilde{\textbf{v}}^{(k)}$, respectively.

C The Examined Autoencoder Architectures

We trained two fully connected (FC) autoencoder architectures, one with 10 layers and one with 20 layers (see Figure C.1 in the Supplementary Material). We also trained a U-Net autoencoder model (see Figure C.2 in the Supplementary Material). The activation functions used for training the models were Leaky ReLU, PReLU, and Softplus. To ensure reproducibility, all experiments were with seed 42.

1.1 C.1 Perfect Fitting Regime

In the experiments that arrive to the perfect fitting regime, the FC models were trained on 600 images from Tiny ImageNet (at $64 \times 64 \times 3$ pixel size) up to a minimum MSE loss of $10^{{-8}}$, which can be considered as numerical perfect fitting. During training, intermediate models at higher train loss values were saved and used later for evaluation of the recovery at lower overfitting levels (see, e.g., Figure F.2). We trained two versions of the 10-layer and 20-layer fully connected models using Leaky ReLU and PReLU activations for each architecture.

The U-Net model was trained on 50 images from the SVHN dataset (at $32 \times 32 \times 3$ pixel size) also to an MSE train loss of $10^{{-8}}$, while saving intermediate models at higher train loss values. We examined U-Net architectures for three different activation functions: Leaky ReLU, PReLU, and Softplus.

1.2 C.2 Moderate Overfitting Regime

In the moderate overfitting regime, we trained a 20-layer FC model on a larger subset of 25,000 images from Tiny ImageNet (at $64 \times 64 \times 3$ pixel size) with Leaky ReLU activations to a loss of $10^{{-4}}$. This achieves moderate overfitting, yet not perfect fitting of the training data.

The U-Net model was trained on 1000 images from the CIFAR-10 dataset (at $32 \times 32 \times 3$ pixel size) to an MSE train loss of $10^{{-4}}$. We examined U-Net architectures for three different activation functions: Leaky ReLU, PReLU, and Softplus.

D The Proposed Method: Additional Implementation Details

1.1 D.1 Stopping Criterion of the Proposed Method

The stopping criterion for the ADMM via Algorithm 1 (which solves equation (6)) is a predefined number of iterations. We set this to 40 iterations. Each alternating minimization iteration in our overall algorithm includes one ADMM procedure.

The stopping criterion for the entire recovery algorithm via alternating minimization, (6)-(7), is that the MSE between successive estimates $\widehat{\textbf{x}}^{(t)}$ must be below a threshold for 3 consecutive iterations; we set this MSE threshold to $10^{-9}$.

1.2 D.2 $\gamma $ Values for the Proposed Method

The $\gamma $ value of the ADMM (Algorithm 1) was set as follows. $\gamma =0.5$ for the 10-layer FC autoencoder with LReLU activations. $\gamma =0.1$ for the 10-layer FC autoencoder with PReLU activations, and for the 20-layer FC autoencoder. $\gamma =1$ for the U-Net architecture.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abitbul, K., Dar, Y. (2024). How Much Training Data Is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-70344-7_19
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70343-0
Online ISBN: 978-3-031-70344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

How Much Training Data Is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Analysis of Generative Methods for Multiple-Image Inpainting

An Analysis of Generative Methods for Multiple Image Inpainting

Learning Prior Feature and Attention Enhanced Image Inpainting

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 6795 KB)

Appendices

Appendices

A Proof of Theorem 1

Lemma A.1

Proof

Corollary A.1

Lemma A.2

Proof

Proof

B Proof of Equation (17)

C The Examined Autoencoder Architectures

1.1 C.1 Perfect Fitting Regime

1.2 C.2 Moderate Overfitting Regime

D The Proposed Method: Additional Implementation Details

1.1 D.1 Stopping Criterion of the Proposed Method

1.2 D.2 \(\gamma \) Values for the Proposed Method

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships