Abstract
Overparameterized autoencoder models often memorize their training data. For image data, memorization is often examined by using the trained autoencoder to recover missing regions in its training images (that were used only in their complete forms in the training). In this paper, we propose an inverse problem perspective for the study of memorization. Given a degraded training image, we define the recovery of the original training image as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such recovery and memorization evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Appendices E-H are also available at https://arxiv.org/pdf/2310.02897.
- 2.
In our case where the original image pixel values are in [0, 1]: \(PSNR=10\log _{10}\left( {\frac{1}{MSE}}\right) \).
References
Afonso, M.V., Bioucas-Dias, J.M., Figueiredo, M.A.T.: Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Brifman, A., Romano, Y., Elad, M.: Turning a denoiser into a super-resolver using plug and play priors. In: 2016 IEEE International Conference on Image Processing (ICIP) (2016)
Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., Tramèr, F.: Membership inference attacks from first principles. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914 (2022)
Chan, S.H., Wang, X., Elgendy, O.A.: Plug-and-play ADMM for image restoration: fixed-point convergence and applications. IEEE Trans. Comput. Imag. 3(1), 84–98 (2017)
Dar, Y., Bruckstein, A.M., Elad, M., Giryes, R.: Postprocessing of compressed images via sequential denoising. IEEE Trans. Image Process. 25(7), 3044–3058 (2016)
Dar, Y., Mayer, P., Luzi, L., Baraniuk, R.G.: Subspace fitting meets regression: the effects of supervision and orthonormality constraints on double descent of generalization errors. In: International Conference on Machine Learning (ICML), pp. 2366–2375 (2020)
Hertrich, J., Neumayer, S., Steidl, G.: Convolutional proximal neural networks and plug-and-play algorithms. Linear Algebra Appl. 631, 203–234 (2021)
Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: a survey. ACM Comput. Surv. 54(11s), 1–37 (2022)
Jiang, Y., Pehlevan, C.: Associative memory in iterated overparameterized sigmoid autoencoders. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 4828–4838. PMLR (13–18 Jul 2020)
Kamilov, U.S., Mansour, H., Wohlberg, B.: A plug-and-play priors approach for solving nonlinear imaging inverse problems. IEEE Signal Process. Lett. 24(12), 1872–1876 (2017)
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Nouri, A., Seyyedsalehi, S.A.: Eigen value based loss function for training attractors in iterated autoencoders. Neural Netw. 161, 575–588 (2023)
Radhakrishnan, A., Belkin, M., Uhler, C.: Overparameterized neural networks implement associative memory. Proc. Natl. Acad. Sci. 117(44), 27162–27170 (2020)
Radhakrishnan, A., Uhler, C., Belkin, M.: Downsampling leads to image memorization in convolutional autoencoders (2018)
Radhakrishnan, A., Yang, K., Belkin, M., Uhler, C.: Memorization in overparameterized autoencoders. arXiv preprint arXiv:1810.10333 (2018)
Rond, A., Giryes, R., Elad, M.: Poisson inverse problems by the plug-and-play scheme. J. Vis. Commun. Image Represent. 41, 96–108 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Sreehari, S., et al.: Plug-and-play priors for bright field electron tomography and sparse interpolation. IEEE Trans. Comput. Imaging 2(4), 408–423 (2016)
Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based reconstruction. In: IEEE GlobalSIP (2013)
Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. In: International Conference on Learning Representations (ICLR) (2023)
Acknowledgements
This work was supported by the Lynn and William Frankel Center for Computer Science at Ben-Gurion University, and by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendices
A Proof of Theorem 1
In this Appendix, we prove Theorem 1.
Lemma A.1
Given a 2-layer tied autoencoder, f, which can be formulated as
for \(\textbf{x}\in \mathbb {R}^d\), where \(\textbf{W} \in \mathbb {R}^{m \times d}\), \(\rho : \mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\). Specifically, the activation function has a (separable) componentwise form
whose \(j^\textrm{th}\) component is denoted as \(z_j\), and for a scalar activation function \(\bar{\rho }:\mathbb {R}\rightarrow \mathbb {R}\). We denote \(\textbf{z}\triangleq \textbf{W} \textbf{x}\).
Then, the Jacobian matrix of f is
where \(\text {diag}(\cdot )\) represents a diagonal matrix with the given components along the main diagonal.
Proof
Let us define auxiliary variables. In addition to \(\textbf{z}\triangleq \textbf{W} \textbf{x}\), we also define \(\textbf{a}\triangleq \rho (\textbf{z})\) and \(\boldsymbol{\xi }\triangleq f(\textbf{x}) = \textbf{W}^{T} \textbf{a}\).
Then, by the chain rule, we get
Next, by definition, \(\frac{d\textbf{a}}{d\textbf{z}} = \frac{d\rho (\textbf{z})}{d\textbf{z}}\), and since \(\rho (\textbf{z})\) is a vector of componentwise activation functions (see (21)), this Jacobian is a \(m\times m\) diagonal matrix in the form of
Substituting (24) back into (23) gives the Jacobian formula of (22).
Corollary A.1
Let \(\bar{\rho }:\mathbb {R}\rightarrow \mathbb {R}\) be a scalar activation function that is differentiable and has derivatives in [0, 1], namely, \(\frac{d \bar{\rho }(z)}{dz} \in [0,1]\) for any \(z\in \mathbb {R}\). Then, for such activation function, a 2-layer tied autoencoder has a Jacobian in the form of \(\textbf{W}^{T} \textbf{D} \textbf{W}\), where \(\textbf{D}\) is a diagonal matrix whose values are in [0, 1].
Lemma A.2
Let \(\textbf{W}\in \mathbb {R}^{c_2\times c_1}\) and \(\textbf{D}\) is a \(c_2\times c_2\) diagonal matrix with values in [0, 1]. Then, \(\textbf{W}^T \textbf{D} \textbf{W}\) is a symmetric positive semi-definite matrix.
Proof
First, we show that the matrix is symmetric:
where \(\textbf{D}^T=\textbf{D}\) due to the symmetry of a diagonal matrix.
Now, we prove that the matrix \(\textbf{W}^T \textbf{D} \textbf{W}\) is positive semi-definite. Namely, we need to show that for any \(\textbf{r} \in \mathbb {R}^{c_1}\), \(\textbf{r}^T\textbf{W}^T\textbf{D}\textbf{W}\textbf{r} \ge 0\). Define \(\widetilde{\textbf{r}} \triangleq \textbf{W} \textbf{r}\), then we need to show that \(\widetilde{\textbf{r}}^T\textbf{D}\widetilde{\textbf{r}} \ge 0\). This holds because, by denoting \(\widetilde{r}_i\) as the \(i^\textrm{th}\) component of \(\widetilde{\textbf{r}}\) and \(\textbf{D}_{i,i}\) as the \(i^\textrm{th}\) main diagonal component of \(\textbf{D}\), we get \(\widetilde{\textbf{r}}^T\textbf{D}\widetilde{\textbf{r}}=\sum _{i=1}^{c_1} \textbf{D}_{i,i}\widetilde{r}_i^2 \ge 0\) because \(\textbf{D}_{i,i}\in [0,1]\) for any i.
Now we proceed to prove Theorem 1, i.e., that a tied autoencoder f from the class described in the theorem is a Moreau proximity operator.
Proof
We prove that \(f(\textbf{x})\) is a Moreau proximity operator by showing that the Jacobian matrix of \(f(\textbf{x})\) w.r.t. any \(\textbf{x} \in \mathbb {R}^d\) satisfies two properties: (i) the Jacobian is a symmetric matrix, and (ii) all the Jacobian matrix eigenvalues are real and in the range of \([0,1]\). Note that previous works on plug and play prior used conditions (i)-(ii) to prove that special types of denoisers are Moreau proximity operators, for example, see [21].
Consider a 2-layer tied autoencoder, f, which can be formulated as
where \(\textbf{W} \in \mathbb {R}^{m \times d}\) has all its singular values in [0, 1], and \(\rho : \mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is a componentwise activation function as in (21) that is based on a differentiable scalar activation function \(\bar{\rho }:\mathbb {R}\rightarrow \mathbb {R}\) whose derivative is in [0, 1].
We will now prove that f is a Moreau proximity operator.
From Corollary A.1 and Lemma A.2, we get that f has a Jacobian matrix \(\textbf{W}^{T} \textbf{D} \textbf{W}\), which is symmetric and positive semi-definite.
We will now prove that the eigenvalues of \(\textbf{W}^{T} \textbf{D} \textbf{W}\) are all in \([0,1]\). First, notice that the singular values of \(\textbf{D}\) are the same as the eigenvalues, which are the diagonal elements that are in \([0,1]\). In addition, the singular values of \(\textbf{W}^{T}\) are the same as the singular values of \(\textbf{W}\), which are in [0, 1] by the assumption of Theorem 1. Hence, the singular values of each of the matrices in the product \(\textbf{W}^{T} \textbf{D} \textbf{W}\) are in [0, 1]. It is also well known that for every two matrices, \(\textbf{A}\in \mathbb {R}^{q_1\times q_2}\), \(\textbf{B}\in \mathbb {R}^{q_2\times q_3}\),
where \(\sigma _i\) denotes the \(i^\textrm{th}\) largest singular value of a corresponding matrix. Hence, for \(\textbf{C}\in \mathbb {R}^{q_3\times q_4}\),
In our case, \(\sigma _1(\textbf{W}^{T})\le 1\), \(\sigma _1(\textbf{D})\le 1\), \(\sigma _i(\textbf{W})\le 1\), and therefore
Consequently, all the singular values of the Jacobian matrix are in [0, 1]. Moreover, for real symmetric matrices, the absolute values of the eigenvalues are equal to the singular values. Since the Jacobian of our 2-layer tied autoencoder is real and symmetric, by (26) we get that the eigenvalues of this Jacobian are in \([-1,1]\). Moreover, by Lemma A.2, the Jacobian is symmetric positive semi-definite and therefore its eigenvalues are non-negative; accordingly, all the eigenvalues of the Jacobian are in \([0,1]\).
To sum up, we showed that a 2-layer tied autoencoder that satisfies the conditions in Theorem 1 has a symmetric semi-positive definite Jacobian with eigenvalues in [0, 1]; therefore, such a 2-layer autoencoder is a Moreau proximity operator.
B Proof of Equation (17)
Recall the notations in (8). The optimization problem (17), i.e.,
has a closed form solution
Then, the diagonal structure of \(\mathbf {\Theta }\) with zeros and ones on its main diagonal implies that \(\mathbf {\Theta }^T \mathbf {\Theta } = \mathbf {\Theta }\) and, therefore,
that can be further simplified to the componentwise form of
where \(\widehat{\xi }_i^{(k)}\), \(y_i\), \(\widetilde{v}^{(k)}_i\) are the \(i^\textrm{th}\) components of the vectors \(\widehat{\boldsymbol{\xi }}^{(k)}\), \(\textbf{y}\), \(\widetilde{\textbf{v}}^{(k)}\), respectively.
C The Examined Autoencoder Architectures
We trained two fully connected (FC) autoencoder architectures, one with 10 layers and one with 20 layers (see Figure C.1 in the Supplementary Material). We also trained a U-Net autoencoder model (see Figure C.2 in the Supplementary Material). The activation functions used for training the models were Leaky ReLU, PReLU, and Softplus. To ensure reproducibility, all experiments were with seed 42.
1.1 C.1 Perfect Fitting Regime
In the experiments that arrive to the perfect fitting regime, the FC models were trained on 600 images from Tiny ImageNet (at \(64 \times 64 \times 3\) pixel size) up to a minimum MSE loss of \(10^{{-8}}\), which can be considered as numerical perfect fitting. During training, intermediate models at higher train loss values were saved and used later for evaluation of the recovery at lower overfitting levels (see, e.g., Figure F.2). We trained two versions of the 10-layer and 20-layer fully connected models using Leaky ReLU and PReLU activations for each architecture.
The U-Net model was trained on 50 images from the SVHN dataset (at \(32 \times 32 \times 3\) pixel size) also to an MSE train loss of \(10^{{-8}}\), while saving intermediate models at higher train loss values. We examined U-Net architectures for three different activation functions: Leaky ReLU, PReLU, and Softplus.
1.2 C.2 Moderate Overfitting Regime
In the moderate overfitting regime, we trained a 20-layer FC model on a larger subset of 25,000 images from Tiny ImageNet (at \(64 \times 64 \times 3\) pixel size) with Leaky ReLU activations to a loss of \(10^{{-4}}\). This achieves moderate overfitting, yet not perfect fitting of the training data.
The U-Net model was trained on 1000 images from the CIFAR-10 dataset (at \(32 \times 32 \times 3\) pixel size) to an MSE train loss of \(10^{{-4}}\). We examined U-Net architectures for three different activation functions: Leaky ReLU, PReLU, and Softplus.
D The Proposed Method: Additional Implementation Details
1.1 D.1 Stopping Criterion of the Proposed Method
The stopping criterion for the ADMM via Algorithm 1 (which solves equation (6)) is a predefined number of iterations. We set this to 40 iterations. Each alternating minimization iteration in our overall algorithm includes one ADMM procedure.
The stopping criterion for the entire recovery algorithm via alternating minimization, (6)-(7), is that the MSE between successive estimates \(\widehat{\textbf{x}}^{(t)}\) must be below a threshold for 3 consecutive iterations; we set this MSE threshold to \(10^{-9}\).
1.2 D.2 \(\gamma \) Values for the Proposed Method
The \(\gamma \) value of the ADMM (Algorithm 1) was set as follows. \(\gamma =0.5\) for the 10-layer FC autoencoder with LReLU activations. \(\gamma =0.1\) for the 10-layer FC autoencoder with PReLU activations, and for the 20-layer FC autoencoder. \(\gamma =1\) for the U-Net architecture.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Abitbul, K., Dar, Y. (2024). How Much Training Data Is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-70344-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70343-0
Online ISBN: 978-3-031-70344-7
eBook Packages: Computer ScienceComputer Science (R0)