Abstract
Quantum machine learning offers a transformative approach to solving complex problems, but the inherent noise hinders its practical implementation in near-term quantum devices. This obstacle makes it challenging to understand the generalization capabilities of quantum circuit models. Designing robust quantum machine learning models under noise requires a principled understanding of complexity and generalization, extending beyond classical capacity measures. This study investigates the generalization properties of parameterized quantum machine learning models under the influence of noise. We present a data-dependent generalization bound grounded in the quantum Fisher information matrix. We leverage statistical learning theory to relate the parameter space volumes and training sizes to estimate the generalization capability of the trained model. By integrating local parameter neighborhoods and effective dimensions defined through quantum Fisher information matrix eigenvalues, we provide a structured characterization of complexity in quantum models. We analyze the tightness of the bound and discuss the trade-off between model expressiveness and generalization performance.


Similar content being viewed by others
References
Schuld M, Petruccione F (2021) Machine Learning with Quantum Computers, 2nd edn. Quantum Science and Technology. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-030-83098-4
Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195–202
Wang Y, Liu J (2024) A comprehensive review of quantum machine learning: from nisq to fault tolerance. arXiv preprint arXiv:2401.11351
Preskill J (2018) Quantum computing in the nisq era and beyond. Quantum 2:79
Torlai G, Melko RG (2020) Machine-learning quantum states in the nisq era. Annu Rev Condens Matter Phys 11(1):325–344
Khanal B, Rivas P, Sanjel A, Sooksatra K, Quevedo E, Rodriguez A (2024) Generalization error bound for quantum machine learning in nisq era-a survey. Quant Mach Intell 6(2):1–20
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212
Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nat Rev Phys 3(9):625–644
Lloyd S, Rebentrost P, Mohseni M (2013) Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411
Aïmeur E, Brassard G, Gambs S (2007) Quantum clustering algorithms. In: Proceedings of the 24th International Conference on Machine Learning, pp 1–8
Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning. Second Edition. https://mitpress.ublish.com/ebook/foundations-of-machine-learning--2-preview/7093/Cover
Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data: a short course
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms
Emami M, Sahraee-Ardakan M, Pandit P, Rangan S, Fletcher A (2020) Generalization error of generalized linear models in high dimensions. In: International Conference on Machine Learning, pp 2892–2901. PMLR
Jakubovitz D, Giryes R, Rodrigues MR (2019) Generalization error in deep learning. In: Compressed Sensing and Its Applications: Third International MATHEON Conference 2017, pp 153–193. Springer
Nadeau C, Bengio Y (1999) Inference for the generalization error. Adv Neural Inform Process Syst 12
Banchi L, Pereira J, Pirandola S (2021) Generalization in quantum machine learning: a quantum information standpoint. PRX Quant 2(4):040321
Gil-Fuster E, Eisert J, Bravo-Prieto C (2023) Understanding quantum machine learning also requires rethinking generalization. arXiv preprint arXiv:2306.13461
Caro MC, Gil-Fuster E, Meyer JJ, Eisert J, Sweke R (2021) Encoding-dependent generalization bounds for parametrized quantum circuits. Quantum 5:582
Khanal B, Rivas P (2023) Evaluating the impact of noise on variational quantum circuits in nisq era devices. In: 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)
Caro MC, Huang H-Y, Cerezo M, Sharma K, Sornborger A, Cincio L, Coles PJ (2022) Generalization in quantum machine learning from few training data. Nat Commun 13(1):4919
Caro MC, Gur T, Rouzé C, Franca DS, Subramanian S (2024) Information-theoretic generalization bounds for learning from quantum data. In: The Thirty Seventh Annual Conference on Learning Theory, pp 775–839. PMLR
Haug T, Kim M (2023) Generalization with quantum geometry for learning unitaries. arXiv preprint arXiv:2303.13462
Canatar A, Peters E, Pehlevan C, Wild SM, Shaydulin R (2022) Bandwidth enables generalization in quantum kernel models. arXiv preprint arXiv:2206.06686
Caro MC, Huang H-Y, Ezzell N, Gibbs J, Sornborger AT, Cincio L, Coles PJ, Holmes Z (2023) Out-of-distribution generalization for learning quantum dynamics. Nat Commun 14(1):3751
Abbas A, Sutter D, Zoufal C, Lucchi A, Figalli A, Woerner S (2021) The power of quantum neural networks. Nat Comput Sci 1(6):403–409
Hur T, Park DK (2024) Understanding generalization in quantum machine learning with margins. arXiv preprint arXiv:2411.06919
Martinis JM, Nam S, Aumentado J, Lang K, Urbina C (2003) Decoherence of a superconducting qubit due to bias noise. Phys Rev B 67(9):094510
Wang S, Fontana E, Cerezo M, Sharma K, Sone A, Cincio L, Coles PJ (2021) Noise-induced barren plateaus in variational quantum algorithms. Nat Commun 12(1):6961
Heyraud V, Li Z, Denis Z, Le Boité A, Ciuti C (2022) Noisy quantum kernel machines. Phys Rev A 106(5):052421
Shor PW (1995) Scheme for reducing decoherence in quantum computer memory. Phys Rev A 52(4):2493
Khanal B, Rivas P (2024) Learning robust observable to address noise in quantum machine learning. arXiv preprint arXiv:2409.07632
Shaib A, Naim MH, Fouda ME, Kanj R, Kurdahi F (2023) Efficient noise mitigation technique for quantum computing. Sci Rep 13(1):3912
Ferracin S, Hashim A, Ville J-L, Naik R, Carignan-Dugas A, Qassim H, Morvan A, Santiago DI, Siddiqi I, Wallman JJ (2024) Efficiently improving the performance of noisy quantum computers. Quantum 8:1410
Qi J, Yang C-HH, Chen P-Y, Hsieh M-H (2023) Theoretical error performance analysis for variational quantum circuit based functional regression. npj Quant Inform 9(1):4
Huang H-Y, Broughton M, Mohseni M, Babbush R, Boixo S, Neven H, McClean JR (2021) Power of data in quantum machine learning. Nat Commun 12(1):2631
Liu Y, Arunachalam S, Temme K (2021) A rigorous and robust quantum speed-up in supervised machine learning. Nat Phys 17(9):1013–1017
Du Y, Tu Z, Yuan X, Tao D (2022) Efficient measure for the expressivity of variational quantum algorithms. Phys Rev Lett 128(8):080506
Gentinetta G, Thomsen A, Sutter D, Woerner S (2024) The complexity of quantum support vector machines. Quantum 8:1225
Kübler J, Buchholz S, Schölkopf B (2021) The inductive bias of quantum kernels. Adv Neural Inf Process Syst 34:12661–12673
Thanasilp S, Wang S, Cerezo M, Holmes Z (2024) Exponential concentration in quantum kernel methods. Nat Commun 15(1):5200
Czarnik P, Arrasmith A, Coles PJ, Cincio L (2021) Error mitigation with clifford quantum-circuit data. Quantum 5:592
Cerezo M, Verdon G, Huang H-Y, Cincio L, Coles PJ (2022) Challenges and opportunities in quantum machine learning. Nat Comput Sci 2(9):567–576
Holmes Z, Sharma K, Cerezo M, Coles PJ (2022) Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quant 3(1):010313
Zhao C, Gao X-S (2021) Analyzing the barren plateau phenomenon in training quantum neural networks with the zx-calculus. Quantum 5:466
McClean JR, Boixo S, Smelyanskiy VN, Babbush R, Neven H (2018) Barren plateaus in quantum neural network training landscapes. Nat Commun 9(1):4812
Arrasmith A, Cerezo M, Czarnik P, Cincio L, Coles PJ (2021) Effect of barren plateaus on gradient-free optimization. Quantum 5:558
Nielsen MA, Chuang I (2002) Quantum computation and quantum information. American Association of Physics Teachers
Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. The J Mach Learn Res 11:2635–2670
Valle-Pérez G, Louis AA (2020) Generalization bounds for deep learning. arXiv preprint arXiv:2012.04115
Johansson FD, Shalit U, Kallus N, Sontag D (2022) Generalization bounds and representation learning for estimation of potential outcomes and causal effects. J Mach Learn Res 23(166):1–50
Pape AD, Kurtz KJ, Sayama H (2015) Complexity measures and concept learning. J Math Psychol 64:66–75
Bialek W, Nemenman I, Tishby N (2001) Predictability, complexity, and learning. Neural Comput 13(11):2409–2463
Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models. Quant Sci Technol 4(4):043001
Holevo AS (2019) Quantum Systems, Channels, Information: a Mathematical Introduction. Walter de Gruyter GmbH & Co KG
Guţă M, Kahn J (2006) Local asymptotic normality for qubit states. Phys Rev A-Atom Mol Opt Phys 73(5):052108
Magesan E, Gambetta JM, Emerson J (2011) Scalable and robust randomized benchmarking of quantum processes. Phys Rev Lett 106(18):180504
Preskill J (1998) Lecture notes for physics 229: quantum information and computation. Calif Inst Technol 16(1):1–8
Khanal B, Rivas P (2024) A modified depolarization approach for efficient quantum machine learning. Mathematics 12(9):1385
Haug T, Kim M (2024) Generalization of quantum machine learning models using quantum Fisher information metric. Phys Rev Lett 133(5):050603
Kasatkin V, Mozgunov E, Ezzell N, Lidar D (2024) Detecting quantum and classical phase transitions via unsupervised machine learning of the Fisher information metric. arXiv preprint arXiv:2408.03418
Bharti K (2021) Fisher information: a crucial tool for nisq research. Quant Views 5:61
Meyer JJ (2021) Fisher information in noisy intermediate-scale quantum applications. Quantum 5:539
Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Haug T, Bharti K, Kim M (2021) Capacity and quantum geometry of parametrized quantum circuits. PRX Quant 2(4):040309
Martens J (2020) New insights and perspectives on the natural gradient method. J Mach Learn Res 21(146):1–76
Stokes J, Izaac J, Killoran N, Carleo G (2020) Quantum natural gradient. Quantum 4:269
Baumgratz T, Nüßeler A, Cramer M, Plenio MB (2013) A scalable maximum likelihood method for quantum state tomography. New J Phys 15(12):125004
Braunstein SL, Caves CM (1994) Statistical distance and the geometry of quantum states. Phys Rev Lett 72(22):3439
Liu J, Yuan H, Lu X-M, Wang X (2020) Quantum Fisher information matrix and multiparameter estimation. J Phys A: Math Theor 53(2):023001
Fujiwara A (2001) Quantum channel identification problem. Phys Rev A 63(4):042304
Petz D, Ghinea C (2011) Introduction to quantum Fisher information. In: Quantum Probability and Related Topics, pp 261–281. World Scientific
Yamamoto N (2019) On the natural gradient for variational quantum eigensolver. arXiv preprint arXiv:1909.05074
Liu J, Xiong H-N, Song F, Wang X (2014) Fidelity susceptibility and quantum Fisher information for density operators with arbitrary ranks. Physica A 410:167–173
Abbas A, Sutter D, Figalli A, Woerner S (2021) Effective dimension of machine learning models. arXiv preprint arXiv:2112.04807
Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys Rev A 103(3):032430
Vapnik V (1998) Statistical learning theory. John Wiley & Sons google schola 2:831–842
Helstrom CW (1969) Quantum detection and estimation theory. J Stat Phys 1:231–252
Paris MG (2009) Quantum estimation for quantum technology. Int J Quant Inform 7(supp01):125–137
Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
Larocca M, Ju N, García-Martín D, Coles PJ, Cerezo M (2023) Theory of overparametrization in quantum neural networks. Nat Comput Sci 3(6):542–551
Ciliberto C, Herbster M, Ialongo AD, Pontil M, Rocchetto A, Severini S, Wossnig L (2018) Quantum machine learning: a classical perspective. Proc R Soc A: Math Phys Eng Sci 474(2209):20170551
Schuld M, Killoran N (2019) Quantum machine learning in feature Hilbert spaces. Phys Rev Lett 122(4):040504
Acknowledgements
This research was executed while P.R. and B.K. were funded by the National Science Foundation under grants NSF CISE-CNS Award 2136961 and 2210091.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
1.1 A.1: Parameter space geometry
In the main text, we introduced the Fisher information matrix (FIM) \({\mathcal {F}}(\theta )\) as a Riemannian metric on the parameter space \(\Theta \subset {\mathbb {R}}^d\). This induces a natural geometric structure on \(\Theta\): the geodesic distance between \(\theta\) and \(\theta ^{\prime }\) measures how different these parameter points are in terms of their influence on the model’s predictions, rather than just their Euclidean distance.
For a parameterized model and its associated FIM, we consider the volume element induced by \({\mathcal {F}}(\theta )\):
A geodesic ball \(B(\theta ,\epsilon )\) of radius \(\epsilon\) centered at \(\theta\) has volume
If \(\sqrt{\det ({\mathcal {F}}(\theta ))}\) is bounded by a positive constant \(m>0\), then for small \(\epsilon\), we can approximate:
where
is the volume of the unit ball in \({\mathbb {R}}^d\). Intuitively, a lower bound on \(\sqrt{\det ({\mathcal {F}}(\theta ))}\) ensures that geodesic balls are not “too small,” implying fewer truly distinct parameter configurations at \(\epsilon\). This geometric insight underpins the covering number bound we discuss next.
1.2 A.2: Covering numbers of a parameter space
The covering number of a parameter space \(\Theta\) measures how many balls of radius \(\epsilon\) are required to cover the entire space. Since the FIM provides a natural measure of distinguishability, a lower bound on \(\sqrt{\det ({\mathcal {F}}(\theta ))}\) relates volumes of small balls to parameter distinguishability.
Lemma A.1
(Covering Number and Volume) Let \(\Theta \subset {\mathbb {R}}^d\) be a compact parameter space with volume \(V_\Theta\). Assume that the determinant of the Fisher Information Matrix \({\mathcal {F}}(\theta )\) satisfies:
Then, for any \(\epsilon> 0\), the covering number \(N(\epsilon ,\Theta ,||\cdot ||)\) of \(\Theta\) with respect to the Euclidean norm \(||\cdot ||\) satisfies:
where \(C = \log V_\Theta - \log V_d - \log m\) and \(V_d = \frac{\pi ^{\frac{d}{2}}}{\Gamma \left( \frac{d}{2} + 1\right) }\) is the volume of a unit ball in \({\mathbb {R}}^d\).
Proof
Consider an \(\epsilon\)-ball around \(\theta\):
To cover \(\Theta\), the total volume of N such balls must be at least \(V_\Theta\). So we have:
Taking the logarithm:
This completes the proof of Lemma (A.1). \(\square\)
The inequality (A4) shows how covering numbers scale with dimension d and resolution \(\epsilon\). Smaller \(\epsilon\) or larger d requires more balls, thus more complex model classes. Conversely, a larger m reduces complexity by ensuring a certain “geometric rigidity” in the parameter space.
Appendix B
1.1 B.1: Bounding empirical Rademacher complexity
We now relate covering numbers to the empirical Rademacher complexity \(\hat{{\mathcal {R}}}_N({\mathcal {F}}_\Theta )\) of the model class \({\mathcal {F}}_\Theta = \{f_{\theta ,p}:\theta \in \Theta \}\). The Rademacher complexity quantifies the model’s capacity to fit random noise and thus provides an upper bound on generalization error.
Lemma B.1
(Empirical Rademacher Complexity Bound) Let \({\mathcal {F}}_\Theta = \{f_{\theta ,p}: \theta \in \Theta \}\) be a model function class. Let \({\mathcal {D}} = \{x_i,y_i\}_{i=1}^N\) be a dataset of N samples drawn from the distribution P. Assume that \(f_{\theta ,p}\) is bounded by the Lipschitz constant \(L_f^p\) and the determinant of the quantum Fisher information matrix is at least m. Then the empirical Rademacher complexity of the model class \({\mathcal {F}}_\Theta\) is bounded by:
where \(C^\prime = \log V_\Theta - \log V_d - \log m + d \log L_f^p\).
Proof
Given a dataset \({\mathcal {D}} = \{x_i,y_i\}_{i=1}^N\) the empirical Rademacher complexity of a model class \({\mathcal {F}}_\Theta\) is defined as:
where \(\sigma = \{\sigma _i\}_{i=1}^N\) are i.i.d Rademacher random variables taking values in \(\{-1,+1\}\) with equal probabilities. Dudley’s entropy integral provides an upper bound on the empirical Rademacher complexity in terms of the covering numbers. If we let \(\Vert \cdot \Vert _{2,{\mathcal {D}}}\) be the empirical \(L_2\)-norm over the sample \({\mathcal {D}}\), we have:
where \(N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}})\) is the covering number of \({\mathcal {F}}_\Theta\) with respect empirical \(L_2\) metric: \(||f - g ||_{2,{\mathcal {D}}} = \left( \frac{1}{N} \sum \limits _{i=1}^N (f(x_i) - g(x_i)) \right) ^{\frac{1}{2}}\) and \(\epsilon _{max} = \sup \limits _{f \in {\mathcal {F}}_\Theta } ||f||_{2,{\mathcal {D}}}\). Since \(f_{\theta ,p}\) is bounded by the Lipschitz constant \(L_f^p\), we can write: \(||f_{\theta ,p} - f_{\theta ^{\prime },p}||_{2,{\mathcal {D}}} \le L_f^p ||\theta - \theta ^{\prime }||, \quad \forall \theta , \theta ^{\prime } \in \Theta\). This implies that an \(\epsilon\)-cover of \(\Theta\) with respect to the Euclidean norm \(||\cdot ||\) is also an \(\epsilon L_f^p\)-cover of \({\mathcal {F}}_\Theta\) with respect to the empirical \(L_2\) metric. Using Lemma A.1, we can write:
Substituting this result into Eq. (B8) gives:
We perform the elementary calculus in the following derivations and if the reader trusts our calculations, they may skip the following steps.
Let \(t = - \log \epsilon\), so \(\epsilon = e^{-t}\) and \(d\epsilon = -e^{-t} dt\). The limit changes as: when \(\epsilon = \epsilon _{max}, t = \log \epsilon _{max}\) and as \(\epsilon\) decreases from \(\epsilon _{max}\) to 0, t increases from \(-\log \epsilon _{max}\) to 0. Thus:
Assuming \(\epsilon _{max} = 1\) (since \(f_{\theta ,p}(x)\) is bounded in [0,1]), we have \(t=0\) at \(\epsilon = 1\), so:
To make the derivation easier let us define:
Let \(S = C' +dt\), then \(t = \frac{S-C'}{d}\) and \(\frac{dt}{dS} = \frac{d}{dS} \left( \frac{S - C'}{d} \right) \implies dt = \frac{dS}{d}\) and \(e^{-t} = e^{-\left( \frac{S-C'}{d}\right) } = e^{\frac{C'}{d}}e^{\frac{-S}{d}}\). Since, \(C' \text { and } d\) are constants, we can write:
Let \(u = \frac{s}{d} \implies s = du \text { and } ds = d du\). Thus,
This integral is the gamma function and can be written as:
\(I = {e^{\frac{C'}{d}}}\sqrt{d} \Gamma \left( \frac{3}{2}, \frac{C'}{d}\right)\).
Since \(\Gamma (s,a) \le \Gamma (s)\) for \(a> 0\), and \(\Gamma \left( \frac{3}{2}\right) \ frac{\sqrt{\pi }}{2}\) we can write:
Substituting this back into Eq. (B8), we get:
This completes the proof of the Lemma (B.1). \(\square\)
This lemma encapsulates how geometry (via quantum FIM) and parameter space volume control Rademacher complexity. As N grows, the complexity term diminishes, indicating improved generalization potential.
Next, we discuss how the Rademacher complexity bound can be used to derive a generalization bound for quantum machine learning models.
1.2 B.2: Generalization bound
We now derive the generalization bound proposed in Theorem 5.1 in the main text. The bound follows from standard statistical learning theory and the Rademacher complexity result obtained above.
Theorem B.2
(Restatement of Theorem 5.1) Let \(d,N \in {\mathbb {N}}, \delta \in (0,1)\), and consider a parameter space \(\Theta \subset {\mathbb {R}}^d\). The quantum model class \({\mathcal {F}}_\Theta =\{f{\theta ,p}:\theta \in \Theta \}\) with noise parameter \(p\in [0,1)\) satisfies \(f_{\theta ,p}(x)=\eta (p)f_\theta (x)\) with \(\eta (0)=1\). Assume:
-
The loss \(l:{\mathcal {Y}}\times {\mathbb {R}}\rightarrow [0,1]\) is Lipschitz continuous in its second argument with constant \(L \le 1\).
-
The model gradients are bounded as: \(\Vert \nabla _\theta f_{\theta ,p}(x)\Vert \le L_f^p\).
-
The quantum FIM satisfies \(\sqrt{\det ({\mathcal {F}}(\theta ))}\ge m>0\).
Define \(C^{\prime } = \log V_\Theta - \log V_d - \log m + d \log L_f^p\).
Then with probability at least 1-\(\delta\) over an i.i.d. sample \(D=\{x_i,y_i\}_{i=1}^N\) of size N,
uniformly for all \(\theta \in \Theta\).
Proof
A standard result in learning theory states that with probability at least \(1-\delta\):
where \({\mathcal {L}}=\{(x,y)\mapsto l(y,f_{\theta ,p}(x)):\theta \in \Theta \}\).
Since \(l(y,{\hat{y}})\) is Lipschitz with constant L and \(f_{\theta ,p}(x)\) is bounded by \(L_f^p\), it follows that \({\mathcal {L}}\) has Rademacher complexity at most \(L\cdot L_f^p\). Applying Lemma B.1:
For \(L\le 1\), this simplifies directly to:
Substitute back into the generalization inequality:
This gives:
matching the statement of Theorem 5.1. \(\square\)
This completes the proof of the generalization bound for Theorem 5.1. The interplay between Fisher information geometry, covering numbers, and Rademacher complexity provides a path from parameter space properties to explicit generalization guarantees. In practice, after training, restricting to local regions or reducing dimension to the effective dimension often yields sharper bounds.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khanal, B., Rivas, P. Data-dependent generalization bounds for parameterized quantum models under noise. J Supercomput 81, 611 (2025). https://doi.org/10.1007/s11227-025-06966-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-06966-9