Data-dependent generalization bounds for parameterized quantum models under noise

Khanal, Bikram; Rivas, Pablo

doi:10.1007/s11227-025-06966-9

Data-dependent generalization bounds for parameterized quantum models under noise

Published: 13 March 2025

Volume 81, article number 611, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

133 Accesses
Explore all metrics

Abstract

Quantum machine learning offers a transformative approach to solving complex problems, but the inherent noise hinders its practical implementation in near-term quantum devices. This obstacle makes it challenging to understand the generalization capabilities of quantum circuit models. Designing robust quantum machine learning models under noise requires a principled understanding of complexity and generalization, extending beyond classical capacity measures. This study investigates the generalization properties of parameterized quantum machine learning models under the influence of noise. We present a data-dependent generalization bound grounded in the quantum Fisher information matrix. We leverage statistical learning theory to relate the parameter space volumes and training sizes to estimate the generalization capability of the trained model. By integrating local parameter neighborhoods and effective dimensions defined through quantum Fisher information matrix eigenvalues, we provide a structured characterization of complexity in quantum models. We analyze the tightness of the bound and discuss the trade-off between model expressiveness and generalization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding quantum machine learning also requires rethinking generalization

Article Open access 13 March 2024

Out-of-distribution generalization for learning quantum dynamics

Article Open access 05 July 2023

Generalization in quantum machine learning from few training data

Article Open access 22 August 2022

References

Schuld M, Petruccione F (2021) Machine Learning with Quantum Computers, 2nd edn. Quantum Science and Technology. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-030-83098-4
Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195–202
Article MATH Google Scholar
Wang Y, Liu J (2024) A comprehensive review of quantum machine learning: from nisq to fault tolerance. arXiv preprint arXiv:2401.11351
Preskill J (2018) Quantum computing in the nisq era and beyond. Quantum 2:79
Article MATH Google Scholar
Torlai G, Melko RG (2020) Machine-learning quantum states in the nisq era. Annu Rev Condens Matter Phys 11(1):325–344
Article MATH Google Scholar
Khanal B, Rivas P, Sanjel A, Sooksatra K, Quevedo E, Rodriguez A (2024) Generalization error bound for quantum machine learning in nisq era-a survey. Quant Mach Intell 6(2):1–20
Google Scholar
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212
Article Google Scholar
Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nat Rev Phys 3(9):625–644
Article Google Scholar
Lloyd S, Rebentrost P, Mohseni M (2013) Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411
Aïmeur E, Brassard G, Gambs S (2007) Quantum clustering algorithms. In: Proceedings of the 24th International Conference on Machine Learning, pp 1–8
Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning. Second Edition. https://mitpress.ublish.com/ebook/foundations-of-machine-learning--2-preview/7093/Cover
Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data: a short course
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms
Emami M, Sahraee-Ardakan M, Pandit P, Rangan S, Fletcher A (2020) Generalization error of generalized linear models in high dimensions. In: International Conference on Machine Learning, pp 2892–2901. PMLR
Jakubovitz D, Giryes R, Rodrigues MR (2019) Generalization error in deep learning. In: Compressed Sensing and Its Applications: Third International MATHEON Conference 2017, pp 153–193. Springer
Nadeau C, Bengio Y (1999) Inference for the generalization error. Adv Neural Inform Process Syst 12
Banchi L, Pereira J, Pirandola S (2021) Generalization in quantum machine learning: a quantum information standpoint. PRX Quant 2(4):040321
Article MATH Google Scholar
Gil-Fuster E, Eisert J, Bravo-Prieto C (2023) Understanding quantum machine learning also requires rethinking generalization. arXiv preprint arXiv:2306.13461
Caro MC, Gil-Fuster E, Meyer JJ, Eisert J, Sweke R (2021) Encoding-dependent generalization bounds for parametrized quantum circuits. Quantum 5:582
Article MATH Google Scholar
Khanal B, Rivas P (2023) Evaluating the impact of noise on variational quantum circuits in nisq era devices. In: 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)
Caro MC, Huang H-Y, Cerezo M, Sharma K, Sornborger A, Cincio L, Coles PJ (2022) Generalization in quantum machine learning from few training data. Nat Commun 13(1):4919
Article MATH Google Scholar
Caro MC, Gur T, Rouzé C, Franca DS, Subramanian S (2024) Information-theoretic generalization bounds for learning from quantum data. In: The Thirty Seventh Annual Conference on Learning Theory, pp 775–839. PMLR
Haug T, Kim M (2023) Generalization with quantum geometry for learning unitaries. arXiv preprint arXiv:2303.13462
Canatar A, Peters E, Pehlevan C, Wild SM, Shaydulin R (2022) Bandwidth enables generalization in quantum kernel models. arXiv preprint arXiv:2206.06686
Caro MC, Huang H-Y, Ezzell N, Gibbs J, Sornborger AT, Cincio L, Coles PJ, Holmes Z (2023) Out-of-distribution generalization for learning quantum dynamics. Nat Commun 14(1):3751
Article Google Scholar
Abbas A, Sutter D, Zoufal C, Lucchi A, Figalli A, Woerner S (2021) The power of quantum neural networks. Nat Comput Sci 1(6):403–409
Article Google Scholar
Hur T, Park DK (2024) Understanding generalization in quantum machine learning with margins. arXiv preprint arXiv:2411.06919
Martinis JM, Nam S, Aumentado J, Lang K, Urbina C (2003) Decoherence of a superconducting qubit due to bias noise. Phys Rev B 67(9):094510
Article Google Scholar
Wang S, Fontana E, Cerezo M, Sharma K, Sone A, Cincio L, Coles PJ (2021) Noise-induced barren plateaus in variational quantum algorithms. Nat Commun 12(1):6961
Article MATH Google Scholar
Heyraud V, Li Z, Denis Z, Le Boité A, Ciuti C (2022) Noisy quantum kernel machines. Phys Rev A 106(5):052421
Article MathSciNet Google Scholar
Shor PW (1995) Scheme for reducing decoherence in quantum computer memory. Phys Rev A 52(4):2493
Article MATH Google Scholar
Khanal B, Rivas P (2024) Learning robust observable to address noise in quantum machine learning. arXiv preprint arXiv:2409.07632
Shaib A, Naim MH, Fouda ME, Kanj R, Kurdahi F (2023) Efficient noise mitigation technique for quantum computing. Sci Rep 13(1):3912
Article Google Scholar
Ferracin S, Hashim A, Ville J-L, Naik R, Carignan-Dugas A, Qassim H, Morvan A, Santiago DI, Siddiqi I, Wallman JJ (2024) Efficiently improving the performance of noisy quantum computers. Quantum 8:1410
Article Google Scholar
Qi J, Yang C-HH, Chen P-Y, Hsieh M-H (2023) Theoretical error performance analysis for variational quantum circuit based functional regression. npj Quant Inform 9(1):4
Article MATH Google Scholar
Huang H-Y, Broughton M, Mohseni M, Babbush R, Boixo S, Neven H, McClean JR (2021) Power of data in quantum machine learning. Nat Commun 12(1):2631
Article MATH Google Scholar
Liu Y, Arunachalam S, Temme K (2021) A rigorous and robust quantum speed-up in supervised machine learning. Nat Phys 17(9):1013–1017
Article MATH Google Scholar
Du Y, Tu Z, Yuan X, Tao D (2022) Efficient measure for the expressivity of variational quantum algorithms. Phys Rev Lett 128(8):080506
Article MathSciNet MATH Google Scholar
Gentinetta G, Thomsen A, Sutter D, Woerner S (2024) The complexity of quantum support vector machines. Quantum 8:1225
Article Google Scholar
Kübler J, Buchholz S, Schölkopf B (2021) The inductive bias of quantum kernels. Adv Neural Inf Process Syst 34:12661–12673
MATH Google Scholar
Thanasilp S, Wang S, Cerezo M, Holmes Z (2024) Exponential concentration in quantum kernel methods. Nat Commun 15(1):5200
Article MATH Google Scholar
Czarnik P, Arrasmith A, Coles PJ, Cincio L (2021) Error mitigation with clifford quantum-circuit data. Quantum 5:592
Article MATH Google Scholar
Cerezo M, Verdon G, Huang H-Y, Cincio L, Coles PJ (2022) Challenges and opportunities in quantum machine learning. Nat Comput Sci 2(9):567–576
Article MATH Google Scholar
Holmes Z, Sharma K, Cerezo M, Coles PJ (2022) Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quant 3(1):010313
Article Google Scholar
Zhao C, Gao X-S (2021) Analyzing the barren plateau phenomenon in training quantum neural networks with the zx-calculus. Quantum 5:466
Article MATH Google Scholar
McClean JR, Boixo S, Smelyanskiy VN, Babbush R, Neven H (2018) Barren plateaus in quantum neural network training landscapes. Nat Commun 9(1):4812
Article MATH Google Scholar
Arrasmith A, Cerezo M, Czarnik P, Cincio L, Coles PJ (2021) Effect of barren plateaus on gradient-free optimization. Quantum 5:558
Article Google Scholar
Nielsen MA, Chuang I (2002) Quantum computation and quantum information. American Association of Physics Teachers
Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. The J Mach Learn Res 11:2635–2670
MathSciNet MATH Google Scholar
Valle-Pérez G, Louis AA (2020) Generalization bounds for deep learning. arXiv preprint arXiv:2012.04115
Johansson FD, Shalit U, Kallus N, Sontag D (2022) Generalization bounds and representation learning for estimation of potential outcomes and causal effects. J Mach Learn Res 23(166):1–50
MathSciNet MATH Google Scholar
Pape AD, Kurtz KJ, Sayama H (2015) Complexity measures and concept learning. J Math Psychol 64:66–75
Article MathSciNet MATH Google Scholar
Bialek W, Nemenman I, Tishby N (2001) Predictability, complexity, and learning. Neural Comput 13(11):2409–2463
Article MATH Google Scholar
Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models. Quant Sci Technol 4(4):043001
Article Google Scholar
Holevo AS (2019) Quantum Systems, Channels, Information: a Mathematical Introduction. Walter de Gruyter GmbH & Co KG
Guţă M, Kahn J (2006) Local asymptotic normality for qubit states. Phys Rev A-Atom Mol Opt Phys 73(5):052108
Article MathSciNet MATH Google Scholar
Magesan E, Gambetta JM, Emerson J (2011) Scalable and robust randomized benchmarking of quantum processes. Phys Rev Lett 106(18):180504
Article MATH Google Scholar
Preskill J (1998) Lecture notes for physics 229: quantum information and computation. Calif Inst Technol 16(1):1–8
MATH Google Scholar
Khanal B, Rivas P (2024) A modified depolarization approach for efficient quantum machine learning. Mathematics 12(9):1385
Article MATH Google Scholar
Haug T, Kim M (2024) Generalization of quantum machine learning models using quantum Fisher information metric. Phys Rev Lett 133(5):050603
Article MathSciNet MATH Google Scholar
Kasatkin V, Mozgunov E, Ezzell N, Lidar D (2024) Detecting quantum and classical phase transitions via unsupervised machine learning of the Fisher information metric. arXiv preprint arXiv:2408.03418
Bharti K (2021) Fisher information: a crucial tool for nisq research. Quant Views 5:61
Article MATH Google Scholar
Meyer JJ (2021) Fisher information in noisy intermediate-scale quantum applications. Quantum 5:539
Article MATH Google Scholar
Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Article MATH Google Scholar
Haug T, Bharti K, Kim M (2021) Capacity and quantum geometry of parametrized quantum circuits. PRX Quant 2(4):040309
Article MATH Google Scholar
Martens J (2020) New insights and perspectives on the natural gradient method. J Mach Learn Res 21(146):1–76
MathSciNet MATH Google Scholar
Stokes J, Izaac J, Killoran N, Carleo G (2020) Quantum natural gradient. Quantum 4:269
Article MATH Google Scholar
Baumgratz T, Nüßeler A, Cramer M, Plenio MB (2013) A scalable maximum likelihood method for quantum state tomography. New J Phys 15(12):125004
Article MATH Google Scholar
Braunstein SL, Caves CM (1994) Statistical distance and the geometry of quantum states. Phys Rev Lett 72(22):3439
Article MathSciNet MATH Google Scholar
Liu J, Yuan H, Lu X-M, Wang X (2020) Quantum Fisher information matrix and multiparameter estimation. J Phys A: Math Theor 53(2):023001
Article MathSciNet MATH Google Scholar
Fujiwara A (2001) Quantum channel identification problem. Phys Rev A 63(4):042304
Article MATH Google Scholar
Petz D, Ghinea C (2011) Introduction to quantum Fisher information. In: Quantum Probability and Related Topics, pp 261–281. World Scientific
Yamamoto N (2019) On the natural gradient for variational quantum eigensolver. arXiv preprint arXiv:1909.05074
Liu J, Xiong H-N, Song F, Wang X (2014) Fidelity susceptibility and quantum Fisher information for density operators with arbitrary ranks. Physica A 410:167–173
Article MathSciNet MATH Google Scholar
Abbas A, Sutter D, Figalli A, Woerner S (2021) Effective dimension of machine learning models. arXiv preprint arXiv:2112.04807
Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys Rev A 103(3):032430
Article MathSciNet MATH Google Scholar
Vapnik V (1998) Statistical learning theory. John Wiley & Sons google schola 2:831–842
Helstrom CW (1969) Quantum detection and estimation theory. J Stat Phys 1:231–252
Article MathSciNet MATH Google Scholar
Paris MG (2009) Quantum estimation for quantum technology. Int J Quant Inform 7(supp01):125–137
Article MATH Google Scholar
Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
MathSciNet MATH Google Scholar
Larocca M, Ju N, García-Martín D, Coles PJ, Cerezo M (2023) Theory of overparametrization in quantum neural networks. Nat Comput Sci 3(6):542–551
Article Google Scholar
Ciliberto C, Herbster M, Ialongo AD, Pontil M, Rocchetto A, Severini S, Wossnig L (2018) Quantum machine learning: a classical perspective. Proc R Soc A: Math Phys Eng Sci 474(2209):20170551
Article MathSciNet MATH Google Scholar
Schuld M, Killoran N (2019) Quantum machine learning in feature Hilbert spaces. Phys Rev Lett 122(4):040504
Article MATH Google Scholar

Download references

Acknowledgements

This research was executed while P.R. and B.K. were funded by the National Science Foundation under grants NSF CISE-CNS Award 2136961 and 2210091.

Author information

Authors and Affiliations

School of Engineering & Computer Science, Baylor University, One Bear Place, Waco, 76798, TX, USA
Bikram Khanal & Pablo Rivas

Authors

Bikram Khanal
View author publications
Search author on:PubMed Google Scholar
Pablo Rivas
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Pablo Rivas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 A.1: Parameter space geometry

In the main text, we introduced the Fisher information matrix (FIM) ${\mathcal {F}}(\theta )$ as a Riemannian metric on the parameter space $\Theta \subset {\mathbb {R}}^d$. This induces a natural geometric structure on $\Theta$: the geodesic distance between $\theta$ and $\theta ^{\prime }$ measures how different these parameter points are in terms of their influence on the model’s predictions, rather than just their Euclidean distance.

For a parameterized model and its associated FIM, we consider the volume element induced by ${\mathcal {F}}(\theta )$:

$$\begin{aligned} dV(\theta ) = \sqrt{\det ({\mathcal {F}}(\theta ))}\, d\theta ^1 \cdots d\theta ^d. \end{aligned}$$

(A1)

A geodesic ball $B(\theta ,\epsilon )$ of radius $\epsilon$ centered at $\theta$ has volume

$$\begin{aligned} V(\theta ,\epsilon ) = \int _{B(\theta ,\epsilon )} dV(\theta ^{\prime }) = \int _{B(\theta ,\epsilon )} \sqrt{\det ({\mathcal {F}}(\theta ^{\prime }))}\, d\theta ^{\prime }. \end{aligned}$$

(A2)

If $\sqrt{\det ({\mathcal {F}}(\theta ))}$ is bounded by a positive constant $m>0$, then for small $\epsilon$, we can approximate:

$$\begin{aligned} V(\theta ,\epsilon ) \approx V_d \epsilon ^d \sqrt{\det ({\mathcal {F}}(\theta ))} \ge V_d \epsilon ^d m, \end{aligned}$$

(A3)

where

$$\begin{aligned} V_d = \frac{\pi ^{d/2}}{\Gamma \left( \tfrac{d}{2} + 1\right) } \end{aligned}$$

is the volume of the unit ball in ${\mathbb {R}}^d$. Intuitively, a lower bound on $\sqrt{\det ({\mathcal {F}}(\theta ))}$ ensures that geodesic balls are not “too small,” implying fewer truly distinct parameter configurations at $\epsilon$. This geometric insight underpins the covering number bound we discuss next.

1.2 A.2: Covering numbers of a parameter space

The covering number of a parameter space $\Theta$ measures how many balls of radius $\epsilon$ are required to cover the entire space. Since the FIM provides a natural measure of distinguishability, a lower bound on $\sqrt{\det ({\mathcal {F}}(\theta ))}$ relates volumes of small balls to parameter distinguishability.

Lemma A.1

(Covering Number and Volume) Let $\Theta \subset {\mathbb {R}}^d$ be a compact parameter space with volume $V_\Theta$. Assume that the determinant of the Fisher Information Matrix ${\mathcal {F}}(\theta )$ satisfies:

$$\begin{aligned} \sqrt{\det ({\mathcal {F}}(\theta ))} \ge m> 0, \quad \forall \theta \in \Theta . \end{aligned}$$

Then, for any $\epsilon> 0$, the covering number $N(\epsilon ,\Theta ,||\cdot ||)$ of $\Theta$ with respect to the Euclidean norm $||\cdot ||$ satisfies:

$$\begin{aligned} \log N(\epsilon ,\Theta ,||\cdot ||) \le C - d \log \epsilon , \end{aligned}$$

(A4)

where $C = \log V_\Theta - \log V_d - \log m$ and $V_d = \frac{\pi ^{\frac{d}{2}}}{\Gamma \left( \frac{d}{2} + 1\right) }$ is the volume of a unit ball in ${\mathbb {R}}^d$.

Proof

Consider an $\epsilon$-ball around $\theta$:

$$\begin{aligned} V(\theta ,\epsilon ) \ge V_d \epsilon ^d m. \end{aligned}$$

To cover $\Theta$, the total volume of N such balls must be at least $V_\Theta$. So we have:

$$\begin{aligned} N (\epsilon , \Theta , ||\cdot ||) \le \frac{V_\Theta }{V_d \epsilon ^d m}. \end{aligned}$$

(A5)

Taking the logarithm:

$$\begin{aligned} \log N(\epsilon , \Theta , | \cdot | )&\le \log V_\Theta - \log \left( V_d \epsilon ^d m \right) \\&= \log V_\Theta - \log V_d - \log m - d \log \epsilon \\&= C - d \log \epsilon . \end{aligned}$$

This completes the proof of Lemma (A.1). $\square$

The inequality (A4) shows how covering numbers scale with dimension d and resolution $\epsilon$. Smaller $\epsilon$ or larger d requires more balls, thus more complex model classes. Conversely, a larger m reduces complexity by ensuring a certain “geometric rigidity” in the parameter space.

Appendix B

1.1 B.1: Bounding empirical Rademacher complexity

We now relate covering numbers to the empirical Rademacher complexity $\hat{{\mathcal {R}}}_N({\mathcal {F}}_\Theta )$ of the model class ${\mathcal {F}}_\Theta = \{f_{\theta ,p}:\theta \in \Theta \}$. The Rademacher complexity quantifies the model’s capacity to fit random noise and thus provides an upper bound on generalization error.

Lemma B.1

(Empirical Rademacher Complexity Bound) Let ${\mathcal {F}}_\Theta = \{f_{\theta ,p}: \theta \in \Theta \}$ be a model function class. Let ${\mathcal {D}} = \{x_i,y_i\}_{i=1}^N$ be a dataset of N samples drawn from the distribution P. Assume that $f_{\theta ,p}$ is bounded by the Lipschitz constant $L_f^p$ and the determinant of the quantum Fisher information matrix is at least m. Then the empirical Rademacher complexity of the model class ${\mathcal {F}}_\Theta$ is bounded by:

$$\begin{aligned} {\hat{ \mathcal R}}_N({\mathcal {F}}_\Theta ) \le \frac{6 \sqrt{\pi d} \exp \left( \frac{C^\prime }{d} \right) }{\sqrt{N}} \end{aligned}$$

(B6)

where $C^\prime = \log V_\Theta - \log V_d - \log m + d \log L_f^p$.

Proof

Given a dataset ${\mathcal {D}} = \{x_i,y_i\}_{i=1}^N$ the empirical Rademacher complexity of a model class ${\mathcal {F}}_\Theta$ is defined as:

$$\begin{aligned} {\hat{\mathcal R}}_N({\mathcal {F}}_\Theta ) = {\mathbb {E}}_{\sigma } \left[ \sup \limits _{f \in F_\Theta } \sum \limits _{i=1}^N \sigma _i f(x_i) \right] , \end{aligned}$$

(B7)

where $\sigma = \{\sigma _i\}_{i=1}^N$ are i.i.d Rademacher random variables taking values in $\{-1,+1\}$ with equal probabilities. Dudley’s entropy integral provides an upper bound on the empirical Rademacher complexity in terms of the covering numbers. If we let $\Vert \cdot \Vert _{2,{\mathcal {D}}}$ be the empirical $L_2$-norm over the sample ${\mathcal {D}}$, we have:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta ) \le \frac{12}{\sqrt{N}} \int _0^{\epsilon _{max}} \sqrt{\log N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}})} \, d\epsilon . \end{aligned}$$

(B8)

where $N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}})$ is the covering number of ${\mathcal {F}}_\Theta$ with respect empirical $L_2$ metric: $||f - g ||_{2,{\mathcal {D}}} = \left( \frac{1}{N} \sum \limits _{i=1}^N (f(x_i) - g(x_i)) \right) ^{\frac{1}{2}}$ and $\epsilon _{max} = \sup \limits _{f \in {\mathcal {F}}_\Theta } ||f||_{2,{\mathcal {D}}}$. Since $f_{\theta ,p}$ is bounded by the Lipschitz constant $L_f^p$, we can write: $||f_{\theta ,p} - f_{\theta ^{\prime },p}||_{2,{\mathcal {D}}} \le L_f^p ||\theta - \theta ^{\prime }||, \quad \forall \theta , \theta ^{\prime } \in \Theta$. This implies that an $\epsilon$-cover of $\Theta$ with respect to the Euclidean norm $||\cdot ||$ is also an $\epsilon L_f^p$-cover of ${\mathcal {F}}_\Theta$ with respect to the empirical $L_2$ metric. Using Lemma A.1, we can write:

$$\begin{aligned}&\log N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}}) \le \log N(\frac{\epsilon }{L_f^p}, \Theta , ||\cdot ||)\nonumber \\&\quad \le C - d \log \left( \frac{\epsilon }{L_f^p}\right) = C - d \left( \log \epsilon - \log L_f^p \right) \nonumber \\&\quad \implies \log N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}}) \le C^\prime - d \log \epsilon , \end{aligned}$$

(B9)

Substituting this result into Eq. (B8) gives:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta ) \le \frac{12}{\sqrt{N}} \int _0^{\epsilon _{max}} \sqrt{C^\prime - d \log \epsilon } \, d\epsilon . \end{aligned}$$

(B10)

We perform the elementary calculus in the following derivations and if the reader trusts our calculations, they may skip the following steps.

Let $t = - \log \epsilon$, so $\epsilon = e^{-t}$ and $d\epsilon = -e^{-t} dt$. The limit changes as: when $\epsilon = \epsilon _{max}, t = \log \epsilon _{max}$ and as $\epsilon$ decreases from $\epsilon _{max}$ to 0, t increases from $-\log \epsilon _{max}$ to 0. Thus:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta )&\le \frac{12}{\sqrt{N}} \int _{-\log \epsilon _{max}}^0 \sqrt{C^\prime - d \log e^{-t}} \, -e^{-t} dt \end{aligned}$$

Assuming $\epsilon _{max} = 1$ (since $f_{\theta ,p}(x)$ is bounded in [0,1]), we have $t=0$ at $\epsilon = 1$, so:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta )&\le \frac{12}{\sqrt{N}} \int _{0}^{\infty } \sqrt{C^\prime + d t} \, e^{-t} dt \end{aligned}$$

To make the derivation easier let us define:

$$\begin{aligned} I = \int _{0}^{\infty } \sqrt{C'+dt} e^{-t} dt \end{aligned}$$

(B11)

Let $S = C' +dt$, then $t = \frac{S-C'}{d}$ and $\frac{dt}{dS} = \frac{d}{dS} \left( \frac{S - C'}{d} \right) \implies dt = \frac{dS}{d}$ and $e^{-t} = e^{-\left( \frac{S-C'}{d}\right) } = e^{\frac{C'}{d}}e^{\frac{-S}{d}}$. Since, $C' \text { and } d$ are constants, we can write:

$$\begin{aligned} I = \int _{C'}^{\infty } \sqrt{S} e^{\frac{C'}{d}} e^{\frac{-S}{d}} \frac{dS}{d} = \frac{e^{\frac{C'}{d}}}{d} \int _{C'}^{\infty } \sqrt{S} e^{\frac{-S}{d}} dS \end{aligned}$$

Let $u = \frac{s}{d} \implies s = du \text { and } ds = d du$. Thus,

$$\begin{aligned} I = \frac{e^{\frac{C'}{d}}}{d} \int _{\frac{C'}{d}}^{\infty } \sqrt{du} e^{-u} d(du) = e^{\frac{C'}{d}} \sqrt{d} \int _{\frac{C'}{d}}^{\infty } u^{\frac{1}{2}} e^{-u} du \end{aligned}$$

This integral is the gamma function and can be written as:

$I = {e^{\frac{C'}{d}}}\sqrt{d} \Gamma \left( \frac{3}{2}, \frac{C'}{d}\right)$.

Since $\Gamma (s,a) \le \Gamma (s)$ for $a> 0$, and $\Gamma \left( \frac{3}{2}\right) \ frac{\sqrt{\pi }}{2}$ we can write:

$$\begin{aligned} I \le {e^{\frac{C'}{d}}}\sqrt{d} \frac{\sqrt{\pi }}{2} \end{aligned}$$

(B12)

Substituting this back into Eq. (B8), we get:

$$\begin{aligned} \hat{{\mathcal {R}}}_N({\mathcal {F}}_\Theta ) \le \frac{6\sqrt{\pi d} \quad e^{\left( \frac{C'}{d}\right) }}{\sqrt{n}} \end{aligned}$$

(B13)

This completes the proof of the Lemma (B.1). $\square$

This lemma encapsulates how geometry (via quantum FIM) and parameter space volume control Rademacher complexity. As N grows, the complexity term diminishes, indicating improved generalization potential.

Next, we discuss how the Rademacher complexity bound can be used to derive a generalization bound for quantum machine learning models.

1.2 B.2: Generalization bound

We now derive the generalization bound proposed in Theorem 5.1 in the main text. The bound follows from standard statistical learning theory and the Rademacher complexity result obtained above.

Theorem B.2

(Restatement of Theorem 5.1) Let $d,N \in {\mathbb {N}}, \delta \in (0,1)$, and consider a parameter space $\Theta \subset {\mathbb {R}}^d$. The quantum model class ${\mathcal {F}}_\Theta =\{f{\theta ,p}:\theta \in \Theta \}$ with noise parameter $p\in [0,1)$ satisfies $f_{\theta ,p}(x)=\eta (p)f_\theta (x)$ with $\eta (0)=1$. Assume:

The loss $l:{\mathcal {Y}}\times {\mathbb {R}}\rightarrow [0,1]$ is Lipschitz continuous in its second argument with constant $L \le 1$.
The model gradients are bounded as: $\Vert \nabla _\theta f_{\theta ,p}(x)\Vert \le L_f^p$.
The quantum FIM satisfies $\sqrt{\det ({\mathcal {F}}(\theta ))}\ge m>0$.

Define $C^{\prime } = \log V_\Theta - \log V_d - \log m + d \log L_f^p$.

Then with probability at least 1-$\delta$ over an i.i.d. sample $D=\{x_i,y_i\}_{i=1}^N$ of size N,

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + \frac{12\sqrt{\pi d}\exp (C^\prime /d)}{\sqrt{N}} + 3\sqrt{\frac{\log (2/\delta )}{2N}}, \end{aligned}$$

(B14)

uniformly for all $\theta \in \Theta$.

Proof

A standard result in learning theory states that with probability at least $1-\delta$:

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + 2\hat{{\mathcal {R}}}N({\mathcal {L}}) + 3\sqrt{\frac{\log (2/\delta )}{2N}}, \end{aligned}$$

(B15)

where ${\mathcal {L}}=\{(x,y)\mapsto l(y,f_{\theta ,p}(x)):\theta \in \Theta \}$.

Since $l(y,{\hat{y}})$ is Lipschitz with constant L and $f_{\theta ,p}(x)$ is bounded by $L_f^p$, it follows that ${\mathcal {L}}$ has Rademacher complexity at most $L\cdot L_f^p$. Applying Lemma B.1:

$$\begin{aligned} \hat{{\mathcal {R}}}_N({\mathcal {L}}) \le L L_f^p \frac{6\sqrt{\pi d}\text { exp}{\left( C^{\prime }/d\right) }}{\sqrt{N}}. \end{aligned}$$

For $L\le 1$, this simplifies directly to:

$$\begin{aligned} \hat{{\mathcal {R}}}_N({\mathcal {L}}) \le \frac{6\sqrt{\pi d}\exp{\left( C^{\prime }/d\right) }}{\sqrt{N}}. \end{aligned}$$

Substitute back into the generalization inequality:

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + 2\cdot \frac{6\sqrt{\pi d}\text { exp}{\left( C^{\prime }/d\right) }}{\sqrt{N}} + 3\sqrt{\frac{\log (2/\delta )}{2N}}. \end{aligned}$$

This gives:

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + \frac{12\sqrt{\pi d}\text { exp}{\left( C^{\prime }/d\right) }}{\sqrt{N}} + 3\sqrt{\frac{\log (2/\delta )}{2N}}, \end{aligned}$$

matching the statement of Theorem 5.1. $\square$

This completes the proof of the generalization bound for Theorem 5.1. The interplay between Fisher information geometry, covering numbers, and Rademacher complexity provides a path from parameter space properties to explicit generalization guarantees. In practice, after training, restricting to local regions or reducing dimension to the effective dimension often yields sharper bounds.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khanal, B., Rivas, P. Data-dependent generalization bounds for parameterized quantum models under noise. J Supercomput 81, 611 (2025). https://doi.org/10.1007/s11227-025-06966-9

Download citation

Accepted: 19 January 2025
Published: 13 March 2025
DOI: https://doi.org/10.1007/s11227-025-06966-9

Keywords

Profiles

Pablo Rivas View author profile

Part of a collection:

Section - Quantum Computing

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-dependent generalization bounds for parameterized quantum models under noise

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Understanding quantum machine learning also requires rethinking generalization

Out-of-distribution generalization for learning quantum dynamics

Generalization in quantum machine learning from few training data

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

1.1 A.1: Parameter space geometry

1.2 A.2: Covering numbers of a parameter space

Lemma A.1

Proof

Appendix B

1.1 B.1: Bounding empirical Rademacher complexity

Lemma B.1

Proof

1.2 B.2: Generalization bound

Theorem B.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now