Skip to main content

Advertisement

Log in

Data-dependent generalization bounds for parameterized quantum models under noise

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Quantum machine learning offers a transformative approach to solving complex problems, but the inherent noise hinders its practical implementation in near-term quantum devices. This obstacle makes it challenging to understand the generalization capabilities of quantum circuit models. Designing robust quantum machine learning models under noise requires a principled understanding of complexity and generalization, extending beyond classical capacity measures. This study investigates the generalization properties of parameterized quantum machine learning models under the influence of noise. We present a data-dependent generalization bound grounded in the quantum Fisher information matrix. We leverage statistical learning theory to relate the parameter space volumes and training sizes to estimate the generalization capability of the trained model. By integrating local parameter neighborhoods and effective dimensions defined through quantum Fisher information matrix eigenvalues, we provide a structured characterization of complexity in quantum models. We analyze the tightness of the bound and discuss the trade-off between model expressiveness and generalization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Schuld M, Petruccione F (2021) Machine Learning with Quantum Computers, 2nd edn. Quantum Science and Technology. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-030-83098-4

  2. Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195–202

    Article  MATH  Google Scholar 

  3. Wang Y, Liu J (2024) A comprehensive review of quantum machine learning: from nisq to fault tolerance. arXiv preprint arXiv:2401.11351

  4. Preskill J (2018) Quantum computing in the nisq era and beyond. Quantum 2:79

    Article  MATH  Google Scholar 

  5. Torlai G, Melko RG (2020) Machine-learning quantum states in the nisq era. Annu Rev Condens Matter Phys 11(1):325–344

    Article  MATH  Google Scholar 

  6. Khanal B, Rivas P, Sanjel A, Sooksatra K, Quevedo E, Rodriguez A (2024) Generalization error bound for quantum machine learning in nisq era-a survey. Quant Mach Intell 6(2):1–20

    Google Scholar 

  7. Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212

    Article  Google Scholar 

  8. Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nat Rev Phys 3(9):625–644

    Article  Google Scholar 

  9. Lloyd S, Rebentrost P, Mohseni M (2013) Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411

  10. Aïmeur E, Brassard G, Gambs S (2007) Quantum clustering algorithms. In: Proceedings of the 24th International Conference on Machine Learning, pp 1–8

  11. Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning. Second Edition. https://mitpress.ublish.com/ebook/foundations-of-machine-learning--2-preview/7093/Cover

  12. Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data: a short course

  13. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms

  14. Emami M, Sahraee-Ardakan M, Pandit P, Rangan S, Fletcher A (2020) Generalization error of generalized linear models in high dimensions. In: International Conference on Machine Learning, pp 2892–2901. PMLR

  15. Jakubovitz D, Giryes R, Rodrigues MR (2019) Generalization error in deep learning. In: Compressed Sensing and Its Applications: Third International MATHEON Conference 2017, pp 153–193. Springer

  16. Nadeau C, Bengio Y (1999) Inference for the generalization error. Adv Neural Inform Process Syst 12

  17. Banchi L, Pereira J, Pirandola S (2021) Generalization in quantum machine learning: a quantum information standpoint. PRX Quant 2(4):040321

    Article  MATH  Google Scholar 

  18. Gil-Fuster E, Eisert J, Bravo-Prieto C (2023) Understanding quantum machine learning also requires rethinking generalization. arXiv preprint arXiv:2306.13461

  19. Caro MC, Gil-Fuster E, Meyer JJ, Eisert J, Sweke R (2021) Encoding-dependent generalization bounds for parametrized quantum circuits. Quantum 5:582

    Article  MATH  Google Scholar 

  20. Khanal B, Rivas P (2023) Evaluating the impact of noise on variational quantum circuits in nisq era devices. In: 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)

  21. Caro MC, Huang H-Y, Cerezo M, Sharma K, Sornborger A, Cincio L, Coles PJ (2022) Generalization in quantum machine learning from few training data. Nat Commun 13(1):4919

    Article  MATH  Google Scholar 

  22. Caro MC, Gur T, Rouzé C, Franca DS, Subramanian S (2024) Information-theoretic generalization bounds for learning from quantum data. In: The Thirty Seventh Annual Conference on Learning Theory, pp 775–839. PMLR

  23. Haug T, Kim M (2023) Generalization with quantum geometry for learning unitaries. arXiv preprint arXiv:2303.13462

  24. Canatar A, Peters E, Pehlevan C, Wild SM, Shaydulin R (2022) Bandwidth enables generalization in quantum kernel models. arXiv preprint arXiv:2206.06686

  25. Caro MC, Huang H-Y, Ezzell N, Gibbs J, Sornborger AT, Cincio L, Coles PJ, Holmes Z (2023) Out-of-distribution generalization for learning quantum dynamics. Nat Commun 14(1):3751

    Article  Google Scholar 

  26. Abbas A, Sutter D, Zoufal C, Lucchi A, Figalli A, Woerner S (2021) The power of quantum neural networks. Nat Comput Sci 1(6):403–409

    Article  Google Scholar 

  27. Hur T, Park DK (2024) Understanding generalization in quantum machine learning with margins. arXiv preprint arXiv:2411.06919

  28. Martinis JM, Nam S, Aumentado J, Lang K, Urbina C (2003) Decoherence of a superconducting qubit due to bias noise. Phys Rev B 67(9):094510

    Article  Google Scholar 

  29. Wang S, Fontana E, Cerezo M, Sharma K, Sone A, Cincio L, Coles PJ (2021) Noise-induced barren plateaus in variational quantum algorithms. Nat Commun 12(1):6961

    Article  MATH  Google Scholar 

  30. Heyraud V, Li Z, Denis Z, Le Boité A, Ciuti C (2022) Noisy quantum kernel machines. Phys Rev A 106(5):052421

    Article  MathSciNet  Google Scholar 

  31. Shor PW (1995) Scheme for reducing decoherence in quantum computer memory. Phys Rev A 52(4):2493

    Article  MATH  Google Scholar 

  32. Khanal B, Rivas P (2024) Learning robust observable to address noise in quantum machine learning. arXiv preprint arXiv:2409.07632

  33. Shaib A, Naim MH, Fouda ME, Kanj R, Kurdahi F (2023) Efficient noise mitigation technique for quantum computing. Sci Rep 13(1):3912

    Article  Google Scholar 

  34. Ferracin S, Hashim A, Ville J-L, Naik R, Carignan-Dugas A, Qassim H, Morvan A, Santiago DI, Siddiqi I, Wallman JJ (2024) Efficiently improving the performance of noisy quantum computers. Quantum 8:1410

    Article  Google Scholar 

  35. Qi J, Yang C-HH, Chen P-Y, Hsieh M-H (2023) Theoretical error performance analysis for variational quantum circuit based functional regression. npj Quant Inform 9(1):4

    Article  MATH  Google Scholar 

  36. Huang H-Y, Broughton M, Mohseni M, Babbush R, Boixo S, Neven H, McClean JR (2021) Power of data in quantum machine learning. Nat Commun 12(1):2631

    Article  MATH  Google Scholar 

  37. Liu Y, Arunachalam S, Temme K (2021) A rigorous and robust quantum speed-up in supervised machine learning. Nat Phys 17(9):1013–1017

    Article  MATH  Google Scholar 

  38. Du Y, Tu Z, Yuan X, Tao D (2022) Efficient measure for the expressivity of variational quantum algorithms. Phys Rev Lett 128(8):080506

    Article  MathSciNet  MATH  Google Scholar 

  39. Gentinetta G, Thomsen A, Sutter D, Woerner S (2024) The complexity of quantum support vector machines. Quantum 8:1225

    Article  Google Scholar 

  40. Kübler J, Buchholz S, Schölkopf B (2021) The inductive bias of quantum kernels. Adv Neural Inf Process Syst 34:12661–12673

    MATH  Google Scholar 

  41. Thanasilp S, Wang S, Cerezo M, Holmes Z (2024) Exponential concentration in quantum kernel methods. Nat Commun 15(1):5200

    Article  MATH  Google Scholar 

  42. Czarnik P, Arrasmith A, Coles PJ, Cincio L (2021) Error mitigation with clifford quantum-circuit data. Quantum 5:592

    Article  MATH  Google Scholar 

  43. Cerezo M, Verdon G, Huang H-Y, Cincio L, Coles PJ (2022) Challenges and opportunities in quantum machine learning. Nat Comput Sci 2(9):567–576

    Article  MATH  Google Scholar 

  44. Holmes Z, Sharma K, Cerezo M, Coles PJ (2022) Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quant 3(1):010313

    Article  Google Scholar 

  45. Zhao C, Gao X-S (2021) Analyzing the barren plateau phenomenon in training quantum neural networks with the zx-calculus. Quantum 5:466

    Article  MATH  Google Scholar 

  46. McClean JR, Boixo S, Smelyanskiy VN, Babbush R, Neven H (2018) Barren plateaus in quantum neural network training landscapes. Nat Commun 9(1):4812

    Article  MATH  Google Scholar 

  47. Arrasmith A, Cerezo M, Czarnik P, Cincio L, Coles PJ (2021) Effect of barren plateaus on gradient-free optimization. Quantum 5:558

    Article  Google Scholar 

  48. Nielsen MA, Chuang I (2002) Quantum computation and quantum information. American Association of Physics Teachers

  49. Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. The J Mach Learn Res 11:2635–2670

    MathSciNet  MATH  Google Scholar 

  50. Valle-Pérez G, Louis AA (2020) Generalization bounds for deep learning. arXiv preprint arXiv:2012.04115

  51. Johansson FD, Shalit U, Kallus N, Sontag D (2022) Generalization bounds and representation learning for estimation of potential outcomes and causal effects. J Mach Learn Res 23(166):1–50

    MathSciNet  MATH  Google Scholar 

  52. Pape AD, Kurtz KJ, Sayama H (2015) Complexity measures and concept learning. J Math Psychol 64:66–75

    Article  MathSciNet  MATH  Google Scholar 

  53. Bialek W, Nemenman I, Tishby N (2001) Predictability, complexity, and learning. Neural Comput 13(11):2409–2463

    Article  MATH  Google Scholar 

  54. Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models. Quant Sci Technol 4(4):043001

    Article  Google Scholar 

  55. Holevo AS (2019) Quantum Systems, Channels, Information: a Mathematical Introduction. Walter de Gruyter GmbH & Co KG

  56. Guţă M, Kahn J (2006) Local asymptotic normality for qubit states. Phys Rev A-Atom Mol Opt Phys 73(5):052108

    Article  MathSciNet  MATH  Google Scholar 

  57. Magesan E, Gambetta JM, Emerson J (2011) Scalable and robust randomized benchmarking of quantum processes. Phys Rev Lett 106(18):180504

    Article  MATH  Google Scholar 

  58. Preskill J (1998) Lecture notes for physics 229: quantum information and computation. Calif Inst Technol 16(1):1–8

    MATH  Google Scholar 

  59. Khanal B, Rivas P (2024) A modified depolarization approach for efficient quantum machine learning. Mathematics 12(9):1385

    Article  MATH  Google Scholar 

  60. Haug T, Kim M (2024) Generalization of quantum machine learning models using quantum Fisher information metric. Phys Rev Lett 133(5):050603

    Article  MathSciNet  MATH  Google Scholar 

  61. Kasatkin V, Mozgunov E, Ezzell N, Lidar D (2024) Detecting quantum and classical phase transitions via unsupervised machine learning of the Fisher information metric. arXiv preprint arXiv:2408.03418

  62. Bharti K (2021) Fisher information: a crucial tool for nisq research. Quant Views 5:61

    Article  MATH  Google Scholar 

  63. Meyer JJ (2021) Fisher information in noisy intermediate-scale quantum applications. Quantum 5:539

    Article  MATH  Google Scholar 

  64. Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276

    Article  MATH  Google Scholar 

  65. Haug T, Bharti K, Kim M (2021) Capacity and quantum geometry of parametrized quantum circuits. PRX Quant 2(4):040309

    Article  MATH  Google Scholar 

  66. Martens J (2020) New insights and perspectives on the natural gradient method. J Mach Learn Res 21(146):1–76

    MathSciNet  MATH  Google Scholar 

  67. Stokes J, Izaac J, Killoran N, Carleo G (2020) Quantum natural gradient. Quantum 4:269

    Article  MATH  Google Scholar 

  68. Baumgratz T, Nüßeler A, Cramer M, Plenio MB (2013) A scalable maximum likelihood method for quantum state tomography. New J Phys 15(12):125004

    Article  MATH  Google Scholar 

  69. Braunstein SL, Caves CM (1994) Statistical distance and the geometry of quantum states. Phys Rev Lett 72(22):3439

    Article  MathSciNet  MATH  Google Scholar 

  70. Liu J, Yuan H, Lu X-M, Wang X (2020) Quantum Fisher information matrix and multiparameter estimation. J Phys A: Math Theor 53(2):023001

    Article  MathSciNet  MATH  Google Scholar 

  71. Fujiwara A (2001) Quantum channel identification problem. Phys Rev A 63(4):042304

    Article  MATH  Google Scholar 

  72. Petz D, Ghinea C (2011) Introduction to quantum Fisher information. In: Quantum Probability and Related Topics, pp 261–281. World Scientific

  73. Yamamoto N (2019) On the natural gradient for variational quantum eigensolver. arXiv preprint arXiv:1909.05074

  74. Liu J, Xiong H-N, Song F, Wang X (2014) Fidelity susceptibility and quantum Fisher information for density operators with arbitrary ranks. Physica A 410:167–173

    Article  MathSciNet  MATH  Google Scholar 

  75. Abbas A, Sutter D, Figalli A, Woerner S (2021) Effective dimension of machine learning models. arXiv preprint arXiv:2112.04807

  76. Schuld M, Sweke R, Meyer JJ (2021) Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys Rev A 103(3):032430

    Article  MathSciNet  MATH  Google Scholar 

  77. Vapnik V (1998) Statistical learning theory. John Wiley & Sons google schola 2:831–842

  78. Helstrom CW (1969) Quantum detection and estimation theory. J Stat Phys 1:231–252

    Article  MathSciNet  MATH  Google Scholar 

  79. Paris MG (2009) Quantum estimation for quantum technology. Int J Quant Inform 7(supp01):125–137

    Article  MATH  Google Scholar 

  80. Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482

    MathSciNet  MATH  Google Scholar 

  81. Larocca M, Ju N, García-Martín D, Coles PJ, Cerezo M (2023) Theory of overparametrization in quantum neural networks. Nat Comput Sci 3(6):542–551

    Article  Google Scholar 

  82. Ciliberto C, Herbster M, Ialongo AD, Pontil M, Rocchetto A, Severini S, Wossnig L (2018) Quantum machine learning: a classical perspective. Proc R Soc A: Math Phys Eng Sci 474(2209):20170551

    Article  MathSciNet  MATH  Google Scholar 

  83. Schuld M, Killoran N (2019) Quantum machine learning in feature Hilbert spaces. Phys Rev Lett 122(4):040504

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research was executed while P.R. and B.K. were funded by the National Science Foundation under grants NSF CISE-CNS Award 2136961 and 2210091.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Rivas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 A.1: Parameter space geometry

In the main text, we introduced the Fisher information matrix (FIM) \({\mathcal {F}}(\theta )\) as a Riemannian metric on the parameter space \(\Theta \subset {\mathbb {R}}^d\). This induces a natural geometric structure on \(\Theta\): the geodesic distance between \(\theta\) and \(\theta ^{\prime }\) measures how different these parameter points are in terms of their influence on the model’s predictions, rather than just their Euclidean distance.

For a parameterized model and its associated FIM, we consider the volume element induced by \({\mathcal {F}}(\theta )\):

$$\begin{aligned} dV(\theta ) = \sqrt{\det ({\mathcal {F}}(\theta ))}\, d\theta ^1 \cdots d\theta ^d. \end{aligned}$$
(A1)

A geodesic ball \(B(\theta ,\epsilon )\) of radius \(\epsilon\) centered at \(\theta\) has volume

$$\begin{aligned} V(\theta ,\epsilon ) = \int _{B(\theta ,\epsilon )} dV(\theta ^{\prime }) = \int _{B(\theta ,\epsilon )} \sqrt{\det ({\mathcal {F}}(\theta ^{\prime }))}\, d\theta ^{\prime }. \end{aligned}$$
(A2)

If \(\sqrt{\det ({\mathcal {F}}(\theta ))}\) is bounded by a positive constant \(m>0\), then for small \(\epsilon\), we can approximate:

$$\begin{aligned} V(\theta ,\epsilon ) \approx V_d \epsilon ^d \sqrt{\det ({\mathcal {F}}(\theta ))} \ge V_d \epsilon ^d m, \end{aligned}$$
(A3)

where

$$\begin{aligned} V_d = \frac{\pi ^{d/2}}{\Gamma \left( \tfrac{d}{2} + 1\right) } \end{aligned}$$

is the volume of the unit ball in \({\mathbb {R}}^d\). Intuitively, a lower bound on \(\sqrt{\det ({\mathcal {F}}(\theta ))}\) ensures that geodesic balls are not “too small,” implying fewer truly distinct parameter configurations at \(\epsilon\). This geometric insight underpins the covering number bound we discuss next.

1.2 A.2: Covering numbers of a parameter space

The covering number of a parameter space \(\Theta\) measures how many balls of radius \(\epsilon\) are required to cover the entire space. Since the FIM provides a natural measure of distinguishability, a lower bound on \(\sqrt{\det ({\mathcal {F}}(\theta ))}\) relates volumes of small balls to parameter distinguishability.

Lemma A.1

(Covering Number and Volume) Let \(\Theta \subset {\mathbb {R}}^d\) be a compact parameter space with volume \(V_\Theta\). Assume that the determinant of the Fisher Information Matrix \({\mathcal {F}}(\theta )\) satisfies:

$$\begin{aligned} \sqrt{\det ({\mathcal {F}}(\theta ))} \ge m> 0, \quad \forall \theta \in \Theta . \end{aligned}$$

Then, for any \(\epsilon> 0\), the covering number \(N(\epsilon ,\Theta ,||\cdot ||)\) of \(\Theta\) with respect to the Euclidean norm \(||\cdot ||\) satisfies:

$$\begin{aligned} \log N(\epsilon ,\Theta ,||\cdot ||) \le C - d \log \epsilon , \end{aligned}$$
(A4)

where \(C = \log V_\Theta - \log V_d - \log m\) and \(V_d = \frac{\pi ^{\frac{d}{2}}}{\Gamma \left( \frac{d}{2} + 1\right) }\) is the volume of a unit ball in \({\mathbb {R}}^d\).

Proof

Consider an \(\epsilon\)-ball around \(\theta\):

$$\begin{aligned} V(\theta ,\epsilon ) \ge V_d \epsilon ^d m. \end{aligned}$$

To cover \(\Theta\), the total volume of N such balls must be at least \(V_\Theta\). So we have:

$$\begin{aligned} N (\epsilon , \Theta , ||\cdot ||) \le \frac{V_\Theta }{V_d \epsilon ^d m}. \end{aligned}$$
(A5)

Taking the logarithm:

$$\begin{aligned} \log N(\epsilon , \Theta , | \cdot | )&\le \log V_\Theta - \log \left( V_d \epsilon ^d m \right) \\&= \log V_\Theta - \log V_d - \log m - d \log \epsilon \\&= C - d \log \epsilon . \end{aligned}$$

This completes the proof of Lemma (A.1). \(\square\)

The inequality (A4) shows how covering numbers scale with dimension d and resolution \(\epsilon\). Smaller \(\epsilon\) or larger d requires more balls, thus more complex model classes. Conversely, a larger m reduces complexity by ensuring a certain “geometric rigidity” in the parameter space.

Appendix B

1.1 B.1: Bounding empirical Rademacher complexity

We now relate covering numbers to the empirical Rademacher complexity \(\hat{{\mathcal {R}}}_N({\mathcal {F}}_\Theta )\) of the model class \({\mathcal {F}}_\Theta = \{f_{\theta ,p}:\theta \in \Theta \}\). The Rademacher complexity quantifies the model’s capacity to fit random noise and thus provides an upper bound on generalization error.

Lemma B.1

(Empirical Rademacher Complexity Bound) Let \({\mathcal {F}}_\Theta = \{f_{\theta ,p}: \theta \in \Theta \}\) be a model function class. Let \({\mathcal {D}} = \{x_i,y_i\}_{i=1}^N\) be a dataset of N samples drawn from the distribution P. Assume that \(f_{\theta ,p}\) is bounded by the Lipschitz constant \(L_f^p\) and the determinant of the quantum Fisher information matrix is at least m. Then the empirical Rademacher complexity of the model class \({\mathcal {F}}_\Theta\) is bounded by:

$$\begin{aligned} {\hat{ \mathcal R}}_N({\mathcal {F}}_\Theta ) \le \frac{6 \sqrt{\pi d} \exp \left( \frac{C^\prime }{d} \right) }{\sqrt{N}} \end{aligned}$$
(B6)

where \(C^\prime = \log V_\Theta - \log V_d - \log m + d \log L_f^p\).

Proof

Given a dataset \({\mathcal {D}} = \{x_i,y_i\}_{i=1}^N\) the empirical Rademacher complexity of a model class \({\mathcal {F}}_\Theta\) is defined as:

$$\begin{aligned} {\hat{\mathcal R}}_N({\mathcal {F}}_\Theta ) = {\mathbb {E}}_{\sigma } \left[ \sup \limits _{f \in F_\Theta } \sum \limits _{i=1}^N \sigma _i f(x_i) \right] , \end{aligned}$$
(B7)

where \(\sigma = \{\sigma _i\}_{i=1}^N\) are i.i.d Rademacher random variables taking values in \(\{-1,+1\}\) with equal probabilities. Dudley’s entropy integral provides an upper bound on the empirical Rademacher complexity in terms of the covering numbers. If we let \(\Vert \cdot \Vert _{2,{\mathcal {D}}}\) be the empirical \(L_2\)-norm over the sample \({\mathcal {D}}\), we have:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta ) \le \frac{12}{\sqrt{N}} \int _0^{\epsilon _{max}} \sqrt{\log N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}})} \, d\epsilon . \end{aligned}$$
(B8)

where \(N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}})\) is the covering number of \({\mathcal {F}}_\Theta\) with respect empirical \(L_2\) metric: \(||f - g ||_{2,{\mathcal {D}}} = \left( \frac{1}{N} \sum \limits _{i=1}^N (f(x_i) - g(x_i)) \right) ^{\frac{1}{2}}\) and \(\epsilon _{max} = \sup \limits _{f \in {\mathcal {F}}_\Theta } ||f||_{2,{\mathcal {D}}}\). Since \(f_{\theta ,p}\) is bounded by the Lipschitz constant \(L_f^p\), we can write: \(||f_{\theta ,p} - f_{\theta ^{\prime },p}||_{2,{\mathcal {D}}} \le L_f^p ||\theta - \theta ^{\prime }||, \quad \forall \theta , \theta ^{\prime } \in \Theta\). This implies that an \(\epsilon\)-cover of \(\Theta\) with respect to the Euclidean norm \(||\cdot ||\) is also an \(\epsilon L_f^p\)-cover of \({\mathcal {F}}_\Theta\) with respect to the empirical \(L_2\) metric. Using Lemma A.1, we can write:

$$\begin{aligned}&\log N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}}) \le \log N(\frac{\epsilon }{L_f^p}, \Theta , ||\cdot ||)\nonumber \\&\quad \le C - d \log \left( \frac{\epsilon }{L_f^p}\right) = C - d \left( \log \epsilon - \log L_f^p \right) \nonumber \\&\quad \implies \log N(\epsilon , {\mathcal {F}}_\Theta , ||\cdot ||_{2,{\mathcal {D}}}) \le C^\prime - d \log \epsilon , \end{aligned}$$
(B9)

Substituting this result into Eq. (B8) gives:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta ) \le \frac{12}{\sqrt{N}} \int _0^{\epsilon _{max}} \sqrt{C^\prime - d \log \epsilon } \, d\epsilon . \end{aligned}$$
(B10)

We perform the elementary calculus in the following derivations and if the reader trusts our calculations, they may skip the following steps.

Let \(t = - \log \epsilon\), so \(\epsilon = e^{-t}\) and \(d\epsilon = -e^{-t} dt\). The limit changes as: when \(\epsilon = \epsilon _{max}, t = \log \epsilon _{max}\) and as \(\epsilon\) decreases from \(\epsilon _{max}\) to 0, t increases from \(-\log \epsilon _{max}\) to 0. Thus:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta )&\le \frac{12}{\sqrt{N}} \int _{-\log \epsilon _{max}}^0 \sqrt{C^\prime - d \log e^{-t}} \, -e^{-t} dt \end{aligned}$$

Assuming \(\epsilon _{max} = 1\) (since \(f_{\theta ,p}(x)\) is bounded in [0,1]), we have \(t=0\) at \(\epsilon = 1\), so:

$$\begin{aligned} {\hat{R}}_N({\mathcal {F}}_\Theta )&\le \frac{12}{\sqrt{N}} \int _{0}^{\infty } \sqrt{C^\prime + d t} \, e^{-t} dt \end{aligned}$$

To make the derivation easier let us define:

$$\begin{aligned} I = \int _{0}^{\infty } \sqrt{C'+dt} e^{-t} dt \end{aligned}$$
(B11)

Let \(S = C' +dt\), then \(t = \frac{S-C'}{d}\) and \(\frac{dt}{dS} = \frac{d}{dS} \left( \frac{S - C'}{d} \right) \implies dt = \frac{dS}{d}\) and \(e^{-t} = e^{-\left( \frac{S-C'}{d}\right) } = e^{\frac{C'}{d}}e^{\frac{-S}{d}}\). Since, \(C' \text { and } d\) are constants, we can write:

$$\begin{aligned} I = \int _{C'}^{\infty } \sqrt{S} e^{\frac{C'}{d}} e^{\frac{-S}{d}} \frac{dS}{d} = \frac{e^{\frac{C'}{d}}}{d} \int _{C'}^{\infty } \sqrt{S} e^{\frac{-S}{d}} dS \end{aligned}$$

Let \(u = \frac{s}{d} \implies s = du \text { and } ds = d du\). Thus,

$$\begin{aligned} I = \frac{e^{\frac{C'}{d}}}{d} \int _{\frac{C'}{d}}^{\infty } \sqrt{du} e^{-u} d(du) = e^{\frac{C'}{d}} \sqrt{d} \int _{\frac{C'}{d}}^{\infty } u^{\frac{1}{2}} e^{-u} du \end{aligned}$$

This integral is the gamma function and can be written as:

\(I = {e^{\frac{C'}{d}}}\sqrt{d} \Gamma \left( \frac{3}{2}, \frac{C'}{d}\right)\).

Since \(\Gamma (s,a) \le \Gamma (s)\) for \(a> 0\), and \(\Gamma \left( \frac{3}{2}\right) \ frac{\sqrt{\pi }}{2}\) we can write:

$$\begin{aligned} I \le {e^{\frac{C'}{d}}}\sqrt{d} \frac{\sqrt{\pi }}{2} \end{aligned}$$
(B12)

Substituting this back into Eq. (B8), we get:

$$\begin{aligned} \hat{{\mathcal {R}}}_N({\mathcal {F}}_\Theta ) \le \frac{6\sqrt{\pi d} \quad e^{\left( \frac{C'}{d}\right) }}{\sqrt{n}} \end{aligned}$$
(B13)

This completes the proof of the Lemma (B.1). \(\square\)

This lemma encapsulates how geometry (via quantum FIM) and parameter space volume control Rademacher complexity. As N grows, the complexity term diminishes, indicating improved generalization potential.

Next, we discuss how the Rademacher complexity bound can be used to derive a generalization bound for quantum machine learning models.

1.2 B.2: Generalization bound

We now derive the generalization bound proposed in Theorem 5.1 in the main text. The bound follows from standard statistical learning theory and the Rademacher complexity result obtained above.

Theorem B.2

(Restatement of Theorem 5.1) Let \(d,N \in {\mathbb {N}}, \delta \in (0,1)\), and consider a parameter space \(\Theta \subset {\mathbb {R}}^d\). The quantum model class \({\mathcal {F}}_\Theta =\{f{\theta ,p}:\theta \in \Theta \}\) with noise parameter \(p\in [0,1)\) satisfies \(f_{\theta ,p}(x)=\eta (p)f_\theta (x)\) with \(\eta (0)=1\). Assume:

  • The loss \(l:{\mathcal {Y}}\times {\mathbb {R}}\rightarrow [0,1]\) is Lipschitz continuous in its second argument with constant \(L \le 1\).

  • The model gradients are bounded as: \(\Vert \nabla _\theta f_{\theta ,p}(x)\Vert \le L_f^p\).

  • The quantum FIM satisfies \(\sqrt{\det ({\mathcal {F}}(\theta ))}\ge m>0\).

Define \(C^{\prime } = \log V_\Theta - \log V_d - \log m + d \log L_f^p\).

Then with probability at least 1-\(\delta\) over an i.i.d. sample \(D=\{x_i,y_i\}_{i=1}^N\) of size N,

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + \frac{12\sqrt{\pi d}\exp (C^\prime /d)}{\sqrt{N}} + 3\sqrt{\frac{\log (2/\delta )}{2N}}, \end{aligned}$$
(B14)

uniformly for all \(\theta \in \Theta\).

Proof

A standard result in learning theory states that with probability at least \(1-\delta\):

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + 2\hat{{\mathcal {R}}}N({\mathcal {L}}) + 3\sqrt{\frac{\log (2/\delta )}{2N}}, \end{aligned}$$
(B15)

where \({\mathcal {L}}=\{(x,y)\mapsto l(y,f_{\theta ,p}(x)):\theta \in \Theta \}\).

Since \(l(y,{\hat{y}})\) is Lipschitz with constant L and \(f_{\theta ,p}(x)\) is bounded by \(L_f^p\), it follows that \({\mathcal {L}}\) has Rademacher complexity at most \(L\cdot L_f^p\). Applying Lemma B.1:

$$\begin{aligned} \hat{{\mathcal {R}}}_N({\mathcal {L}}) \le L L_f^p \frac{6\sqrt{\pi d}\text { exp}{\left( C^{\prime }/d\right) }}{\sqrt{N}}. \end{aligned}$$

For \(L\le 1\), this simplifies directly to:

$$\begin{aligned} \hat{{\mathcal {R}}}_N({\mathcal {L}}) \le \frac{6\sqrt{\pi d}\exp{\left( C^{\prime }/d\right) }}{\sqrt{N}}. \end{aligned}$$

Substitute back into the generalization inequality:

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + 2\cdot \frac{6\sqrt{\pi d}\text { exp}{\left( C^{\prime }/d\right) }}{\sqrt{N}} + 3\sqrt{\frac{\log (2/\delta )}{2N}}. \end{aligned}$$

This gives:

$$\begin{aligned} R(\theta ) \le {\hat{R}}_N(\theta ) + \frac{12\sqrt{\pi d}\text { exp}{\left( C^{\prime }/d\right) }}{\sqrt{N}} + 3\sqrt{\frac{\log (2/\delta )}{2N}}, \end{aligned}$$

matching the statement of Theorem 5.1. \(\square\)

This completes the proof of the generalization bound for Theorem 5.1. The interplay between Fisher information geometry, covering numbers, and Rademacher complexity provides a path from parameter space properties to explicit generalization guarantees. In practice, after training, restricting to local regions or reducing dimension to the effective dimension often yields sharper bounds.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khanal, B., Rivas, P. Data-dependent generalization bounds for parameterized quantum models under noise. J Supercomput 81, 611 (2025). https://doi.org/10.1007/s11227-025-06966-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-06966-9

Keywords

Profiles

  1. Pablo Rivas