On the trade-off between number of examples and precision of supervision in machine learning problems

Gnecco, Giorgio; Nutarelli, Federico

doi:10.1007/s11590-019-01486-x

On the trade-off between number of examples and precision of supervision in machine learning problems

Original Paper
Published: 30 September 2019

Volume 15, pages 1711–1733, (2021)
Cite this article

Optimization Letters Aims and scope Submit manuscript

350 Accesses
1 Altmetric
Explore all metrics

Abstract

We investigate linear regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time for supervision, we optimize the trade-off between the number of examples and their precision (the reciprocal of the conditional variance of the output), by formulating and solving suitable optimization problems, based on large-sample approximations of the outputs of the classical ordinary least squares and weighted least squares regression algorithms. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples. We conclude presenting numerical results validating the theory, and discussing extensions of the proposed framework to other optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Trade-Off Between Number of Examples and Precision of Supervision in Regression

On the Optimal Generalization Error for Weighted Least Squares Under Variable Individual Supervision Times

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

Notes

https://www.advanpix.com.
Having test examples independent from the training set is a very common assumption in machine learning, made to get a fair estimate of the generalization capability of the trained model (e.g., for the specific case considered in the work, of the performance index reported later in Eq. (4). Replacement of the test examples with the training ones to get such an estimate would produce misleading results in case of overfitting of the trained learning machine on the training set.
We use the expressions “decreasing returns of scale”, “constant returns of scale”, and “increasing returns of scale” when modeling the precision of supervision as a function of the computational time per example to refer to the case in which such precision is modeled, respectively, as a strictly concave and increasing function, a linear increasing function, and a strictly convex and increasing function of the computational time per example. Given the connections between the subject of the paper and econometrics, the former terminology has been used to provide an interpretation of the measurement error model that can be of interest to readers with a background in economics.
This topic is clearly in line with the call for papers of the special issue “Optimization Models and Solution Techniques” of the journal Optimization Letters, from which we report the following paragraph: “Recent advances in information technology enable the treatment of big data volumes, devising effective solution methods toward better decisions”.
In this interaction, also optimization takes an important role, as justified by the large number of papers dealing with the application of optimization techniques to machine learning, published in journals such as Operations Research, Mathematics of Operations Research, and Optimization Letters. As a further example, the International Annual Conference on machine Learning, Optimization and Data science (LOD) is specifically devoted to the interaction between optimization and machine learning.
Both ordinary least squares and weighted least squares (considered later in this section) implement optimal solutions of related unconstrained convex quadratic optimization problems; see, e.g., [15, Chapters 2 and 18].
Although in the paper it is assumed that all the training examples are available simultaneously to the learning machine (e.g., at the end of the time interval used for all their supervisions), the analysis could be extended to online learning, by replacing T with the current time, and applying results such as [7, Proposition 5].
I.e., for every $\varepsilon >0$, $\mathrm{Prob} \left( \left\| \frac{X_{N(\varDelta T)}' X_{N(\varDelta T)}}{N(\varDelta T)} - \mathbb {E} \left\{ \underline{x} \, \underline{x}'\right\} \right\| > \varepsilon \right) $ (where $\Vert \cdot \Vert $ is an arbitrary matrix norm) tends to 0 as $N(\varDelta T)$ tends to $+\infty $.
This assumption has been introduced only in order to avoid the pathological case for which the set of $\varepsilon $-optimal solutions of one of the optimization problems (13) or (14) coincides trivially with the whole admissible domain $[\varDelta T_{\mathrm{min}}, \varDelta T_{\mathrm{max}}]$.
For a better understanding of this part of the proof, Fig. 1 shows the behavior of the rescaled objective functions $T \frac{p C (\varDelta T)^{-\alpha }}{\left\lfloor \frac{T}{\varDelta T} \right\rfloor }$ and $p C (\varDelta T)^{1-\alpha }$ for the three cases $0< \alpha = 0.5 < 1$, $\alpha = 1.5 > 1$, and $\alpha = 1$ (the values of the other parameters are $p=10$, $T=10$ sec, $\varDelta T_{\mathrm{min}}=0.3$ sec, $\varDelta T_{\mathrm{max}}=0.7$ sec, $k_1=1$, and $k_2=1$ sec).
Simulation results similar to the ones reported in this section have been obtained also for other choices of the parameters of the problem. Given the limited space, the choice $p=10$ has been made for illustrative purposes, to achieve a good compromise between too small and too large choices for the dimension of the parameter vector.
This choice of the covariance matrix has been obtained by setting $\mathrm{Var}\left( \underline{x}\right) =A A'$, where the elements of $A \in \mathbb {R}^{p \times p}$ have been randomly and independently generated according to a uniform probability density on the interval [0,1]).

References

Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. 113, 7353–7360 (2016)
Article MathSciNet Google Scholar
Bacigalupo, A., Gnecco, G.: Metamaterial filter design via surrogate optimization. J. Phys. Conf. Proc. 1092, 4 (2018)
Google Scholar
Bacigalupo, A., Gnecco, G., Lepidi, M., Gambarotta, L.: Optimal design of low-frequency band gaps in anti-tetrachiral lattice meta-materials. Compos. Part B Eng. 115, 341–359 (2017)
Article Google Scholar
Bacigalupo, A., Lepidi, M., Gnecco, G., Gambarotta, L.: Optimal design of auxetic hexachiral metamaterials with local resonators. Smart Mater. Struct. 25(5), 19 (2016)
Article Google Scholar
Bargagli Stoffi, F.J., Gnecco, G.: Estimating heterogeneous causal effects in the presence of irregular assignment mechanisms. In: Proceedings of the $5{{\rm th}}$ IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA 2018), Turin, Italy, pp. 1–10 (2018)
Barlow, R.J.: Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, 1st edn. Wiley, London (1989)
MATH Google Scholar
Gnecco, G., Bemporad, A., Gori, M., Sanguineti, M.: LQG online learning. Neural Computation 29, 2203–2291 (2017)
Article MathSciNet Google Scholar
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression problems. In: Proceedings of the $4{{\rm th}}$ International Conference of the International Neural Network Society on Big Data and Deep Learning (INNS BDDL 2019), Sestri Levante, Italy, pp. 1–6 (2019)
Gnecco, G., Nutarelli, F.: On the trade-off between sample size and precision of supervision in the fixed effects panel data model. In: Proceedings of the $5{\rm th}$ International Conference on machine Learning, Optimization & Data science (LOD 2019), Certosa di Pontignano (Siena), Italy, pp. 1–12 (2019)
Greene, W.H.: Econometric Analysis, 5th edn. Pearson Education Inc., London (2003)
Google Scholar
Groves, R.M., Fowler, F.J.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: Survey Methodology, 1st edn. Wiley-Interscience, London (2004)
MATH Google Scholar
Hamming, R.: Numerical Methods for Scientists and Engineers, 2nd edn. McGraw-Hill, New York (1973)
MATH Google Scholar
Korolev, V.Y., Shevtsova, I.G.: On the upper bound for the absolute constant in the Berry-Esseen inequality. Theory Probab. Appl. 54(4), 638 (2010)
Article MathSciNet Google Scholar
Maddala, G.S.: Introduction to Econometrics, 2nd edn. MacMillan Publishing Company, London (1992)
Google Scholar
Ruud, P.A.: An Introduction to Classical Econometric Theory, 1st edn. Oxford University Press, Oxford (2000)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, 1st edn. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Vapnik, V.N.: Statistical Learning Theory, 1st edn. Wiley, London (1998)
MATH Google Scholar
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)
Article Google Scholar
Wilkinson, J.H.: The evaluation of the zeros of ill-conditioned polynomials. Part I. Numer. Math. 1, 150–166 (1959)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

IMT School for Advanced Studies, Piazza S. Francesco 19, 55100, Lucca, Italy
Giorgio Gnecco & Federico Nutarelli

Authors

Giorgio Gnecco
View author publications
You can also search for this author in PubMed Google Scholar
Federico Nutarelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgio Gnecco.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gnecco, G., Nutarelli, F. On the trade-off between number of examples and precision of supervision in machine learning problems. Optim Lett 15, 1711–1733 (2021). https://doi.org/10.1007/s11590-019-01486-x

Download citation

Received: 21 January 2019
Accepted: 23 September 2019
Published: 30 September 2019
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11590-019-01486-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the trade-off between number of examples and precision of supervision in machine learning problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the Trade-Off Between Number of Examples and Precision of Supervision in Regression

On the Optimal Generalization Error for Weighted Least Squares Under Variable Individual Supervision Times

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On the trade-off between number of examples and precision of supervision in machine learning problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the Trade-Off Between Number of Examples and Precision of Supervision in Regression

On the Optimal Generalization Error for Weighted Least Squares Under Variable Individual Supervision Times

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation