Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

Gnecco, Giorgio; Nutarelli, Federico

doi:10.1007/978-3-030-37599-7_44

Giorgio Gnecco¹³ &
Federico Nutarelli¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1758 Accesses
5 Citations

Abstract

We investigate a modification of the classical fixed effects panel data model (a linear regression model able to represent unobserved heterogeneity in the data), in which one has the additional possibility of controlling the conditional variance of the output given the input, by varying the cost associated with the supervision of each training example. Assuming an upper bound on the total supervision cost, we analyze and optimize the trade-off between the sample size and the precision of supervision (the reciprocal of the conditional variance of the output), by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the classical algorithm used to estimate the parameters of the fixed effects panel data model. Considering a specific functional form for that precision, we prove that, depending on the “returns to scale” of the precision with respect to the supervision cost per example, in some cases “many but bad” examples provide a smaller generalization error than “few but good” ones, whereas in other cases the opposite occurs. The results extend to the fixed effects panel data model the ones we obtained in recent works for a simpler linear regression model. We conclude discussing possible applications of our results, and extensions of the proposed optimization framework to other panel data models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For simplicity of exposition, here the model is not presented in its most general form (e.g., the disturbances \(\varepsilon _{n,t}\)’s are simply assumed to be mutually independent).
2.
The case of finite T and large N is of more interest for microeconometrics, and will be investigated in future research.
3.
E.g., if the \(\varvec{x}_{n,t}\)’s are independent, identically distributed, and have finite moments up to the order 4.
4.
We recall that a sequence of random real matrices \(\varvec{M}_{T}\), \(T=1,\ldots ,+\infty \) converges in probability to the real matrix \(\varvec{M}\) if, for every \(\varepsilon >0\), \(\mathrm{Prob} \left( \left\| \varvec{M}_{T} - \varvec{M}\right\| > \varepsilon \right) \) (where \(\Vert \cdot \Vert \) is an arbitrary matrix norm) tends to 0 as T tends to \(+\infty \). In this case, one writes \(\mathrm{plim}_{T \rightarrow +\infty } \varvec{M}_T=\varvec{M}\).
5.
The existence of the probability limit (20) and the assumed positive definiteness of the matrix \(\varvec{A}_N\) guarantee that the invertibility of the matrix \(\sum _{n=1}^N \varvec{X}_n' \varvec{Q} \varvec{X}_n=\sum _{n=1}^N \varvec{X}_n' \varvec{Q}' \varvec{Q} \varvec{X}_n\) (see Sect. 2) holds with probability near 1 for large T.
6.
This is obtained taking also into account that, as a consequence of the Continuous Mapping Theorem [4, Theorem 7.33], the probability limit of the product of two random variables equals the product of their probability limits, when the latter two exist.
7.
By an argument similar to that used in [6], one can show that the approximation is exact, at optimality, when C is a multiple of both \(N c_\mathrm{min}\) and \(N c_\mathrm{max}\).

References

Andreß, H.-J., Golsch, K., Schmidt, A.W.: Applied Panel Data Analysis for Economic and Social Surveys. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32914-2
Book MATH Google Scholar
Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)
Article MathSciNet Google Scholar
Chen, C.-H., Lee, L.H.: Stochastic Simulation Optimization: An Optimal Computing Budget Allocation. World Scientific, Singapore (2010)
Book Google Scholar
Florescu, I.: Probability and Stochastic Processes. Wiley, Hoboken (2015)
MATH Google Scholar
Frees, E.W.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds.) INNSBDDL 2019. PINNS, vol. 1, pp. 1–6. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16841-4_1
Chapter Google Scholar
Greene, W.H.: Econometric Analysis. Pearson Education, London (2003)
Google Scholar
Groves, R.M., Fowler Jr., F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)
MATH Google Scholar
Nguyen, H.T., Kosheleva, O., Kreinovich, V., Ferson, S.: Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)
Article MathSciNet Google Scholar
Ruud, P.A.: An Introduction to Classical Econometric Theory. Oxford University Press, Oxford (2000)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)
MATH Google Scholar
Varian, H.R.: Big Data: new tricks for econometrics. J. Econ. Perspect. 28, 3–38 (2014)
Article Google Scholar
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

IMT School for Advanced Studies, Lucca, Italy
Giorgio Gnecco & Federico Nutarelli

Authors

Giorgio Gnecco
View author publications
You can also search for this author in PubMed Google Scholar
Federico Nutarelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgio Gnecco .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
Harvard University, Cambridge, MA, USA
Renato Umeton
Università di Catania, Catania, Catania, Italy
Giovanni Giuffrida
Almawave, Rome, Roma, Italy
Vincenzo Sciacca

Appendix: Derivation of Eq. (18)

(27)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gnecco, G., Nutarelli, F. (2019). Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-37599-7_44
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Derivation of Eq. (18)

Appendix: Derivation of Eq. (18)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation