Abstract
We investigate a modification of the classical fixed effects panel data model (a linear regression model able to represent unobserved heterogeneity in the data), in which one has the additional possibility of controlling the conditional variance of the output given the input, by varying the cost associated with the supervision of each training example. Assuming an upper bound on the total supervision cost, we analyze and optimize the trade-off between the sample size and the precision of supervision (the reciprocal of the conditional variance of the output), by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the classical algorithm used to estimate the parameters of the fixed effects panel data model. Considering a specific functional form for that precision, we prove that, depending on the “returns to scale” of the precision with respect to the supervision cost per example, in some cases “many but bad” examples provide a smaller generalization error than “few but good” ones, whereas in other cases the opposite occurs. The results extend to the fixed effects panel data model the ones we obtained in recent works for a simpler linear regression model. We conclude discussing possible applications of our results, and extensions of the proposed optimization framework to other panel data models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For simplicity of exposition, here the model is not presented in its most general form (e.g., the disturbances \(\varepsilon _{n,t}\)’s are simply assumed to be mutually independent).
- 2.
The case of finite T and large N is of more interest for microeconometrics, and will be investigated in future research.
- 3.
E.g., if the \(\varvec{x}_{n,t}\)’s are independent, identically distributed, and have finite moments up to the order 4.
- 4.
We recall that a sequence of random real matrices \(\varvec{M}_{T}\), \(T=1,\ldots ,+\infty \) converges in probability to the real matrix \(\varvec{M}\) if, for every \(\varepsilon >0\), \(\mathrm{Prob} \left( \left\| \varvec{M}_{T} - \varvec{M}\right\| > \varepsilon \right) \) (where \(\Vert \cdot \Vert \) is an arbitrary matrix norm) tends to 0 as T tends to \(+\infty \). In this case, one writes \(\mathrm{plim}_{T \rightarrow +\infty } \varvec{M}_T=\varvec{M}\).
- 5.
The existence of the probability limit (20) and the assumed positive definiteness of the matrix \(\varvec{A}_N\) guarantee that the invertibility of the matrix \(\sum _{n=1}^N \varvec{X}_n' \varvec{Q} \varvec{X}_n=\sum _{n=1}^N \varvec{X}_n' \varvec{Q}' \varvec{Q} \varvec{X}_n\) (see Sect. 2) holds with probability near 1 for large T.
- 6.
This is obtained taking also into account that, as a consequence of the Continuous Mapping Theorem [4, Theorem 7.33], the probability limit of the product of two random variables equals the product of their probability limits, when the latter two exist.
- 7.
By an argument similar to that used in [6], one can show that the approximation is exact, at optimality, when C is a multiple of both \(N c_\mathrm{min}\) and \(N c_\mathrm{max}\).
References
Andreß, H.-J., Golsch, K., Schmidt, A.W.: Applied Panel Data Analysis for Economic and Social Surveys. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32914-2
Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)
Chen, C.-H., Lee, L.H.: Stochastic Simulation Optimization: An Optimal Computing Budget Allocation. World Scientific, Singapore (2010)
Florescu, I.: Probability and Stochastic Processes. Wiley, Hoboken (2015)
Frees, E.W.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds.) INNSBDDL 2019. PINNS, vol. 1, pp. 1–6. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16841-4_1
Greene, W.H.: Econometric Analysis. Pearson Education, London (2003)
Groves, R.M., Fowler Jr., F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)
Nguyen, H.T., Kosheleva, O., Kreinovich, V., Ferson, S.: Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)
Ruud, P.A.: An Introduction to Classical Econometric Theory. Oxford University Press, Oxford (2000)
Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)
Varian, H.R.: Big Data: new tricks for econometrics. J. Econ. Perspect. 28, 3–38 (2014)
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Derivation of Eq. (18)
Appendix: Derivation of Eq. (18)
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gnecco, G., Nutarelli, F. (2019). Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-37599-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)