Skip to main content

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Abstract

We investigate a modification of the classical fixed effects panel data model (a linear regression model able to represent unobserved heterogeneity in the data), in which one has the additional possibility of controlling the conditional variance of the output given the input, by varying the cost associated with the supervision of each training example. Assuming an upper bound on the total supervision cost, we analyze and optimize the trade-off between the sample size and the precision of supervision (the reciprocal of the conditional variance of the output), by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the classical algorithm used to estimate the parameters of the fixed effects panel data model. Considering a specific functional form for that precision, we prove that, depending on the “returns to scale” of the precision with respect to the supervision cost per example, in some cases “many but bad” examples provide a smaller generalization error than “few but good” ones, whereas in other cases the opposite occurs. The results extend to the fixed effects panel data model the ones we obtained in recent works for a simpler linear regression model. We conclude discussing possible applications of our results, and extensions of the proposed optimization framework to other panel data models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For simplicity of exposition, here the model is not presented in its most general form (e.g., the disturbances \(\varepsilon _{n,t}\)’s are simply assumed to be mutually independent).

  2. 2.

    The case of finite T and large N is of more interest for microeconometrics, and will be investigated in future research.

  3. 3.

    E.g., if the \(\varvec{x}_{n,t}\)’s are independent, identically distributed, and have finite moments up to the order 4.

  4. 4.

    We recall that a sequence of random real matrices \(\varvec{M}_{T}\), \(T=1,\ldots ,+\infty \) converges in probability to the real matrix \(\varvec{M}\) if, for every \(\varepsilon >0\), \(\mathrm{Prob} \left( \left\| \varvec{M}_{T} - \varvec{M}\right\| > \varepsilon \right) \) (where \(\Vert \cdot \Vert \) is an arbitrary matrix norm) tends to 0 as T tends to \(+\infty \). In this case, one writes \(\mathrm{plim}_{T \rightarrow +\infty } \varvec{M}_T=\varvec{M}\).

  5. 5.

    The existence of the probability limit (20) and the assumed positive definiteness of the matrix \(\varvec{A}_N\) guarantee that the invertibility of the matrix \(\sum _{n=1}^N \varvec{X}_n' \varvec{Q} \varvec{X}_n=\sum _{n=1}^N \varvec{X}_n' \varvec{Q}' \varvec{Q} \varvec{X}_n\) (see Sect. 2) holds with probability near 1 for large T.

  6. 6.

    This is obtained taking also into account that, as a consequence of the Continuous Mapping Theorem [4, Theorem 7.33], the probability limit of the product of two random variables equals the product of their probability limits, when the latter two exist.

  7. 7.

    By an argument similar to that used in [6], one can show that the approximation is exact, at optimality, when C is a multiple of both \(N c_\mathrm{min}\) and \(N c_\mathrm{max}\).

References

  1. Andreß, H.-J., Golsch, K., Schmidt, A.W.: Applied Panel Data Analysis for Economic and Social Surveys. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-32914-2

    Book  MATH  Google Scholar 

  2. Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 7353–7360 (2016)

    Article  MathSciNet  Google Scholar 

  3. Chen, C.-H., Lee, L.H.: Stochastic Simulation Optimization: An Optimal Computing Budget Allocation. World Scientific, Singapore (2010)

    Book  Google Scholar 

  4. Florescu, I.: Probability and Stochastic Processes. Wiley, Hoboken (2015)

    MATH  Google Scholar 

  5. Frees, E.W.: Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  6. Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds.) INNSBDDL 2019. PINNS, vol. 1, pp. 1–6. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16841-4_1

    Chapter  Google Scholar 

  7. Greene, W.H.: Econometric Analysis. Pearson Education, London (2003)

    Google Scholar 

  8. Groves, R.M., Fowler Jr., F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)

    MATH  Google Scholar 

  9. Nguyen, H.T., Kosheleva, O., Kreinovich, V., Ferson, S.: Trade-off between sample size and accuracy: case of measurements under interval uncertainty. Int. J. Approx. Reason. 50, 1164–1176 (2009)

    Article  MathSciNet  Google Scholar 

  10. Ruud, P.A.: An Introduction to Classical Econometric Theory. Oxford University Press, Oxford (2000)

    Google Scholar 

  11. Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998)

    MATH  Google Scholar 

  12. Varian, H.R.: Big Data: new tricks for econometrics. J. Econ. Perspect. 28, 3–38 (2014)

    Article  Google Scholar 

  13. Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Gnecco .

Editor information

Editors and Affiliations

Appendix: Derivation of Eq. (18)

Appendix: Derivation of Eq. (18)

(27)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gnecco, G., Nutarelli, F. (2019). Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37599-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37598-0

  • Online ISBN: 978-3-030-37599-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics