Penalized multiply robust estimation in high-order autoregressive processes with missing explanatory variables

https://doi.org/10.1016/j.jmva.2021.104867Get rights and content

Abstract

Multiply robust estimation with missing data is considered as an important field in statistics, which incorporates information by weighting multiply candidate models and loosens the requirement of the model specification. Nevertheless, in high-dimensional cases one more flexible hypothesis is the “true structure” beyond the correct model. In this paper, we study the parametric estimation for high-order autoregressive processes with a lagged-dependent binary explanatory variable that is missing at random (MAR). Based on the “true structure” specification, we propose a penalized multiply robust estimation equation in the presence of multiply candidate model sets. The selecting criterion for optimal tuning parameters is modified for the model identification with incomplete data. We validate that our tuning criterion can correctly distinguish the true autoregressive coefficients from zero asymptotically, the estimators of population parameters enjoy the oracle properties as well. Some simulations are carried out and we apply the method to fit the model for the U.S. Industrial Production Index data and produce out-of-sample forecasts to confirm the rationality of results.

Introduction

Explanatory variables mixed time series model has received considerable attention because of the flexible description of the dynamic linear relationship. Assume that {ɛt,tZ} be a sequence of independent identically distributed (i.i.d.) random variables with mean zero and variance σ2 (σ>0), the p-order autoregressive processes with a binary explanatory variable are defined as Xt=αXt1+βZt+ɛt,tZ, Xt1=(Xt1,,Xtp,1), α=(α1,,αp,αp+1) and β are coefficients that we are interested to estimate. Zt is a one-dimensional binary explanatory variable which satisfies Zt|tB(wt(γ)),tZ,where wt(γ) has a parametric form with the parameter γ, t=σ(X0,X1,,Xt). Model (1) with such Zt can better characterize a number of realistic problems. For instance, clinicians need to follow up the indicator for a patient during a period of time, which may be affected by the status of a complication of his/her time. In this case, Xt denotes the outcome and Zt stands for the actual incidence of the individual at time t. Condition (2) effectively explains the potential relevance between Zt and Xt, that is, the complication is influenced by the state of the patient in earlier time. Some of economic time series with the type of binary covariate could be provided with a gratifying relevance as well. The following legend simply depicts this relationship with p=1: ...Xt1XtXt+1Xt+2......Zt1ZtZt+1Zt+2...

Let θ=(α,β) denote the vector of all parameters in the model. There are a great deal of literatures to estimate θ in terms of full data. For example, Crowder [2] made the inference of the consistency and asymptotic normality of the conditional least-squares (CLS) estimator. Zhao and Wang [26] investigated the procedure and propensity of the coefficient via empirical likelihood (EL). Yang and Wang [23] studied the related Bayesian inference based on independently and identically distributed explanatory variables.

In practice, it is common to consider high-order processes in the procedure of modeling. Variable selection is fundamental for shrinking to 0 for some insignificant orders to improve the overall prediction accuracy, for instance, Huang and Yang [10] advanced a lag-selection method for non-linear additive autoregressive models and Wang et al. [19] proposed the modified LASSO for the linear regression with autoregressive errors (REGAR). Zhang et al. [25] made the inference for the penalized conditional least squares estimation in generalized integer-valued autoregressive (GINAR) processes. Wang et al. [21] applied the penalized conditional maximum likelihood estimation for the Poisson autoregression (PAR) model, and Kwon et al. [11] studied the selection criterion of tuning parameter with adaptive LASSO (ALASSO) penalized function in autoregressive (AR) processes.

Missing data is frequently encountered in empirical studies. Ignoring drop-out information often destroys the representativeness of the sample and leads to a biased conclusion. Let {(xt,zt):t{1,,n}} be the finite sample from (1), xt is always available and zt is subject to missingness with a response indicator δt satisfying that δt=1 if zt is observed and δt=0 otherwise. The response mechanism is denoted by πt(ϕ)=Pr(δt=1|t,Zt;ϕ)with an unknown parameter vector ϕ. When Ztδt|t, the mechanism is called missing at random (MAR) (refer Little and Rubin [13]), and (3) can be rewritten as Pr(δt=1|t;ϕ). One most general estimation for MAR data, weighting the complete case through the inverse of the selection probabilities, is named as the inverse probability weighted (IPW) method (Rosenbaum and Rubin [16]). Beyond IPW estimation, Robins et al. [15] proposed a double robust estimation for θ by solving 1nt=1nδtπt(ϕ)qt(θ)+1δtπt(ϕ)E{qt(θ)|t}=0,where qt(θ)=Ut(θ)/θ, Ut(θ)={XtE(Xt|Gt1)}2 and Gt1 is a σ-field of events generated by {X0,X1,Z1,,Xt1,Zt1,Zt}. In the independent and identically distributed (i.i.d.) case, Chan [1], Han [7], Duan and Wang [4] separately generalized (4) to a multiply robust (MR) estimator if either πt(ϕ) or wt(γ) has a correct specification in their candidate model sets. Li et al. [12] discussed the relevance between double robustness and multiple robustness, and proposed a model mixing method for multiply candidate models to improve the effect of the double robust estimation.

However, there are a few difficulties to determine the exact model for πt(ϕ) and wt(γ) through little of implicit information, especially in high-order processes. Suppose that P={πt(j)(ϕ(j)):j{1,,J}} and W={wt(k)(γ(k)):k{1,,K}} are the nonnested multiply candidate model sets for πt(ϕ) and wt(γ) with corresponding parameter vectors ϕ(j) and γ(k), universally we can formulate the model with a large number of lag-dependent variables to attenuate modeling biases as the “true structure” (see in Definition 1) model. To deal with the problem, we consider a variable selection approach with determined penalties. One can prove that the estimators of the “true structure” models obey the asymptotic theory of White [22], also concur with the oracle properties. Furthermore, we study the parametric estimation for (1), (2) with missing Zt. Under MAR mechanism, we propose a penalized multiply robust estimation (PMREE) with a sparse θ. The proposed method is applicable for the significant lags in strictly stationary autoregression models that outperform the general multiply robust estimation. To improve the accuracy of the result, we develop a modified tuning parameter selection criterion to recover the information of the missing part. We briefly point out that our selection criterion possesses the consistency.

The rest of the paper is organized as follows. We discuss the penalized estimation for candidate models, which is n-consistent with the extremum of the Kullback–Leibler Information Criterion (KLIC) in Section 2.1. In Section 2.2 we propose the PMREE for θ in virtue of the empirical probability mass and LS weights. The multiply robust property and the oracle properties for PMREE estimator are derived in Sections 3.1 Multiple robustness, 3.2 Oracle properties respectively. The selection criterion for the tuning parameter is modified in the multiply robust framework of Section 3.3. Simulation studies are procedured in Section 4.1 to evidence the validity, and we cite an instance in Section 4.2 by fitting the monthly U.S. Industrial Production Index time series data to report the usability of the result. Conclusion remarks are presented in Section 5. Related assumptions and technical proofs are shown in the Appendix, and all tables and figures of numerical studies are attached in the Supplementary Material.

Section snippets

Estimation for multiply candidate models

The multiply robust property implies that it is necessary to estimate each parameter in P and W though some of them are incorrectly specified. Without loss of generality, suppose that the “true structure” models in P, W are πt(1)(ϕ) and wt(1)(γ), and the “true values” of parameters are ϕ0(1), γ0(1) respectively. We firstly introduce the concept of the “true structure” specification in the following definition.

Definition 1

Let g(X;θ0) be the true density function of X with a parameter vector θ0 on the

Sample properties and oracle properties

In this section, we explore the asymptotic theory for PMREE estimator. We prove that θˆn from (14) is multiply robust, i.e., θˆn converges to the “true value” θ0 as long as either P or W covered the “true structure” model. The oracle properties of the PMREE estimator is discussed in the variable selection framework. Based on the existence theorem of the original function, the following theorem is sufficient for the large sample properties of the estimator.

Theorem 3

For a given penalty function pλ(), if (

Simulation

To report the performance of the parameter estimation and the variable selection in previous sections, we conduct simulations for model (1) with p=6. The simulations are based on 1000 independent Monte Carlo test replications and the sample size n values either 200 or 400.

Firstly, observations are generated by Xt=αXt1+βZt+ɛt with ɛtN(0,1) and X6=0, where the true parameter θ0=(0.45,0,0,0,0,0.45,0,1). For a given Xt1, the success probability of the Bernoulli distribution Zt|t1 is

Conclusion

This paper investigates a model selection and parametric estimation for high-order autoregressive processes with a binary missing explanatory variable. The specifications for potential models are extended to the “true structure” case to improve the multiple robustness. The method effectively eliminates insignificant lags and identifies a parsimonious autoregressive process in high-dimensional modeling. Also, large sample properties of the PMREE estimator are studied, simulations and an

CRediT authorship contribution statement

Wei Xiong: Writing of the manuscript, Development of methodology, Execution of simulation work. Dehui Wang: Development of methodology, Execution of simulation work. Dianliang Deng: Writing of the revised version of the manuscript, Query and analysis of the empirical example, Supported Wei Xiong’s work by providing research venue. Xinyang Wang: Designing of the variable selection part in simulation studies, Writing of the manuscript. Wanying Zhang: Organization, Writing of the manuscript.

Acknowledgments

We thank the Editor, Associate Editor and referees for their insightful comments. We also thank Prof. Kai Yang for helpful discussions. The research is supported by National Natural Science Foundation of China (Nos. 11871028, 11731015, 11901053, 12101417), Educational Commission of Liaoning Province of China (No. LJKZ1003), China Scholarship Council and Natural Sciences and Engineering Research Council of Canada (NSERC) .

All authors approved the version of the manuscript to be published.

References (27)

  • HanP.

    Calibration and multiple robustness when data are missing not at random

    Statist. Sinica

    (2018)
  • HanP. et al.

    Estimation with missing data: beyond double robustness

    Biometrika

    (2013)
  • HuangJ.Z. et al.

    Identification of non-linear additive autoregressive models

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2004)
  • Cited by (0)

    View full text