Penalized multiply robust estimation in high-order autoregressive processes with missing explanatory variables
Introduction
Explanatory variables mixed time series model has received considerable attention because of the flexible description of the dynamic linear relationship. Assume that be a sequence of independent identically distributed (i.i.d.) random variables with mean zero and variance , the -order autoregressive processes with a binary explanatory variable are defined as , and are coefficients that we are interested to estimate. is a one-dimensional binary explanatory variable which satisfies where has a parametric form with the parameter , . Model (1) with such can better characterize a number of realistic problems. For instance, clinicians need to follow up the indicator for a patient during a period of time, which may be affected by the status of a complication of his/her time. In this case, denotes the outcome and stands for the actual incidence of the individual at time . Condition (2) effectively explains the potential relevance between and , that is, the complication is influenced by the state of the patient in earlier time. Some of economic time series with the type of binary covariate could be provided with a gratifying relevance as well. The following legend simply depicts this relationship with :
Let denote the vector of all parameters in the model. There are a great deal of literatures to estimate in terms of full data. For example, Crowder [2] made the inference of the consistency and asymptotic normality of the conditional least-squares (CLS) estimator. Zhao and Wang [26] investigated the procedure and propensity of the coefficient via empirical likelihood (EL). Yang and Wang [23] studied the related Bayesian inference based on independently and identically distributed explanatory variables.
In practice, it is common to consider high-order processes in the procedure of modeling. Variable selection is fundamental for shrinking to 0 for some insignificant orders to improve the overall prediction accuracy, for instance, Huang and Yang [10] advanced a lag-selection method for non-linear additive autoregressive models and Wang et al. [19] proposed the modified LASSO for the linear regression with autoregressive errors (REGAR). Zhang et al. [25] made the inference for the penalized conditional least squares estimation in generalized integer-valued autoregressive (GINAR) processes. Wang et al. [21] applied the penalized conditional maximum likelihood estimation for the Poisson autoregression (PAR) model, and Kwon et al. [11] studied the selection criterion of tuning parameter with adaptive LASSO (ALASSO) penalized function in autoregressive (AR) processes.
Missing data is frequently encountered in empirical studies. Ignoring drop-out information often destroys the representativeness of the sample and leads to a biased conclusion. Let be the finite sample from (1), is always available and is subject to missingness with a response indicator satisfying that if is observed and otherwise. The response mechanism is denoted by with an unknown parameter vector . When , the mechanism is called missing at random (MAR) (refer Little and Rubin [13]), and (3) can be rewritten as . One most general estimation for MAR data, weighting the complete case through the inverse of the selection probabilities, is named as the inverse probability weighted (IPW) method (Rosenbaum and Rubin [16]). Beyond IPW estimation, Robins et al. [15] proposed a double robust estimation for by solving where , and is a -field of events generated by . In the independent and identically distributed (i.i.d.) case, Chan [1], Han [7], Duan and Wang [4] separately generalized (4) to a multiply robust (MR) estimator if either or has a correct specification in their candidate model sets. Li et al. [12] discussed the relevance between double robustness and multiple robustness, and proposed a model mixing method for multiply candidate models to improve the effect of the double robust estimation.
However, there are a few difficulties to determine the exact model for and through little of implicit information, especially in high-order processes. Suppose that and are the nonnested multiply candidate model sets for and with corresponding parameter vectors and , universally we can formulate the model with a large number of lag-dependent variables to attenuate modeling biases as the “true structure” (see in Definition 1) model. To deal with the problem, we consider a variable selection approach with determined penalties. One can prove that the estimators of the “true structure” models obey the asymptotic theory of White [22], also concur with the oracle properties. Furthermore, we study the parametric estimation for (1), (2) with missing . Under MAR mechanism, we propose a penalized multiply robust estimation (PMREE) with a sparse . The proposed method is applicable for the significant lags in strictly stationary autoregression models that outperform the general multiply robust estimation. To improve the accuracy of the result, we develop a modified tuning parameter selection criterion to recover the information of the missing part. We briefly point out that our selection criterion possesses the consistency.
The rest of the paper is organized as follows. We discuss the penalized estimation for candidate models, which is -consistent with the extremum of the Kullback–Leibler Information Criterion (KLIC) in Section 2.1. In Section 2.2 we propose the PMREE for in virtue of the empirical probability mass and LS weights. The multiply robust property and the oracle properties for PMREE estimator are derived in Sections 3.1 Multiple robustness, 3.2 Oracle properties respectively. The selection criterion for the tuning parameter is modified in the multiply robust framework of Section 3.3. Simulation studies are procedured in Section 4.1 to evidence the validity, and we cite an instance in Section 4.2 by fitting the monthly U.S. Industrial Production Index time series data to report the usability of the result. Conclusion remarks are presented in Section 5. Related assumptions and technical proofs are shown in the Appendix, and all tables and figures of numerical studies are attached in the Supplementary Material.
Section snippets
Estimation for multiply candidate models
The multiply robust property implies that it is necessary to estimate each parameter in and though some of them are incorrectly specified. Without loss of generality, suppose that the “true structure” models in , are and , and the “true values” of parameters are , respectively. We firstly introduce the concept of the “true structure” specification in the following definition.
Definition 1 Let be the true density function of with a parameter vector on the
Sample properties and oracle properties
In this section, we explore the asymptotic theory for PMREE estimator. We prove that from (14) is multiply robust, i.e., converges to the “true value” as long as either or covered the “true structure” model. The oracle properties of the PMREE estimator is discussed in the variable selection framework. Based on the existence theorem of the original function, the following theorem is sufficient for the large sample properties of the estimator.
Theorem 3 For a given penalty function , if (
Simulation
To report the performance of the parameter estimation and the variable selection in previous sections, we conduct simulations for model (1) with . The simulations are based on 1000 independent Monte Carlo test replications and the sample size values either or .
Firstly, observations are generated by with and , where the true parameter . For a given , the success probability of the Bernoulli distribution is
Conclusion
This paper investigates a model selection and parametric estimation for high-order autoregressive processes with a binary missing explanatory variable. The specifications for potential models are extended to the “true structure” case to improve the multiple robustness. The method effectively eliminates insignificant lags and identifies a parsimonious autoregressive process in high-dimensional modeling. Also, large sample properties of the PMREE estimator are studied, simulations and an
CRediT authorship contribution statement
Wei Xiong: Writing of the manuscript, Development of methodology, Execution of simulation work. Dehui Wang: Development of methodology, Execution of simulation work. Dianliang Deng: Writing of the revised version of the manuscript, Query and analysis of the empirical example, Supported Wei Xiong’s work by providing research venue. Xinyang Wang: Designing of the variable selection part in simulation studies, Writing of the manuscript. Wanying Zhang: Organization, Writing of the manuscript.
Acknowledgments
We thank the Editor, Associate Editor and referees for their insightful comments. We also thank Prof. Kai Yang for helpful discussions. The research is supported by National Natural Science Foundation of China (Nos. 11871028, 11731015, 11901053, 12101417), Educational Commission of Liaoning Province of China (No. LJKZ1003), China Scholarship Council and Natural Sciences and Engineering Research Council of Canada (NSERC) .
All authors approved the version of the manuscript to be published.
References (27)
- et al.
Tuning parameter selection for the adaptive LASSO in the autoregressive model
J. Korean Statal Soc.
(2017) - et al.
Poisson autoregressive process modeling via the penalized conditional maximum likelihood procedure
Statist. Papers
(2017) - et al.
Regularized estimation in GINAR () process
J. Korean Stat. Soc.
(2017) A simple multiple robust estimator for missing response problem
Stat
(2013)On the asymptotic properties of least-squares estimators in autoregression
Ann. Statist.
(1980)- et al.
Variable selection and estimation with the seamlessL0 penalty
Statist. Sinica
(2013) - et al.
A fusion of least squares and empirical likelihood for regression models with a missing binary covariate
Sci. China (Math.)
(2016) - et al.
Variable selection via nonconcave penalized likelihood and its oracle properties
J. Amer. Statist. Assoc.
(2001) - et al.
Martingale Limit Theory and Its Application
(1980) Multiply robust estimation in regression analysis with missing data
J. Amer. Statist. Assoc.
(2014)