Cumulant-based order selection of non-Gaussian autoregressive moving average models: the corner method

doi:10.1016/j.sigpro.2004.10.011

Signal Processing

Volume 85, Issue 3, March 2005, Pages 449-456

https://doi.org/10.1016/j.sigpro.2004.10.011 Get rights and content

Abstract

This paper presents a new corner location method to model order selection of an autoregressive moving average (ARMA) model. The criterion is determined in terms of the minimum eigenvalue of the third-order cumulant matrix derived from the observed data sequence. The observed sequence is modeled as the output of an ARMA system that is excited by an unobservable input, and is corrupted by zero-mean Gaussian additive noise. The system is driven by a zero-mean independent and identically distributed (i.i.d.) non-Gaussian sequence. The method is an extension to recent results based on third-order cumulant (TOC) by Al-Smadi and Wilkes. Simulations verify the performance of the proposed method even when the observed signal is heavily corrupted by additive noise. The proposed estimator, via computer simulation, is found to outperform the TOC estimator of Al-Smadi and Wilkes.

Introduction

Autoregressive moving average (ARMA) models are mathematical models of persistence in time series analysis. A time series is a sequence of observations that are ordered in time or space. If observations are made on some phenomenon throughout time, it is most sensible to display the data in the order in which they arose. This is reasonable since successive observations will probably be dependent. Time series models or ARMA models are excellent for random data if the model type and the model order are known [8]. ARMA models find applications in many diverse fields such as in signal modeling, spectrum estimation, communications, biomedical signal processing, speech signal processing, system identification, adaptive control, etc. ARMA models describe a stationary stochastic process very accurately if the right number of parameters is used [5]. For the estimation of an ARMA model of a stationary stochastic process, three basic problems can be distinguished: estimation of the model order, estimation of the model parameters, and estimation of the expected fit of a selected model to future data [7]. Successful identification of ARMA parameters depends on the correct model order selection. Selecting the order of ARMA model is one of the most difficult problems in developing a linear model for data [13]. The determination of the number of parameters in a model used to fit a data set is a well-known and well-researched problem.

Model order determination of ARMA models is an area of research to which many efforts have been devoted in the past. This problem has been of considerable interest for some time and it has a long and continuity history. The reason for this interest is twofold: relevance of the issue in many practical applications and the unsatisfaction got by the users of the existing methods. This unsatisfaction comes from the fact that the problem of order determination is an ill-posed problem; that is, desirable features of an algorithm can hardly be written in mathematical form. In most practical cases, the model order is not known. This vital and crucial step is ignored, chosen rather arbitrarily, or assumed to be available in many of the commonly employed ARMA modeling algorithms. For example, in spectrum analysis and modeling, the problem of model order selection is of most importance [15]. That is because the accuracy of the frequency estimates depends on the estimated order of the prediction filter [14].

Pioneering work on order selection has been done by Akaike [1], Rissanen [11], Schwarz [12], and Parzen [10]. Much attention has been given in the literature for determination of the autoregressive (AR) order. However, the problem of ARMA model order determination is much more difficult [15]. One method by Liang et al. [9] is shown to yield a level of performance for a general ARMA model order estimation never before achieved. This method is derived from the minimum description length (MDL) principle [11], [12]. It is based on the minimum eigenvalue (MEV) of a family of covariance matrices computed from the observed data. Liang et al. showed that the MDL did not work well at low signal-to-noise ratio (SNR) and is computationally expensive. This is due to the prediction error used in computing MDL that is directly affected by the accuracy of the parameter estimates.

An extensive review of the literature on order determination methods was done by Al-Smadi and Wilkes [4]. In their paper, they also formulated four practical criteria for a successful model order selection technique. Among the desirable features is the robustness to additive noise. This is the main point of their paper and it is dealt by taking advantage of the insensitiveness of the higher order statistics to Gaussian noise. The paper is strongly based on the work by Liang [9]; more specifically, the properties that are developed there for the signal samples are extended in [4] to the third-order cumulant (TOC) sequence. Another important contribution of the paper [4] is that its ability to drop the white noise assumption [9]. That is, they extended the original results of Liang to the case of colored Gaussian noise. Although the problem of model order determination is an “old” problem and widely considered to be solved, recent results by Liang [9] and Al-Smadi and Wilkes [4] indicate that this is far from the actual case. In fact, the much higher accuracy of these new algorithms calls for re-examination of this important problem.

In this paper, we present a new approach to the problem of ARMA model order estimation by utilizing theoretical ideas. The proposed algorithm is based on the minimum eigenvalue of a third-order cumulants matrix derived from the observed data sequence. The observed sequence is modeled as the output of an ARMA system that is excited by an unobservable input, and is corrupted by zero-mean Gaussian additive noise. A comparison will be presented between the proposed and the TOC methods [4] for different SNRs on the output signal. The TOC method is briefly reviewed in Section 2. Section 3 describes the proposed method. Section 4 contains examples of the proposed algorithm. Section 5 is devoted to concluding remarks.

Section snippets

Problem formulation

Let x(t) denote a real-valued stationary ARMA(p,q) signal given by $\sum_{i = 0}^{p} a_{i} x (t - i) = \sum_{i = 0}^{q} b_{i} w (t - i),$ where w(t) is the excitation sequence and x(t) is the noiseless output signal. The excitation signal w(t) is assumed to be zero-mean, non-Gaussian, independent and identically distributed (i.i.d.) process. The parameters a₀,…,a_p are the AR parameters; the number of AR parameters is the order p. The parameters b₀,…,b_q are the MA parameters; q is the MA order. We model the noisy output as $x_{0} (t) = x (t) + v (t)$

Proposed algorithm

Now, the J_MEV and J_TOC criterion are theoretically sound. However, the row/column ratio tables’ method was observed in [9] as a method that works without any kind of mathematical proof. Even though this method provides good estimates of the true model order [9], [4], it has no justification of why it works. We will now investigate a new method to locate the corner that works and can be justified. The method is based on theoretical viewpoints and is derived from the cost function, J_TOC, in Eq.

Simulation examples

In this section, we present simulation results concerning the proposed approach to model order selection from only the observed noisy output data. To study the robustness of the algorithm, a number of experiments were performed. In these experiments, the proposed method has been compared with the TOC method. The computations were performed in MATLAB. A finite length of $N = 1500$ points was considered in each experiment. The driving input sequence is not observed. However, it is needed for

Conclusion

In this paper, the problem of estimating the model order of a general ARMA process has been investigated. The method presented is an extension to the results by Al-Smadi and Wilkes. As in the TOC method, we look for a corner in the tabulation of the cost function J_TOC. The corner is detected by transforming the J_TOC matrix into row vector and column vector. Each vector defines one side of the corner; i.e., the AR and MA model orders. The proposed method demonstrated superior performance over

Acknowledgments

The author would like to thank the anonymous referees whose comments and suggestions contributed to the improvement of the paper.

References (15)

A. Al-Smadi et al.
Fitting ARMA models to linear non-Gaussian processes using higher order statistics
Signal Process. Internat. J.
(November 2002)
J. Rissanen
Modeling by shortest data description
Automatica
(1978)
H. Akaike
A new look at statistical model identification
IEEE Trans. Automat. Control
(1974)
A. Al-Smadi, D.M. Wilkes, On estimating ARMA model orders, IEEE International Symposium on Circuits and Systems, May...
A. Al-Smadi et al.
Robust and accurate ARX and ARMA model order estimation of non-Gaussian processes
IEEE Trans. Signal Process.
(March 2002)
N. Beamish et al.
A study of autoregressive and window spectral estimation
Appl. Stat.
(1981)
P. Broersen
The quality of models for ARMA processes
IEEE Trans. Signal Process.
(June 1998)

There are more references available in the full text version of this article.

Cited by (10)

Estimation of ARMA-model parameters to describe pathological conditions in cardiovascular system models
2020, Informatics in Medicine Unlocked
Cardiovascular diseases cause one of three deaths worldwide. Among these diseases, especially aortic aneurysms are a highly underestimated problem. There are some diagnostic methods known in the literature (transesophageal echocardiography, doppler sonography or CT/MRT), but none is suitable as an easilyavailable non-invasive screening method, which is inexpensive and independent of the medical examiner. Within this study we present a first step towards a novel screening method using artificial intelligence:
The objective of this study is to simulate healthy and diseased conditions of cardiovascular blood flow by means of numerical models, using a distributed zero-dimensional lumped approach based on the Windkessel model, in order to regard pressure-pressure transfer functions between two systemic measurement locations. The coefficients of the transfer function were estimated by an AutoRegressive-MovingAverage (ARMA)-model. The numerical estimation of the ARMA-coefficients $(a_{k}, b_{k})$ of order $l = 45$ was performed via a Subspace Gauss-Newton search method. The ARMA-coefficients were estimated using artificial zero-mean signals from the arteria brachialis and femoralis in four cases: Besides the control group, the estimations were performed on signals of two aneurysms located in the thoracic (TAA1 and TAA2) and one in the abdominal aorta (AAA). Finally, we quantified the difference between the estimated coefficients in each pathological case, using a distance measure based on the mean and the standard deviation. The largest deviation between the pathological conditions and the control group was found for the coefficients $a_{2}, a_{3}, a_{4}$ and $a_{7}$ . The findings suggest a reasonable situation to distinguish the pathological state of the four underlying pathological cases from the estimated coefficients; therefore we propose to diagnose the pathological states from the control group using a classification algorithm.
A new approach for geological pattern recognition using high-order spatial cumulants
2010, Computers and Geosciences
Citation Excerpt :
High-order cumulants are combinations of moment statistical parameters that allow the characterization of non-Gaussian random variables (Billinger and Rosenblatt, 1966), and may be seen as an extension of the well-known covariance function. They are critical contributors to non-Gaussian and non-linear modelling, where related developments include cumulants for signal filtering and deconvolution (Al-Smadi, 2004; Nikias and Petropulu, 1993; Sadler et al., 1995; Delopoulos and Giannakis, 1996; Dembélé and Favier, 1998; Zhang, 2005), or for estimating the gravitational evolution of the cosmic distribution function (Gaztanaga et al., 2000) and conditional cumulants and high-order statistics in the so-called high-precision astronomy (Bernardeau et al., 2002). A key justification for the use of cumulants is the wealth of information they contain compared to second-order statistical measures (Pan and Szapudi, 2005).
Spatially distributed natural phenomena represent complex non-linear and non-Gaussian systems. Currently, their spatial distributions are typically studied using second-order spatial statistical models, which are limiting considering the spatial complexity of natural phenomena such as geological applications. High-order geostatistics is a new area of research based on higher-order spatial connectivity measures, especially spatial cumulants as suitable for non-Gaussian and non-linear phenomena. This paper presents HOSC or High-order spatial cumulants, an algorithm for calculating spatial cumulants, including anisotropic experimental cumulants based on spatial templates. High-order cumulants are calculated on two- and three-dimensional synthetic training images so as to elaborate on their characteristics. Spatial cumulants up to and including the fifth-order are found to be efficient in characterizing patterns on both binary and continuous images. The behaviour of spatial cumulants is shown to characterize well the behaviour of the spatial architecture of geological data, including the degree of homogeneity and connectivity. The high-order cumulants are found to be relatively insensitive to the number of data used, and relatively small data sets are sufficient to provide cumulant maps. HOSC has been coded in FORTAN 90 and is easily integrated to the S-GeMS open source platform.
Parameter estimation of DSSS signals in non-cooperative communication system
2007, Journal of Systems Engineering and Electronics
A new adaptive estimator for direct sequence spread spectrum (DSSS) signals using fourth-order cumulant based adaptive method is considered. The general higher-order statistics may not be easily applied in signal processing with too complex computation. Based on the fourth-order cumulant with 1-D slices and adaptive filters, an efficient algorithm is proposed to solve the problem and is extended for nonstationary stochastic processes. In order to achieve the accurate parameter estimation of direct sequence spread spectrum (DSSS) signals, the first step uses the modified fourth-order cumulant to reduce the computing complexity. While the second step employs an adaptive recursive system to estimate the power spectrum in the frequency domain. In the case of intercepted signals without large enough data samples, the estimator provides good performance in parameter estimation and white Gaussian noise suppression. Computer simulations are included to corroborate the theoretical development with different signal-to-noise ratio conditions and recursive coefficients.
A new technique for arma-system identification based on qr-decomposition of third order cumulants matrix
2021, International Journal of Circuits, Systems and Signal Processing
A robust method for the identification of non-Gaussian autoregressive systems in colored Gaussian noise
2020, Transactions of the Institute of Measurement and Control
Seismic wavelet extraction based on auto-regressive and moving average model and particle swarm optimization
2011, Zhongguo Shiyou Daxue Xuebao (Ziran Kexue Ban)/Journal of China University of Petroleum (Edition of Natural Science)

View all citing articles on Scopus

View full text

Cumulant-based order selection of non-Gaussian autoregressive moving average models: the corner method

Abstract

Introduction

Section snippets

Problem formulation

Proposed algorithm

Simulation examples

Conclusion

Acknowledgments

Signal Process. Internat. J.

Automatica

A new look at statistical model identification

IEEE Trans. Automat. Control

Robust and accurate ARX and ARMA model order estimation of non-Gaussian processes

IEEE Trans. Signal Process.

A study of autoregressive and window spectral estimation

Appl. Stat.

The quality of models for ARMA processes

IEEE Trans. Signal Process.