Bayesian multiscale feature detection of log-spectral densities

https://doi.org/10.1016/j.csda.2009.03.020Get rights and content

Abstract

A fully-automatic Bayesian visualization tool to identify periodic components of evenly sampled stationary time series, is presented. The given method applies the multiscale ideas of the SiZer-methodology to the log-spectral density of a given series. The idea is to detect significant peaks in the true underlying curve viewed at different resolutions or scales. The results are displayed in significance maps, illustrating for which scales and for which frequencies, peaks in the log-spectral density are detected as significant. The inference involved in producing the significance maps is performed using the recently developed simplified Laplace approximation. This is a Bayesian deterministic approach used to get accurate estimates of posterior marginals for latent Gaussian Markov random fields at a low computational cost, avoiding the use of Markov chain Monte Carlo techniques. Application of the given exploratory tool is illustrated analyzing both synthetic and real time series.

Introduction

Many natural phenomena show a repetitive or regular behavior over time, best described as functions of periodic variations. Periodicities of stationary time series are most easily studied applying a frequency-domain approach, analyzing the series by means of their spectral representations. The spectral representation of stationary time series essentially decomposes the series into a sum of sines and cosines with uncorrelated random coefficients. The spectral density function, also named the power spectrum, reflects the periodic nature of a time series and describes the variance of the series in terms of frequency. Peaks in the power spectrum indicate which periodic components the time series is composed of, giving valuable information about the underlying physical process. Spectral density analysis is an important subject in various fields, for example statistical signal processing, economic analyses or climatology. For a thorough mathematical introduction to spectral density analysis, see for example Brockwell and Davis (1991) and Brillinger (2001).

The power spectrum of a zero-mean stationary real-valued stochastic process xt,tZ, is defined by f(ω)=h=γ(h)e2πiωh,1/2ω1/2, where the autocovariance function (ACF) of the series, γ(h)=E(xtxt+h),hZ is assumed to be absolutely summable, i.e. |γ(h)|<. We will assume stationarity in a wide-sense, having a stationary ACF. The power spectrum, being the Fourier transform of the ACF, provides exactly the same information on the second-order properties of the series as the ACF. We also note that the total variance equals the spectrum integrated over all frequencies.

An aim in spectral density analysis is to estimate the power spectrum of a random signal based on an observed time series. The field of suggested estimators is extensive, including both parametric, nonparametric and Bayesian methods. The classical nonparametric periodogram estimator is given by I(ω)=12n|t=12nxte2πiωt|2,1/2ω1/2, where x1,,x2n denote the observed time series. The raw periodogram is known to be asymptotically unbiased but not consistent, giving a highly fluctuating estimate. In practice, the periodogram is usually smoothed, both to obtain a consistent estimator and to make the resulting estimate more easy to interpret.

In most of the early literature on finding a consistent estimator of the power spectrum, the periodogram is smoothed directly, applying weighted local average approaches, see for example Blackman and Tukey (1958), Parzen (1961) and Priestley (1981). Other well-known consistent nonparametric estimators include procedures based on the Whittle (1962) likelihood, and also least-squares based methods for the log-periodogram, see for example Wahba (1980). The presented method is based on smoothing the log-periodogram applying a Bayesian approach, including an integrated Wiener process as a prior. Other methods for smoothing the periodogram within a Bayesian framework include, among others, Carter and Khon (1997), Liseo et al. (2001), Choudhuri et al. (2004) and Tonellato (2007), all eliciting different choices of prior models.

The aim of the presented method is to identify significant peaks in the log-spectral density of stationary time series, using a multiscale approach. Multiscale methods are often referred to as scale-space methods which are well-known in computer vision, see Lindeberg (1994) for an introduction. Multiscale ideas applied to nonparametric function estimation were introduced in Chaudhuri and Marron (1999), presenting the SiZer-methodology (SIgnificant ZERo-crossings of derivatives). The idea of SiZer is to detect significant trends in the true underlying curve viewed at different levels of smoothing, also referred to as scales. The results are visualized in SiZer maps, illustrating significant trends of the true underlying curve for all scales simultaneously, see Erästö (2005) for a discussion of SiZer maps. A theoretical justification of the SiZer-methodology is given in Chaudhuri and Marron (2000).

Originally, quantiles needed to evaluate the SiZer map were computed applying Gaussian distributional assumptions or bootstrapping, see Chaudhuri and Marron (1999). Some computational improvements, based on asymptotic theory, were developed in Hannig and Marron (2006). Bayesian versions of the SiZer-methodology were given in Erästö and Holmström (2005), presenting both an analytically solvable regression model and an approach using Gibbs sampling. In Øigård et al. (2006), a Bayesian approach was presented for Gaussian observational models, calculating quantiles exactly.

We extend the application of the SiZer-methodology to spectral density estimation and to a non-Gaussian observational model, using Bayesian approximate inference. Applying the integrated Wiener process as a prior model, the vector of discretely observed log-spectral densities augmented with first derivatives is a Gaussian Markov random field (GMRF). The simplified Laplace approximation, given in Rue et al. (2009), can then be applied to obtain accurate estimates of the posterior marginals needed to estimate the log-spectral density and its first derivative for different scales. Inference is summarized in color maps similar to SiZer maps, here named significance maps.

The computations involved are performed efficiently, applying numerical algorithms that take advantage of the special conditional independence structure of GMRFs. This class of models is widely used in Bayesian analysis, in both temporal and spatial contexts. Traditionally, Bayesian inference for such models has been performed by Markov chain Monte Carlo (MCMC) techniques. However, MCMC algorithms might have a high computational cost, especially in finding accurate estimates for tail probabilities, having an additive error. In comparison, the simplified Laplace approximation has a relative error. By applying this method, also the tails of a distribution are accurately estimated.

Details on prior choices, the simplified Laplace approximation, implementation and evaluation of probabilities to find significant peaks, are given in Section 2. Examples, illustrating the performance of the given method, are presented in Section 3. Some concluding remarks are given in Section 4.

Section snippets

Methodology

In order to produce a significance map for the log-spectral density, we evaluate posterior probabilities for the first derivative using a wide range of scales or smoothing parameters. The spectral density is symmetric around 0, and the analysis is restricted to the set of positive Fourier or fundamental frequencies, Ω={ωj=j/(2n),j=1,,n}, where 2n denotes the length of a given time series. The presented method relies on large-sample properties of the periodogram in (1), assuming I(ωj)f(ωj){12χ2

Examples

In order to illustrate the performance of the given exploratory tool we have included three simulated examples, see Sections 3.1 Analysis of a multiplicative seasonal model, 3.2 Analysis of an AR(24) process, 3.3 Analysis of a mixture of periodic series. In Section 3.4, we perform analysis for the well-known Wolfer sunspot series given in Waldmeier, 1960–1978, Waldmeier, 1961. We conclude by analyzing a time series of monthly mean sea level pressures for Tahiti, which possibly can be related to

Concluding remarks

The presented method represents an exploratory tool identifying periodicities in evenly-sampled stationary time series. We apply a Bayesian multiscale approach, extending the application of the SiZer-methodology introduced in Chaudhuri and Marron (1999) to spectral density estimation. Our aim is to identify peaks in the true underlying log-spectral density corresponding to periodic components of a given time series. In order to illustrate whether peaks in the log-spectral density are

References (29)

  • H. Rue et al.

    Approximate Bayesian inference for hierarchical Gaussian Markov random field models

    J. Statist. Plann. Inference

    (2007)
  • S.F. Tonellato

    Random field priors for spectral density functions

    J. Statist. Plann. Inference

    (2007)
  • T.A. Øigård et al.

    Bayesian multiscale analysis for time series data

    Comput. Stat. Data Anal.

    (2006)
  • R.B. Blackman et al.

    The Measurement of Power Spectra from the Viewpoint of Communications Engineering

    (1958)
  • D.R. Brillinger

    Time Series: Data Analysis and Theory

    (2001)
  • P.J. Brockwell et al.

    Time Series: Theory and Methods

    (1991)
  • C.K. Carter et al.

    Semiparametric Bayesian inference for time series with mixed spectra

    J. R. Statist. Soc. B

    (1997)
  • P. Chaudhuri et al.

    SiZer for exploration of structures in curves

    J. Amer. Statist. Assoc.

    (1999)
  • P. Chaudhuri et al.

    Scale space view of curve estimation

    Ann. Statist.

    (2000)
  • N. Choudhuri et al.

    Bayesian estimation of the spectral density of a time series

    J. Amer. Statist. Assoc.

    (2004)
  • Erästö, P., 2005. Studies in trend detection of scatter plots with visualization. PhD thesis, Dept. of Mathematics and...
  • P. Erästö et al.

    Bayesian multiscale smoothing for making inferences about features in scatterplots

    J. Comput. Graph. Statist.

    (2005)
  • J. Hannig et al.

    Advanced distribution theory for SiZer

    J. Amer. Statist. Assoc.

    (2006)
  • T. Lindeberg

    Scale-space theory: A basic tool for analyzing structures at different scales

    J. Appl. Statist.

    (1994)
  • Cited by (12)

    • Urban–Rural Disparities in Deaths of Despair: A County-Level Analysis 2004–2016 in the U.S.

      2023, American Journal of Preventive Medicine
      Citation Excerpt :

      To obtain the posterior distribution for parameters, Markov Chain Monte Carlo algorithm method is often used but can be intensive in terms of computation and time. Owing to its fast computation, efficient method, and relative accuracy, the integrated nested Laplace approximation (INLA) method was implemented with R package, R-INLA, to obtain the posterior distributions of parameters and random effects.48,49 The mean RR for each predictor variable was computed by exponentiating the posterior mean estimates.

    • Nonparametric estimation of a log-variance function in scale-space

      2013, Journal of Statistical Planning and Inference
      Citation Excerpt :

      SiZer has been also applied to time series data (Park et al., 2004; Rondonotti et al., 2007; Park et al., 2007, 2009a, 2009b). In addition, various Bayesian multiscale smoothing techniques have been proposed (Erästö and Holmström, 2005; Godtliebsen and Oigard, 2005; Oigard et al., 2006; Erästö and Holmström, 2007; Sørbye et al., 2009). Note that most of these SiZer tools aim to discover the important features in the mean or median structure of the data.

    • Statistical inference and visualization in scale-space using local likelihood

      2013, Computational Statistics and Data Analysis
      Citation Excerpt :

      Park et al. (2009b) introduced a SiZer that puts forth a method for comparing two or more time series. In addition, various Bayesian versions of SiZer have also been proposed as an approach to Bayesian multiscale smoothing (Erästö and Holmström, 2005; Godtliebsen and Oigard, 2005; Oigard et al., 2006; Erästö and Holmström, 2007; Sørbye et al., 2009). Note that all of these tools are restricted to data with a continuous response variable, and thus they are not readily applicable to discrete data.

    • Statistical Scale Space Methods

      2017, International Statistical Review
    View all citing articles on Scopus
    View full text