Skip to main content
Log in

Discovering focal regions of slightly-aggregated sparse signals

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The characteristic aspects of dynamic distortions on a lengthy time series of i.i.d. pure noise when embedded with slightly-aggregating sparse signals are summarized into a significantly shorter recurrence time process of a chosen extreme event. We first employ the Kolmogorov–Smirnov statistic to compare the empirical recurrence time distribution with the null geometry distribution when no signal being present in the original time series. The power of such a hypothesis testing depends on varying degrees of aggregation of sparse signals: from a completely random distribution of singletons to batches of various sizes on the entire temporal span. We demonstrate the Kolmogorov–Smirnov statistic capturing the dynamic distortions due to slightly-aggregating sparse signals better than does Tukey’s Higher Criticism statistic even when the batch size is as small as five. Secondly, after confirming the presence of signals in the pure noise time series, we apply the hierarchical factor segmentation (HFS) algorithm again based on the recurrence time process to compute focal segments that contain a significantly higher intensity of signals than do the rest of the temporal regions. In a computer experiment with a given fixed number of signals, the focal segments identified by the HFS algorithm afford many folds of signal intensity which also critically depend on the degree of aggregation of sparse signals. This ratio information can facilitate better sensitivity, equivalent to a smaller false discovery rate, if the signal-discovering protocol implemented within the computed focal regions is different from that used outside of the focal regions. We also numerically compute the specificity as the total number of signals contained in the computed collection of focal regions, which indicates the inherent difficulty in the task of sparse signal discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Abramovich F, Benjamini Y (1996) Adaptive thresholding of wavelet coefficients. Comput Stat Data Anal 22:351–361

    Article  MathSciNet  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Cai T, Jin J, Low M (2007) Estimation and confidence set for sparse normal mixtures. Ann Statist 35: 2421–2449

    Google Scholar 

  • Chang L-B, Goswami A, Hsieh F, Hwang C-R (2013) An invariance for the large sample empirical distribution of waiting time between successive extremes. Under review for a special volume on stochastic calculus. In: Hwang CR et al (ed) (2013) Festschrift in honor of Professor S. R. Srinivasa Varadhan on the occasion of his 70th birthday, Academia Sinica, Taipei, Taiwan

  • Donoho D, Jin J (2004) Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 32:962–994

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho D, Jin J (2008) Higher criticism thresholding: optimal feature selection usful features are rare and weak. Proc Natl Acad Sci 105:14790–14795

    Article  Google Scholar 

  • Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 96:1151–1160

    Article  MathSciNet  Google Scholar 

  • Fushing H, Hwang CR, Lee HC, Lan YC, Horng SB (2006) Testing and mapping non-stationarity in animal behavioral processes: a case study on an individual female bean weevil. J Theor Biol 238:805–816

    Article  MathSciNet  Google Scholar 

  • Fushing H, Chen SC, Pollard KS (2009) A nearly exhaustive search for CpG islands on whole chromosome. Int J Biostat 5, Article 14

  • Fushing H, Chen S-C, Hwang C-R (2010a) Non-parametric decoding on discrete time series and its application in bioinformatics. Stat Biosci 2:18–40

    Article  Google Scholar 

  • Fushing H, Chen SC, Lee HJ (2010b) Computing circadian rhythmic patterns and beyond: a new non-Fourier analysis. Comput Stat 24:409–430

    Article  MathSciNet  Google Scholar 

  • Fushing H, Chen SC, Lee HJ (2010c) Statistical computations on biological rhythms I: dissecting variable cycles and measuring phase shifts in activity event time series. J Comput Graph Stat 19:221–239

    Article  MathSciNet  Google Scholar 

  • Fushing H, Ferrer E, Chen SC, Chow SM (2010d) Dynamics of dydic interaction I: exploring non-stationarity of intra- and inter-individual affective processes via hierarchical segmentation and stochastic small-world networks. Psychometrika 75:351–372

    Article  MathSciNet  MATH  Google Scholar 

  • Fushing H, Chen SC, Hwang C-R (2012) Discovering stock dynamics through multidimensional volatility-phases. Quant Financ 12:213–230

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P, Jin J (2008) Properties of higher criticism under strong dependence. Ann Stat 36:381–402

    Article  MathSciNet  MATH  Google Scholar 

  • Jeng XJ, Cai T, Li H (2010) Robust identification of sparse segments in ultra-high dimensional data analysis. J Am Stat Assoc 105:1156–1166

    Google Scholar 

  • Jin J (2007) Proportion of nonzero normal means: univeral oracle equivalences and uniformly consistent estimates. J R Stat Soc Ser B 70:461–493

    Article  Google Scholar 

  • Jin J, Cai T (2007) Estimating the null and the proportion of non-null effects in large scale multiple comparison. J Am Stat Assoc 102:496–506

    MathSciNet  Google Scholar 

  • Kac M (1947) On the notion of recurrence in discrete stochastic processes. Bull Am Math Soc 53:1002–1010

    Article  MATH  Google Scholar 

  • Tukey J (1989) Higher criticism for individual significance in several tables or parts of tables. Princeton University, Princeton (Internal working paper)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hsieh Fushing.

Additional information

This research is supported in part by the NSF under Grant DMS 1007219 (co-funded by Cyber-enabled Discovery and Innovation (CDI) program).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, SC., Fushing, H. & Hwang, CR. Discovering focal regions of slightly-aggregated sparse signals. Comput Stat 28, 2295–2308 (2013). https://doi.org/10.1007/s00180-013-0407-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-013-0407-8

Keywords

Navigation