Skip to main content
Log in

Most recent changepoint detection in censored panel data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This study aims to detect the most recent changepoint in censored panel data by ignoring dependence within and between segments as well as taking into account the serial autocorrelation. A comparison of different methods to detect the most recent changepoint for censored data is presented. Different censoring rates such as 20%, 50%, and 90% in the case of right and left censoring while (10%, 10%), (25%, 25%) and (40%, 50%) for interval censoring are considered. Further, we use most recent changepoint (MRC), double cumulative sum binary segmentation, non parametric changepoint detection (ECP), multiple changepoints in multivariate time series, analyzing each series in the panel independently, and analyzing aggregated data (AGG) methods. It is observed that different censoring rates have a significant effect on the detection of changepoints in high dimensional data. It is also noticed that the MRC method outperforms the competing methods considered in this study. In addition to investigating the impact of penalties, the performance of MRC and AGG methods is also compared using water quality data of the Niagara River. Also, a data set related to survival time of stroke patients is also a part of this study. An R package “cpcens” is available in comprehensive R archive network to replicate the results of this article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aston JA, Kirch C (2014) Efficiency of change point tests in high dimensional settings. arXiv preprint arXiv: 1409.1771

  • Bardwell L (2018) Efficient search methods for high dimensional time-series. Ph.D. thesis, Lancaster University

  • Bardwell L, Fearnhead P, Eckley IA, Smith S, Spott M (2019) Most recent changepoint detection in panel data. Technometrics 61(1):88–98

    Article  MathSciNet  Google Scholar 

  • Bellman RE, Dreyfus SE (2015) Applied dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Cao H, Biao Wu W (2015) Changepoint estimation: another look at multiple testing problems. Biometrika 102(4):974–980

    Article  MathSciNet  Google Scholar 

  • Charikar M, Guha S, Tardos É, Shmoys DB (2002) A constant-factor approximation algorithm for the k-median problem. J Comput Syst Sci 65(1):129–149

    Article  MathSciNet  Google Scholar 

  • Cho H et al (2016) Change-point detection in panel data via double CUSUM statistic. Electron J Stat 10(2):2000–2038

    Article  MathSciNet  Google Scholar 

  • Cho H, Fryzlewicz P (2015) Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J R Stat Soc Ser B (Stat Methodol) 77(2):475–507

    Article  MathSciNet  Google Scholar 

  • Cohen AC (2016) Truncated and censored samples: theory and applications. CRC Press, Boca Raton

    Book  Google Scholar 

  • Coolen F, Yan K (2004) Nonparametric predictive inference with right-censored data. J Stat Plan Inference 126(1):25–54

    Article  MathSciNet  Google Scholar 

  • Cryer J, Chan K (2008) Time series analysis: with applications in R. Springer, Berlin

    Book  Google Scholar 

  • Davis RA, Lee TCM, Rodriguez-Yam GA (2006) Structural break estimation for nonstationary time series models. J Am Stat Assoc 101(473):223–239

    Article  MathSciNet  Google Scholar 

  • Fearnhead P, Rigaill G (2019) Changepoint detection in the presence of outliers. J Am Stat Assoc 114:169–183

    Article  MathSciNet  Google Scholar 

  • Fryzlewicz P (2014) Wild binary segmentation for multiple change-point detection. Ann Stat 42(6):2243–2281

    Article  MathSciNet  Google Scholar 

  • Grünwald PD (2007) The minimum description length principle. MIT Press, Cambridge

    Book  Google Scholar 

  • Haynes K, Eckley IA, Fearnhead P (2017) Computationally efficient changepoint detection for a range of penalties. J Comput Graph Stat 26(1):134–143

    Article  MathSciNet  Google Scholar 

  • Helsel DR (2011) Statistics for censored environmental data using Minitab and R. Wiley, New York

    Book  Google Scholar 

  • Hewett P, Ganser GH (2007) A comparison of several methods for analyzing censored data. Ann Occup Hyg 51(7):611–632

    Google Scholar 

  • Horváth L, Hušková M (2012) Change-point detection in panel data. J Time Ser Anal 33(4):631–648

    Article  MathSciNet  Google Scholar 

  • James NA, Matteson DS (2013) ecp: An R package for nonparametric multiple change point analysis of multivariate data. arXiv preprint arXiv:1309.3295

  • Jandhyala V, Fotopoulos S, MacNeill I, Liu P (2013) Inference for single and multiple change-points in time series. J Time Ser Anal 34(4):423–446

    Article  MathSciNet  Google Scholar 

  • Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107(500):1590–1598

    Article  MathSciNet  Google Scholar 

  • Kirch C, Muhsal B, Ombao H (2015) Detection of changes in multivariate time series with application to EEG data. J Am Stat Assoc 110(511):1197–1216

    Article  MathSciNet  Google Scholar 

  • Lavielle M (2005) Using penalized contrasts for the change-point problem. Sig Process 85(8):1501–1510

    Article  Google Scholar 

  • Lavielle M, Moulines E (2000) Least-squares estimation of an unknown number of shifts in a time series. J Time Ser Anal 21(1):33–59

    Article  MathSciNet  Google Scholar 

  • Lavielle M, Teyssiere G (2006) Detection of multiple change-points in multivariate time series. Lith Math J 46(3):287–306

    Article  Google Scholar 

  • Leung K-M, Elashoff RM, Afifi AA (1997) Censoring issues in survival analysis. Annu Rev Public Health 18(1):83–104

    Article  Google Scholar 

  • Ma TF, Yau CY (2016) A pairwise likelihood-based approach for changepoint detection in multivariate time series models. Biometrika 103(2):409–421

    Article  MathSciNet  Google Scholar 

  • Maidstone R, Hocking T, Rigaill G, Fearnhead P (2017) On optimal multiple changepoint algorithms for large data. Stat Comput 27(2):519–533

    Article  MathSciNet  Google Scholar 

  • Matteson DS, James NA (2014) A nonparametric approach for multiple change point analysis of multivariate data. J Am Stat Assoc 109(505):334–345

    Article  MathSciNet  Google Scholar 

  • Mei Y (2011) Quickest detection in censoring sensor networks. In: 2011 IEEE international symposium on information theory proceedings (ISIT), pp 2148–2152. IEEE

  • Miller RG Jr (2011) Survival analysis. Wiley, New York

    Google Scholar 

  • Mohammad NM (2014) Censored Time Series Analysis. Electronic Thesis and Dissertation Repository. 2489. https://ir.lib.uwo.ca/etd/2489

  • Nemhauser G, Wolsey L (1988) Integer programming and combinatorial optimization. Wiley, New York

    MATH  Google Scholar 

  • Park JW, Genton MG, Ghosh SK (2007) Censored time series analysis with autoregressive moving average models. Can J Stat 35(1):151–168

    Article  MathSciNet  Google Scholar 

  • Preuss P, Puchstein R, Dette H (2015) Detection of multiple structural breaks in multivariate time series. J Am Stat Assoc 110(510):654–668

    Article  MathSciNet  Google Scholar 

  • Reese J (2006) Solution methods for the p-median problem: an annotated bibliography. Netw Int J 48(3):125–142

    MathSciNet  MATH  Google Scholar 

  • Robinson PM (1980) Estimation and forecasting for time series containing censored or missing observations. In: Anderson OD (ed) Time series. North Holland, Amsterdam, New York, pp 167–182. Proceedings of the international conference held at Nottingham University

  • Taha HA (2017) Operations research: an introduction. Pearson, London

    MATH  Google Scholar 

  • Teitz MB, Bart P (1968) Heuristic methods for estimating the generalized vertex median of a weighted graph. Oper Res 16(5):955–961

    Article  Google Scholar 

  • Vert J-P, Bleakley K (2010) Fast detection of multiple change-points shared by many signals using group LARS. In: Advances in neural information processing systems, pp 2343–2351

  • Wang T, Samworth RJ (2018) High dimensional change point estimation via sparse projection. J R Stat Soc Ser B (Stat Methodol) 80(1):57–83

    Article  MathSciNet  Google Scholar 

  • Wooldridge JM (2010) Econometric analysis of cross section and panel data. MIT Press, Cambridge

    MATH  Google Scholar 

  • Xie Y, Siegmund D (2013) Sequential multi-sensor change-point detection. Ann Stat 41:670–692

    Article  MathSciNet  Google Scholar 

  • Yao Y-C (1987) Approximating the distribution of the maximum likelihood estimate of the change-point in a sequence of independent random variables. Ann Stat 15(3):1321–1328

    Article  MathSciNet  Google Scholar 

  • Zeger SL, Brookmeyer R (1986) Regression analysis with censored autocorrelated data. J Am Stat Assoc 81(395):722–729

    MathSciNet  MATH  Google Scholar 

  • Zhang NR, Siegmund DO (2007) A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63(1):22–32

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thanks the editor, associate editor, and anonymous reviewers for their constructive and critical comments to improve the quality of the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajid Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 269 KB)

Appendices

Appendix

A. The double CUSUM binary segmentation algorithm

Let \({\mathcal {I}}_{s,d} = [s,s + e_{T}] \cup [d - e_{T},d]\) represent a fraction of the interval [s,d] and we do not search for the changepoints on this interval in order to account for possible bias. Let the index u is used to represent the level and v represents the location of the node at each level. Then repeat the following steps.

Step 1:

Set (u,v) = (1,1), \(s_{u,v}\) = 1 and \(d_{u,v}\) = T

Step 2:

At current level u, repeat the following steps for all v.

Step 2.1:

Let s = \(s_{u,v}\) and d = \(d_{u,v}\), obtain the CUSUMs series \({\chi ^{(k)}_{s,b,d}}\) for \(b \in [s,d)\) and \(k = 1,\ldots ,n\), on which \(D_{m}^{\varphi }({\{|\chi ^{(k)}_{s,b,d}|}\} _{k=1}^{n})\) is computed over all b and m.

Step 2.2:

Obtain the test statistic

$$\begin{aligned} T_{s,d}^{\varphi } = \mathrm {{\underset{b\in [s,d] \ {\mathcal {I}}_{s,d} }{\max } \underset{1\le m\le n}{max}}}D_{m}^{\varphi }({\{|\chi ^{(k)}_{s,b,d}|}\}_{k=1}^{n}). \end{aligned}$$

where

$$\begin{aligned} D_{m}^{\varphi }({\{|\chi ^{(k)}_{s,b,d}|}\}_{k=1}^{n})&= \Big \{\frac{m(2n-m)}{2n}\Big \}^{\varphi }\Bigg (\frac{1}{m}\sum \limits _{k=1}^{m}|\chi _{s,b,d}^{(k)}|-\frac{1}{2n-m}\sum \limits _{k=m+1}^{n}|\chi _{s,b,d}^{(k)}|\Bigg ) \\&= \Big \{\frac{m(2n-m)}{2n}\Big \}^{\varphi }\frac{1}{m}\sum \limits _{k=1}^{m}\Bigg (|\chi _{s,b,d}^{(k)}|-\frac{1}{2n-m}\sum \limits _{k=m+1}^{n}|\chi _{s,b,d}^{(k)}|\Bigg ) \end{aligned}$$

the \(D^{\varphi }_{m}\) (DC operator) takes the ordered CUSUM values \(|\chi _{s,b,d}^{(1)}|\ge |\chi _{s,b,d}^{(2)}|\ge \cdots \ge |\chi _{s,b,d}^{(n)}|\) at each b, as its input for some \(\varphi \in [0,1]\).

Step 2.3:

If \(T^ {\varphi }_{s,d} \le \pi ^{\varphi }_{n,T}\), stop searching for changepoints on the interval [s,d]. On the other hand, if \(T^ {\varphi }_{s,d} \ge \pi ^{\varphi }_{n,T}\), locate

$$\begin{aligned} {\widehat{\eta }} = \mathrm {{arg\underset{b\in [s,d] \ {\mathcal {I}}_{s,d} }{\max } \underset{1\le m\le n}{max}}}D_{m}^{\varphi }({\{|\chi ^{(k)}_{s,b,d}|}\}_{k=1}^{n}). \end{aligned}$$

and proceed to Step 2.4.

Step 2.4:

Add \({\widehat{\eta }}\) to the set of estimated changepoints and divide the interval \([s_{u,v},d_{u,v}]\) into two sub-intervals \([s_{u+1,2v-1},d_{u+1,2v-1}]\) and \([s_{u+1,2v},d_{u+1,2v}]\), where \(s_{u+1,2v-1} = s_{u,v}, d_{u+1,2v-1}\) = \( {\widehat{\eta }}\), \(s_{u+1,2v} = {{\widehat{\eta }}} + 1 \) and \(d_{u+1,2v} = d_{u,v}\).

Step 3:

Once \([s_{u,v},d_{u,v}]\) for all v are examined at level u, set u \(\leftarrow \) u + 1 and go to Step 2. Step 2.3 furnishes a stopping rule to the DCBS algorithm: quit the search for further changepoints when \(T^ {\varphi }_{s,d} \le \pi ^{\varphi }_{n,T}\) on every [s,d] defined by two adjacent estimated changepoints.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Siddiqa, H., Ali, S. & Shah, I. Most recent changepoint detection in censored panel data. Comput Stat 36, 515–540 (2021). https://doi.org/10.1007/s00180-020-01028-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01028-5

Keywords

Navigation