Abstract
In the paper, a new nonparametric algorithm for the homogeneity and change-point detection in random sequences is proposed. This algorithm is based on Klyushin–Petunin test for samples heterogeneity which allows us both absolutely continuous distributions and distributions with ties. The implementation of the algorithm may be both online and offline. It allows us to analyze small chunks of data stream for comparison providing the significance level less than 0.05. The comparisons show that proposed algorithm is more sensitive and robust than their counterparts. Opposite to the counterpart tests (Kolmogorov–Smirnov and Wilcoxon), the proposed algorithm well detect the homogeneity of samples from both distributions which differ in means and it has the same variance and distributions with the same mean but different variances. The algorithm has also wide field of applications from the detection of drift concept in texts to tracking the healthy parameters and coordinates of patients obtained from wearable gadgets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brodsky, B.: Change-Point Analysis in Nonstationary Stochastic Models. CRC Press, Boca Raton (2017). https://doi.org/10.1201/9781315367989
Brodsky, B., Darkhovsky, B.: Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Kluwer Academin Press, Dordrecht/Boston (1993). https://doi.org/10.1007/978-94-015-8163-9
Brodsky, B., Darkhovsky, B.: Non-Parametric Statistical Diagnosis: Problems and Methods. Springer, Heidelberg (2010). https://doi.org/10.1007/978-94-015-9530-8
Chen, J., Gupta, A.: Parametric Statistical Change Point Analysis With Applications to Genetics, Medicine, and Finance. Birkhauser, Basel (2012). https://doi.org/10.1007/978-0-8176-4801-5
Fearnhead, P., Liu, Z.: On line inference for multiple change point problems. J. Roy. Stat. Soc. Ser. B 69, 203–213 (2007). https://doi.org/10.1111/j.1467-9868.2007.00601.x
Ferger, D.: On the power of nonparametric changepoint-tests. Metrika 41, 277–292 (1994). https://doi.org/10.1007/BF01895324
Gombay, E.: U-statistics for sequential change detection. Metrika 52, 113–145 (2000). https://doi.org/10.1007/PL00003980
Gombay, E.: U-statistics for change under alternatives. J. Multivar. Anal. 78, 139–158 (2001). https://doi.org/10.1006/jmva.2000.1945
Gombay, E., Horvath, L.: An application of the maximum likelihood test to the change-point problem. Stoch. Process. Appl. 50, 161–171 (1994). https://doi.org/10.1016/0304-4149(94)90154-6
Gombay, E., Horvath, L.: On the rate of approximations for maximum likelihoodtests in change-point models. J. Multivar. Anal. 56, 120–152 (1996). https://doi.org/10.1006/jmva.1996.0007
Gurevich, G.: Retrospective parametric tests for homogeneity of data. Commun. Stat. Theor. Methods 36, 2841–2862 (2007). https://doi.org/10.1080/03610920701386968
Gurevich, G., Vexler, A.: Retrospective change point detection: from parametric to distribution free policies. Commun. Stat. Simul. Comput. 39, 1–22 (2010). https://doi.org/10.1080/03610911003663881
Hill, B.: Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. J. Am. Stat. Assoc. 63, 677–691 (1968). https://doi.org/10.1080/01621459.1968.11009286
Holmes, M., Kojadinovic, I., Quessy, J.: Nonparametric tests for change-point detection a la Gomabay and Hovath. J. Multivar. Anal. 115, 16–32 (2013). https://doi.org/10.1016/j.jmva.2012.10.004
James, B., James, K., Siegmund, D.: Tests for a change-point. Biometrika 74, 71–83 (1987). https://doi.org/10.1093/biomet/74.1.71
Johnson, N., Kotz, S.: Some generalizations of Bernoulli and Polya-Eggenberger contagion models. Stat. Pap. 32, 1–17 (1991). https://doi.org/10.1007/BF02925473
Klyushin, D., Petunin, Y.: A nonparametric test for the equivalence of populations based on a measure of proximity of samples. Ukrainian Math. J. 55(2), 181–198 (2003)
Matveichuk, S., Petunin, Y.: A generalization of the Bernoulli model occurring in order statistics. I. Ukrainian Math. J. 42(4), 459–466 (1990)
Matveichuk, S., Petunin, Y.: A generalization of the Bernoulli model occurring in order statistics. II. Ukrainian Math. J. 43(6), 728–734 (1991)
Mei, Y.: Sequential change-point detection when unknown parameters are present in the pre-change distribution. Ann. Stat. 34, 92–122 (2006). https://doi.org/10.1214/009053605000000859
Pettitt, A.: A non-parametric approach to the change-point problem. Appl. Stat. 28, 126–135 (1979). https://doi.org/10.2307/2346729
Pires, A., Amado, C.: Interval estimators for a binomial proportion: comparison of twenty methods. REVSTAT-Stat. J. 6, 165–197 (2008). https://doi.org/10.1080/01621459.1968.11009286
Poor, H., Hadjiliadis, O.: Quickest Detection. Cambridge University Press, Cambridge (2009). https://doi.org/10.1017/CBO9780511754678
Siegmund, D.: Sequential Analysis. Springer Series in Statistics. Springer, New York (1985). https://doi.org/10.1007/978-1-4757-1862-1
Tartakovsky, A., Rozovskii, B., et al.: A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Trans. Sig. Process 54(9), 3372–3382 (2006)
Truong, C., Oudre, L., Vayatis, N.: A review of change point detection methods. CoRR, abs/1801.00718 (2018), http://arxiv.org/abs/1801.00718
Truong, C., Oudre, L., Vayatis, N.: Selective review of offline changepoint detection methods. Sig. Process. 167, 107299 (2020). https://doi.org/10.1016/j.sigpro.2019.107299
Vexler, A., Gurevich, G.: Average most powerful tests for a segmented regression. Commun. Stat. Theor. Methods 38, 2214–2231 (2009). https://doi.org/10.1080/03610920802521208
Wolfe, D., Schechtman, E.: Nonparametric statistical procedures for the change point problem. J. Stat. Plann. Infer. 9, 389–396 (1984). https://doi.org/10.1016/0378-3758(84)90013-2
Zou, C., Liu, Y., Qin, P., Wang, Z.: Empirical likelihood ratio test for the change-point problem. Stat. Prob. Lett. 77, 374–382 (2007). https://doi.org/10.1016/j.spl.2006.08.003
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Klyushin, D., Martynenko, I. (2020). Novel Nonparametric Test for Homogeneity and Change-Point Detection in Data Stream. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds) Data Stream Mining & Processing. DSMP 2020. Communications in Computer and Information Science, vol 1158. Springer, Cham. https://doi.org/10.1007/978-3-030-61656-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-61656-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61655-7
Online ISBN: 978-3-030-61656-4
eBook Packages: Computer ScienceComputer Science (R0)