Abstract
In this paper the regression function methods based on Parzen kernels are investigated. Both the modeled function and the variance of noise are assumed to be time-varying. The commonly known kernel estimator is extended by adopting two popular tools often applied in concept drifting data stream scenario. The first tool is a sliding window, in which only a constant number of recently received data elements affects the estimator. The second one is the forgetting factor. In this case at each time step past data become less and less important. These heuristic approaches are experimentally compared with the basic mathematically justified estimator and demonstrate similar accuracy.
M. Pawlak carried out this research at USS during his sabbatical leave from University of Manitoba.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: Data Streams: Models and Algorithms. Advances in Database Systems. Springer, New York (2006)
Alippi, C., Boracchi, G., Roveri, M.: Hierarchical change-detection tests. IEEE Trans. Neural Netw. Learn. Syst. 28(2), 246–258 (2017)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Bilski, J., Smolag, J.: Parallel architectures for learning the RTRN and Elman dynamic neural networks. IEEE Trans. Parallel Distrib. Syst. 26(9), 2561–2570 (2015)
Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 81–94 (2014)
Devroye, L.P.: On the pointwise and the integral convergence of recursive kernel estimates of probability densities. Utilitas Math. (Canada) 15, 113–128 (1979)
Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Duda, P., Hayashi, Y., Jaworski, M.: On the strong convergence of the orthogonal series-type kernel regression neural networks in a non-stationary environment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012. LNCS, vol. 7267, pp. 47–54. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29347-4_6
Duda, P., Jaworski, M., Pietruczuk, L.: On pre-processing algorithms for data stream. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012. LNCS, vol. 7268, pp. 56–63. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29350-4_7
Ellis, P.: The time-dependent mean and variance of the non-stationary Markovian infinite server system. J. Math. Stat. 6, 68–71 (2010)
Epanechnikov, V.A.: Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14(1), 153–158 (1969)
Er, M.J., Duda, P.: On the weak convergence of the orthogonal series-type kernel regresion neural networks in a non-stationary environment. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 443–450. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31464-3_45
Galkowski, T., Rutkowski, L.: Nonparametric recovery of multivariate functions with applications to system identification. Proc. IEEE 73(5), 942–943 (1985)
Galkowski, T., Rutkowski, L.: Nonparametric fitting of multivariate functions. IEEE Trans. Autom. Control 31(8), 785–787 (1986)
Gałkowski, T.: Kernel estimation of regression functions in the boundary regions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS, vol. 7895, pp. 158–166. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38610-7_15
Galkowski, T., Pawlak, M.: Nonparametric extension of regression functions outside domain. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS, vol. 8467, pp. 518–530. Springer, Cham (2014). doi:10.1007/978-3-319-07173-2_44
Galkowski, T., Pawlak, M.: Nonparametric function fitting in the presence of nonstationary noise. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS, vol. 8467, pp. 531–538. Springer, Cham (2014). doi:10.1007/978-3-319-07173-2_45
Galkowski, T., Pawlak, M.: Orthogonal series estimation of regression functions in nonstationary conditions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS, vol. 9119, pp. 427–435. Springer, Cham (2015). doi:10.1007/978-3-319-19324-3_39
Galkowski, T., Pawlak, M.: Nonparametric estimation of edge values of regression functions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS, vol. 9693, pp. 49–59. Springer, Cham (2016). doi:10.1007/978-3-319-39384-1_5
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)
Greblicki, W., Krzyzak, A., Pawlak, M.: Distribution-free pointwise consistency of kernel regression estimate. Ann. Stat. 12, 1570–1575 (1984)
Greblicki, W., Pawlak, M.: Necessary and sufficient consistency conditions for a recursive kernel regression estimate. J. Multivar. Anal. 23(1), 67–76 (1987)
Greblicki, W., Pawlak, M.: Nonparametric System Identification. Cambridge University Press, Cambridge (2008)
Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-free Theory of Nonparametric Regression. Springer Science & Business Media, New York (2006)
Jaworski, M., Er, M.J., Pietruczuk, L.: On the application of the parzen-type kernel regression neural network and order statistics for learning in a non-stationary environment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012. LNCS, vol. 7267, pp. 90–98. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29347-4_11
Jaworski, M., Pietruczuk, L., Duda, P.: On resources optimization in fuzzy clustering of data streams. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012. LNCS, vol. 7268, pp. 92–99. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29350-4_11
Krzyzak, A., Pawlak, M.: Almost everywhere convergence of a recursive regression function estimate and classification. IEEE Trans. Inf. Theory 30(1), 91–93 (1984)
Krzyzak, A., Pawlak, M.: The pointwise rate of convergence of the kernel regression estimate. J. Stat. Plann. Infer. 16, 159–166 (1987)
Nikulin, V.: Prediction of the shoppers loyalty with aggregated data streams. J. Artif. Intell. Soft Comput. Res. 6(2), 69–79 (2016)
Parzen, E.: On estimation of probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: The Parzen kernel approach to learning in non-stationary environment. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3319–3323 (2014)
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: A method for automatic adjustment of ensemble size in stream data mining. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 9–15 (2016)
Pietruczuk, L., Duda, P., Jaworski, M.: Adaptation of decision trees for handling concept drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS, vol. 7894, pp. 459–473. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38658-9_41
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: How to adjust an ensemble size in stream data mining? Inf. Sci. 381, 46–54 (2017)
Rao, B.P.: Nonparametric Functional Estimation. Academic Press, Orlando (2014)
Rutkowski, L.: Generalized regression neural networks in time-varying environment. IEEE Trans. Neural Netw. 15, 576–596 (2004)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)
Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)
Serdah, A.M., Ashour, W.M.: Clustering large-scale data based on modified affinity propagation algorithm. J. Artif. Intell. Soft Comput. Res. 6(1), 23–33 (2016)
Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)
Specht, D.F.: Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Trans. Neural Netw. 1(1), 111–121 (1990)
Specht, D.F.: A general regression neural network. IEEE Trans. Neural Netw. 2(6), 568–576 (1991)
Wong, K.F.K., Galka, A., Yamashita, O., Ozaki, T.: Modelling non-stationary variance in eeg time series by state space garch model. Comput. Biol. Med. 36(12), 1327–1335 (2006)
Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2014)
Acknowledgments
This work was supported by the Polish National Science Center under Grant No. 2014/15/B/ST7/05264.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jaworski, M., Duda, P., Rutkowski, L., Najgebauer, P., Pawlak, M. (2017). Heuristic Regression Function Estimation Methods for Data Streams with Concept Drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2017. Lecture Notes in Computer Science(), vol 10246. Springer, Cham. https://doi.org/10.1007/978-3-319-59060-8_65
Download citation
DOI: https://doi.org/10.1007/978-3-319-59060-8_65
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59059-2
Online ISBN: 978-3-319-59060-8
eBook Packages: Computer ScienceComputer Science (R0)