Skip to main content
Log in

Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

In many data stream mining applications, traditional density estimation methods such as kernel density estimation, reduced set density estimation can not be applied to the density estimation of data streams because of their high computational burden, processing time and intensive memory allocation requirement. In order to reduce the time and space complexity, a novel density estimation method Dm-KDE over data streams based on the proposed algorithm m-KDE which can be used to design a KDE estimator with the fixed number of kernel components for a dataset is proposed. In this method, Dm-KDE sequence entries are created by algorithm m-KDE instead of all kernels obtained from other density estimation methods. In order to further reduce the storage space, Dm-KDE sequence entries can be merged by calculating their KL divergences. Finally, the probability density functions over arbitrary time or entire time can be estimated through the obtained estimation model. In contrast to the state-of-the-art algorithm SOMKE, the distinctive advantage of the proposed algorithm Dm-KDE exists in that it can achieve the same accuracy with much less fixed number of kernel components such that it is suitable for the scenarios where higher on-line computation about the kernel density estimation over data streams is required.We compare Dm-KDE with SOMKE and M-kernel in terms of density estimation accuracy and running time for various stationary datasets. We also apply Dm-KDE to evolving data streams. Experimental results illustrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Domingos P, Hulten G. A general framework for mining massive data stream. Journal of Computational and Graphical Statistics, 2003, 12(4): 945–949

    Article  MathSciNet  Google Scholar 

  2. Aggarwal C C, Han J, Wang J Y, Yu P S. A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(5): 577–589

    Article  Google Scholar 

  3. Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments. IEEE Transaction on Neural Networks, 2011, 22(10): 1517–1531

    Article  Google Scholar 

  4. Lazar A A, Pnevatikakis E A. Video time encoding machines. IEEE Transaction on Neural Networks, 2011, 22(3): 461–473

    Article  Google Scholar 

  5. Rogister P, Benosman R, Ieng S H, Lichtsteiner P, Delbruck T. Asynchronous event-based binocular stereo matching. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(2): 347–353

    Article  Google Scholar 

  6. Domingos P, Hulten G. Catching up with the data: research issues in mining data streams. In: Workshop on Research Issues in Data Mining and Knowledge Discovery. 2001, 1–5

    Google Scholar 

  7. Martinez W L, Martinez A R. Computational statistics handbook with MATLAB. London: Chapman & Hall, 2008

    Google Scholar 

  8. Heinz C, Seeger B. Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(7): 880–893

    Article  Google Scholar 

  9. Hämäläinen A. Self-organizing map and reduced kernel density estimation. PhD Thesis. Jyväskylä: University of Jyväskylä, 1995.

    Google Scholar 

  10. Girolami M, He C. Probability density estimation from optimally condensed data samples. Pattern Analysis and Machine Intelligence, 2003, 25(10): 1253–1264

    Article  Google Scholar 

  11. Deng Z H, Chung F L, Wang S T. FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recognition, 2008, 41(4): 1363–1372

    Article  MATH  Google Scholar 

  12. Deng Z H, Kup-Sze C, Chung F L, Wang S T. Scalable TSK fuzzy modeling for very large datasets using minimal-enclosing-ball approximation. IEEE Transactions on Fuzzy Systems, 2011, 19(2): 210–226

    Article  Google Scholar 

  13. Qian P J, Chung F L, Wang S T, Deng Z H. Fast graph-based relaxed clustering for large data sets using minimal enclosing ball. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, 42(3): 672–687

    Article  Google Scholar 

  14. Cai Z, Qian W, Wei L, Zhou A. M-kernel merging: toward density estimation over data streams. In: Proceedings of the 18th International Conference on Database System Advances. 2003, 285-292

  15. Heinz C, Seeger B. Toward kernel density estimation over streaming data. In: Proceedings of International Conference on Management Data. 2006, 1-12

  16. Cao Y, He H B, Man H. SOMKE: kernel density estimation over data streams by sequences of self-organizing maps. IEEE Transaction on Neural Network and Learning Systems, 2012, 23(8): 1254–1268

    Article  Google Scholar 

  17. Kollios G, Gunopulos D. Efficient biased sampling for approximatie clustering and outlier detection in large datasets. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(5): 1170–1187

    Article  Google Scholar 

  18. He H, Chen S, Li K, Xu X. Incremental learning from stream data. IEEE Transaction on Neural Network, 2011, 22(12): 1901–1914

    Article  Google Scholar 

  19. Kullback S, Leibler R A. On information and sufficiency. The Annals of Mathematical Statistics, 1951, 22(1): 79–86

    Article  MATH  MathSciNet  Google Scholar 

  20. Cao Y, He H, Man H, Shen X. Integration of self-organizing map (SOM) and kernel density estimation (KDE) for network intrusion detection. In Proceedings of SPIE, 2009, 1–12

    Google Scholar 

  21. Yang Y, Liu Y Y. An improved background and foreground modeling using kernel density estimation in moving object detection. In: Proceedings of the International Conference on Computer Science and Network Technology (ICCSNT). 2011, 1050–1054

    Google Scholar 

  22. DiNardo J, Tobias J L. Nonparametric density and regression estimation. The Journal of Economic Perspectives, 2001, 15(4): 11–28

    Article  Google Scholar 

  23. Parzen E. On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 1962, 33(3): 1065–1076

    Article  MATH  MathSciNet  Google Scholar 

  24. Silverman B W. Density estimation for statistics and data analysis. London: Chapman & Hall, 1986

    Book  MATH  Google Scholar 

  25. Jones M C, Marron J S, Sheather S J. A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 1996, 91(433): 401–407

    Article  MATH  MathSciNet  Google Scholar 

  26. Raykar V C, Duraiswami R. Fast optimal bandwidth selection for kernel density estimation. In: Proceedings of 6th SIAMInternational Conference on Data Mining. 2006, 524–528

    Google Scholar 

  27. Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient K-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881–892

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shitong Wang.

Additional information

Min Xu received the BS degree from Suzhou University, China in 2002 and the MS degree from Jiangnan University, China in 2009. Currently, she is a lecturer and pursuing a PhD degree in the School of Digital Media, Jiangnan University, China. Her current research interests include pattern recognition, information retrieval.

Hisao Ishibuchi received the BS, MS and PhD degrees in industrial engineering from Osaka Prefecture University, Osaka, Japan. Since 1999, he has been a full professor with Osaka Prefecture University. His research interests include artificial intelligence, neural fuzzy systems, and data mining. Dr. Ishibuchi is on the editorial boards of several journals, including the IEEE Transactions Fuzzy Systems and the IEEE Transactions on Systems, Man, and Cybernetics (B): Cybernetics.

Xin Gu received the BS degree from Southerneast University, China in 2001 and the MS degree from Jiangnan University in 2009. Currently, he is pursuing a PhD degree in the School of Digital Media, Jiangnan University, China. His current research interests include pattern recognition, information retrieval.

Shitong Wang received the MS degree in computer science from Nanjing University of Aeronautics and Astronautics, China in 1987. He visited London University and Bristol University in UK, Hiroshima International University in Japan, Hong Kong University of Science and Technology, Hong Kong Polytechnic University, as a research scientist, for over six years. Currently, he is a full professor of the School of Digital Media, Jiangnan University, China. His research interests include artificial intelligence, neuro-fuzzy systems, pattern recognition, and image processing. He has published about 80 papers in international/national journals and has authored seven books.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, M., Ishibuchi, H., Gu, X. et al. Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams. Front. Comput. Sci. 8, 563–580 (2014). https://doi.org/10.1007/s11704-014-3105-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-3105-y

Keywords

Navigation