Skip to main content

Research on Concept Drift Detection for Decision Tree Algorithm in the Stream of Big Data

  • Conference paper
  • First Online:
Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

  • 1388 Accesses

Abstract

With the rapid development of information technology, various industries have to deal with an increasing number of data. Compared with the traditional static data, stream data under big data environment was rapid, continuous and always changed with time. At the same time, the implicit distribution of data stream brought about the concept drift. A stream data concept drift detection algorithm named ADDS (Anti-concept Drift Detection Algorithm) was put forward, which is mainly used to detect and process the hidden concept drift of unsteady data stream, under big data environment. The ADDS was focused on the improvements of traditional classification algorithms with incremental way to adapt to the demand of streaming data processing. The experimental results showed that the ADDS had a better concept drift detection effect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015)

    Article  Google Scholar 

  2. Assunção, M.D., Calheiros, R.N., Bianchi, S., et al.: Big data computing and clouds: Trends and future directions. J. Parallel Distrib. Comput. 75(5), 3–15 (2014)

    Google Scholar 

  3. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Data stream mining. In: Data Mining and Knowledge Discovery Handbook, pp. 759–787. Springer, Berlin (2009)

    Google Scholar 

  4. Lu, S., Xie, G., Chen, Z., et al.: The management of application of big data in internet of thing in environmental protection in China. In: IEEE First International Conference on Big Data Computing Service and Applications (BigDataService), pp. 218–222. IEEE (2015)

    Google Scholar 

  5. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, New York (2002)

    Google Scholar 

  6. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM, New York (2003)

    Google Scholar 

  7. Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. Intell. Data Anal. 10(1), 23–45 (2006)

    Google Scholar 

  8. Anagnostopoulos, C., Tasoulis, D.K., Adams, N.M., et al.: Temporally adaptive estimation of logistic classifiers on data streams. Adv. Data Anal. Classif. 3(3), 243–261 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM, New York (2001)

    Google Scholar 

  10. Suzuki, Y., Kido, K.: Big-data streaming applications scheduling with online learning and concept drift detection. In: Proceedings of the Design, Automation & Test in Europe, pp. 1547–1550. IEEE, Piscataway (2015)

    Google Scholar 

  11. Kuncheva, L.I.: Classifier ensembles for changing environments. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 1–15. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25966-4_1

    Chapter  Google Scholar 

  12. Gama, J.: A survey on learning from data streams: current and future trends. Prog. Artif. Intell. 1(1), 45–55 (2012)

    Article  Google Scholar 

  13. Chunquan, L., Yang, Z., Peng, S., et al.: Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Inf. Sci. 213(23), 50–67 (2012)

    MathSciNet  Google Scholar 

  14. Wenhua, Z.: Constructing decision trees for mining high-speed data streams. Chin. J. Electron. 21(2), 215–220 (2012)

    Google Scholar 

  15. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This paper was supported in part by project on the National Key Research and Development Program of China (2017YFB0202200); Program of National Natural Science Foundation of China (61373017, 61572261, 61170065); Outstanding Young Fund Project of Jiangsu Natural Science Foundation of China (BK20170100); Jiangsu Key Research and Development Program (BE2017166); Open-End Fund of Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks (WSNLBZY201514) and Research Project of Nanjing University of Posts and Telecommunications (NY214067).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yimu Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Liu, S., Lu, L., Zhang, Y., Xin, T., Ji, Y., Wang, R. (2017). Research on Concept Drift Detection for Decision Tree Algorithm in the Stream of Big Data. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics