Skip to main content

Evaluating the Four-Way Performance Trade-Off for Stream Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11484))

Abstract

Machine Learning (ML) solutions need to deal efficiently with a huge amount of data available, addressing scalability concerns without sacrificing predictive performance. Moreover, this data comes in the form of a continuous and evolving stream imposing new constraints, e.g., limited memory and energy resources. In the same way, energy-aware ML algorithms are gaining relevance due to the power constraints of hardware platforms in several real-life applications, as the Internet of Things (IoT). Many algorithms have been proposed to cope with the mutable nature of data streams, with the Very Fast Decision Tree (VFDT) being one of the most widely used. An adaptation of the VFDT, called Strict VFDT (SVFDT), can significantly reduce memory usage without putting aside the predictive performance and time efficiency. However, the analysis of energy consumption regarding data stream processing of the VFDT and SVFDT is overlooked. In this work, we compare the four-way relationship between predictive performance, memory costs, time efficiency and energy consumption, tuning the hyperparameters of the algorithms to optimise the resources devoted to it. Experiments over 23 benchmark datasets revealed that the SVFDT-I is the most energy-friendly algorithm and greatly reduced memory consumption, being statistically superior to the VFDT.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/vturrisi/pystream.

  2. 2.

    https://software.intel.com/ai-academy/tools/devcloud.

References

  1. Agrawal, R., Swami, A., Imielinski, T.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)

    Article  Google Scholar 

  2. Albers, S.: Energy-efficient algorithms. Commun. ACM 53(5), 86–96 (2010)

    Article  Google Scholar 

  3. Bifet, A., de Francisci Morales, G., Read, J., Holmes, G., Pfahringer, B.: Efficient online evaluation of big data stream classifiers. In: Proceedings of the XXI ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 59–68. ACM, New York (2015)

    Google Scholar 

  4. Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22

    Chapter  Google Scholar 

  5. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Domingos, P., Hulten, G.: Mining high-speed data streams, pp. 71–80 (2000)

    Google Scholar 

  8. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  MATH  Google Scholar 

  9. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the IX ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 523–528. ACM, New York (2003)

    Google Scholar 

  10. Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, Edinburgh/Boca Raton (2010)

    Book  MATH  Google Scholar 

  11. Garcia-Martin, E., Lavesson, N., Grahn, H.: Energy efficiency analysis of the very fast decision tree algorithm. In: Missaoui, R., Abdessalem, T., Latapy, M. (eds.) Trends in Social Network Analysis. LNSN, pp. 229–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53420-6_10

    Chapter  Google Scholar 

  12. García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45(Supplement C), 100–123 (2014)

    Article  Google Scholar 

  13. Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9), 1469–1495 (2017)

    Article  MathSciNet  Google Scholar 

  14. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  15. Harizopoulos, S., Shah, M., Meza, J., Ranganathan, P.: Energy efficiency: the new holy grail of data management systems research. arXiv preprint arXiv:0909.1784 (2009)

  16. Holmes, G., Richard, K., Pfahringer, B.: Tie-breaking in Hoeffding trees (2005)

    Google Scholar 

  17. Hooper, A.: Green computing. Commun. ACM 51(10), 11–13 (2008)

    Article  Google Scholar 

  18. Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl. Inf. Syst. 22(3), 371–391 (2010)

    Article  Google Scholar 

  19. Krawczyk, B., Minku, L., Gama, J., Stefanowski, J.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 1–86 (2017)

    Article  Google Scholar 

  20. Mouw, E.: Linux kernel procfs guide (2001). http://lib.hpu.edu.vn/handle/123456789/21423

  21. Nemenyi, P.: Distribution-free Multiple Comparisons. Ph.D. thesis, Princeton University (1963)

    Google Scholar 

  22. Noureddine, A., Bourdon, A., Rouvoy, R., Seinturier, L.: A preliminary study of the impact of software engineering on GreenIT. In: 2012 First International Workshop on Green and Sustainable Software (GREENS), pp. 21–27. IEEE (2012)

    Google Scholar 

  23. Noureddine, A., Rouvoy, R., Seinturier, L.: A review of energy measurement approaches. ACM SIGOPS Oper. Syst. Rev. 47(3), 42–49 (2013)

    Article  Google Scholar 

  24. Patterson, M.G.: What is energy efficiency?: Concepts, indicators and methodological issues. Energy Policy 24(5), 377–390 (1996)

    Article  Google Scholar 

  25. Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 90–99. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_11

    Chapter  Google Scholar 

  26. PowerTop. https://01.org/powertop. Accessed 01 Jan 2019

  27. Singh, S., Sharma, P.K., Moon, S.Y., Park, J.H.: Advanced lightweight encryption algorithms for IoT devices: survey, challenges and solutions. J. Ambient Intell. Humaniz. Comput. 1–18 (2017)

    Google Scholar 

  28. Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2001, vol. 4, pp. 377–382 (2001)

    Google Scholar 

  29. Turrisi da Costa, V.G., de Carvalho, A.C.P.L.F., Barbon, S.: Strict very fast decision tree: a memory conservative algorithm for data stream mining. Pattern Recognit. Lett. 116, 22–28 (2018)

    Google Scholar 

  30. Turrisi da Costa, V.G., Mastelini, S.M., de Carvalho, A.C.P.L.F., Barbon, S.: Making data stream classification tree-based ensembles lighter. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 480–485, October 2018

    Google Scholar 

  31. Vereecken, W., Van Heddeghem, W., Colle, D., Pickavet, M., Demeester, P.: Overall ICT footprint and green communication technologies. In: 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP), pp. 1–6. IEEE (2010)

    Google Scholar 

  32. Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Discov. 30(4), 964–994 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  33. Yang, H., Fong, S.: Incremental optimization mechanism for constructing a decision tree in data stream mining. In: Mathematical Problems in Engineering (2013)

    Google Scholar 

Download references

Acknowledgement

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and the National Council for Scientific and Technological Development - Brazil (CNPq) - Grant of Project 420562/2018-4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor G. Turrisi da Costa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Cite this paper

Turrisi da Costa, V.G., Santana, E.J., Lopes, J.F., Barbon, S. (2019). Evaluating the Four-Way Performance Trade-Off for Stream Classification. In: Miani, R., Camargos, L., Zarpelão, B., Rosas, E., Pasquini, R. (eds) Green, Pervasive, and Cloud Computing. GPC 2019. Lecture Notes in Computer Science(), vol 11484. Springer, Cham. https://doi.org/10.1007/978-3-030-19223-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-19223-5_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19222-8

  • Online ISBN: 978-3-030-19223-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics