Abstract
Machine Learning (ML) solutions need to deal efficiently with a huge amount of data available, addressing scalability concerns without sacrificing predictive performance. Moreover, this data comes in the form of a continuous and evolving stream imposing new constraints, e.g., limited memory and energy resources. In the same way, energy-aware ML algorithms are gaining relevance due to the power constraints of hardware platforms in several real-life applications, as the Internet of Things (IoT). Many algorithms have been proposed to cope with the mutable nature of data streams, with the Very Fast Decision Tree (VFDT) being one of the most widely used. An adaptation of the VFDT, called Strict VFDT (SVFDT), can significantly reduce memory usage without putting aside the predictive performance and time efficiency. However, the analysis of energy consumption regarding data stream processing of the VFDT and SVFDT is overlooked. In this work, we compare the four-way relationship between predictive performance, memory costs, time efficiency and energy consumption, tuning the hyperparameters of the algorithms to optimise the resources devoted to it. Experiments over 23 benchmark datasets revealed that the SVFDT-I is the most energy-friendly algorithm and greatly reduced memory consumption, being statistically superior to the VFDT.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Swami, A., Imielinski, T.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Albers, S.: Energy-efficient algorithms. Commun. ACM 53(5), 86–96 (2010)
Bifet, A., de Francisci Morales, G., Read, J., Holmes, G., Pfahringer, B.: Efficient online evaluation of big data stream classifiers. In: Proceedings of the XXI ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 59–68. ACM, New York (2015)
Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Domingos, P., Hulten, G.: Mining high-speed data streams, pp. 71–80 (2000)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the IX ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 523–528. ACM, New York (2003)
Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, Edinburgh/Boca Raton (2010)
Garcia-Martin, E., Lavesson, N., Grahn, H.: Energy efficiency analysis of the very fast decision tree algorithm. In: Missaoui, R., Abdessalem, T., Latapy, M. (eds.) Trends in Social Network Analysis. LNSN, pp. 229–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53420-6_10
García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45(Supplement C), 100–123 (2014)
Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9), 1469–1495 (2017)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Harizopoulos, S., Shah, M., Meza, J., Ranganathan, P.: Energy efficiency: the new holy grail of data management systems research. arXiv preprint arXiv:0909.1784 (2009)
Holmes, G., Richard, K., Pfahringer, B.: Tie-breaking in Hoeffding trees (2005)
Hooper, A.: Green computing. Commun. ACM 51(10), 11–13 (2008)
Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl. Inf. Syst. 22(3), 371–391 (2010)
Krawczyk, B., Minku, L., Gama, J., Stefanowski, J.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 1–86 (2017)
Mouw, E.: Linux kernel procfs guide (2001). http://lib.hpu.edu.vn/handle/123456789/21423
Nemenyi, P.: Distribution-free Multiple Comparisons. Ph.D. thesis, Princeton University (1963)
Noureddine, A., Bourdon, A., Rouvoy, R., Seinturier, L.: A preliminary study of the impact of software engineering on GreenIT. In: 2012 First International Workshop on Green and Sustainable Software (GREENS), pp. 21–27. IEEE (2012)
Noureddine, A., Rouvoy, R., Seinturier, L.: A review of energy measurement approaches. ACM SIGOPS Oper. Syst. Rev. 47(3), 42–49 (2013)
Patterson, M.G.: What is energy efficiency?: Concepts, indicators and methodological issues. Energy Policy 24(5), 377–390 (1996)
Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 90–99. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_11
PowerTop. https://01.org/powertop. Accessed 01 Jan 2019
Singh, S., Sharma, P.K., Moon, S.Y., Park, J.H.: Advanced lightweight encryption algorithms for IoT devices: survey, challenges and solutions. J. Ambient Intell. Humaniz. Comput. 1–18 (2017)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2001, vol. 4, pp. 377–382 (2001)
Turrisi da Costa, V.G., de Carvalho, A.C.P.L.F., Barbon, S.: Strict very fast decision tree: a memory conservative algorithm for data stream mining. Pattern Recognit. Lett. 116, 22–28 (2018)
Turrisi da Costa, V.G., Mastelini, S.M., de Carvalho, A.C.P.L.F., Barbon, S.: Making data stream classification tree-based ensembles lighter. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 480–485, October 2018
Vereecken, W., Van Heddeghem, W., Colle, D., Pickavet, M., Demeester, P.: Overall ICT footprint and green communication technologies. In: 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP), pp. 1–6. IEEE (2010)
Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Discov. 30(4), 964–994 (2016)
Yang, H., Fong, S.: Incremental optimization mechanism for constructing a decision tree in data stream mining. In: Mathematical Problems in Engineering (2013)
Acknowledgement
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and the National Council for Scientific and Technological Development - Brazil (CNPq) - Grant of Project 420562/2018-4.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Turrisi da Costa, V.G., Santana, E.J., Lopes, J.F., Barbon, S. (2019). Evaluating the Four-Way Performance Trade-Off for Stream Classification. In: Miani, R., Camargos, L., Zarpelão, B., Rosas, E., Pasquini, R. (eds) Green, Pervasive, and Cloud Computing. GPC 2019. Lecture Notes in Computer Science(), vol 11484. Springer, Cham. https://doi.org/10.1007/978-3-030-19223-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-19223-5_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19222-8
Online ISBN: 978-3-030-19223-5
eBook Packages: Computer ScienceComputer Science (R0)