Abstract
Image data stream classification presents several challenges, for example, the evolution of concepts of known classes (concept drift) and the emergence of new classes (open set). Many studies conducted on image data stream classification investigate the classifier, but do not explore other important issues, such as specific evaluation methods for data stream scenarios, evolution of the image feature descriptor and the updating of the decision model, while considering characteristics of real application environments. This article thus aims at making contributions that aid in closing these gaps through the incorporation of an experimental study, which considers a new evaluation method for the classification of image streams, while deliberating on important issues connected to this task. To this end, algorithms from the literature were considered, in order to identify how such algorithms lose performance when evaluated in real-world scenarios. Experiments were carried out exploring the refinement of the feature descriptor, updating the model in the presence of concept drift and open set, in addition to the use of latency and active learning strategies. The results obtained show that the greater the reality considered in the experiments, the greater the degradation of the results.
Similar content being viewed by others
Availability of data and materials
The datasets analyzed during the current study are public and may also be requested from the corresponding author on reasonable request. The EVISClass code is available on GitHub.
Notes
The implementation of EVISClass framework is available at https://github.com/EVISClass/EVISClass.
References
Silva JA, Faria ER, Barros RC, Hruschka ER, Carvalho ACPLFD, Gama JA (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):13–11331. https://doi.org/10.1145/2522968.2522981
Gurjar GS, Chhabria S (2015) A review on concept evolution technique on data stream. In: International conference on pervasive computing. IEEE, Pune, pp 1–3. https://doi.org/10.1109/PERVASIVE.2015.7087172
Mehta JS (2017) Concept drift in streaming data classification: algorithms, platforms and issues. Procedia Comput Sci 122:804–811. https://doi.org/10.1016/j.procs.2017.11.440
Parreira P, Prati R (2019) Active learning in data stream with intermediate latency. In: ENIAC, Salvador
Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han J, Srivastava A, Oza NC (2013) Classification and adaptive novel class detection of feature-evolving data streams. TKDE 25(7):1484–1497. https://doi.org/10.1109/TKDE.2012.109
Rebuffi S-A, Kolesnikov A, Sperl G, Lampert CH (2017) iCaRL: incremental classifier and representation learning. In: CVPR. IEEE, Honolulu, Hawaii, pp 5533–5542. https://doi.org/10.1109/CVPR.2017.587
Goo W, Kim J, Kim G, Hwang S (2016) Taxonomy-regularized semantic deep convolutional neural networks. In: ECCV. Springer, Amsterdam, pp 86–101. https://doi.org/10.1007/978-3-319-46475-6_6
Castro FM, Marin-Jimenez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: ECCV. Springer, Munich, pp 241–257. https://doi.org/10.1007/978-3-030-01258-8_15
Hu J, Sun Z, Li B, Yang K, Li D (2017) Online user modeling for interactive streaming image classification. In: MMM. Springer, Reykjavik, pp 293–305. https://doi.org/10.1007/978-3-319-51814-5_25
Ristin M, Guillaumin M, Gall J, Gool LV (2014) Incremental learning of NCM forests for large-scale image classification. In: CVPR. IEEE, Columbus, pp 3654–3661. https://doi.org/10.1109/CVPR.2014.467
Wu J, Sheng VS, Zhang J, Li H, Dadakova T, Swisher CL, Cui Z, Zhao P (2020) Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput Surv 53(2):1–35. https://doi.org/10.1145/3379504
de Lima MC, Barioni MCN, Faria ER, Razente HL (2020) Evisclass: a new evaluation method for image data stream classifiers. In: ICMLA. IEEE, Miami, pp 399–406. https://doi.org/10.1109/ICMLA51294.2020.00070
de Lima MC, de Abreu AJS, Faria ER, Barioni MCN (2021) Evaluating the construction of feature descriptors in the performance of the image data stream classification. In: CIARP. Springer, Porto, pp 327–339. https://doi.org/10.1007/978-3-030-93420-0_31
Nguyen H-L, Woon Y-K, Ng W-K (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569. https://doi.org/10.1007/s10115-014-0808-1
Souza VMA, Silva DF, Batista GEAPA, Gama J (2015) Classification of evolving data streams with infinitely delayed labels. In: ICMLA. IEEE, Miami, pp 214–219. https://doi.org/10.1109/ICMLA.2015.174
Zhu X, Zhang P, Lin X, Shi Y (2010) Active learning from stream data using optimal weight classifier ensemble. Syst Man Cybern B Cybern 40(6):1607–1621. https://doi.org/10.1109/TSMCB.2010.2042445
Žliobaitė I, Bifet A, Pfahringer B, Holmes G (2014) Active learning with drifting streaming data. TNNLS 25(1):27–39. https://doi.org/10.1109/TNNLS.2012.2236570
Bifet A, Gavaldà R, Holmes G, Pfahringer B (2018) Machine learning for data streams with practical examples in MOA. MIT Press, Cambridge
Pugliese VU, Costa RD, Hirata CM (2021) Comparative evaluation of the supervised machine learning classification methods and the concept drift detection methods in the financial business problems. In: Filipe J, Śmiałek M, Brodsky A, Hammoudi S (eds) ICEIS. Springer, Online Conference, pp 268–292. https://doi.org/10.1007/978-3-030-75418-1_13
Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Pfahringer B, Holmes G, Hoffmann A (eds) Discovery science. Springer, Canberra, pp 1–15. https://doi.org/10.1007/978-3-642-16184-1_1
Funding
This work has been supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.
Author information
Authors and Affiliations
Contributions
All authors participated in the definition of the split policies and in the design of the experiments. All authors helped to draft the manuscript and also read and approved its final version. YSS prepared the experiments described in Sect. 5.2. MCL was responsible of coding and of executing the experiments.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests.
Consent for publication
The authors consent to the publication of the manuscript in Journal of Intelligent Information Systems.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Mateus C. de Lima: This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.
Rights and permissions
About this article
Cite this article
de Lima, M.C., Souza, Y., Faria, E.R. et al. A comprehensive analysis of the diverse aspects inherent to image data stream classification. Knowl Inf Syst 64, 2215–2238 (2022). https://doi.org/10.1007/s10115-022-01717-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01717-1