Abstract
Multi-label classification is a methodology that tries to solve classification problems where multiple classes are associated with each data example. Data streams pose new challenges to this methodology caused by the massive amounts of structured data production. In fact, most of the existent batch mode methods may not support this condition. Therefore, this paper proposes two multi-label classification methods based on rule and ensembles learning from continuous flow of data. These methods are derived from a multi-target regression algorithm. The main contribution of this work is the rule specialization for subsets of class labels, instead of the usual local (individual models for each output) or a global (one model for all outputs) methods. Prequential evaluation was conducted where global, local and subset operation modes were compared against other online classifiers found in the literature. Six real-world data sets were used. The evaluation demonstrated that the subset specialization presents competitive performance, when compared to local and global approaches and online classifiers found in the literature.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal, C.C.: Data Streams: Models and Algorithms (Advances in Database Systems). Springer, New York (2006)
Almeida, E., Ferreira, C., Gama, J.: Adaptive model rules from data streams. In: ECML 2013—European Conference on Machine Learning (2013)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference, KDD ’09, pp. 139–148. ACM, New York (2009)
Bifet, A., Kirkby, R.: Data stream mining: a practical approach. The University of Waikato, Tech. rep. (2009)
Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’01, pp. 42–53. Springer, London (2001)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Duarte, J., Gama, J.: Multi-target regression from high-speed data streams with adaptive model rules. In: IEEE Conference on Data Science and Advanced Analytics (2015)
Fürnkranz, J., Gamberger, D., Lavra, N.: Foundations of Rule Learning. Springer, New York (2012)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2010)
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques, 1st edn. Springer, New York (2016)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23(1), 128–168 (2011)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)
Kong, X., Yu, P.: An ensemble-based approach to fast classification of multi-label data streams, pp. 95–104 (2011)
Loza Mencía, E., Fürnkranz, J.: Pairwise learning of multilabel classifications with perceptrons. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, pp. 2899–2906 (2008)
Madjarov, G., Kocev, D., Gjorgjevikj, D., Deroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)
Osojnik, A., Panov, P., Dzeroski, S.: Multi-label classification via multi-target regression on data streams. Discov. Sci. (DS) 2015, 170–185 (2015)
Osojnik, A., Panov, P., DźEroski, S.: Multi-label classification via multi-target regression on data streams. Mach. Learn. 106(6), 745–770 (2017). https://doi.org/10.1007/s10994-016-5613-5
Oza, N.C., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics, pp. 105–112. Morgan Kaufmann (2001)
Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)
Read, J., Bifet, A., Holmes, G., Pfahringer, B.: Scalable and efficient multi-label classification for evolving data streams. Mach. Learn. 88(1–2), 243–272 (2012)
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD ’09, pp. 254–269. Springer, Berlin (2009)
Acknowledgements
This work is financed by the ERDF European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sousa, R., Gama, J. Multi-label classification from high-speed data streams with adaptive model rules and random rules. Prog Artif Intell 7, 177–187 (2018). https://doi.org/10.1007/s13748-018-0142-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-018-0142-z