Abstract
Recently, learning and mining from data streams with incremental feature spaces have attracted extensive attention, where data may dynamically expand over time in both volume and feature dimensions. Existing approaches usually assume that the incoming instances can always receive true labels. However, in many real-world applications, e.g., environment monitoring, acquiring the true labels is costly due to the need of human effort in annotating the data. To tackle this problem, we propose a novel incremental Feature spaces Learning with Label Scarcity (FLLS) algorithm, together with its two variants. When data streams arrive with augmented features, we first leverage the margin-based online active learning to select valuable instances to be labeled and thus build superior predictive models with minimal supervision. After receiving the labels, we combine the online passive-aggressive update rule and margin-maximum principle to jointly update the dynamic classifier in the shared and augmented feature space. Finally, we use the projected truncation technique to build a sparse but efficient model. We theoretically analyze the error bounds of FLLS and its two variants. Also, we conduct experiments on synthetic data and real-world applications to further validate the effectiveness of our proposed algorithms.
- [1] . 2020. Learning with feature and distribution evolvable streams. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119, PMLR, 11317–11327.Google Scholar
- [2] . 2016. Online learning from trapezoidal data streams. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2709–2723.Google ScholarDigital Library
- [3] . 2017. Learning with feature evolvable streams. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1417–1427.Google Scholar
- [4] . 2018. One-pass learning with incremental and decremental features. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11 (2018), 2776–2792.Google ScholarDigital Library
- [5] . 2019. Online feature selection with capricious streaming features: A general framework. In Proceedings of the IEEE International Conference on Big Data. IEEE, 683–688.Google ScholarCross Ref
- [6] . 2021. Online learning: A comprehensive survey. Neurocomputing.Google Scholar
- [7] . 2015. An efficient online active learning algorithm for binary classification. Pattern Recognition Letters 68, P1 (2015), 22–26.Google ScholarDigital Library
- [8] . 2016. Online active learning of decision trees with evidential data. Pattern Recognition 52, C (2016), 33–45.Google ScholarDigital Library
- [9] . 2018. Second-order online active learning and its applications. IEEE Transactions on Knowledge & Data Engineering 30, 7 (2018), 1338–1351.Google ScholarCross Ref
- [10] . 2016. Incremental and decremental max-flow for online semi-supervised learning. IEEE Transactions on Knowledge and Data Engineering 28, 8 (2016), 2115–2127.Google ScholarDigital Library
- [11] . 2020. A unified framework for sparse online learning. ACM Transactions on Knowledge Discovery from Data 14, 5, Article
59 (Aug. 2020), 20 pages.Google ScholarDigital Library - [12] . 2019. Prediction with unpredictable feature evolution. IEEE Transactions on Neural Networks and Learning Systems (2021), 1–10.
DOI: Google ScholarCross Ref - [13] . 2021. Storage fit learning with feature evolvable streams. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 7729–7736. https://ojs.aaai.org/index.php/AAAI/article/view/16944.Google Scholar
- [14] . 2020. Feature interaction for streaming feature selection. IEEE Transactions on Neural Networks and Learning Systems 32, 10 (2020), 4691–4702.Google Scholar
- [15] . 2018. A survey on online feature selection with streaming features. Frontiers of Computer Science 12, 3 (2018), 479–493.Google ScholarDigital Library
- [16] . 2013. Online feature selection with streaming features. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 5 (2013), 1178–1192.Google ScholarDigital Library
- [17] . 2019. Online learning from data streams with varying feature spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, AAAI Press, 3232–3239.Google ScholarDigital Library
- [18] . 2020. Toward mining capricious data streams: A generative approach. IEEE Transactions on Neural Networks and Learning Systems PP, 99 (2020), 1–13.Google ScholarCross Ref
- [19] . 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7, 85 (2006), 2399–2434.Google Scholar
- [20] . 2008. Online manifold regularization: A new learning setting and empirical study. In Proceedings of the 2008th European Conference on Machine Learning and Knowledge Discovery in Databases, Vol. 5211, 393–407.Google ScholarCross Ref
- [21] . 2011. Manifold coarse graining for online semi-supervised learning. In Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases. 391–406.Google ScholarDigital Library
- [22] . 2018. Learning dynamics of decision boundaries without additional labeled data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1627–1636.Google ScholarDigital Library
- [23] . 2003. Learning probabilistic linear-threshold classifiers via selective sampling. In Learning Theory and Kernel Machines. B. Scholkopf and M. K. Warmuth (Eds.), Springer, 373–387.Google ScholarCross Ref
- [24] . 2006. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research 7, 44 (2006), 1205–1230.Google Scholar
- [25] . 2013. Cost-sensitive online active learning with application to malicious URL detection. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 919–927.Google ScholarDigital Library
- [26] . 2014. Online passive aggressive active learning and its applications. In Proceedings of the 6th Asian Conference on Machine Learning, Vol. 39, JMLR.org.Google Scholar
- [27] . 2004. Online choice of active learning algorithms. Journal of Machine Learning Research 5, Mar (2004), 255–291.Google ScholarDigital Library
- [28] . 2018. Online active learning with expert advice. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 58:1–58:22.Google ScholarDigital Library
- [29] . 2015. Active crowdsourcing for annotation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Vol. 2, IEEE Computer Society, 1–8.Google ScholarCross Ref
- [30] . 1992. Query by committee. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory. ACM, 287–294.Google ScholarDigital Library
- [31] . 2021. Online learning in variable feature spaces under incomplete supervision. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI Press, 4106–4114.Google Scholar
- [32] . 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 19 (2006), 551–585.Google Scholar
- [33] . 2004. Convex Optimization. Convex Optimization.Google ScholarCross Ref
- [34] . 2016. Online passive-aggressive active learning. Machine Learning 103, 2 (2016), 141–183.Google ScholarDigital Library
- [35] . 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 27:1–27:27. Retrieved from http://www.csie.ntu.edu.tw/cjlin/libsvm.Google Scholar
- [36] . 2006. Prediction, Learning, and Games. Cambridge University Press.Google ScholarCross Ref
Index Terms
- Incremental Feature Spaces Learning with Label Scarcity
Recommendations
Transductive Multilabel Learning via Label Set Propagation
The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble
In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed ...
Learning with feature network and label network simultaneously
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial IntelligenceFor many supervised learning problems, limited training samples and incomplete labels are two difficult challenges, which usually lead to degenerated performance on label prediction. To improve the generalization performance, in this paper, we propose ...
Comments