Abstract:
In the literature, it is often assumed that given a data stream, every instance is provided with its correct label after the prediction for performance evaluation (i.e., ...Show MoreMetadata
Abstract:
In the literature, it is often assumed that given a data stream, every instance is provided with its correct label after the prediction for performance evaluation (i.e., the null latency scenario). However, in several real applications, this assumption is invalid. For example, in the problem of predicting if it will rain tomorrow, the actual label will only be available the day after tomorrow, i.e., the actual label will be made available with an unavoidable delay. This scenario is called intermediate latency. Unfortunately, learning with intermediate latency is little investigated in the literature. Furthermore, the data stream can be non-stationary, i.e., their data distribution can change over time. This phenomenon is called concept drift. Therefore, a classification model in a data stream with concept drift will need a set of recent instances with their respective labels to adapt to the new concept. However, in an intermediate latency scenario, the delayed instance labels may not be timely for concept drift detection. Thus, the classification model will not be capable of adapting immediately, implying a reduction of performance. In this paper, we propose a framework that can improve the performance of a classification model in a data stream with intermediate latency through a domain adaptation approach called importance weighting. The framework is called NIW-DSIL (Naive Importance Weighting for Data Stream with Intermediate Latency). The experiments showed that our approach is promising in dealing with this scenario. For the real datasets, the NIW - DSIL got better results in 6 of 8 intermediate latency scenarios.
Date of Conference: 05-07 December 2021
Date Added to IEEE Xplore: 24 January 2022
ISBN Information: