Abstract:
We present a method for clustering data streams incrementally, designed to discover all valid density peaks in a single pass, in a non-parametric fashion. It detects emer...Show MoreMetadata
Abstract:
We present a method for clustering data streams incrementally, designed to discover all valid density peaks in a single pass, in a non-parametric fashion. It detects emerging clusters along the stream by dynamically locating kernels in the most promising areas and performing a Stochastic Mean Shift procedure to find clustering centers. We present a density estimation approach for dynamic initialization, considering every sub-stream that follows `emerging data' as a sample set and applying Hypothesis Testing (p-value approach) to estimate its local density. The sub-stream size and the p-value are determined in a way that provides provable accuracy guarantee. We compare our method with the state-of-the-art, on realistic and complex datasets. We show that it outperforms not only stream algorithms but also their more complex, non-stream foundational paradigms.
Date of Conference: 20-24 August 2018
Date Added to IEEE Xplore: 29 November 2018
ISBN Information:
Print on Demand(PoD) ISSN: 1051-4651