ABSTRACT
Given the increasing size and dimension of streaming data made available by for example in industrial sensors or wireless sensor networks (WSNs), it is an important and worthwhile task to monitor not only the data itself but also the relationships between data sources. To solve this task, we present DIMID, an online algorithm to monitor dependencies in high dimensional streaming data. DIMID uses an entropy-based measure that generalizes to non-linear as well as complex functional types of relationships, is non-parametric and can be computed incrementally.
To deal with the streaming, possibly infinite data, DIMID contains a dimensionality reducing projection method and an estimator for entropy that uses the local density of data points. This also allows the algorithm to update the current relationships with new data as it becomes available, instead of recomputing on the complete batch after every update.
Comparisons to three state-of-the-art other algorithms for dependency-monitoring on a variety of time series data sets with linear and non-linear dependencies showed significant (p < 0.01) improvements in the AUC- and F1-measure. We also achieve a reduction in run-time from linear to logarithmic in the number of observed samples.
- J. Beirlant, E. J. Dudewicz, L. Györfi, and E. C. Van der Meulen. Nonparametric entropy estimation: An overview. International Journal of Mathematical and Statistical Sciences, 6(1):17--39, 1997.Google Scholar
- J. Benesty, Y. Huang, and J. Chen. Time delay estimation via minimum entropy. Signal Processing Letters, IEEE, 14(3):157--160, 2007. Google ScholarCross Ref
- H.-P. Bernhard, G. Darbellay, et al. Performance analysis of the mutual information function for nonlinear and linear signal processing. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, volume 3, pages 1297--1300. IEEE, 1999. Google ScholarDigital Library
- P. Bodik, W. Hong, C. Guestrin, S. Madden, M. Paskin, and R. Thibaux. Intel lab data. http://db.csail.mit.edu/labdata/labdata.html, 2004.Google Scholar
- J. Boidol and A. Hapfelmeier. Detecting data stream dependencies on high dimensional data. In The 1st International Conference on Internet of Things and Big Data, IoTBD 2016, pages 375--382. INSTICC, 2016. Google ScholarCross Ref
- S. W. D. Center. The international sunspot number. International Sunspot Number Monthly Bulletin and online catalogue, 2016.Google Scholar
- P. Clifford and I. Cosma. A simple sketching algorithm for entropy estimation over streaming data. In AISTATS, pages 196--206, 2013.Google Scholar
- G. A. Darbellay. An estimator of the mutual information based on a criterion for conditional independence. Computational Statistics & Data Analysis, 32(1):1--17, 1999. Google ScholarDigital Library
- A. Dionisio, R. Menezes, and D. A. Mendes. Mutual information: a measure of dependency for nonlinear time series. Physica A: Statistical Mechanics and its Applications, 344(1):326--329, 2004. Google ScholarCross Ref
- J. Gama and C. Pinto. Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing, pages 662--667. ACM, 2006. Google ScholarDigital Library
- A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation, 101(23):e215--e220, 2000. Google ScholarCross Ref
- W. B. Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189--206):1, 1984.Google Scholar
- B. Kaluža, V. Mirchevska, E. Dovgan, M. Luštrek, and M. Gams. An agent-based approach to care in independent living. In Ambient Intelligence, pages 177--186. Springer, 2010. Google ScholarDigital Library
- F. Keller, E. Müller, and K. Böhm. Estimating mutual information on data streams. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management, page 3. ACM, 2015. Google ScholarDigital Library
- A. Kraskov, H. Stögbauer, and P. Grassberger. Estimating mutual information. Physical review E, 69(6):066138, 2004. Google ScholarCross Ref
- D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. Detecting novel associations in large data sets. Science, 334(6062):1518--1524, 2011. Google ScholarCross Ref
- A. Seliniotaki, G. Tzagkarakis, V. Christofides, and P. Tsakalides. Stream correlation monitoring for uncertainty-aware data processing systems. In Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on, pages 342--347. IEEE, 2014. Google ScholarCross Ref
- The NASDAQ Stock Market. Nasdaq daily quotes. http://www.nasdaq.com/quotes/nasdaq, 2015.Google Scholar
Index Terms
- Fast mutual information computation for dependency-monitoring on data streams
Recommendations
Incremental linear discriminant analysis for classification of data streams
This paper presents a constructive method for deriving an updated discriminant eigenspace for classification when bursts of data that contains new classes is being added to an initial discriminant eigenspace in the form of random chunks. Basically, we ...
Subspace clustering of data streams: new algorithms and effective evaluation measures
Nowadays, most streaming data sources are becoming high dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, in spite of the rich ...
Fast communication: Gabor feature-based face recognition using supervised locality preserving projection
This paper introduces a novel Gabor-based supervised locality preserving projection (GSLPP) method for face recognition. Locality preserving projection (LPP) is a recently proposed method for unsupervised linear dimensionality reduction. LPP seeks to ...
Comments