skip to main content
10.1145/3019612.3019669acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Fast mutual information computation for dependency-monitoring on data streams

Published:03 April 2017Publication History

ABSTRACT

Given the increasing size and dimension of streaming data made available by for example in industrial sensors or wireless sensor networks (WSNs), it is an important and worthwhile task to monitor not only the data itself but also the relationships between data sources. To solve this task, we present DIMID, an online algorithm to monitor dependencies in high dimensional streaming data. DIMID uses an entropy-based measure that generalizes to non-linear as well as complex functional types of relationships, is non-parametric and can be computed incrementally.

To deal with the streaming, possibly infinite data, DIMID contains a dimensionality reducing projection method and an estimator for entropy that uses the local density of data points. This also allows the algorithm to update the current relationships with new data as it becomes available, instead of recomputing on the complete batch after every update.

Comparisons to three state-of-the-art other algorithms for dependency-monitoring on a variety of time series data sets with linear and non-linear dependencies showed significant (p < 0.01) improvements in the AUC- and F1-measure. We also achieve a reduction in run-time from linear to logarithmic in the number of observed samples.

References

  1. J. Beirlant, E. J. Dudewicz, L. Györfi, and E. C. Van der Meulen. Nonparametric entropy estimation: An overview. International Journal of Mathematical and Statistical Sciences, 6(1):17--39, 1997.Google ScholarGoogle Scholar
  2. J. Benesty, Y. Huang, and J. Chen. Time delay estimation via minimum entropy. Signal Processing Letters, IEEE, 14(3):157--160, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  3. H.-P. Bernhard, G. Darbellay, et al. Performance analysis of the mutual information function for nonlinear and linear signal processing. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, volume 3, pages 1297--1300. IEEE, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Bodik, W. Hong, C. Guestrin, S. Madden, M. Paskin, and R. Thibaux. Intel lab data. http://db.csail.mit.edu/labdata/labdata.html, 2004.Google ScholarGoogle Scholar
  5. J. Boidol and A. Hapfelmeier. Detecting data stream dependencies on high dimensional data. In The 1st International Conference on Internet of Things and Big Data, IoTBD 2016, pages 375--382. INSTICC, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  6. S. W. D. Center. The international sunspot number. International Sunspot Number Monthly Bulletin and online catalogue, 2016.Google ScholarGoogle Scholar
  7. P. Clifford and I. Cosma. A simple sketching algorithm for entropy estimation over streaming data. In AISTATS, pages 196--206, 2013.Google ScholarGoogle Scholar
  8. G. A. Darbellay. An estimator of the mutual information based on a criterion for conditional independence. Computational Statistics & Data Analysis, 32(1):1--17, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Dionisio, R. Menezes, and D. A. Mendes. Mutual information: a measure of dependency for nonlinear time series. Physica A: Statistical Mechanics and its Applications, 344(1):326--329, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Gama and C. Pinto. Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing, pages 662--667. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation, 101(23):e215--e220, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  12. W. B. Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189--206):1, 1984.Google ScholarGoogle Scholar
  13. B. Kaluža, V. Mirchevska, E. Dovgan, M. Luštrek, and M. Gams. An agent-based approach to care in independent living. In Ambient Intelligence, pages 177--186. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Keller, E. Müller, and K. Böhm. Estimating mutual information on data streams. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management, page 3. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Kraskov, H. Stögbauer, and P. Grassberger. Estimating mutual information. Physical review E, 69(6):066138, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  16. D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. Detecting novel associations in large data sets. Science, 334(6062):1518--1524, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Seliniotaki, G. Tzagkarakis, V. Christofides, and P. Tsakalides. Stream correlation monitoring for uncertainty-aware data processing systems. In Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on, pages 342--347. IEEE, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  18. The NASDAQ Stock Market. Nasdaq daily quotes. http://www.nasdaq.com/quotes/nasdaq, 2015.Google ScholarGoogle Scholar

Index Terms

  1. Fast mutual information computation for dependency-monitoring on data streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '17: Proceedings of the Symposium on Applied Computing
        April 2017
        2004 pages
        ISBN:9781450344869
        DOI:10.1145/3019612

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 April 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,650of6,669submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader