Abstract
This paper considers the problem of monitoring vehicle data streams in a resource-constrained environment. It particularly focuses on a monitoring task that requires frequent computation of correlation matrices using lightweight on-board computing devices. It motivates this problem in the context of the MineFleet Real-Time system and offers a randomized algorithm for fast monitoring of correlation (FMC), inner product, and Euclidean distance matrices among others. Unlike the existing approaches that compute all the entries of these matrices from a data set, the proposed technique works using a divide-and-conquer approach. This paper presents a probabilistic test for quickly detecting whether or not a subset of coefficients contains a significant one with a magnitude greater than a user given threshold. This test is used for quickly identifying the portions of the space that contain significant coefficients. The proposed algorithm is particularly suitable for monitoring correlation and related matrices computed from continuous data streams.
Similar content being viewed by others
References
Alon, N., Babai, L. and Itai. A., “A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem”, Journ. of Algorithms, 7, 4, pp. 567–583, 1986.
Alon, N., Goldreich, O., Hastad, J. and Peralta, R., “Simple Constructions of Almost K-wise Independent Random Variables”, in IEEE Symposium on Foundations of Computer Science, pp. 544–553, 1990.
Alon, N., Goldreich, O. and Mansour, Y., “Almost K-wise Independence versus K-wise Independence,” Inf. Process. Lett., 88 ,3, pp. 107–110, 2003.
Alon, N., Matias, Y. and Szegedy, M., “The Space Complexity of Approximating the Frequency Moments,” in Proc. of the ACM Symposium on Theory of Computing, pp. 20–29, 1996.
Alqallaf, F., Konis, K., Martin, R. and Zamar, R., “Scalable Robust Covariance and Correlation Estimates for Data Mining,” in Proc. of the eighth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 14–23, ACM Press, 2002.
Chien, S., Debban, T., Yen, C., Sherwood, R., Castano, R., Cichy, B., Davies, A., Burl, M., Fukunaga, A., Greeley, R., Doggett, T., Williams, K., Baker, V. and Dohm, J., “Revolutionary Deep Space Science Missions Enabled by Onboard Autonomy,” International Symposium on Artificial Intelligence, Robotics, and Automation in Space (i-SAIRAS), 2003.
Cormode, G. and Muthukrishnan, S., “Estimating Dominance Norms of Multiple Data Streams,” Technical Report, DIMACS TR 2002–35, DIMACS, 2002.
Cormode, G. and Muthukrishnan, S., “What is New: Finding Significant Differences in Network Data Streams,” in Proc. of the INFOCOM04, 2004.
Falk, R. and Well, A., “Many Faces of the Correlation Coefficient,” Journ. of Statistics Education, 5, 3, 1997.
Feigenbaum, J., Kannan, S., Strauss, M. and Viswanathan, M., “An Approximate l 1 - difference Algorithm for Massive Data Streams,” in IEEE Symposium on Foundations of Computer Science, pp. 501–511, 1999.
Ganguly, S., “Estimating Frequency Moments of Data Streams Using Random Linear Combinations,” in APPROX-RANDOM, pp. 369–380, 2004.
Hall, D.L. and Culler, D., Handbook of Multi-Sensor Data Fusion, 2001.
Hotelling, H. “Relation between Two Sets of Variants,” Biometrika, 28, pp. 322–377, 1936.
Kargupta, H., Bhargava, R., Liu, K., Powers, M., Blair, P., Bushra, S., Dull, J., Sarkar, K., Klein, M., Vasa, M. and Handy, D., “Vedas: A Mobile and Distributed Data Stream Mining System for Real-time Vehicle Monitoring,” in Proc. of the SIAM International Data Mining Conference, Orlando, 2004.
Kargupta, H. and Sivakumar, K., “Existential Pleasures of Distributed Data Mining,” Next Generation Data Mining: Future Directions and Challenges, MIT/AAAI Press, 2004.
Luby, M., “A Simple Parallel Algorithm for the Maximal Independent Set Problem,” in STOC ’85: Proc. of the Seventeenth Annual ACM Symposium on Theory of Computing, pp. 1–10, ACM Press, 1985.
Motwani, R. and Raghavan, P., Randomized Algorithms, Cambridge University Press, 1995.
Pottie, G. and Kaiser, W., “Embedding the Internet: Wireless Integrated Network Sensors,” Communications of the ACM, 43, 5, pp. 51–58, 2000.
Srivastava, A.N. and Stroeve, J., “Onboard Detection of Snow, Ice, Clouds and Other Geophysical Processes Using Kernel Methods,” in Proc. of the ICML 2003 Workshop on Machine Learning Technologies for Autonomous Space Sciences, 2003.
Weldon, K.L., “A Simplified Introduction to Correlation and Regression,” Journ. of Statistics Education, 8, 3, 2000.
Zilberstein, S., “Using Anytime Algorithms in Intelligent Systems,” AI Magazine, 17, 3, pp. 73–83, 1996.
Zue, Y. and Shasha, D., “Statistical Monitoring of Thousands of Data Streams in Real Time,” in Proc. of the 28th VLDB Conference, Hong Kong, China, 2002.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Kargupta, H., Puttagunta, V., Klein, M. et al. On-board Vehicle Data Stream Monitoring Using MineFleet and Fast Resource Constrained Monitoring of Correlation Matrices. New Gener. Comput. 25, 5–32 (2006). https://doi.org/10.1007/s00354-006-0002-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-006-0002-4