skip to main content
research-article

SCENT: Scalable compressed monitoring of evolving multirelational social networks

Published:04 November 2011Publication History
Skip Abstract Section

Abstract

We propose SCENT, an innovative, scalable spectral analysis framework for internet scale monitoring of multirelational social media data, encoded in the form of tensor streams. In particular, a significant challenge is to detect key changes in the social media data, which could reflect important events in the real world, sufficiently quickly. Social media data have three challenging characteristics. First, data sizes are enormous; recent technological advances allow hundreds of millions of users to create and share content within online social networks. Second, social data are often multifaceted (i.e., have many dimensions of potential interest, from the textual content to user metadata). Finally, the data is dynamic; structural changes can occur at multiple time scales and be localized to a subset of users. Consequently, a framework for extracting useful information from social media data needs to scale with data volume, and also with the number and diversity of the facets of the data. In SCENT, we focus on the computational cost of structural change detection in tensor streams. We extend compressed sensing (CS) to tensor data. We show that, through the use of randomized tensor ensembles, SCENT is able to encode the observed tensor streams in the form of compact descriptors. We show that the descriptors allow very fast detection of significant spectral changes in the tensor stream, which also reduce data collection, storage, and processing costs. Experiments over synthetic and real data show that SCENT is faster (17.7x--159x for change detection) and more accurate (above 0.9 F-score) than baseline methods.

References

  1. Aggarwal, C. 2003. A framework for diagnosing changes in evolving data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. and Yu, P. 2005. Online analysis of community evolution in data streams. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 56.Google ScholarGoogle Scholar
  3. Aggarwal, C., Zie, Y., and Yu, P. 2010. On clustering graph streams. In Proceedings of the SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  4. Allan, J., Papka, R., and Lavrenko, V. 1998. On-line new event detection and tracking. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 37--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bajwa, W., Haupt, J., Sayeed, A., and Nowak, R. 2006. Compressive wireless sensing. In Proceedings of the 5th International Conference on Information Processing in Sensor Networks. ACM, 142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Balasubramanyan, R., Lin, F., Cohen, W., Hurst, M., and Smith, N. 2009. From episodes to sagas: Understanding the news by identifying temporally related story sequences. In Proceeedings of the 3rd International AAAI Conference on Weblogs and Social Media (Poster).Google ScholarGoogle Scholar
  7. Barabási, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., and Vicsek, T. 2002. Evolution of the social network of scientific collaborations. Physica A: Stat Mechan Appl. 311, 3--4, 590--614.Google ScholarGoogle ScholarCross RefCross Ref
  8. Baraniuk, R., Davenport, M., Devore, R., and Wakin, M. 2008. A simple proof of the restricted isometry property for random matrices. Construct. Approxim. 28, 3, 253--263. Becker, H., Naaman, M., and Gravano, L. 2009. Event identification in social media. In Proceedings of the ACM SIGMOD International Conference on Management of Data Workshop on the Web and Databases (WebDB'09).Google ScholarGoogle ScholarCross RefCross Ref
  9. Blei, D. and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the International Conference on Machine Learning. ACM, 120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Blei, D., Ng, A., and Jordan, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Resear. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Brants, T., Chen, F., and Farahat, A. 2003. A system for new event detection. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 330--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Candes, E. and Romberg, J. 2007. Sparsity and incoherence in compressive sampling. Inv. Probl. 23, 969--985.Google ScholarGoogle ScholarCross RefCross Ref
  14. Candes, E. and Tao, T. 2006. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52, 12, 5406--5425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Candes, E. and Wakin, M. 2008. People hearing without listening: An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 2, 21--30.Google ScholarGoogle ScholarCross RefCross Ref
  16. Carroll, J. and Chang, J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of Eckart-Young decomposition. Psychometrika 35, 3, 283--319.Google ScholarGoogle ScholarCross RefCross Ref
  17. Chen, K. and Liu, L. 2009. He-tree: a framework for detecting changes in clustering structure for categorical data streams. VLDB J. 18, 6, 1241--1260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chi, Y., Tseng, B., and Tatemura, J. 2006. Eigen-trend: Trend analysis in the blogosphere based on singular value decompositions. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, 68--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., and Raghavan, P. 2009. On compressing social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cormode, G. and Hadjieleftheriou, M. 2009. Finding the frequent items in streams of data. Comm. ACM 52, 10, 97--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dasgupta, D. and Forrest, S. 1996. Novelty detection in time series data using ideas from immunology. In Proceedings of the International Conference. on Intelligent Systems.Google ScholarGoogle Scholar
  22. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41, 6, 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  23. Domingos, P. and Hulten, G. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Drineas, P. and Mahoney, M. 2007. A randomized algorithm for a tensor-based generalization of the singular value decomposition. Linear Alg. Appl. 420, 2--3, 553--571.Google ScholarGoogle ScholarCross RefCross Ref
  25. Golub, G. and Van Loan, C. 1996. Matrix Computation. Johns Hopkins University Press.Google ScholarGoogle Scholar
  26. Harshman, R. 1970. Foundations of the parafac procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16, 1, 84.Google ScholarGoogle Scholar
  27. Haupt, J., Bajwa, W., Rabbat, M., and Nowak, R. 2008. Compressed sensing for networked data. IEEE Signal Process. Mag. 25, 2, 92--101.Google ScholarGoogle ScholarCross RefCross Ref
  28. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kolda, T. and Bader, B. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3, 455--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kolda, T. G. and Sun, J. 2008. Scalable tensor decompositions for multi-aspect data mining. In Proceedings of the IEEE International Conference on Data Mining. 363--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kumaran, G. and Allan, J. 2004. Text classification and named entities for new event detection.In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Leskovec, J., Backstrom, L., and Kleinberg, J. 2009. Meme-tracking and the dynamics of the news cycle. InProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Leskovec, J. and Faloutsos, C. 2006. Sampling from large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 631--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Leskovec, J., Kleinberg, J., and Faloutsos, C. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2008. Facenet: A framework for analyzing communities and their evolutions in dynamics networks. In Proceedings of the International World Wide Web Conference. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009. Metafac: Community discovery via relational hypergraph factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rusmevichientong, P., Pennock, D., Lawrence, S., and Giles, C. 2001. Methods for sampling pages uniformly from the world wide web. In Proceedings of the AAAI Fall Symposium on Using Uncertainty within Computation. 121--128.Google ScholarGoogle Scholar
  39. Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sun, J., Tao, D., and Faloutsos, C. 2006. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 374--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tucker, L. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3, 279--311.Google ScholarGoogle ScholarCross RefCross Ref
  42. Wang, H., Fan, W., Yu, P., and Han, J. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 226--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wang, X. and McCallum, A. 2006. Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yang, Y., Pierce, T., and Carbonell, J. 1998. A study of retrospective and on-line event detection. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SCENT: Scalable compressed monitoring of evolving multirelational social networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 7S, Issue 1
        Special section on ACM multimedia 2010 best paper candidates, and issue on social media
        October 2011
        246 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/2037676
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 November 2011
        • Accepted: 1 July 2011
        • Revised: 1 March 2011
        • Received: 1 September 2010
        Published in tomm Volume 7S, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader