Abstract
We propose SCENT, an innovative, scalable spectral analysis framework for internet scale monitoring of multirelational social media data, encoded in the form of tensor streams. In particular, a significant challenge is to detect key changes in the social media data, which could reflect important events in the real world, sufficiently quickly. Social media data have three challenging characteristics. First, data sizes are enormous; recent technological advances allow hundreds of millions of users to create and share content within online social networks. Second, social data are often multifaceted (i.e., have many dimensions of potential interest, from the textual content to user metadata). Finally, the data is dynamic; structural changes can occur at multiple time scales and be localized to a subset of users. Consequently, a framework for extracting useful information from social media data needs to scale with data volume, and also with the number and diversity of the facets of the data. In SCENT, we focus on the computational cost of structural change detection in tensor streams. We extend compressed sensing (CS) to tensor data. We show that, through the use of randomized tensor ensembles, SCENT is able to encode the observed tensor streams in the form of compact descriptors. We show that the descriptors allow very fast detection of significant spectral changes in the tensor stream, which also reduce data collection, storage, and processing costs. Experiments over synthetic and real data show that SCENT is faster (17.7x--159x for change detection) and more accurate (above 0.9 F-score) than baseline methods.
- Aggarwal, C. 2003. A framework for diagnosing changes in evolving data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 586. Google ScholarDigital Library
- Aggarwal, C. and Yu, P. 2005. Online analysis of community evolution in data streams. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 56.Google Scholar
- Aggarwal, C., Zie, Y., and Yu, P. 2010. On clustering graph streams. In Proceedings of the SIAM International Conference on Data Mining.Google Scholar
- Allan, J., Papka, R., and Lavrenko, V. 1998. On-line new event detection and tracking. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 37--45. Google ScholarDigital Library
- Bajwa, W., Haupt, J., Sayeed, A., and Nowak, R. 2006. Compressive wireless sensing. In Proceedings of the 5th International Conference on Information Processing in Sensor Networks. ACM, 142. Google ScholarDigital Library
- Balasubramanyan, R., Lin, F., Cohen, W., Hurst, M., and Smith, N. 2009. From episodes to sagas: Understanding the news by identifying temporally related story sequences. In Proceeedings of the 3rd International AAAI Conference on Weblogs and Social Media (Poster).Google Scholar
- Barabási, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., and Vicsek, T. 2002. Evolution of the social network of scientific collaborations. Physica A: Stat Mechan Appl. 311, 3--4, 590--614.Google ScholarCross Ref
- Baraniuk, R., Davenport, M., Devore, R., and Wakin, M. 2008. A simple proof of the restricted isometry property for random matrices. Construct. Approxim. 28, 3, 253--263. Becker, H., Naaman, M., and Gravano, L. 2009. Event identification in social media. In Proceedings of the ACM SIGMOD International Conference on Management of Data Workshop on the Web and Databases (WebDB'09).Google ScholarCross Ref
- Blei, D. and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the International Conference on Machine Learning. ACM, 120. Google ScholarDigital Library
- Blei, D., Ng, A., and Jordan, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Resear. 3, 993--1022. Google ScholarDigital Library
- Brants, T., Chen, F., and Farahat, A. 2003. A system for new event detection. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 330--337. Google ScholarDigital Library
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarDigital Library
- Candes, E. and Romberg, J. 2007. Sparsity and incoherence in compressive sampling. Inv. Probl. 23, 969--985.Google ScholarCross Ref
- Candes, E. and Tao, T. 2006. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52, 12, 5406--5425. Google ScholarDigital Library
- Candes, E. and Wakin, M. 2008. People hearing without listening: An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 2, 21--30.Google ScholarCross Ref
- Carroll, J. and Chang, J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of Eckart-Young decomposition. Psychometrika 35, 3, 283--319.Google ScholarCross Ref
- Chen, K. and Liu, L. 2009. He-tree: a framework for detecting changes in clustering structure for categorical data streams. VLDB J. 18, 6, 1241--1260. Google ScholarDigital Library
- Chi, Y., Tseng, B., and Tatemura, J. 2006. Eigen-trend: Trend analysis in the blogosphere based on singular value decompositions. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, 68--77. Google ScholarDigital Library
- Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., and Raghavan, P. 2009. On compressing social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 219--228. Google ScholarDigital Library
- Cormode, G. and Hadjieleftheriou, M. 2009. Finding the frequent items in streams of data. Comm. ACM 52, 10, 97--105. Google ScholarDigital Library
- Dasgupta, D. and Forrest, S. 1996. Novelty detection in time series data using ideas from immunology. In Proceedings of the International Conference. on Intelligent Systems.Google Scholar
- Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41, 6, 391--407.Google ScholarCross Ref
- Domingos, P. and Hulten, G. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 80. Google ScholarDigital Library
- Drineas, P. and Mahoney, M. 2007. A randomized algorithm for a tensor-based generalization of the singular value decomposition. Linear Alg. Appl. 420, 2--3, 553--571.Google ScholarCross Ref
- Golub, G. and Van Loan, C. 1996. Matrix Computation. Johns Hopkins University Press.Google Scholar
- Harshman, R. 1970. Foundations of the parafac procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16, 1, 84.Google Scholar
- Haupt, J., Bajwa, W., Rabbat, M., and Nowak, R. 2008. Compressed sensing for networked data. IEEE Signal Process. Mag. 25, 2, 92--101.Google ScholarCross Ref
- Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 50--57. Google ScholarDigital Library
- Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarDigital Library
- Kolda, T. and Bader, B. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3, 455--500. Google ScholarDigital Library
- Kolda, T. G. and Sun, J. 2008. Scalable tensor decompositions for multi-aspect data mining. In Proceedings of the IEEE International Conference on Data Mining. 363--372. Google ScholarDigital Library
- Kumaran, G. and Allan, J. 2004. Text classification and named entities for new event detection.In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 304. Google ScholarDigital Library
- Leskovec, J., Backstrom, L., and Kleinberg, J. 2009. Meme-tracking and the dynamics of the news cycle. InProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497--506. Google ScholarDigital Library
- Leskovec, J. and Faloutsos, C. 2006. Sampling from large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 631--636. Google ScholarDigital Library
- Leskovec, J., Kleinberg, J., and Faloutsos, C. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. L. 2008. Facenet: A framework for analyzing communities and their evolutions in dynamics networks. In Proceedings of the International World Wide Web Conference. ACM Press. Google ScholarDigital Library
- Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009. Metafac: Community discovery via relational hypergraph factorization. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Rusmevichientong, P., Pennock, D., Lawrence, S., and Giles, C. 2001. Methods for sampling pages uniformly from the world wide web. In Proceedings of the AAAI Fall Symposium on Using Uncertainty within Computation. 121--128.Google Scholar
- Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8, 888--905. Google ScholarDigital Library
- Sun, J., Tao, D., and Faloutsos, C. 2006. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 374--383. Google ScholarDigital Library
- Tucker, L. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3, 279--311.Google ScholarCross Ref
- Wang, H., Fan, W., Yu, P., and Han, J. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 226--235. Google ScholarDigital Library
- Wang, X. and McCallum, A. 2006. Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 433. Google ScholarDigital Library
- Yang, Y., Pierce, T., and Carbonell, J. 1998. A study of retrospective and on-line event detection. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 28--36. Google ScholarDigital Library
Index Terms
- SCENT: Scalable compressed monitoring of evolving multirelational social networks
Recommendations
The future of online social networks (OSN)
Use of media content promotes the social usage patterns between the connected users.OSN cultivates the growing trend of video sharing more distinctly than photo content.Online videos alone domain the diffusion patterns of interactive activities on ...
Understanding user behavior in a local social media platform by social network analysis
MindTrek '11: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media EnvironmentsCharacterizing user behavior by social network analysis in social media has been an active research domain for a long time. However, much previous research has focused on the large-scale global social media such as Facebook, Wikipedia and Twitter. ...
What's different about social media networks? a framework and research agenda
In recent years, we have witnessed the rapid proliferation and widespread adoption of a new class of information technologies, commonly known as social media. Researchers often rely on social network analysis (SNA) when attempting to understand these ...
Comments