skip to main content
research-article

Extraction, characterization and utility of prototypical communication groups in the blogosphere

Published: 27 December 2010 Publication History

Abstract

This article analyzes communication within a set of individuals to extract the representative prototypical groups and provides a novel framework to establish the utility of such groups. Corporations may want to identify representative groups (which are indicative of the overall communication set) because it is easier to track the prototypical groups rather than the entire set. This can be useful for advertising, identifying “hot” spots of resource consumption as well as in mining representative moods or temperature of a community. Our framework has three parts: extraction, characterization, and utility of prototypical groups. First, we extract groups by developing features representing communication dynamics of the individuals. Second, to characterize the overall communication set, we identify a subset of groups within the community as the prototypical groups. Third, we justify the utility of these prototypical groups by using them as predictors of related external phenomena; specifically, stock market movement of technology companies and political polls of Presidential candidates in the 2008 U.S. elections.
We have conducted extensive experiments on two popular blogs, Engadget and Huffington Post. We observe that the prototypical groups can predict stock market movement/political polls satisfactorily with mean error rate of 20.32%. Further, our method outperforms baseline methods based on alternative group extraction and prototypical group identification methods. We evaluate the quality of the extracted groups based on their conductance and coverage measures and develop metrics: predictivity and resilience to evaluate their ability to predict a related external time-series variable (stock market movement/political polls). This implies that communication dynamics of individuals are essential in extracting groups in a community, and the prototypical groups extracted by our method are meaningful in characterizing the overall communication sets.

References

[1]
Adar, E., Weld, D. S., Bershad, B. N., and Gribble, S. S. 2007. Why we search: Visualizing and predicting user behavior. In Proceedings of the 16th International Conference on World Wide Web. ACM, 161--170.
[2]
Agrawal, R., Rajagopalan, S., Srikant, R., and Xu, Y. 2003. Mining newsgroups using networks arising from social behavior. In Proceedings of the 12th International Conference on World Wide Web. 529--535.
[3]
Almeida, R. B. and Almeida, V. A. F. 2004. A community-aware search engine. In Proceedings of the 13th International Conference on World Wide Web. ACM, New York, 413-421.
[4]
Antweiler, W. and Frank, M. Z. 2004. Is all that talk just noise? The information content of internet stock message boards. J. Finan. 59, 3, 1259--1294.
[5]
Balfe, E. and Smyth, B. 2004. Query mining for community based web search. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 594-598.
[6]
Barabasi, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., and Vicsek, T. 2002. Evolution of the social network of scientific collaborations. Physica A 311, 3, 590--614.
[7]
Bird, C., Gourley, A., Devanbu, P., Gertz, M., and Swaminathan, A. 2006. Mining email social networks. In Proceedings of the International Workshop on Mining Software Repositories. 137--143.
[8]
Blog Rankings. 2010. Blog rankings. http://www.ebizmba.com/articles/blogs
[9]
Bollobás, B. 1998. Modern Graph Theory. Springer, New York.
[10]
Borgs, C., Chayes, J., Mahdian, M., and Saberi, A. 2004. Exploring the community structure of newsgroups. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 783--787.
[11]
Boydell, O. N. and Smyth, B. 2006. Capturing community search expertise for personalized web search using snippet-indexes. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, 277--286.
[12]
Chen, H.-C. J., Magdon-Ismail, M., Goldberg, M., and Wallace, W. A. 2007. Inferring agent dynamics from social communication network. In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis. ACM, 36--45.
[13]
Chi, Y., Zhu, S., Song, X., Tatemura, J., and Tseng, B. L. 2007. Structural and temporal analysis of the blogosphere through community factorization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 163--172.
[14]
Chin, A. and Chignell, M. 2007. Identifying active subgroups in online communities. In Proceedings of the Conference of the Center for Advanced Studies On Collaborative Research. ACM, 280--283.
[15]
Choudhury, M. D., Sundaram, H., John, A., and Seligmann, D. D. 2008a. Dynamic prediction of communication flow using social context. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia.
[16]
Choudhury, M. D., Sundaram, H., John, A., and Seligmann, D. D. 2008b. Multi-Scale characterization of social network dynamics in the blogosphere. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 1515--1516.
[17]
Choudhury, M. D., Sundaram, H., John, A., and Seligmann, D. D. 2008c. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia. ACM, 55--60.
[18]
Choudhury, M. D., Sundaram, H., John, A., and Seligmann, D. D. 2009a. What makes conversations interesting?:Themes, participants and consequences of conversations in online social media. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, 331--340.
[19]
Choudhury, M. D., Sundaram, H., John, A., and Seligmann, D. D. 2009b. Social synchrony: Predicting mimicry of user actions in online social media. In Proceedings of the International Conference on Computational Science and Engineering. IEEE Computer Society, Los Alamitos, CA, 151--158.
[20]
Choudhury, T., Clarkson, B., Basu, S., and Pentland, A. 2003. Learning communities: Connectivity and dynamics of interacting agents. In Proceedings of the International Joint Conference on Neural Networks. 2797--2802.
[21]
Coyle, M. and Smyth, B. 2007. On the community-based explanation of search results. In Proceedings of the 12th International Conference on Intelligent User Interfaces. ACM, 282--285.
[22]
D'amore, R. 2004. Expertise community detection. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 498--499.
[23]
Du, N., Wu, B., Pei, X., Wang, B., and Xu, L. 2007. Community detection in large-scale social networks. In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis. ACM, 16--25.
[24]
Duda, R. O., Hart, P. E. and Stork., D. G. 2001. Pattern Classification. John Wiley & Sons, New York.
[25]
Engadget. 2010. Engadget homepage. http://www.engadget.com/
[26]
Falkowski, T., Bartelheimer, J., and Spiliopoulou, M. 2006. Mining and visualizing the evolution of subgroups in social networks. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 52--58.
[27]
Fisher, D. and Dourish, P. 2004. Social and temporal structures in everyday collaboration. In Proceedings of the SIGCHI Conference On Human Factors In Computing Systems. ACM, 551--558.
[28]
Gómez, V., Kaltenbrunner, A., and López, V. 2008. Statistical analysis of the social network and discussion threads in slashdot. In Proceedings of the 17th International Conference on World Wide Web. ACM, 645--654.
[29]
Google Finance. 2010. Google finance homepage. http://finance.google.com/finance
[30]
Gruhl, D., Guha, R., Liben-Nowell, D., and Tomkins, A. 2004. Information diffusion through blogspace. In Proceedings of the 13th International Conference on World Wide Web.
[31]
Gruhl, D., Guha, R., Kumar, R., Novak, J., and Tomkins, A. 2005. The predictive power of online chatter. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 78--87.
[32]
Gyöngyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating web spam with trustrank. In Proceedings of the 30th International Conference on Very Large Data Bases. Vol. 30, 576--587.
[33]
Huang, J., Zhuang, Z., Li, J., and Giles, C. L. 2008. Collaboration over time: Characterizing and modeling network evolution. In Proceedings of the International Conference on Web Search and Web Data Mining. ACM, 107--116.
[34]
Huffington Post. 2010. Huffington post homepage. http://www.huffingtonpost.com/theblog/
[35]
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.
[36]
Kondor, R. and Lafferty, J. 2002. Diffusion kernels on graphs and other discrete structures. In Proceedings of the 19th International Conference on Machine Learning (ICML).
[37]
Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. 2003. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web. ACM, 568--576.
[38]
Kumar, R., Novak, J., and Tomkins, A. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 611--617.
[39]
Li, B., Xu, S., and Zhang, J. 2007. Enhancing clustering blog documents by utilizing author/reader comments. In Proceedings of the 45th Annual Southeast Regional Conference. ACM, 94--99.
[40]
Liljeros, F., Edling, C. R., and Amaral, L. A. N. 2003. Sexual networks: Implications for the transmission of sexually transmitted infections. Microb. Infect. 5, 2, 189--196.
[41]
Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. L. 2007. Blog community discovery and evolution based on mutual awareness expansion. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 48--56.
[42]
Liu, Y., Huang, X., An, A., and Yu, X. 2007. Arsa: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 607--614.
[43]
McDonald, D. W. 2003. Recommending collaboration with social networks: A comparative evaluation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 593--600.
[44]
McPherson, M., Smith-Lovin, L., and Cook, J. M. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27, 415--444.
[45]
Mei, Q., Liu, C., Su, H., and Zhai, C. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on World Wide Web. ACM, 533--542.
[46]
Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. 2007. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide Web. ACM Press, 171--180.
[47]
Milgram, S. 1967. The small world problem. Psych. Today 61, 1.
[48]
Nakajima, S., Tatemura, J., Hara, Y., Tanaka, K., and Uemura, S. 2005. Discovering important bloggers based on analyzing blog threads. In Proceedings of the WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics.
[49]
New York Times. 2010. New york times homepage. http://www.nytimes.com/
[50]
Newman, M. E. J. 2001. From the cover: The structure of scientific collaboration networks. Proc. Nat. Acad. Sci. 98, 2, 404--409.
[51]
Newman, M. E. J. 2003. Mixing patterns in networks. Phys. Rev. E (Statist. Nonlinear Soft Matter Phys.) 67, 2, 026126.
[52]
Ng, A. Y., Jordan, M. I., and Weiss., Y. 2002. On spectral clustering: Analysis and an algorithm. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 14.
[53]
Nielsen Ratings. 2010. Nielsen netratings. http://www.nielsen-online.com/
[54]
Ohira, M., Ohsugi, N., Ohoka, T., and Matsumoto, K.-I. 2005. Accelerating cross-project knowledge collaboration using collaborative filtering and social networks. SIGSOFT Softw. Engin. Not. 30, 4, 1--5.
[55]
Palla, G., Barabasi, A.-L., and Vicsek, T. 2007. Quantifying social group evolution. Nature 446, 7136, 664--667.
[56]
Real Clear Politics. 2010. Real clear politics polls. http://www.realclearpolitics.com/
[57]
Song, X., Chi, Y., Hino, K., and Tseng, B. 2007a. Identifying opinion leaders in the blogosphere. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, 971--974.
[58]
Song, X., Chi, Y., Hino, K., and Tseng, B. L. 2007b. Information flow modeling based on diffusion rate for prediction and ranking. In Proceedings of the 16th International Conference on World Wide Web. ACM, 191--200.
[59]
Stewart, A., Chen, L., Paiu, R., and Nejdl, W. 2007. Discovering information diffusion paths from blogosphere for online advertising. In Proceedings of the 1st International Workshop on Data Mining and Audience Intelligence for Advertising. ACM, 46--54.
[60]
Toivonen, R., Onnela, J.- P., Saramäki, J., Hyvönen, J., and Kaski, K. 2006. A model for social networks. Physica A 371, 2, 851--860.
[61]
Zhou, D., Manavoglu, E., Li, J., Giles, C. L., and Zha, H. 2006. Probabilistic models for discovering e-communities. In Proceedings of the 15th International Conference on World Wide Web. ACM, 173--182.

Cited By

View all
  • (2014)Analyzing firm-specific social media and marketDecision Support Systems10.1016/j.dss.2014.08.00167:C(30-39)Online publication date: 1-Nov-2014
  • (2012)Proactive Personalized Mobile Multi-Blogging Service on Smart-M3Journal of Computing and Information Technology10.2498/cit.100209420:3Online publication date: 2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 29, Issue 1
December 2010
232 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1877766
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2010
Accepted: 01 July 2010
Revised: 01 March 2010
Received: 01 September 2009
Published in TOIS Volume 29, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Blogosphere
  2. Engadget
  3. Huffington Post
  4. communication dynamics
  5. political polls
  6. prototypical groups
  7. social communication
  8. social network analysis
  9. stock market movement

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Analyzing firm-specific social media and marketDecision Support Systems10.1016/j.dss.2014.08.00167:C(30-39)Online publication date: 1-Nov-2014
  • (2012)Proactive Personalized Mobile Multi-Blogging Service on Smart-M3Journal of Computing and Information Technology10.2498/cit.100209420:3Online publication date: 2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media