Abstract
Structural analysis of social networks can provide important insights into the clusters and important nodes. However, it is silent on the content-based reasons for importance or commonality. This additional level of insight requires sampling content from nodes and processing it to distill new insights. That is done effectively by human analysts, but as networks grow into Big Data scale, human analysis is not possible. This raises the question of whether automated techniques can mimic the same results humans find. In this paper, we demonstrate how topic modeling can be applied, filtered, and adapted to produce easy-to-understand keywords that represent important clusters in a network. Those keywords reflect the insights achieved by human analysts doing a manual content-based analysis of the network features. While humans should never be removed from the analysis process, this work shows how automated techniques can be integrated to scale humans’ ability to gain insights in large networks.








Similar content being viewed by others
Notes
References
Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). ACM.
Amelio, A., & Pizzuti, C. (2013). Community mining in signed networks: a multiobjective approach. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining (pp. 95–99). ACM.
Beutel, A., Akoglu, L., & Faloutsos, C. (2015). Graph-based user behavior modeling: from prediction to fraud detection. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2309–2310). ACM.
Black, L.W., Welser, H.T., Cosley, D., & DeGroot, J.M. (2011). Self-governance through group discussion in wikipedia: measuring deliberation in online groups. Small Group Research, p. 1046496411406137.
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Brandes, U. (2001). A faster algorithm for betweenness centrality*. Journal of Mathematical Sociology, 25(2), 163–177.
Brandes, U., Kenis, P., Lerner, J., & van Raaij, D. (2009). Network analysis of collaboration structure in wikipedia. In Proceedings of the 18th international conference on world wide web (pp. 731–740). ACM.
Buntain, C., & Golbeck, J. (2014). Identifying social roles in reddit using network structure. In Proceedings of the companion publication of the 23rd international conference on world wide web companion. International World Wide Web Conferences Steering Committee (pp. 615–620).
Cabunducan, G., Castillo, R., & Lee, J.B. (2011). Voting behavior analysis in the election of wikipedia admins. In International conference on advances in social networks analysis and mining (ASONAM), 2011 (pp. 545–547). IEEE.
Creswell, J.W. (2013). Research design: qualitative, quantitative, and mixed methods approaches. Sage Publications.
Crossley, N. (2010). The social world of the network. Combining qualitative and quantitative elements in social network analysis. Sociologica, 4(1), 0–0.
Domínguez, S., & Hollstein, B. (2014). Mixed methods social networks research: design and applications, Vol. 36. Cambridge University Press.
Eleta, I.M., & Golbeck, J. (2014). Multilingual use of twitter: social networks at the language frontier. Computers in Human Behavior, 41, 424–432.
Fisher, D., Smith, M., & Welser, H.T. (2006). You are who you talk to: detecting roles in usenet newsgroups. In Proceedings of the 39th annual Hawaii international conference on system sciences, HICSS ’06 (Vol. 3, p. 59.2). Washington, DC, USA: IEEE Computer Society.
Golbeck, J. (2006). Filmtrust: movie recommendations from semantic web-based social networks. In Consumer communications and networking conference (Vol. 2, pp. 1314–1315). Citeseer.
Gómez, V., Kaltenbrunner, A., & López, V. (2008). Statistical analysis of the social network and discussion threads in slashdot. In Proceedings of the 17th international conference on world wide web (pp. 645–654). ACM.
Gupta, M., Gao, J., Yan, X., Cam, H., & Han, J. (2013). On detecting association-based clique outliers in heterogeneous information networks. In IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), 2013 (pp. 108– 115). IEEE.
Huang, B., Kimmig, A., Getoor, L., & Golbeck, J. (2013). A flexible framework for probabilistic models of social trust. In International conference on social computing, behavioral-cultural modeling, and prediction (pp. 265–273). Springer.
Iba, T., Nemoto, K., Peters, B., & Gloor, P.A. (2010). Analyzing the creative editing behavior of wikipedia editors: through dynamic social network analysis. Procedia-Social and Behavioral Sciences, 2(4), 6441–6456.
Kane, G.C. (2009). It’s a network, not an encyclopedia: a social network perspective on wikipedia collaboration. In Academy of management proceedings, number 1 (pp. 1–6). Academy of Management.
Lambiotte, R., Delvenne, J.-C., & Barahona, M. (2008). Laplacian dynamics and multiscale modular structure in networks. arXiv preprint arXiv:0812.1770.
Laniado, D., & Tasso, R. (2011). Co-authorship 2.0: Patterns of collaboration in wikipedia. In Proceedings of the 22nd ACM conference on hypertext and hypermedia (pp. 201–210). ACM.
Laniado, D., Tasso, R., Volkovich, Y., & Kaltenbrunner, A. (2011). When the wikipedians talk: network and tree structure of wikipedia discussion pages. In ICWSM.
Lincoln, M. (2016). Modeling the network of dutch and flemish print production, 1550–1750. PhD Dissertation, University of Maryland.
Manca, M., Boratto, L., & Carta, S. (2015). Behavioral data mining to produce novel and serendipitous friend recommendations in a social bookmarking system. Information Systems Frontiers, 1–15.
Mobasher, B., Cooley, R., & Srivastava, J. (2000). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.
Nonnecke, B., & Preece, J. (2000). Lurker demographics: counting the silent. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 73–80). ACM.
Pal, A., & Counts, S. (2011). Identifying topical authorities in microblogs. In Proceedings of the fourth ACM international conference on web search and data mining (pp. 45–54). ACM.
Schönhofen, P. (2009). Identifying document topics using the wikipedia category network. Web Intelligence and Agent Systems: An International Journal, 7(2), 195–207.
Williams, J., Heiser, R., & Chinn, S.J. (2012). Social media posters and lurkers: the impact on team identification and game attendance in minor league baseball. Journal of Direct, Data and Digital Marketing Practice, 13(4), 295–310.
Ziegler, C.-N., & Lausen, G. (2005). Propagation models for trust and distrust in social networks. Information Systems Frontiers, 7(4-5), 337–358.
Acknowledgements
This work was conducted with the support of the National Science Foundation award 1546829.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Golbeck, J., Gerhard, J., O’Colman, F. et al. Scaling Up Integrated Structural and Content-Based Network Analysis. Inf Syst Front 20, 1191–1202 (2018). https://doi.org/10.1007/s10796-017-9783-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-017-9783-x