Skip to main content
Log in

Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks

  • Special Feature
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Group profiling methods aim to construct a descriptive profile for communities in social networks. This task is similar to the traditional cluster labeling task, commonly adopted in document clustering to identify tags which characterize each derived cluster. This similarity encourages the direct application of cluster labeling methods for group profiling problems. However, in group profiling, an important additional information can be leveraged, which is the presence of links among the clustered individuals. This work extends our previous work by incorporating relational information to better describe communities. The proposed approach, so-called Centrality-based Group Profiling approach, makes use of network centrality measures in the selection of nodes for the characterization, i.e., nodes that generalize the content of the observed communities. The use of relational information to select relevant nodes in a community significantly reduces the complexity of the profiling task, at the same time retaining enough representative content to produce a good characterization. Experiments were conducted in a co-authorship network to evaluate different profiling strategies. The results demonstrated the ability of the proposed approach to producing good profiles for the observed groups with both group profiling and standard cluster labeling methods, with a considerably lower computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The fringe of a community P is defined as the set of vertices that do not belong to P, but have at least one connection to members of P.

  2. https://arxiv.org/

  3. http://www.blogcatalog.com/

  4. http://www.livejournal.com/

  5. Here we consider \(N(v_i) \subseteq V_P\).

  6. Stopwords is a list of all non-informative terms in a document, usually composed of prepositions, articles, adverbs, numbers, pronouns, and punctuation.

  7. https://github.com/andrecamara/dataset_ngc

  8. A paper is considered cohesive if it presents great similarity to the content found in the group.

  9. As we notice in one pilot study, subjects tend to assign random ratings if the task takes too long.

  10. Is a very general model for coordination among multiple agents.

  11. Method to automatically identify the most relevant target variables in forming its explanation.

  12. As best profiles; we refer to the ones pointed by the users in the evaluation described in section “Comparative Study: Differentiation-Based Methods”.

References

  1. Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newslett. 7(2), 3–12 (2005)

    Article  Google Scholar 

  2. Baumes, J., Goldberg, M., Magdon-Ismail, M., Wallace, A.: Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (2004)

  3. Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 687–696 (2007)

  4. Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool, New York (2010)

    Google Scholar 

  5. Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: the state of the art and comparative study. CoRR abs/1110.5813 (2011)

  6. Tang, L., Liu, H., Zhang, J., Agarwal, N., Salerno, J.J.: Topic taxonomy adaptation for group profiling. ACM Trans. Knowl. Discov. Data 1(4), 1 (2008)

    Article  Google Scholar 

  7. Tang, L., Wang, X., Liu, H.: Group profiling for understanding social structures. ACM Trans. Intell. Syst. Technol. 3, 15 (2011)

    Article  Google Scholar 

  8. Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 139–146. ACM, New York, NY, USA (2009)

  9. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’92, pp. 318–329. ACM, New York, NY, USA (1992)

  10. Gomes, J.E.A., Prudncio, R., Nascimento, A.: A comparative study of group profiling techniques in co-authorship networks. In: Brazilian Conference on Intelligent Systems (BRACIS 2016) (2016)

  11. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  12. Rossi, R.G., Rezende, S.O.: Building a topic hierarchy using the bag-of-related-words representation. In: Proceedings of the 11th ACM Symposium on Document Engineering, DocEng ’11, pp. 195–204. ACM, New York, NY, USA (2011)

  13. Kouznetsov, A., Zouaq, A.: Knowledge Management and Acquisition for Smart Systems and Services: 13th Pacific Rim Knowledge Acquisition Workshop, PKAW 2014, Gold Cost, Qld, Australia, December 1–2, 2014. Proceedings, chap. A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords, pp. 260–268. Springer International Publishing, Cham (2014)

  14. Ienco, D., Meo, R.: Towards the automatic construction of conceptual taxonomies. In: Song, I.Y., Eder, J., Nguyen, T. (eds.) Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science, vol. 5182, pp. 327–336. Springer, Berlin, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Role, F., Nadif, M.: Beyond cluster labeling: semantic interpretation of clusters’ contents using a graph representation. Know. Based Syst. 56, 141–155 (2014)

    Article  Google Scholar 

  16. Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: Sixth International Joint Conference on Natural Language Processing, pp. 834–838 (2013)

  17. Rossi, R.G., Marcacini, R.M., Rezende, S.O.: Analysis of domain independent statistical keyword extraction methods for incremental clustering. In: 12th Brazilian Symposium on Neural Networks, pp. 17–37 (2014)

  18. Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 10, 8 (2008)

    Google Scholar 

  19. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, New York (1994)

    Book  MATH  Google Scholar 

  20. Chintalapudi, S.R., Prasad, M.H.M.K.: A survey on community detection algorithms in large scale real world networks. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1323–1327 (2015)

  21. Gomes, J.E.A., Prudncio, R.B.C., Meira, L., Azevedo Filho, A., Nascimento, A.C.A., Oliveira, H.: Profiling for understanding educational social networking. Softw. Eng. Knowl. Eng. (SEKE 2013) (2013)

  22. Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)

    Article  MathSciNet  Google Scholar 

  23. Han, E.H., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’00, pp. 424–431. Springer, London (2000)

  24. Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Second ACM International Conference on Web Search and Data Mining (WSDM 2009) (2009)

  25. Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using wordnet and lexical chains. Expert Syst. Appl. 42(4), 2264–2275 (2015)

    Article  Google Scholar 

  26. Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, pp. 465–474. ACM, New York, NY, USA (2013)

  27. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  28. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 101–110 (2008)

  29. Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the 2006 International Conference on Digital Government Research, dg.o ’06, pp. 167–176. Digital Government Society of North America (2006)

  30. Maqbool, O., Babri, H.: Interpreting clustering results through cluster labeling. In: Proceedings of the IEEE Symposium on Emerging Technologies, 2005, pp. 429–434 (2005)

  31. Popescul, A., Ungar, L.H.: Automatic labeling of document clusters, (2000). In press, http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7B8CE8FD896381B0BCBCD51B9080B647?doi=10.1.1.33.141&rep=rep1&type=pdf. Accessed 15 Oct 2017

  32. Kuhn, A., Ducasse, S., Gírba, T.: Semantic clustering: identifying topics in source code. Inf. Softw. Technol. 49(3), 230–243 (2007)

    Article  Google Scholar 

  33. Bollen, J., Gonalves, B., Ruan, G., Mao, H.: Happiness is assortative in online social networks. Artif. Life 17(3), 237–251 (2011)

    Article  Google Scholar 

  34. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)

    Article  Google Scholar 

  35. Yuan, Y.C., Gay, G.: Homophily of network ties and bonding and bridging social capital in computer-mediated distributed teams. J. Comput. Mediat. Commun. 11(4), 1062–1084 (2006)

    Article  Google Scholar 

  36. Freeman, L.C.: Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)

    Article  MathSciNet  Google Scholar 

  37. Clauset, A., Moore, C., Newman, M.E.J.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008)

    Article  Google Scholar 

  38. Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: D. Lin, D. Wu (eds.) Proceedings of EMNLP 2004. Association for Computational Linguistics, Barcelona, Spain, pp. 404–411 (2004)

  39. Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer Publishing Company, New York (2012)

    Book  Google Scholar 

  40. Barrera, A., Verma, R.: Computational Linguistics and Intelligent Text Processing: 13th International Conference, CICLing 2012, New Delhi, India, March 11–17, 2012, Proceedings, Part II, chap. Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization, pp. 366–377. Springer, Berlin, Heidelberg (2012)

  41. Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS ’09, pp. 27:1–27:2 (2009)

  42. Bastian, M., Heymann, S., Jacomy, M.: Gephi: An open source software for exploring and manipulating networks (2009)

  43. Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions (2007)

  44. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems), 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    MATH  Google Scholar 

  45. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  46. Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.: Learning Classifier Systems: 11th International Workshop, IWLCS 2008, Atlanta, GA, USA, July 13, 2008, and 12th International Workshop, IWLCS 2009, Montreal, QC, Canada, July 9, 2009, Revised Selected Papers. Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010). https://books.google.com.br/books?id=psa6BQAAQBAJ

Download references

Acknowledgements

The authors would like to thank CNPq (Brazilian Agency) for its financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João E. A. Gomes.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gomes, J.E.A., Prudêncio, R.B.C. & Nascimento, A.C.A. Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks. New Gener. Comput. 36, 59–89 (2018). https://doi.org/10.1007/s00354-017-0028-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-017-0028-9

Keywords

Navigation