Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks

Gomes, João E. A.; Prudêncio, Ricardo B. C.; Nascimento, André C. A.

doi:10.1007/s00354-017-0028-9

Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks

Special Feature
Published: 21 November 2017

Volume 36, pages 59–89, (2018)
Cite this article

New Generation Computing Aims and scope Submit manuscript

João E. A. Gomes^1,2,
Ricardo B. C. Prudêncio² &
André C. A. Nascimento³

406 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Group profiling methods aim to construct a descriptive profile for communities in social networks. This task is similar to the traditional cluster labeling task, commonly adopted in document clustering to identify tags which characterize each derived cluster. This similarity encourages the direct application of cluster labeling methods for group profiling problems. However, in group profiling, an important additional information can be leveraged, which is the presence of links among the clustered individuals. This work extends our previous work by incorporating relational information to better describe communities. The proposed approach, so-called Centrality-based Group Profiling approach, makes use of network centrality measures in the selection of nodes for the characterization, i.e., nodes that generalize the content of the observed communities. The use of relational information to select relevant nodes in a community significantly reduces the complexity of the profiling task, at the same time retaining enough representative content to produce a good characterization. Experiments were conducted in a co-authorship network to evaluate different profiling strategies. The results demonstrated the ability of the proposed approach to producing good profiles for the observed groups with both group profiling and standard cluster labeling methods, with a considerably lower computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Community Structure in Co-authorship Networks: The Case of Italian Statisticians

Exploring Community Structures by Comparing Group Characteristics

Community Clustering Based on Weighted Informative Graph

Notes

The fringe of a community P is defined as the set of vertices that do not belong to P, but have at least one connection to members of P.
https://arxiv.org/
http://www.blogcatalog.com/
http://www.livejournal.com/
Here we consider \(N(v_i) \subseteq V_P\).
Stopwords is a list of all non-informative terms in a document, usually composed of prepositions, articles, adverbs, numbers, pronouns, and punctuation.
https://github.com/andrecamara/dataset_ngc
A paper is considered cohesive if it presents great similarity to the content found in the group.
As we notice in one pilot study, subjects tend to assign random ratings if the task takes too long.
Is a very general model for coordination among multiple agents.
Method to automatically identify the most relevant target variables in forming its explanation.
As best profiles; we refer to the ones pointed by the users in the evaluation described in section “Comparative Study: Differentiation-Based Methods”.

References

Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newslett. 7(2), 3–12 (2005)
Article Google Scholar
Baumes, J., Goldberg, M., Magdon-Ismail, M., Wallace, A.: Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (2004)
Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 687–696 (2007)
Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool, New York (2010)
Google Scholar
Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: the state of the art and comparative study. CoRR abs/1110.5813 (2011)
Tang, L., Liu, H., Zhang, J., Agarwal, N., Salerno, J.J.: Topic taxonomy adaptation for group profiling. ACM Trans. Knowl. Discov. Data 1(4), 1 (2008)
Article Google Scholar
Tang, L., Wang, X., Liu, H.: Group profiling for understanding social structures. ACM Trans. Intell. Syst. Technol. 3, 15 (2011)
Article Google Scholar
Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 139–146. ACM, New York, NY, USA (2009)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’92, pp. 318–329. ACM, New York, NY, USA (1992)
Gomes, J.E.A., Prudncio, R., Nascimento, A.: A comparative study of group profiling techniques in co-authorship networks. In: Brazilian Conference on Intelligent Systems (BRACIS 2016) (2016)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Rossi, R.G., Rezende, S.O.: Building a topic hierarchy using the bag-of-related-words representation. In: Proceedings of the 11th ACM Symposium on Document Engineering, DocEng ’11, pp. 195–204. ACM, New York, NY, USA (2011)
Kouznetsov, A., Zouaq, A.: Knowledge Management and Acquisition for Smart Systems and Services: 13th Pacific Rim Knowledge Acquisition Workshop, PKAW 2014, Gold Cost, Qld, Australia, December 1–2, 2014. Proceedings, chap. A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords, pp. 260–268. Springer International Publishing, Cham (2014)
Ienco, D., Meo, R.: Towards the automatic construction of conceptual taxonomies. In: Song, I.Y., Eder, J., Nguyen, T. (eds.) Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science, vol. 5182, pp. 327–336. Springer, Berlin, Heidelberg (2008)
Chapter Google Scholar
Role, F., Nadif, M.: Beyond cluster labeling: semantic interpretation of clusters’ contents using a graph representation. Know. Based Syst. 56, 141–155 (2014)
Article Google Scholar
Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: Sixth International Joint Conference on Natural Language Processing, pp. 834–838 (2013)
Rossi, R.G., Marcacini, R.M., Rezende, S.O.: Analysis of domain independent statistical keyword extraction methods for incremental clustering. In: 12th Brazilian Symposium on Neural Networks, pp. 17–37 (2014)
Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 10, 8 (2008)
Google Scholar
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, New York (1994)
Book MATH Google Scholar
Chintalapudi, S.R., Prasad, M.H.M.K.: A survey on community detection algorithms in large scale real world networks. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1323–1327 (2015)
Gomes, J.E.A., Prudncio, R.B.C., Meira, L., Azevedo Filho, A., Nascimento, A.C.A., Oliveira, H.: Profiling for understanding educational social networking. Softw. Eng. Knowl. Eng. (SEKE 2013) (2013)
Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)
Article MathSciNet Google Scholar
Han, E.H., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’00, pp. 424–431. Springer, London (2000)
Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Second ACM International Conference on Web Search and Data Mining (WSDM 2009) (2009)
Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using wordnet and lexical chains. Expert Syst. Appl. 42(4), 2264–2275 (2015)
Article Google Scholar
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM ’13, pp. 465–474. ACM, New York, NY, USA (2013)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 101–110 (2008)
Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the 2006 International Conference on Digital Government Research, dg.o ’06, pp. 167–176. Digital Government Society of North America (2006)
Maqbool, O., Babri, H.: Interpreting clustering results through cluster labeling. In: Proceedings of the IEEE Symposium on Emerging Technologies, 2005, pp. 429–434 (2005)
Popescul, A., Ungar, L.H.: Automatic labeling of document clusters, (2000). In press, http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7B8CE8FD896381B0BCBCD51B9080B647?doi=10.1.1.33.141&rep=rep1&type=pdf. Accessed 15 Oct 2017
Kuhn, A., Ducasse, S., Gírba, T.: Semantic clustering: identifying topics in source code. Inf. Softw. Technol. 49(3), 230–243 (2007)
Article Google Scholar
Bollen, J., Gonalves, B., Ruan, G., Mao, H.: Happiness is assortative in online social networks. Artif. Life 17(3), 237–251 (2011)
Article Google Scholar
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)
Article Google Scholar
Yuan, Y.C., Gay, G.: Homophily of network ties and bonding and bridging social capital in computer-mediated distributed teams. J. Comput. Mediat. Commun. 11(4), 1062–1084 (2006)
Article Google Scholar
Freeman, L.C.: Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)
Article MathSciNet Google Scholar
Clauset, A., Moore, C., Newman, M.E.J.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008)
Article Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: D. Lin, D. Wu (eds.) Proceedings of EMNLP 2004. Association for Computational Linguistics, Barcelona, Spain, pp. 404–411 (2004)
Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer Publishing Company, New York (2012)
Book Google Scholar
Barrera, A., Verma, R.: Computational Linguistics and Intelligent Text Processing: 13th International Conference, CICLing 2012, New Delhi, India, March 11–17, 2012, Proceedings, Part II, chap. Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization, pp. 366–377. Springer, Berlin, Heidelberg (2012)
Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS ’09, pp. 27:1–27:2 (2009)
Bastian, M., Heymann, S., Jacomy, M.: Gephi: An open source software for exploring and manipulating networks (2009)
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions (2007)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems), 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)
MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.: Learning Classifier Systems: 11th International Workshop, IWLCS 2008, Atlanta, GA, USA, July 13, 2008, and 12th International Workshop, IWLCS 2009, Montreal, QC, Canada, July 9, 2009, Revised Selected Papers. Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010). https://books.google.com.br/books?id=psa6BQAAQBAJ

Download references

Acknowledgements

The authors would like to thank CNPq (Brazilian Agency) for its financial support.

Author information

Authors and Affiliations

Campus Serra Talhada, Instituto Federal do Sertão Pernambucano (IFSertão-PE), 78, Serra Talhada, PE, Brazil
João E. A. Gomes
Centro de Informática, Universidade Federal de Pernambuco (UFPE), Recife, PE, 50740-560, Brazil
João E. A. Gomes & Ricardo B. C. Prudêncio
Dep. de Estatística e Informática, Universidade Federal Rural de Pernambuco (UFRPE), Recife, PE, 52171-900, Brazil
André C. A. Nascimento

Authors

João E. A. Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo B. C. Prudêncio
View author publications
You can also search for this author in PubMed Google Scholar
André C. A. Nascimento
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João E. A. Gomes.

About this article

Cite this article

Gomes, J.E.A., Prudêncio, R.B.C. & Nascimento, A.C.A. Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks. New Gener. Comput. 36, 59–89 (2018). https://doi.org/10.1007/s00354-017-0028-9

Download citation

Received: 03 February 2017
Accepted: 01 November 2017
Published: 21 November 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00354-017-0028-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks

Abstract

Access this article

Similar content being viewed by others

Community Structure in Co-authorship Networks: The Case of Italian Statisticians

Exploring Community Structures by Comparing Group Characteristics

Community Clustering Based on Weighted Informative Graph

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Centrality-Based Group Profiling: A Comparative Study in Co-authorship Networks

Abstract

Access this article

Similar content being viewed by others

Community Structure in Co-authorship Networks: The Case of Italian Statisticians

Exploring Community Structures by Comparing Group Characteristics

Community Clustering Based on Weighted Informative Graph

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation