Skip to main content
Log in

An entropy-based social network community detecting method and its application to scientometrics

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Community structure is one of the important properties of social networks in general and in particular the citation networks in the field of scientometrics. A majority of existing methods are not proper for detecting communities in a directed network, and thus hinders their applications in the citation networks. In this paper, we provide a novel method which not only overcomes the above mentioned disability, but also has a relative low algorithm time complexity which facilitates the application in large scale networks. We use the concept of Shannon entropy to measure a network’s information and then consider the process of detecting communities as a process of information loss. Based on this idea, we develop an optimal model to depict the process of detecting communities and further introduce the principle of dynamic programming to solve the model. A simulation test is also designed to examine the model’s accuracy in discovering the community structure and identifying the optimal community number. Finally, we apply our method in a citation network from the journal Scientometrics and then provide several insights on promising research topics through the detected communities by our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bell, E. T. (1934). Exponential numbers. American Mathematical Monthly, 41, 411–419.

    Article  MathSciNet  Google Scholar 

  • Bellman, R. E. (1957). Dynamic Programming. Princeton: Princeton University Press.

    MATH  Google Scholar 

  • Bianconi, G., Pin, P., & Marsili, M. (2009). Assessing the relevance of node features for network structure. Proceedings of the National Academy of Sciences, 106(28), 11433–11438.

  • Braam, R. R., Moed, H. F., & VanRaan, A. E. J. (1991). Mapping of science by combined co-citation and word analysis, I. Structural aspects. Journal of the American Society for Information Science, 42(4), 233–251.

    Article  Google Scholar 

  • Braun, T., Glänzel, W., Maczelka, H., & Schubert, A. (1994). World science in the eighties. National performances in publication output and citation impact, 1985–1989 versus 1980–1984. Scientometrics, 31(1), 3–30.

    Article  Google Scholar 

  • Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: a multiple-perspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409.

    Article  Google Scholar 

  • Correa, M., González-Sabaté, L., & Serrano, I. (2013). Home bias effect in the management literature. Scientometrics, 95(1), 417–433.

    Article  Google Scholar 

  • Deng, X. L., Wang, B., Wu, B., & Yang, S. Q. (2012). Modularity modeling and evaluation in community detecting of complex network based on information entropy. Journal of Computer Research and Development, 49(4), 725–734.

    Google Scholar 

  • Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science, 328(5981), 1029–1031.

    Article  MathSciNet  MATH  Google Scholar 

  • Evans, T. S., Lambiotte, R., & Panzarasa, P. (2011). Community structure and patterns of scientific collaboration in business and management. Scientometrics, 89(1), 381–396.

    Article  Google Scholar 

  • Fatt, C. K., Ujum, E. A., & Ratnavelu, K. (2010). The structure of collaboration in the Journal of Finance. Scientometrics, 85(3), 849–860.

    Article  Google Scholar 

  • Franceschet, M. (2012). The large-scale structure of journal citation networks. Journal of the American Society for Information Science and Technology, 63(4), 837–842.

    Article  MathSciNet  Google Scholar 

  • Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7812–7826.

    Article  MathSciNet  Google Scholar 

  • Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better late than never? on the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.

    Article  Google Scholar 

  • Groh, G., & Fuchs, C. (2011). Multi-modal social networks for modeling scientific fields. Scientometrics, 89(2), 569–590.

    Article  Google Scholar 

  • He, B., Ding, Y., Tang, J., Reguramalingam, V., & Bollen, J. (2013). Mining diversity subgraph in multidisciplinary scientific collaboration networks: a meso perspective. Journal of Informetrics, 7(1), 117–128.

    Article  Google Scholar 

  • Jin, D., Liu, D., Yang, B., & Liu, J. (2009). Fast complex network clustering algorithm using agents. Proceedings of the 8th International Conference on dependable, autonomic and secure computing, 615–619.

  • Kajikawa, Y., Yoshikawa, J., Takeda, Y., & Matsushima, K. (2008). Tracking emerging technologies in energy research: toward a roadmap for sustainable energy. Technological Forecasting and Social Change, 75(6), 771–782.

    Article  Google Scholar 

  • Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2), 292–307.

    Article  Google Scholar 

  • Kumar, S., & Jan, J. M. (2014). Research collaboration networks of two OIC nations: comparative study between Turkey and Malaysia in the field of ‘Energy Fuels’, 2009–2011. Scientometrics, 98(1), 387–414.

    Article  Google Scholar 

  • Li, Y., Wu, C., & Wang, Z. (2014). An information-theoretic approach for detecting communities in networks. Quality and Quantity,. doi:10.1007/s11135-014-9996-8.

    Google Scholar 

  • Lo, D., Surian, D., Prasetyo, P. K., Zhang, K., & Lim, E. P. (2013). Mining direct antagonistic communities in signed social networks. Information Processing and Management, 49(4), 773–791.

    Article  Google Scholar 

  • Moed, H. F., Bruin, R. E. D., & Leeuwen, T. N. V. (1995). New bibliometric tools for the assessment of national research performance: database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422.

    Article  Google Scholar 

  • Nascimento, M. C. V., & Pitsoulis, L. (2013). Community detection by modularity maximization using GRASP with path relinking. Computers & Operations Research, 40(12), 3121–3131.

    Article  MathSciNet  Google Scholar 

  • Nepusz, T., Petrózi, A., Négyessy, L., & Bazsó, F. (2008). Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E, 77(1), 016107.

    Article  MathSciNet  Google Scholar 

  • Newman, M. E. J. (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64(1), 016131.

    Article  Google Scholar 

  • Newman, M. E. J. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.

    Article  Google Scholar 

  • Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(9), 06613.

    Google Scholar 

  • Newman, M. E. J. (2012). Communities, modules and large-scale structure in networks. Nature Physics, 8, 25–31.

    Article  Google Scholar 

  • Newman, M. E. J. (2013a). Community detection and graph partitioning. Physics Reports, 486(3–5), 75–174.

    Google Scholar 

  • Onnela, J., Fenn, D. J., Reid, S., Porter, M. A., Mucha, P. J., Fricker, M. D., et al. (2012). Taxonomies of networks from community structure. Physical Review E, 86, 036104.

    Article  Google Scholar 

  • Palla, G., Derényi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814–818.

    Article  Google Scholar 

  • Pallotti, F., Lomi, A., & Mascia, D. (2013). From network ties to network structures: exponential random graph models of interorganizational relations. Quality & Quantity, 47(3), 1665–1685.

    Article  Google Scholar 

  • Peixoto, T. P. (2013). Parsimonious module inference in large networks. Physical Review Letters, 110(14), 148701.

    Article  Google Scholar 

  • Psorakis, I., Roberts, S., Ebden, M., & Sheldon, B. (2011). Overlapping community detection using Bayesian non-negative matrix factorization. Physical Review E, 83(6), 066114.

    Article  Google Scholar 

  • Rodriguez, M. A., & Pepe, A. (2008). On the relationship between the structural and socioacademic communities of a coauthorship network. Journal of Informetrics, 2(3), 195–201.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. T. (2007). An information-theoretic framework for resolving community structure in complex networks. Proceedings of the National Academy of Science, 104(18), 7327–7331.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the United States of America, 105(4), 1118–1123.

    Article  Google Scholar 

  • Seglen, P. O. (1992). How representative is the journal impact factor? Research evaluation, 2(3), 143–149.

    Article  Google Scholar 

  • Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2008). Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation, 28(11), 758–775.

    Article  Google Scholar 

  • Shibata, N., Kajikawa, Y., Takeda, Y., Sakata, I., & Matsushima, K. (2009). Early detection of innovations from citation networks. In Industrial Engineering and Engineering Management, IEEE International Conference on IEEM, 54–58.

  • Shiga, M., Takigawa, I., & Mamitsuka, H. (2007). A spectral clustering approach to optimally combining numerical vectors with a modular network. Proceedings of 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 647–656.

  • Small, H. (1993). Macro-level changes in the structure of co-citation clusters: 1983–1989. Scientometrics, 26(1), 5–20.

    Article  MathSciNet  Google Scholar 

  • Velden, T., Haque, A., & Lagoze, C. (2010). A new approach to analyzing patterns of collaboration in co-authorship networks: mesoscopic analysis and interpretation. Scientometrics, 85(1), 219–242.

    Article  Google Scholar 

  • Velden, T., & Lagoze, C. (2013). The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology, 64(12), 2405–2427.

    Article  Google Scholar 

  • Vinkler, P. (1997). Relations of relative scientometric impact indicators. The relative publication strategy index. Scientometrics, 40(1), 163–169.

    Google Scholar 

  • Wallace, M. L., & Gingras, Y. (2008). A new approach for detecting scientific specialties from raw cocitation networks. Journal of the American Society for Information Science, 60(2), 240–246.

    Article  Google Scholar 

  • Wang, G., Zhang, X., Jia, G., & Ren, X. (2013). Application of algorithm used in community detection of complex network. International Journal of Future Generation Communication and Networking, 6(4), 219–230.

    Google Scholar 

  • Yan, E., Ding, Y., & Jacob, E. K. (2012a). Overlaying communities and topics: an analysis on publication networks. Scientometrics, 90(2), 499–513.

    Article  Google Scholar 

  • Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012b). Topics in dynamic research communities: an exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.

    Article  Google Scholar 

  • Yan, B., & Gregory, S. (2009). Detecting Communities in Networks by Merging Cliques. 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS 2009), 832–836.

  • Yang, B., Cheung, W. K., & Liu, J. (2007). Community mining from signed social networks. IEEE Transaction on Knowledge and Data Engineering, 19(10), 1333–1348.

    Article  Google Scholar 

  • Yu, G., & Li, Y. (2010). Identification of referencing and citation processes of scientific journals based on the citation distribution model. Scientometrics, 82(2), 249–261.

    Article  Google Scholar 

  • Zhao, Z., Feng, S., Wang, Q., Huang, J. Z., Wiiiams, G. J., & Fan, J. (2012). Topic oriented community detection through social objects and link analysis in social networks. Knowledge-Based Systems, 26, 164–173.

    Article  Google Scholar 

  • Zhen, Z., Wei, W., & Liang, W. (2012). Community detection based on an improved modularity. Pattern Recognition, 321, 638–645.

    Article  Google Scholar 

  • Ziv, E., Middendorf, M., & Wiggins, C. H. (2005). Information-theoretic approach to network modularity. Physical Review E, 71, 046117.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This study was partly funded by China Scholarship Council (No. 201306120159) and National Natural Science Foundation of China (No. 71271070, No. 71201039, and No. 71172157).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guijie Zhang.

Appendices

Appendix 1

The following Table 3 shows the time complexity of the mentioned community detection methods in Introduction.

Table 3 Time complexity of these algorithms mentioned in Introduction

Appendix 2

Here gives the analysis on time complexity of this paper’s method. Recall Table 1, the second step of Initialization needs the time complexity O(n 2), and the first step of Loop needs the time complexity O(n−1) which consists of refreshing the new community’s information with time complexity O(n−1) and finding the optimal merging communities with time complexity O(n−1). The second step of loop needs the time complexity O(n−2). Similarly, when the community number is nm, namely the \( m{\text{th}} \) step of loop, it needs the time complexity O(nm). Note that the above m changes from n−1 to 1. Accordingly, the whole time complexity can be obtained by adding the above mentioned all steps’ complexity. As a result, the whole algorithm’s time complexity is O(n 2).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhang, G., Feng, Y. et al. An entropy-based social network community detecting method and its application to scientometrics. Scientometrics 102, 1003–1017 (2015). https://doi.org/10.1007/s11192-014-1377-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1377-5

Keywords

MSC code

JEC code

Navigation