Skip to main content
Log in

Discovering communities based on mention distance

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Scholarly community detection has important applications in various fields. Current studies rely heavily on structured scholar networks, which have high computational complexity and are challenging to construct in practice. We propose a novel approach that can detect disjoint and overlapping scholarly communities directly from large textual corpora. To the best of our knowledge, this is the first study intended to detect communities directly from unstructured texts. In general, academic articles tend to mention related work and researchers. Researchers that are more closely related to each other are mentioned in a closer grouping in lines of academic text. Based on this correlation, we propose an intuitional method that measures the mutual relatedness of researchers through their textual distance. First, we extract and disambiguate the researcher names from academic articles. Then, we embed each researcher as an implicit vector and measure the relatedness of researchers by their vector distance. Finally, the communities are identified by vector clusters. We develop and evaluate our method on several real-world datasets. The experimental results demonstrate that our method achieves comparable performance with several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://aan.how/

  2. http://aclweb.org/anthology/

References

  • Arenas, A., Fernandez, A., & Gomez, S. (2008). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(5), 053039.

    Article  Google Scholar 

  • Balasubramanyan, R., & Cohen, W. W. (2011). Block-lda: Jointly modeling entity-annotated text and entity-entity links. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SIAM (pp. 450–461).

  • Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10, P10008.

    Article  Google Scholar 

  • Bohlin, L., Edler, D., Lancichinetti, A., & Rosvall, M. (2014). Community detection and visualization of networks with the map equation framework. In: Measuring scholarly impact, pp. 3–34. Springer.

  • Bornmann, L., Wray, K. B., & Haunschild, R. (2020). Citation concept analysis (cca): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by two exemplary case studies including classic books by thomas s. kuhn and karl r. popper. Scientometrics, 122, 1051–1074.

    Article  Google Scholar 

  • Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the Association for Information Science and Technology, 64(9), 1759–1767.

    Google Scholar 

  • Chakraborty, T., Ghosh, S., & Park, N. (2019). Ensemble-based overlapping community detection using disjoint community structures. Knowledge Based Systems, 163, 241–251.

    Article  Google Scholar 

  • Chang, J., & Blei, D. (2009). Relational topic models for document networks. In: Artificial Intelligence and Statistics (pp. 81–88).

  • Ding, W., Lin, C., & Ishwar, P. (2017). Node embedding via word embedding for network community discovery. IEEE Transactions on Signal and Information Processing over Networks, 3(3), 539–552.

    Article  MathSciNet  Google Scholar 

  • Duch, J., & Arenas, A. (2005). Community detection in complex networks using extremal optimization. Physical Review E, 72(2), 027104.

    Article  Google Scholar 

  • Fetahu, B., Markert, K., Nejdl, W., & Anand, A. (2016). Finding news citations for wikipedia. In: Conference on Information and Knowledge Management (pp. 337–346)

  • Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.

    Article  MathSciNet  Google Scholar 

  • Gregory, S. (2010). Finding overlapping communities in networks by label propagation. New Journal of Physics, 12(10), 103018.

    Article  Google Scholar 

  • Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855–864).

  • Han, Y., Tang, J. (2015). Probabilistic community and role model for social networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 407–416).

  • Hassan, S., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. In: ACM IEEE Joint Conference on Digital Libraries (pp. 1–8).

  • He, C., Tang, Y., Liu, H., Fei, X., Li, H., & Liu, S. (2019). A robust multi-view clustering method for community detection combining link and content information. Physica A-statistical Mechanics and Its Applications, 514, 396–411.

    Article  MathSciNet  Google Scholar 

  • He, K., Li, Y., Soundarajan, S., & Hopcroft, J. E. (2018). Hidden community detection in social networks. Information Sciences, 425, 92–106.

    Article  MathSciNet  Google Scholar 

  • He, L., Lu, C. T., Ma, J., Cao, J., Shen, L., Yu, P. S. (2016). Joint community and structural hole spanner detection via harmonic modularity. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 875–884)

  • Jin, D., Chen, Z., He, D., & Zhang, W. (2015). Modeling with node degree preservation can accurately find communities. In: Twenty-Ninth AAAI Conference on Artificial Intelligence

  • Lancichinetti, A., Radicchi, F., Ramasco, J. J., & Fortunato, S. (2011). Finding statistically significant communities in networks. PloS One, 6(4), e18961.

    Article  Google Scholar 

  • Liu, M., Lang, B., Gu, Z., & Zeeshan, A. (2017). Measuring similarity of academic articles with semantic profile and joint word embedding. Tsinghua Science and Technology, 22(6), 619–632.

    Article  Google Scholar 

  • Liu, M., Chen, Y., Lang, B., Zhang, L., & Niu, H. (2018a). Identifying scholarly communities from unstructured texts. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (pp. 75–89). Springer

  • Liu, M., Lang, B., & Gu, Z. (2018b). Similarity calculations of academic articles using topic events and domain knowledge. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (pp. 45–53). Springer

  • Liu, R. (2016). Citation-based extraction of core contents from biomedical articles. In: International Conference Industrial, Engineering & Other Applications Applied Intelligent Systems (pp. 217–228).

  • Liu, X., Zhang, J., & Guo, C. (2013). Full-text citation analysis: A new method to enhance scholarly networks. Journal of the Association for Information Science and Technology, 64(9), 1852–1863.

    Google Scholar 

  • Liu, Y., Niculescu-Mizil, A., Gryc, W. (2009). Topic-link lda: joint models of topic and author community. In: Proceedings of the 26th Annual International Conference on Machine Learning (pp. 665–672)

  • Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. S. (2020). S2ORC: The Semantic Scholar Open Research Corpus. In: Proceedings of ACL, arXiv.org/abs/1911.02782

  • Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781

  • Nanba, H., Kando, N., & Okumura, M. (2011). Classification of research papers using citation links and citation types: Towards automatic review article generation. Advances in Classification Research Online, 11(1), 117–134.

    Article  Google Scholar 

  • Newman, M. E. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6), 066133.

    Article  Google Scholar 

  • Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582.

    Article  Google Scholar 

  • Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.

    Article  Google Scholar 

  • Peng, F., & McCallum, A. (2006). Information extraction from research papers using conditional random fields. Information Processing & Management, 42(4), 963–979.

    Article  Google Scholar 

  • Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 2658–2663.

    Article  Google Scholar 

  • Rohe, K., Chatterjee, S., Yu, B., et al. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878–1915.

    Article  MathSciNet  Google Scholar 

  • Shao, J., Han, Z., Yang, Q., & Zhou, T. (2015). Community detection based on distance dynamics. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1075–1084)

  • Sobolevsky, S., Campari, R., Belyi, A., & Ratti, C. (2014). General optimization technique for high-quality community detection in complex networks. Physical Review E, 90(1), 012811.

    Article  Google Scholar 

  • Sun, H., Chng, E., Yong, X., Garibaldi, J. M., See, S., & Chen, D. (2018a). A fast community detection method in bipartite networks by distance dynamics. Physica A-statistical Mechanics and Its Applications, 496, 108–120.

    Article  Google Scholar 

  • Sun, H., He, F., Huang, J., Sun, Y., Li, Y., Wang, C., He, L., Sun, Z., & Jia, X. (2020). Network embedding for community detection in attributed networks. ACM Transactions on Knowledge Discovery from Data. https://doi.org/10.1145/3385415

  • Sun, Z., Wang, B., Sheng, J., Yu, Z., & Shao, J. (2018b). Overlapping community detection based on information dynamics. IEEE Access, 6, 70919–70934.

    Article  Google Scholar 

  • Tu, C., Zeng, X., Wang, H., Zhang, Z., Liu, Z., Sun, M., et al. (2019). A unified framework for community detection and network representation learning. IEEE Transactions on Knowledge and Data Engineering, 31(6), 1051–1065.

    Article  Google Scholar 

  • Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B, 86(11), 471.

    Article  Google Scholar 

  • Wang, D., Zhang, H., Liu, R., Liu, X., & Wang, J. (2016). Unsupervised feature selection through Gram–Schmidt orthogonalization: A word co-occurrence perspective. Neurocomputing, 173, 845–854.

    Article  Google Scholar 

  • Wang, P., Li, S., Zhou, H., Tang, J., & Wang, T. (2019). Cited text spans identification with an improved balanced ensemble model. Scientometrics, 120(3), 1111–1145.

    Article  Google Scholar 

  • Wang, T., Brede, M., Ianni, A., & Mentzakis, E. (2017). Detecting and characterizing eating-disorder communities on social media. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (pp. 91–100)

  • Wu, L., Zhang, Q., Chen, C., Guo, K., & Wang, D. (2020). Deep learning techniques for community detection in social networks. IEEE Access, 8, 96016–96026.

    Article  Google Scholar 

  • Xie, J., Szymanski, B. K., Liu, X. (2011). Slpa: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In: 2011 IEEE 11th International Conference on Data Mining Workshops (pp. 344–349). IEEE

  • Yang, J., & Leskovec, J. (2015). Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1), 181–213.

    Article  Google Scholar 

  • Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. In: 2013 IEEE 13th International Conference on Data Mining (pp. 1151–1156). IEEE

  • Yang, L., Cao, X., He, D., Wang, C., Wang, X., & Zhang, W. (2016). Modularity based community detection with deep learning. IJCAI, 16, 2252–2258.

    Google Scholar 

  • Zhang, H., Zhao, T., King, I., & Lyu, M. R. (2016). Modeling the homophily effect between links and communities for overlapping community detection. In: IJCAI (pp. 3938–3944)

Download references

Acknowledgements

We deeply appreciate the reviewers for their valuable feedback. We thank Feng Xu for his valuable suggestions and discussions. This research was supported by the Exploratory Research Foundation of the State Key Laboratory of Software Development Environment (SKLSDE −2017ZX−03), the Research Foundation of the Beijing Information Science and Technology University (2035015), CNCERT Key Foundation for Youths (2020Q08), the Basic Research Project of Military Commission of Science and Technology (2017−JCJQ−ZD−043−04), and the National Key Research and Development Project (2016QY04W0901).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Liu, M., Wang, B. et al. Discovering communities based on mention distance. Scientometrics 126, 1945–1967 (2021). https://doi.org/10.1007/s11192-021-03863-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-03863-9

Keywords

Navigation