Abstract
Scholarly community detection has important applications in various fields. Previous studies have relied heavily on structured scholar networks, which have high computational complexity and are challenging to construct in practice. We propose a novel alternative that can identify scholarly communities directly from large textual corpora. To our knowledge, this is the first study intended to detect communities directly from unstructured texts. Generally, academic articles tend to mention related work and researchers. Researchers that are more closely related to each other are mentioned in a closer grouping in lines of academic text. Based on this correlation, we develop an intuitional method that measures the mutual relatedness of researchers through their textual distance. First, we extract and disambiguate the researcher names from academic articles. Then, we embed each researcher as an implicit vector and measure the relatedness of researchers by their vector distance. Finally, the communities are identified by vector clusters. We implement and evaluate our method on three real-world datasets. The experimental results demonstrate that our method achieves better performance than state-of-the-art methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Radicchi, F., Castellano, C., Cecconi, F., et al.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Waltman, L., van Eck, N.J.: A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86(11), 471 (2013)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)
Arenas, A., Fernandez, A., Gomez, S.: Analysis of the structure of complex networks at different resolution levels. New J. Phys. 10(5), 053039 (2008)
Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Phys. Rev. E 72(2), 027104 (2005)
Sobolevsky, S., Campari, R., Belyi, A., et al.: General optimization technique for high-quality community detection in complex networks. Phys. Rev. E 90(1), 012811 (2014)
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theor. Exp. 2008(10), P10008 (2008)
Bohlin, L., Edler, D., Lancichinetti, A., Rosvall, M.: Community detection and visualization of networks with the map equation framework. In: Ding, Y., Rousseau, R., Wolfram, D. (eds.) Measuring Scholarly Impact, pp. 3–34. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10377-8_1
Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. In: ICDM (2013)
Wang, T., Brede, M., Ianni, A., et al.: Detecting and Characterizing Eating-Disorder Communities on Social Media. In: WSDM, pp. 91–100 (2017)
Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39(4), 1878–1915 (2011)
Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99(12), 7821–7826 (2002)
Shao, J., Han, Z., Yang, Q., et al.: Community detection based on distance dynamics. In: KDD 2015, pp. 1075–1084 (2015)
Zhang, H., Zhao, T., King, I., et al.: Modeling the homophily effect between links and communities for overlapping community detection. In: IJCAI, pp. 3938–3944 (2016)
Han, Y., Tang, J.: Probabilistic community and role model for social networks. In: KDD, pp. 407–416 (2015)
Jin, D., Chen, Z., He, D., Zhang, W.: Modeling with node degree preservation can accurately find communities. In: AAAI (2015)
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: ICML (2009)
Balasubramanyan, R., Cohen, W.W.: Block-LDA: jointly modeling entity-annotated text and entity-entity links. In: SDM (2011)
Chang, J., Blei, D.M.: Relational topic models for document networks. In: AISTATS (2009)
Yang, L., Cao, X., He, D., et al.: Modularity based community detection with deep learning. In: IJCAI, pp. 2252–2258 (2016)
Ding, W., Lin, C., Ishwar, P.: Node embedding via word embedding for network community discovery. IEEE Trans. Sig. Inf. Process. Netw. 3(3), 539–552 (2017)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)
Peng, F., McCallum, A.: Information extraction from research papers using conditional random fields. Inf. process. Manag. 42(4), 963–979 (2006)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Wang, D., Zhang, H., Liu, R., Liu, X., Wang, J.: Unsupervised feature selection through gram-schmidt orthogonalization - a word co-occurrence perspective. Neurocomputing 173, 845–854 (2016)
Acknowledgments
This research was supported by the Foundation of the State Key Laboratory of Software Development Environment (No. SKLSDE-2017ZX-03)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Liu, M., Chen, Y., Lang, B., Zhang, L., Niu, H. (2018). Identifying Scholarly Communities from Unstructured Texts. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-96890-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96889-6
Online ISBN: 978-3-319-96890-2
eBook Packages: Computer ScienceComputer Science (R0)