Skip to main content

Identifying Scholarly Communities from Unstructured Texts

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10987))

Abstract

Scholarly community detection has important applications in various fields. Previous studies have relied heavily on structured scholar networks, which have high computational complexity and are challenging to construct in practice. We propose a novel alternative that can identify scholarly communities directly from large textual corpora. To our knowledge, this is the first study intended to detect communities directly from unstructured texts. Generally, academic articles tend to mention related work and researchers. Researchers that are more closely related to each other are mentioned in a closer grouping in lines of academic text. Based on this correlation, we develop an intuitional method that measures the mutual relatedness of researchers through their textual distance. First, we extract and disambiguate the researcher names from academic articles. Then, we embed each researcher as an implicit vector and measure the relatedness of researchers by their vector distance. Finally, the communities are identified by vector clusters. We implement and evaluate our method on three real-world datasets. The experimental results demonstrate that our method achieves better performance than state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://clair.si.umich.edu/homepage/downloads/aan_relational/.

  2. 2.

    http://clair.si.umich.edu/homepage/downloads/aan_session/.

  3. 3.

    http://aclweb.org/anthology/.

  4. 4.

    http://clair.eecs.umich.edu/aan/index.php.

References

  1. Radicchi, F., Castellano, C., Cecconi, F., et al.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)

    Article  Google Scholar 

  2. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)

    Article  Google Scholar 

  3. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  4. Waltman, L., van Eck, N.J.: A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86(11), 471 (2013)

    Article  Google Scholar 

  5. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)

    Article  Google Scholar 

  6. Arenas, A., Fernandez, A., Gomez, S.: Analysis of the structure of complex networks at different resolution levels. New J. Phys. 10(5), 053039 (2008)

    Article  Google Scholar 

  7. Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Phys. Rev. E 72(2), 027104 (2005)

    Article  Google Scholar 

  8. Sobolevsky, S., Campari, R., Belyi, A., et al.: General optimization technique for high-quality community detection in complex networks. Phys. Rev. E 90(1), 012811 (2014)

    Article  Google Scholar 

  9. Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  10. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theor. Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  11. Bohlin, L., Edler, D., Lancichinetti, A., Rosvall, M.: Community detection and visualization of networks with the map equation framework. In: Ding, Y., Rousseau, R., Wolfram, D. (eds.) Measuring Scholarly Impact, pp. 3–34. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10377-8_1

    Chapter  Google Scholar 

  12. Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. In: ICDM (2013)

    Google Scholar 

  13. Wang, T., Brede, M., Ianni, A., et al.: Detecting and Characterizing Eating-Disorder Communities on Social Media. In: WSDM, pp. 91–100 (2017)

    Google Scholar 

  14. Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39(4), 1878–1915 (2011)

    Article  MathSciNet  Google Scholar 

  15. Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99(12), 7821–7826 (2002)

    Article  MathSciNet  Google Scholar 

  16. Shao, J., Han, Z., Yang, Q., et al.: Community detection based on distance dynamics. In: KDD 2015, pp. 1075–1084 (2015)

    Google Scholar 

  17. Zhang, H., Zhao, T., King, I., et al.: Modeling the homophily effect between links and communities for overlapping community detection. In: IJCAI, pp. 3938–3944 (2016)

    Google Scholar 

  18. Han, Y., Tang, J.: Probabilistic community and role model for social networks. In: KDD, pp. 407–416 (2015)

    Google Scholar 

  19. Jin, D., Chen, Z., He, D., Zhang, W.: Modeling with node degree preservation can accurately find communities. In: AAAI (2015)

    Google Scholar 

  20. Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: ICML (2009)

    Google Scholar 

  21. Balasubramanyan, R., Cohen, W.W.: Block-LDA: jointly modeling entity-annotated text and entity-entity links. In: SDM (2011)

    Chapter  Google Scholar 

  22. Chang, J., Blei, D.M.: Relational topic models for document networks. In: AISTATS (2009)

    Google Scholar 

  23. Yang, L., Cao, X., He, D., et al.: Modularity based community detection with deep learning. In: IJCAI, pp. 2252–2258 (2016)

    Google Scholar 

  24. Ding, W., Lin, C., Ishwar, P.: Node embedding via word embedding for network community discovery. IEEE Trans. Sig. Inf. Process. Netw. 3(3), 539–552 (2017)

    MathSciNet  Google Scholar 

  25. Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)

    Google Scholar 

  26. Peng, F., McCallum, A.: Information extraction from research papers using conditional random fields. Inf. process. Manag. 42(4), 963–979 (2006)

    Article  Google Scholar 

  27. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  28. Wang, D., Zhang, H., Liu, R., Liu, X., Wang, J.: Unsupervised feature selection through gram-schmidt orthogonalization - a word co-occurrence perspective. Neurocomputing 173, 845–854 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the Foundation of the State Key Laboratory of Software Development Environment (No. SKLSDE-2017ZX-03)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, M., Chen, Y., Lang, B., Zhang, L., Niu, H. (2018). Identifying Scholarly Communities from Unstructured Texts. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96890-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96889-6

  • Online ISBN: 978-3-319-96890-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics