Skip to main content
Log in

Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Citation recommendation is the task of suggesting a list of references for an author given a manuscript. This is important for academic research for it provides an efficient and easy way to find relevant literatures. In this paper, we propose a novel probabilistic topic model to automatically recommend citations for researchers. The model considers not only text content similarity between papers but also community relevance among authors for effective citation recommendation. To fully utilize content and diversified link information in a bibliographic network, we extend LDA with matrix factorization, so that semantic topic learning and community detection are essentially reinforcing each other during parameter estimation. We also develop a flexible way to generate a family of citation link probability functions, which can substantially increase the model capacity. Experimental results on the ANN and DBLP dataset show that our model outperforms baseline algorithms for citation recommendation, and is capable of generating qualified author communities and topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://clair.eecs.umich.edu/aan/index.php.

  2. https://cn.aminer.org/data.

References

  • Barceló G, Cendejas E, Sidorov G et al (2009) Formal grammar for hispanic named entities analysis. International conference on intelligent text processing and computational linguistics. Springer Berlin Heidelberg, pp 183–194

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Cai D, He X, Wu X et al (2008) Non-negative matrix factorization on manifold. In: Eighth IEEE international conference on data mining, pp 63–72

  • Chang J, Blei DM (2009) Relational topic models for document networks. In: AIStats, pp 81–88

  • Chen J, Saad Y (2012) Dense subgraph extraction with application to community detection. IEEE Trans Knowl Data Eng 24:1216–1230

    Article  Google Scholar 

  • Cohen W and Sarawagi S (2004) Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 89–98

  • Cohn D, Hofmann T (2001) The missing link-a probabilistic model of document content and hypertext connectivity. Adv Neural Inf Process Syst 430–436

  • Cota RG, Ferreira AA, Nascimento C et al (2010) An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J Am Soc Inf Sci Technol 61(9):1853–1870

    Article  Google Scholar 

  • Erosheva E, Fienberg S, Lafferty J (2004) Mixed-membership models of scientific publications. Proc Natl Acad Sci 101:5220–5227

    Article  Google Scholar 

  • Etzioni O, Cafarella M, Downey D et al (2005) Unsupervised named-entity extraction from the web: an experimental study. Artif intell 165(1):91–134

    Article  Google Scholar 

  • Gori M, Pucci (2006) A Research paper recommender systems: a random-walk based approach. In: IEEE/WIC/ACM international conference on web intelligence, pp 778–781

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235

    Article  Google Scholar 

  • He Q, Pei J, Kifer D et al (2010) Context-aware citation recommendation. In: Proceedings of the 19th international conference on World Wide Web, pp 421–430

  • Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196

    Article  MATH  Google Scholar 

  • Kataria S (2012) Topic models for link prediction in document networks. The Pennsylvania State University, University Park

    Google Scholar 

  • Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, pp 631–640

  • Li C, Cheung WK, Ye Y et al (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44:359–383

    Article  Google Scholar 

  • Lin YR, Sun J, Sundaram H et al (2011) Community discovery via metagraph factorization. ACM Trans Knowl Discov Data 5:17

    Article  Google Scholar 

  • Liu X, Bollen J, Nelson ML et al (2005) Co-authorship networks in the digital library research community. Inf Process Manage 41:1462–1480

    Article  Google Scholar 

  • Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th annual international conference on machine learning, pp 665–672

  • Mcauliffe JD, Blei DM (2008) Supervised topic models. Adv Neural Inf Process Syst pp 121–128.

  • McNee SM, Albert I, Cosley D et al (2002) On the recommendation of citations for research papers. In: Proceeding of ACM conference on computer supported cooperative work, pp 116–125

  • Mei Q, Cai D, Zhang D et al (2008) Topic modeling with network regularization. In: Proceedings of the 17th international conference on world wide web, pp 101–110

  • Meng F, Gao D, Li W et al (2013) A unified graph model for personalized query-oriented reference paper recommendation. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1509–1512

  • Mimno D, Wallach HM, McCallum A (2007) Community-based link prediction with text. In: Workshop on statistical models of networks, the 21st annual conference on neural information processing systems

  • Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence, pp 352–359

  • Nallapati RM, Ahmed A, Xing EP et al (2008) Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 542–550

  • Nie Z, Zhang Y, Wen J-R et al (2005) Object-level ranking: bringing order to web objects. In: Proceedings of the 14th international conference on world wide web, pp 567–574

  • Purushotham S, Liu Y, Kuo C-CJ (2012) Collaborative topic regression with social matrix factorization for recommendation systems. In: Proceedings of the 29th annual international conference on machine learning, pp 325–341

  • Radev DR, Muthukrishnan P, Qazvinian V et al (2013) The ACL anthology network corpus. Lang Resour Eval 47:919–944

    Article  Google Scholar 

  • Ren X, Liu J, Yu X et al (2014) Cluscite: Effective citation recommendation by information network-based clustering. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 821–830

  • Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, pp 232–241

  • Rosen-Zvi M, Griffiths T, Steyvers M et al (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494

  • Sugiyama K, Kan MY (2010) Scholarly paper recommendation via user’s recent research interests. In: Proceedings of the 10th annual joint conference on digital libraries, pp 29–38

  • Tang J, Zhang J, Yao L et al (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 990–998

  • Tang X, Wan X, Zhang X (2014) Cross-language context-aware citation recommendation in scientific articles. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, pp 817–826

  • Thompson P and Dozier C (1997) Name searching and information retrieval. In: Proceedings of 2nd Conference on empirical methods in natural language processing EMNLP, pp 134–140

  • Torres R, McNee SM, Abel M et al (2004) Enhancing digital libraries with TechLens+. In: Proceedings of the 4th ACM/IEEE-CS joint conference on digital libraries, pp 228–236

  • Von Luxburg U (2007) A tutorial on spectral clustering. Stat comput 17:395–416

    Article  MathSciNet  Google Scholar 

  • Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 448–456

  • Wang H, Li W (2015) Relational collaborative topic regression for recommender systems. IEEE Trans Knowl Data Eng 27:1343–1355

    Article  Google Scholar 

  • Wang F, Li T, Wang X et al (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Disc 22:493–521

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Z, Wang W, Xue G et al (2015) Semi-supervised community detection framework based on non-negative factorization using individual labels. In: International conference in swarm intelligence, pp 349–359

  • Yang Z, Hong L, Davison BD (2013) Academic network analysis: a joint topic modeling approach. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and Mining, pp 324–333

  • Yang L, Cao X, Jin D et al (2015) A unified semi-supervised community detection framework using latent space graph regularization. IEEE trans cybern 45:2585–2598

    Article  Google Scholar 

  • Zhang ZY (2013) Community structure detection in complex networks with partial background information. EPL 101(4):48005

    Article  Google Scholar 

  • Zhang H, Qiu B, Giles CL et al (2007) An LDA-based community structure discovery approach for large-scale social networks. Intell Secur Inf 200–207

  • Zhang ZY, Sun KD, Wang SQ (2013) Enhanced community structure detection in complex networks with partial background information. Sci Rep 3(11):3241

    Article  Google Scholar 

  • Zhang X, Guan N, Zhang W et al (2015) Symmetric non-negative matrix factorization based link partition method for overlapping community detection. In: IEEE International conference on systems, man, and cybernetics, pp 2198–2203

Download references

Acknowledgements

The work described in this paper was partially support by National Natural Science Foundation of China (Project No. 61373046) and Natural Science Basic Research Plan in Shaanxi Province of China (Project No. S2015YFJM2129).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, T., Zhu, L., Cai, X. et al. Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network. J Ambient Intell Human Comput 9, 957–975 (2018). https://doi.org/10.1007/s12652-017-0497-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0497-1

Keywords

Navigation