Skip to main content
Log in

Group topic modeling for academic knowledge discovery

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Conference mining and expert finding are useful academic knowledge discovery problems from an academic recommendation point of view. Group level (GL) topic modeling can provide us with richer text semantics and relationships, which results in denser topics. And denser topics are more useful for academic discovery issues in contrast to Element level (EL) or Document level (DL) topic modeling, which produces sparser topics. Previous methods performed academic knowledge discovery by using network connectivity (only links not text of documents), keywords-based matching (no semantics) or by using semantics-based intrinsic structure of the words presented between documents (semantics at DL), while ignoring semantics-based intrinsic structure of the words and relationships between conferences (semantics at GL). In this paper, we consider semantics-based intrinsic structure of words and relationships presented in conferences (richer text semantics and relationships) by modeling from GL. We propose group topic modeling methods based on Latent Dirichlet Allocation (LDA). Detailed empirical evaluation shows that our proposed GL methods significantly outperformed DL methods for conference mining and expert finding problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Andrieu C, Freitas ND, Doucet A, Jordan M (2003) An introduction to MCMC for machine learning. J Mach Learn 50:5–43

    Article  MATH  Google Scholar 

  2. Azzopardi L, Girolami M, van Risjbergen K (2003) Investigating the relationship between language model perplexity and IR precision-recall measures. In: Proc of the 26th ACM SIGIR conference on research and development in information retrieval, Toronto, Canada, July 28–August 1, 2003

    Google Scholar 

  3. Balabanovic M, Shoham Y (1997) Content-based collaborative recommendation. Commun ACM

  4. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Blei DM, Lafferty J (2006) Dynamic topic models. In: Proc of 23rd international conference on machine learning (ICML), Pittsburgh, Pennsylvania, USA, June 25–29, 2006

    Google Scholar 

  6. Breese J, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proc of the international conference on uncertainty in intelligence (UAI), pp 43–52

    Google Scholar 

  7. Daud A, Li J, Zhu L, Muhammad F (2010) Knowledge discovery through directed probabilistic topic models a survey. J Front Comput Sci China 4(2):280–301

    Article  Google Scholar 

  8. Daud A, Li J, Zhu L, Muhammad F (2009) Conference mining via generalized topic modeling. In: Buntine W et al (ed) Proc of European conference on machine learning and principles and practices of knowledge discovery in databases (ECML PKDD), Part I. LNAI, vol 5781, pp 244–259

    Google Scholar 

  9. Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Sys 22(1):143–177

    Article  Google Scholar 

  10. DBLP bibliography database. http://www.informatik.uni-trier.de/~ley/db/

  11. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. In: Proc of the national academy of sciences, USA, vol 99, pp 8271–8276

    Google Scholar 

  12. Griffiths TL, Steyvers M (2004) Finding scientific topics. In: Proc of the national academy of sciences, pp 5228–5235

    Google Scholar 

  13. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proc of the 15th annual conference on uncertainty in artificial intelligence (UAI), Stockholm, Sweden, July 30–August 1, 1999

    Google Scholar 

  14. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307

    MATH  Google Scholar 

  15. Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P (2007) Mining eclipse developer contributions via author-topic models. In: 29th international conference on software engineering workshops (ICSEW)

    Google Scholar 

  16. Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proc of the international symposium on string processing and information retrieval (SPIRE), Lisbon, Portugal, September 11–13, 2002, pp 1–10

    Google Scholar 

  17. McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proc of the 6th ACM SIGKDD conference on knowledge discovery and data mining, Boston, MA, USA, August 20–23, 2000, pp 169–178

    Chapter  Google Scholar 

  18. Popescul A, Flake GW, Lawrence S et al. (2000) Clustering and identifying temporal trends in document databases. IEEE Adv Digit Libr 173–182

  19. Pothen A, Simon H, Liou KP (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl 11:430–452

    Article  MathSciNet  MATH  Google Scholar 

  20. Radicchi F, Castellano C, Cecconi F et al (2004) Dening and identifying communities in networks. In: Proc of the national academy of sciences, USA

    Google Scholar 

  21. Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proc of the 20th international conference on uncertainty in artificial intelligence (UAI), Banff, Canada, July 7–11 2004

    Google Scholar 

  22. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) ArnetMiner: extraction and mining of academic social networks. In: Proc of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, USA, August 24–27, 2008

    Google Scholar 

  23. Tyler JR, Wilkinson DM, Huberman BA (2003) Email as spectroscopy: automated discovery of community structure within organizations. In: Proc of the international conference on communities and technologies, pp 81–96

    Google Scholar 

  24. Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proc of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, USA, August 20–23, 2006

    Google Scholar 

  25. Wang J, Xu C, Li G, Dai Z, Luo G (2007) Understanding research field evolving and trend with dynamic Bayesian networks. In: Proc of the PAKDD

    Google Scholar 

  26. Zaiane OR, Chen J, Goebel R (2007) DBconnect: mining research community on DBLP data. In: Joint 9th WEBKDD and 1st SNA-KDD workshop, San Jose, California, USA, August 12, 2007

    Google Scholar 

  27. Zhang J, Tang J, Liang B et al (2008) Recommendation over a heterogeneous social network. In: Proc of the 9th international conference on web-age information management (WAIM), ZhangJiaJie, China, July 20–22, 2008

    Google Scholar 

  28. Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc of the 24th ACM SIGIR international conference on information retrieval, pp 334–342

    Google Scholar 

  29. Kim HR, Chan PK (2008) Learning implicit user interest hierarchy for context in personalization. J Appl Intell 28:153–166

    Article  Google Scholar 

  30. Diederich J (2003) Authorship attribution with support vector machines. J Appl Intell 19:109–123

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Daud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daud, A., Muhammad, F. Group topic modeling for academic knowledge discovery. Appl Intell 36, 870–886 (2012). https://doi.org/10.1007/s10489-011-0302-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-011-0302-3

Keywords

Navigation