Abstract
Conference mining and expert finding are useful academic knowledge discovery problems from an academic recommendation point of view. Group level (GL) topic modeling can provide us with richer text semantics and relationships, which results in denser topics. And denser topics are more useful for academic discovery issues in contrast to Element level (EL) or Document level (DL) topic modeling, which produces sparser topics. Previous methods performed academic knowledge discovery by using network connectivity (only links not text of documents), keywords-based matching (no semantics) or by using semantics-based intrinsic structure of the words presented between documents (semantics at DL), while ignoring semantics-based intrinsic structure of the words and relationships between conferences (semantics at GL). In this paper, we consider semantics-based intrinsic structure of words and relationships presented in conferences (richer text semantics and relationships) by modeling from GL. We propose group topic modeling methods based on Latent Dirichlet Allocation (LDA). Detailed empirical evaluation shows that our proposed GL methods significantly outperformed DL methods for conference mining and expert finding problems.
Similar content being viewed by others
References
Andrieu C, Freitas ND, Doucet A, Jordan M (2003) An introduction to MCMC for machine learning. J Mach Learn 50:5–43
Azzopardi L, Girolami M, van Risjbergen K (2003) Investigating the relationship between language model perplexity and IR precision-recall measures. In: Proc of the 26th ACM SIGIR conference on research and development in information retrieval, Toronto, Canada, July 28–August 1, 2003
Balabanovic M, Shoham Y (1997) Content-based collaborative recommendation. Commun ACM
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Blei DM, Lafferty J (2006) Dynamic topic models. In: Proc of 23rd international conference on machine learning (ICML), Pittsburgh, Pennsylvania, USA, June 25–29, 2006
Breese J, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proc of the international conference on uncertainty in intelligence (UAI), pp 43–52
Daud A, Li J, Zhu L, Muhammad F (2010) Knowledge discovery through directed probabilistic topic models a survey. J Front Comput Sci China 4(2):280–301
Daud A, Li J, Zhu L, Muhammad F (2009) Conference mining via generalized topic modeling. In: Buntine W et al (ed) Proc of European conference on machine learning and principles and practices of knowledge discovery in databases (ECML PKDD), Part I. LNAI, vol 5781, pp 244–259
Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Sys 22(1):143–177
DBLP bibliography database. http://www.informatik.uni-trier.de/~ley/db/
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. In: Proc of the national academy of sciences, USA, vol 99, pp 8271–8276
Griffiths TL, Steyvers M (2004) Finding scientific topics. In: Proc of the national academy of sciences, pp 5228–5235
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proc of the 15th annual conference on uncertainty in artificial intelligence (UAI), Stockholm, Sweden, July 30–August 1, 1999
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307
Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P (2007) Mining eclipse developer contributions via author-topic models. In: 29th international conference on software engineering workshops (ICSEW)
Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proc of the international symposium on string processing and information retrieval (SPIRE), Lisbon, Portugal, September 11–13, 2002, pp 1–10
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proc of the 6th ACM SIGKDD conference on knowledge discovery and data mining, Boston, MA, USA, August 20–23, 2000, pp 169–178
Popescul A, Flake GW, Lawrence S et al. (2000) Clustering and identifying temporal trends in document databases. IEEE Adv Digit Libr 173–182
Pothen A, Simon H, Liou KP (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl 11:430–452
Radicchi F, Castellano C, Cecconi F et al (2004) Dening and identifying communities in networks. In: Proc of the national academy of sciences, USA
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proc of the 20th international conference on uncertainty in artificial intelligence (UAI), Banff, Canada, July 7–11 2004
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) ArnetMiner: extraction and mining of academic social networks. In: Proc of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, USA, August 24–27, 2008
Tyler JR, Wilkinson DM, Huberman BA (2003) Email as spectroscopy: automated discovery of community structure within organizations. In: Proc of the international conference on communities and technologies, pp 81–96
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proc of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, USA, August 20–23, 2006
Wang J, Xu C, Li G, Dai Z, Luo G (2007) Understanding research field evolving and trend with dynamic Bayesian networks. In: Proc of the PAKDD
Zaiane OR, Chen J, Goebel R (2007) DBconnect: mining research community on DBLP data. In: Joint 9th WEBKDD and 1st SNA-KDD workshop, San Jose, California, USA, August 12, 2007
Zhang J, Tang J, Liang B et al (2008) Recommendation over a heterogeneous social network. In: Proc of the 9th international conference on web-age information management (WAIM), ZhangJiaJie, China, July 20–22, 2008
Zhai C, Lafferty J (2001) A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc of the 24th ACM SIGIR international conference on information retrieval, pp 334–342
Kim HR, Chan PK (2008) Learning implicit user interest hierarchy for context in personalization. J Appl Intell 28:153–166
Diederich J (2003) Authorship attribution with support vector machines. J Appl Intell 19:109–123
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Daud, A., Muhammad, F. Group topic modeling for academic knowledge discovery. Appl Intell 36, 870–886 (2012). https://doi.org/10.1007/s10489-011-0302-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-011-0302-3