Abstract
This paper addresses the problem of semantics-based maven search in research community, which means identifying a person with some given expertise. Traditional approaches either ignored semantic knowledge or temporal information, resulting in some right mavens that cannot be effectively identified because of non-occurrence of keywords and un-exploitation of time effects. In this paper, we propose a novel semantics and temporal information based maven search (STMS) approach to discover latent topics (semantically related soft clusters of words) between the authors, venues (conferences or journals) and time simultaneously. In the proposed approach, each author in a venue is represented as a probability distribution over topics, and each topic is represented as a probability distribution over words and year of the venue for that topic. Through discovered latent topics we can search mavens by implicitly modeling word-author, author-author and author-venue correlations with continuous time effects. Inference making procedure for topics and authors of new venues is explained. We also show how authors’ correlations can be discovered and the bad effect of topics sparseness on the retrieval performance. Experimental results on the corpus downloaded from DBLP show that proposed approach significantly outperformed the baseline approach, due to its ability to produce less sparse topics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrieu, C., Freitas, N.D., Doucet, A., Jordan, M.: An Introduction to MCMC for Machine Learning. Journal of Machine Learning 50, 5–43 (2003)
Azzopardi, L., Girolami, M., Risjbergen, K.V.: Investigating the Relationship between Language Model Perplexity and IR Precision-Recall Measures. In: Proc. of the 26th ACM SIGIR, Toronto, Canada, July 28-August 1 (2003)
Balog, K., Azzopardi, L., de Rijke, M.: Formal Models for Expert Finding in Enterprise Corpora. In: Proc. of SIGIR, pp. 43–55 (2006)
Balog, K., Bogers, T., Azzopardi, L., Rijke, M., Bosch, A.: Broad Expertise Retrieval in Sparse Data Environments. In: Proc. of SIGIR, pp. 551–558 (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Cao, Y., Liu, J., Bao, S., Li, H.: Research on Expert Search at Enterprise Track of TREC (2005)
DBLP Bibliography Database, http://www.informatik.uni-trier.de/~ley/db/
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Griffiths, T.L., Steyvers, M.: Finding Scientific Topics. In: Proc. of the National Academy of Sciences, USA, pp. 5228–5235 (2004)
Hawking, D.: Challenges in Enterprise Search. In: Proc. of the 15th Conference on Australasian Database, vol. 27, pp. 15–24 (2004)
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of the 15th Annual Conference on UAI, Stockholm, Sweden, July 30-August 1 (1999)
Hofmann, T., Puzicha, J., Jordan, M.I.: Learning from Dyadic Data. In: Advances in Neural Information Processing Systems (NIPS), vol. 11. MIT Press, Cambridge (1999)
Mimno, D., McCallum, A.: Expertise Modeling for Matching Papers with Reviewers. In: Proc. of the 13th ACM SIGKDD, pp. 500–509 (2007)
Nie, Z., Ma, Y., Shi, S., Wen, J., Ma, W.: Web Object Retrieval. In: Proc. of World Wide Web (WWW), pp. 81–90 (2007)
Petkova, D., Croft, W.B.: Generalizing the Language Modeling Framework for Named Entity Retrieval. In: Proc. of SIGIR (2007)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The Author-Topic Model for Authors and Documents. In: Proc. of the 20th International Conference on UAI, Canada (2004)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: Extraction and Mining of Academic Social Networks. In: Proc. of the 14th ACM SIGKDD (2008)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet Processes. Technical Report 653, Department of Statistics, UC Berkeley (2004)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad-hoc Information Retrieval. In: Proc. of the 24th ACM SIGIR, pp. 334–342 (2001)
Zhang, J., Tang, J., Liu, L., Li, J.: A Mixture Model for Expert Finding. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 466–478. Springer, Heidelberg (2008)
Zhang, J., Tang, J., Li, J.: Expert Finding in a Social Network. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 1066–1069. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daud, A., Li, J., Zhou, L., Muhammad, F. (2009). A Generalized Topic Modeling Approach for Maven Search. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-00672-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)