Skip to main content

A Generalized Topic Modeling Approach for Maven Search

  • Conference paper
Advances in Data and Web Management (APWeb 2009, WAIM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

Abstract

This paper addresses the problem of semantics-based maven search in research community, which means identifying a person with some given expertise. Traditional approaches either ignored semantic knowledge or temporal information, resulting in some right mavens that cannot be effectively identified because of non-occurrence of keywords and un-exploitation of time effects. In this paper, we propose a novel semantics and temporal information based maven search (STMS) approach to discover latent topics (semantically related soft clusters of words) between the authors, venues (conferences or journals) and time simultaneously. In the proposed approach, each author in a venue is represented as a probability distribution over topics, and each topic is represented as a probability distribution over words and year of the venue for that topic. Through discovered latent topics we can search mavens by implicitly modeling word-author, author-author and author-venue correlations with continuous time effects. Inference making procedure for topics and authors of new venues is explained. We also show how authors’ correlations can be discovered and the bad effect of topics sparseness on the retrieval performance. Experimental results on the corpus downloaded from DBLP show that proposed approach significantly outperformed the baseline approach, due to its ability to produce less sparse topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrieu, C., Freitas, N.D., Doucet, A., Jordan, M.: An Introduction to MCMC for Machine Learning. Journal of Machine Learning 50, 5–43 (2003)

    Article  MATH  Google Scholar 

  2. Azzopardi, L., Girolami, M., Risjbergen, K.V.: Investigating the Relationship between Language Model Perplexity and IR Precision-Recall Measures. In: Proc. of the 26th ACM SIGIR, Toronto, Canada, July 28-August 1 (2003)

    Google Scholar 

  3. Balog, K., Azzopardi, L., de Rijke, M.: Formal Models for Expert Finding in Enterprise Corpora. In: Proc. of SIGIR, pp. 43–55 (2006)

    Google Scholar 

  4. Balog, K., Bogers, T., Azzopardi, L., Rijke, M., Bosch, A.: Broad Expertise Retrieval in Sparse Data Environments. In: Proc. of SIGIR, pp. 551–558 (2007)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Cao, Y., Liu, J., Bao, S., Li, H.: Research on Expert Search at Enterprise Track of TREC (2005)

    Google Scholar 

  7. DBLP Bibliography Database, http://www.informatik.uni-trier.de/~ley/db/

  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Griffiths, T.L., Steyvers, M.: Finding Scientific Topics. In: Proc. of the National Academy of Sciences, USA, pp. 5228–5235 (2004)

    Google Scholar 

  10. Hawking, D.: Challenges in Enterprise Search. In: Proc. of the 15th Conference on Australasian Database, vol. 27, pp. 15–24 (2004)

    Google Scholar 

  11. Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of the 15th Annual Conference on UAI, Stockholm, Sweden, July 30-August 1 (1999)

    Google Scholar 

  12. Hofmann, T., Puzicha, J., Jordan, M.I.: Learning from Dyadic Data. In: Advances in Neural Information Processing Systems (NIPS), vol. 11. MIT Press, Cambridge (1999)

    Google Scholar 

  13. Mimno, D., McCallum, A.: Expertise Modeling for Matching Papers with Reviewers. In: Proc. of the 13th ACM SIGKDD, pp. 500–509 (2007)

    Google Scholar 

  14. Nie, Z., Ma, Y., Shi, S., Wen, J., Ma, W.: Web Object Retrieval. In: Proc. of World Wide Web (WWW), pp. 81–90 (2007)

    Google Scholar 

  15. Petkova, D., Croft, W.B.: Generalizing the Language Modeling Framework for Named Entity Retrieval. In: Proc. of SIGIR (2007)

    Google Scholar 

  16. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The Author-Topic Model for Authors and Documents. In: Proc. of the 20th International Conference on UAI, Canada (2004)

    Google Scholar 

  17. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: Extraction and Mining of Academic Social Networks. In: Proc. of the 14th ACM SIGKDD (2008)

    Google Scholar 

  18. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet Processes. Technical Report 653, Department of Statistics, UC Berkeley (2004)

    Google Scholar 

  19. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad-hoc Information Retrieval. In: Proc. of the 24th ACM SIGIR, pp. 334–342 (2001)

    Google Scholar 

  20. Zhang, J., Tang, J., Liu, L., Li, J.: A Mixture Model for Expert Finding. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 466–478. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Zhang, J., Tang, J., Li, J.: Expert Finding in a Social Network. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 1066–1069. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Daud, A., Li, J., Zhou, L., Muhammad, F. (2009). A Generalized Topic Modeling Approach for Maven Search. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00672-2_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00671-5

  • Online ISBN: 978-3-642-00672-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics