ABSTRACT
We consider the problem of a user navigating an unfamiliar corpus of text documents where document metadata is limited or unavailable, the domain is specialized, and the user base is small. These challenging conditions may hold, for example, within an organization such as a business or government agency. We propose to augment standard keyword search with user feedback on latent topics. These topics are automatically learned from the corpus in an unsupervised manner and presented alongside search results. User feedback is then used to reformulate the original query, resulting in improved information retrieval performance in our experiments.
- D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In ICML, pages 25--32. Omnipress, 2009. Google ScholarDigital Library
- D. Blei, L. Carin, and D. Dunson. Probabilistic topic models. Signal Processing Magazine, IEEE, 27(6):55--65, 2010.Google Scholar
- D. Blei and J. Lafferty. Visualizing topics with multi-word expressions. Technical report, 2009. arXiv:0907.1013v1 {stat.ML}.Google Scholar
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- O. Bodenreider. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(suppl 1):D267--D270, 2004.Google ScholarCross Ref
- C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In TREC, pages 69--80. NIST, 1994.Google Scholar
- C. Chemudugunta, A. Holloway, P. Smyth, and M. Steyvers. Modeling documents by combining semantic concepts with unsupervised statistical learning. In ISWC, pages 229--244. Springer, 2008. Google ScholarDigital Library
- B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley, 2010. Google ScholarDigital Library
- W. Dakka, P. G. Ipeirotis, and K. R. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM, pages 768--775. ACM, 2005. Google ScholarDigital Library
- C. Fellbaum. WordNet : an Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
- M. J. Gardner, J. Lutes, J. Lund, J. Hansen, D. Walker, E. Ringger, and K. Seppi. The topic browser: An interactive tool for browsing topic models. In NIPS Workshop on Challenges of Data Visualization. MIT Press, 2010.Google Scholar
- T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(Suppl 1):5228--5235, 2004.Google ScholarCross Ref
- M. Hoffman, D. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In NIPS, pages 856--864. MIT Press, 2010.Google ScholarDigital Library
- J. Koren, Y. Zhang, and X. Liu. Personalized interactive faceted search. In WWW, pages 477--486, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- J. H. Lau, D. Newman, S. Karimi, and T. Baldwin. Best topic word selection for topic labelling. In Coling 2010: Posters, pages 605--613. Coling 2010 Organizing Committee, 2010. Google ScholarDigital Library
- W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006. Google ScholarDigital Library
- S. Liberman and R. Lempel. Approximately optimal facet selection. In (submission), 2011.Google Scholar
- X. Ling, Q. Mei, C. Zhai, and B. Schatz. Mining multi-faceted overviews of arbitrary topics in a text collection. In KDD, pages 497--505. ACM, 2008. Google ScholarDigital Library
- T.-Y. Liu. Learning to rank for information retrieval. In SIGIR Tutorials, page 904, 2010. Google ScholarDigital Library
- Y. Lu, Q. Mei, and C. Zhai. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Information Retrieval, pages 1--26, 2010. Google ScholarDigital Library
- A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
- D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage., 40:735--750, 2004. Google ScholarDigital Library
- D. M. Mimno and A. McCallum. Organizing the OCA: learning faceted subjects from a library of digital books. In JCDL, pages 376--385. ACM, 2007. Google ScholarDigital Library
- T. Minka and J. Lafferty. Expectation-propagation for the generative aspect model. In UAI, pages 352--359. Morgan Kaufmann, 2002. Google ScholarDigital Library
- D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In HLT-NAACL, pages 100--108, Morristown, NJ, USA, 2010. ACL. Google ScholarDigital Library
- D. Newman, Y. Noh, E. Talley, S. Karimi, and T. Baldwin. Evaluating topic models for digital libraries. In JCDL, pages 215--224. ACM, 2010. Google ScholarDigital Library
- L. A. Park and K. Ramamohanarao. The sensitivity of latent Dirichlet allocation for information retrieval. In ECML PKDD, pages 176--188. Springer-Verlag, 2009. Google ScholarDigital Library
- A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3:703--710, 2010. Google ScholarDigital Library
- E. Stoica, M. Hearst, and M. Richardson. Automating creation of hierarchical faceted metadata structures. In HLT-NAACL, pages 244--251, Rochester, New York, April 2007. ACL.Google Scholar
- Y. W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In In NIPS. MIT Press, 2007.Google ScholarCross Ref
- The Gene Ontology Consortium. Gene Ontology: Tool for the unification of biology. Nature Genetics, 25:25--29, 2000.Google ScholarCross Ref
- E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarDigital Library
- X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185. ACM, 2006. Google ScholarDigital Library
- R. W. White and G. Marchionini. Examining the effectiveness of real-time query expansion. Inf. Process. Manage., 43:685--704, 2007. Google ScholarDigital Library
- K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In CHI, pages 401--408. ACM, 2003. Google ScholarDigital Library
- X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In ECIR, pages 29--41. Springer-Verlag, 2009. Google ScholarDigital Library
- L. Zhang and Y. Zhang. Interactive retrieval based on faceted feedback. In SIGIR, pages 363--370. ACM, 2010. Google ScholarDigital Library
Index Terms
- Latent topic feedback for information retrieval
Recommendations
Does sentiment help requirement engineering: exploring sentiments in user comments to discover informative comments
AbstractUser comments are valuable resources for software improvement; however, owing to excessive volume, informative comments need to be selectively analyzed. We attempt to address this problem by sentiment analysis and expect sentiment can be a useful ...
Cross-lingual latent topic extraction
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational LinguisticsProbabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-...
Evaluating topic models for information retrieval
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementWe explore the utility of different types of topic models, both probabilistic and not, for retrieval purposes. We show that: (1) topic models are effective for document smoothing; (2) more elaborate topic models that capture topic dependencies provide ...
Comments