skip to main content
10.1145/2020408.2020503acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Latent topic feedback for information retrieval

Published:21 August 2011Publication History

ABSTRACT

We consider the problem of a user navigating an unfamiliar corpus of text documents where document metadata is limited or unavailable, the domain is specialized, and the user base is small. These challenging conditions may hold, for example, within an organization such as a business or government agency. We propose to augment standard keyword search with user feedback on latent topics. These topics are automatically learned from the corpus in an unsupervised manner and presented alongside search results. User feedback is then used to reformulate the original query, resulting in improved information retrieval performance in our experiments.

References

  1. D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In ICML, pages 25--32. Omnipress, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Blei, L. Carin, and D. Dunson. Probabilistic topic models. Signal Processing Magazine, IEEE, 27(6):55--65, 2010.Google ScholarGoogle Scholar
  3. D. Blei and J. Lafferty. Visualizing topics with multi-word expressions. Technical report, 2009. arXiv:0907.1013v1 {stat.ML}.Google ScholarGoogle Scholar
  4. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. O. Bodenreider. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(suppl 1):D267--D270, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In TREC, pages 69--80. NIST, 1994.Google ScholarGoogle Scholar
  7. C. Chemudugunta, A. Holloway, P. Smyth, and M. Steyvers. Modeling documents by combining semantic concepts with unsupervised statistical learning. In ISWC, pages 229--244. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Dakka, P. G. Ipeirotis, and K. R. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM, pages 768--775. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Fellbaum. WordNet : an Electronic Lexical Database. MIT Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. J. Gardner, J. Lutes, J. Lund, J. Hansen, D. Walker, E. Ringger, and K. Seppi. The topic browser: An interactive tool for browsing topic models. In NIPS Workshop on Challenges of Data Visualization. MIT Press, 2010.Google ScholarGoogle Scholar
  12. T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(Suppl 1):5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Hoffman, D. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In NIPS, pages 856--864. MIT Press, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Koren, Y. Zhang, and X. Liu. Personalized interactive faceted search. In WWW, pages 477--486, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. H. Lau, D. Newman, S. Karimi, and T. Baldwin. Best topic word selection for topic labelling. In Coling 2010: Posters, pages 605--613. Coling 2010 Organizing Committee, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Liberman and R. Lempel. Approximately optimal facet selection. In (submission), 2011.Google ScholarGoogle Scholar
  18. X. Ling, Q. Mei, C. Zhai, and B. Schatz. Mining multi-faceted overviews of arbitrary topics in a text collection. In KDD, pages 497--505. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T.-Y. Liu. Learning to rank for information retrieval. In SIGIR Tutorials, page 904, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Lu, Q. Mei, and C. Zhai. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Information Retrieval, pages 1--26, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google ScholarGoogle Scholar
  22. D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage., 40:735--750, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. M. Mimno and A. McCallum. Organizing the OCA: learning faceted subjects from a library of digital books. In JCDL, pages 376--385. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Minka and J. Lafferty. Expectation-propagation for the generative aspect model. In UAI, pages 352--359. Morgan Kaufmann, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In HLT-NAACL, pages 100--108, Morristown, NJ, USA, 2010. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Newman, Y. Noh, E. Talley, S. Karimi, and T. Baldwin. Evaluating topic models for digital libraries. In JCDL, pages 215--224. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. A. Park and K. Ramamohanarao. The sensitivity of latent Dirichlet allocation for information retrieval. In ECML PKDD, pages 176--188. Springer-Verlag, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3:703--710, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Stoica, M. Hearst, and M. Richardson. Automating creation of hierarchical faceted metadata structures. In HLT-NAACL, pages 244--251, Rochester, New York, April 2007. ACL.Google ScholarGoogle Scholar
  30. Y. W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In In NIPS. MIT Press, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  31. The Gene Ontology Consortium. Gene Ontology: Tool for the unification of biology. Nature Genetics, 25:25--29, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  32. E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. W. White and G. Marchionini. Examining the effectiveness of real-time query expansion. Inf. Process. Manage., 43:685--704, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In CHI, pages 401--408. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In ECIR, pages 29--41. Springer-Verlag, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Zhang and Y. Zhang. Interactive retrieval based on faceted feedback. In SIGIR, pages 363--370. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Latent topic feedback for information retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2011
        1446 pages
        ISBN:9781450308137
        DOI:10.1145/2020408

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 August 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader