research-article

Latent topic feedback for information retrieval

Authors:
David Andrzejewski

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

,
David Buttler

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 600–608https://doi.org/10.1145/2020408.2020503

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 600–608

ABSTRACT

We consider the problem of a user navigating an unfamiliar corpus of text documents where document metadata is limited or unavailable, the domain is specialized, and the user base is small. These challenging conditions may hold, for example, within an organization such as a business or government agency. We propose to augment standard keyword search with user feedback on latent topics. These topics are automatically learned from the corpus in an unsupervised manner and presented alongside search results. User feedback is then used to reformulate the original query, resulting in improved information retrieval performance in our experiments.

References

D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In ICML, pages 25--32. Omnipress, 2009. Google ScholarDigital Library
D. Blei, L. Carin, and D. Dunson. Probabilistic topic models. Signal Processing Magazine, IEEE, 27(6):55--65, 2010.Google Scholar
D. Blei and J. Lafferty. Visualizing topics with multi-word expressions. Technical report, 2009. arXiv:0907.1013v1 {stat.ML}.Google Scholar
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
O. Bodenreider. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(suppl 1):D267--D270, 2004.Google ScholarCross Ref
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In TREC, pages 69--80. NIST, 1994.Google Scholar
C. Chemudugunta, A. Holloway, P. Smyth, and M. Steyvers. Modeling documents by combining semantic concepts with unsupervised statistical learning. In ISWC, pages 229--244. Springer, 2008. Google ScholarDigital Library
B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley, 2010. Google ScholarDigital Library
W. Dakka, P. G. Ipeirotis, and K. R. Wood. Automatic construction of multifaceted browsing interfaces. In CIKM, pages 768--775. ACM, 2005. Google ScholarDigital Library
C. Fellbaum. WordNet : an Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
M. J. Gardner, J. Lutes, J. Lund, J. Hansen, D. Walker, E. Ringger, and K. Seppi. The topic browser: An interactive tool for browsing topic models. In NIPS Workshop on Challenges of Data Visualization. MIT Press, 2010.Google Scholar
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(Suppl 1):5228--5235, 2004.Google ScholarCross Ref
M. Hoffman, D. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In NIPS, pages 856--864. MIT Press, 2010.Google ScholarDigital Library
J. Koren, Y. Zhang, and X. Liu. Personalized interactive faceted search. In WWW, pages 477--486, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
J. H. Lau, D. Newman, S. Karimi, and T. Baldwin. Best topic word selection for topic labelling. In Coling 2010: Posters, pages 605--613. Coling 2010 Organizing Committee, 2010. Google ScholarDigital Library
W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006. Google ScholarDigital Library
S. Liberman and R. Lempel. Approximately optimal facet selection. In (submission), 2011.Google Scholar
X. Ling, Q. Mei, C. Zhai, and B. Schatz. Mining multi-faceted overviews of arbitrary topics in a text collection. In KDD, pages 497--505. ACM, 2008. Google ScholarDigital Library
T.-Y. Liu. Learning to rank for information retrieval. In SIGIR Tutorials, page 904, 2010. Google ScholarDigital Library
Y. Lu, Q. Mei, and C. Zhai. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Information Retrieval, pages 1--26, 2010. Google ScholarDigital Library
A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage., 40:735--750, 2004. Google ScholarDigital Library
D. M. Mimno and A. McCallum. Organizing the OCA: learning faceted subjects from a library of digital books. In JCDL, pages 376--385. ACM, 2007. Google ScholarDigital Library
T. Minka and J. Lafferty. Expectation-propagation for the generative aspect model. In UAI, pages 352--359. Morgan Kaufmann, 2002. Google ScholarDigital Library
D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In HLT-NAACL, pages 100--108, Morristown, NJ, USA, 2010. ACL. Google ScholarDigital Library
D. Newman, Y. Noh, E. Talley, S. Karimi, and T. Baldwin. Evaluating topic models for digital libraries. In JCDL, pages 215--224. ACM, 2010. Google ScholarDigital Library
L. A. Park and K. Ramamohanarao. The sensitivity of latent Dirichlet allocation for information retrieval. In ECML PKDD, pages 176--188. Springer-Verlag, 2009. Google ScholarDigital Library
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3:703--710, 2010. Google ScholarDigital Library
E. Stoica, M. Hearst, and M. Richardson. Automating creation of hierarchical faceted metadata structures. In HLT-NAACL, pages 244--251, Rochester, New York, April 2007. ACL.Google Scholar
Y. W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In In NIPS. MIT Press, 2007.Google ScholarCross Ref
The Gene Ontology Consortium. Gene Ontology: Tool for the unification of biology. Nature Genetics, 25:25--29, 2000.Google ScholarCross Ref
E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarDigital Library
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185. ACM, 2006. Google ScholarDigital Library
R. W. White and G. Marchionini. Examining the effectiveness of real-time query expansion. Inf. Process. Manage., 43:685--704, 2007. Google ScholarDigital Library
K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In CHI, pages 401--408. ACM, 2003. Google ScholarDigital Library
X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In ECIR, pages 29--41. Springer-Verlag, 2009. Google ScholarDigital Library
L. Zhang and Y. Zhang. Interactive retrieval based on faceted feedback. In SIGIR, pages 363--370. ACM, 2010. Google ScholarDigital Library

Index Terms

Latent topic feedback for information retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Information retrieval query processing

Recommendations

Does sentiment help requirement engineering: exploring sentiments in user comments to discover informative comments
Abstract
User comments are valuable resources for software improvement; however, owing to excessive volume, informative comments need to be selectively analyzed. We attempt to address this problem by sentiment analysis and expect sentiment can be a useful ...
Read More
Cross-lingual latent topic extraction
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-...
Read More
Evaluating topic models for information retrieval
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

We explore the utility of different types of topic models, both probabilistic and not, for retrieval purposes. We show that: (1) topic models are effective for document smoothing; (2) more elaborate topic models that capture topic dependencies provide ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
latent topic models
user feedback
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 736
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Latent topic feedback for information retrieval

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Does sentiment help requirement engineering: exploring sentiments in user comments to discover informative comments

Cross-lingual latent topic extraction

Evaluating topic models for information retrieval