research-article

Learning topics and related passages in books

Authors:
David Newman

University of California, Irvine, Irvine, CA, USA

University of California, Irvine, Irvine, CA, USA
View Profile

,
Youn Noh

Yale University, New Haven, CT, USA

Yale University, New Haven, CT, USA
View Profile

,
Kat Hagedorn

University Libraries, University of Michigan

University Libraries, University of Michigan
View Profile

,
Arun Balagopalan

Computer Science, University of California, Irvine

Computer Science, University of California, Irvine
View Profile

JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital LibrariesJune 2012Pages 195–198https://doi.org/10.1145/2232817.2232854

Published:10 June 2012Publication History

JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Pages 195–198

ABSTRACT

The number of books available online is increasing, but user interfaces may not be taking full advantage of advances in machine learning techniques that could help users navigate, explore, discover and understand interesting and useful content in books. Using a group of ten students and over one thousand crowdsourced judgments, we conducted multiple user studies to evaluate topics and related passages in books, all learned by topic modeling. Using ten books, selected from humanities (e.g. Plato's Republic), social sciences (e.g. Marx's Capital) and sciences (e.g. Einstein's Relativity), and four different evaluation experiments, we show that users agree that the learned topics are coherent and important to the book, and related to the automatically generated passages. We show how crowdsourced evaluations are useful, and can complement more focused evaluations using students who have studied the texts. This work provides a framework for (1) learning topics and related passages in books, and (2) evaluating those learned topics and passages, and moves one step toward automatic annotation to support topic navigation of books.

References

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. The Journal of Machine Learning Research, 3, 993-1022. Google ScholarDigital Library
Hearst, M. A. (1997). TextTiling: Segmenting text into multiparagraph subtopic passages. Computational linguistics, 23(1), 33-6. Google ScholarDigital Library

Index Terms

Learning topics and related passages in books
1. Information systems
  1. Information systems applications

Recommendations

Group topic model: organizing topics into groups
Abstract
Latent Dirichlet allocation defines hidden topics to capture latent semantics in text documents. However, it assumes that all the documents are represented by the same topics, resulting in the “forced topic” problem. To solve this problem, we ...
Read More
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media

Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...
Read More
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
June 2012
458 pages
ISBN:9781450311540
DOI:10.1145/2232817
General Chairs:
Karim B. Boughida
The George Washington University, USA
,
Barrie Howard
The Library of Congress, USA
,
Program Chairs:
Michael L. Nelson
Old Dominion University, USA
,
Herbert Van de Sompel
Los Alamos National Laboratory, USA
,
Ingeborg Sølvberg
Norwegian University of Science & Technology, Norway
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
topic modeling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate415of1,482submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 205
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning topics and related passages in books

JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

ABSTRACT

References

Cited By

Index Terms

Recommendations

Group topic model: organizing topics into groups

Text, Topics, and Turkers: A Consensus Measure for Statistical Topics

Extractive text summarization using clustering-based topic modeling