Reference Hub1
Identifying Emerging Topics and Content Change from Evolving Document Sets

Identifying Emerging Topics and Content Change from Evolving Document Sets

Parvathi Chundi
Copyright: © 2017 |Volume: 7 |Issue: 4 |Pages: 18
ISSN: 2155-6393|EISSN: 2155-6407|EISBN13: 9781522514305|DOI: 10.4018/IJKBO.2017100101
Cite Article Cite Article

MLA

Chundi, Parvathi. "Identifying Emerging Topics and Content Change from Evolving Document Sets." IJKBO vol.7, no.4 2017: pp.1-18. http://doi.org/10.4018/IJKBO.2017100101

APA

Chundi, P. (2017). Identifying Emerging Topics and Content Change from Evolving Document Sets. International Journal of Knowledge-Based Organizations (IJKBO), 7(4), 1-18. http://doi.org/10.4018/IJKBO.2017100101

Chicago

Chundi, Parvathi. "Identifying Emerging Topics and Content Change from Evolving Document Sets," International Journal of Knowledge-Based Organizations (IJKBO) 7, no.4: 1-18. http://doi.org/10.4018/IJKBO.2017100101

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Document sets where the content is evolving frequently occur often in organizations. It is common for oranizations to update the policy documents periodically and for a news story to evolve over a period of time. When a document set evolves, some of the old content may remain unchanged while some other new content may be added. Depending on the amount of changes, users may need to read and/or analyze the new content once again. Evolving content may make it hard for users to track the changes and understand the global view of the change. In this paper, we consider document sets consisting of documents published at two different points of time and develop a measure to capture the change in content between the documents published at two different time points. We divide a document set into two subsets – a subset of documents containing documents published at an earlier date and another subset containing documents published at a later date. We use Latent Dirichlet Allocation to extract a topic and word distributions for each of the two subsets of the document set. We then compute similarity of the set of topics computed for each subset to measure the amount of change in the content. We study the effectiveness of the method on two data sets – a set of privacy policy documents and a set of Reuters news articles extracted from the TDT-Pilot Corpus and present the experimental results.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.