Skip to main content

Topic Chains for Understanding a News Corpus

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Abstract

The Web is a great resource and archive of news articles for the world. We present a framework, based on probabilistic topic modeling, for uncovering the meaningful structure and trends of important topics and issues hidden within the news archives on the Web. Central in the framework is a topic chain, a temporal organization of similar topics. We experimented with various topic similarity metrics and present our insights on how best to construct topic chains. We discuss how to interpret the topic chains to understand the news corpus by looking at long-term topics, temporary issues, and shifts of focus in the topic chains. We applied our framework to nine months of Korean Web news corpus and present our findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J.: Introduction to topic detection and tracking. In: Topic Detection and Tracking, Event-based Information Organization, pp. 1–16 (2002)

    Google Scholar 

  2. Blei, D., Lafferty, J.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning (2006)

    Google Scholar 

  3. Wang, X., McCallum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

    Google Scholar 

  4. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research, 993–1022 (2003)

    Google Scholar 

  5. Nallapati, R., Feng, A., Peng, F., Allan, J.: Event threading within news topics. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management (2004)

    Google Scholar 

  6. Feng, A., Allan, J.: Finding and linking incidents in news. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (2007)

    Google Scholar 

  7. Allan, J., Gupta, R., Khandelwal, V.: Temporal summaries of new topics. In: Proceedings of the 24th Annual International Conference on ACM SIGIR Research and Development in Information Retrieval (2001)

    Google Scholar 

  8. Blei, D., Lafferty, J.: Topic models. In: Text Mining: Theory and Applications, pp. 71–93 (2009)

    Google Scholar 

  9. Wang, X., Zhang, K., Jin, X., Shen, D.: Mining common topics from multiple asynchronous text streams. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining (2009)

    Google Scholar 

  10. Bolelli, L., Ertekin, Ş., Giles, C.: Topic and trend detection in text collections using latent dirichlet allocation. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 776–780. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)

    Google Scholar 

  12. Shahaf, D., Guestrin, C.: Connecting the dots between news articles. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)

    Google Scholar 

  13. Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005)

    Google Scholar 

  14. Mei, Q., Liu, C., Su, H., Zhai, C.: A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In: Proceedings of the 15th International Conference on World Wide Web (2006)

    Google Scholar 

  15. He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P.: Detecting topic evolution in scientific literature: how can citations help? In: Proceeding of the 18th ACM Conference on Information and Knowledge Management (2009)

    Google Scholar 

  16. Newman, D., Asuncion, A., Smyth, P.: Distributed algorithms for topic models. The Journal of Machine Learning Research 10, 1801–1828 (2009)

    MATH  Google Scholar 

  17. Chang, J., Boyd-graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Advances in Neural Information Processing Systems (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, D., Oh, A. (2011). Topic Chains for Understanding a News Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19437-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19436-8

  • Online ISBN: 978-3-642-19437-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics