Skip to main content

Domain-Informed Topic Detection

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

We discuss Topic Detection, a sub-task of the Topic Detection and Tracking (TDT) Project, and present a system that uses the linguistic and temporal features of news reportage to enhance the discovery of events in a collection of news articles. We describe an online application of these techniques that constructs topical clusters from live news feeds. We conclude that these approaches promise more coherent and useful clusters and suggest some areas of future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study Final Report. In: Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998)

    Google Scholar 

  2. Arampatzis, A., van der Weide, T., Koster, C., van Bommel, P.: Term Selection for Filtering based on Distribution of Terms over Time. In: Proc. RIAO 2000 Con tent-Based Multimedia Information Access, Paris, France, pp. 1221–1237 (2000)

    Google Scholar 

  3. Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: Proc. 23rd Annual International ACM SIGIR Conference, pp. 224–231 (2000)

    Google Scholar 

  4. Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning Approaches for Detecting and Tracking News Events. IEEE Intelligent Systems 14, 32–43 (1999)

    Google Scholar 

  5. Griths, A., Robinson, L., Willett, P.: Hierarchic Agglomerative Clustering Methods for Automatic Document Classication. Journal of Documentation 40, 175–205 (1984)

    Article  Google Scholar 

  6. Carbonell, J., Yang, Y., Lafferty, J., Brown, R., Pierce, T., Liu, X.: CMU Report on TDT-2: Segmentation, Detection and Tracking. In: Proc. of the DARPA Broadcast News Conference (1999)

    Google Scholar 

  7. Gundersen, G., Steihaug, T.: Data Structures in Java for Matrix Computations. In: Proc. Norsk Informatikkkonferanse, NIK 2002, pp. 97–108 (2002)

    Google Scholar 

  8. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: a Cluster-based Approach to Browsing Large Document Collections. In: Proc. 15th Annual International ACM SIGIR Conference, pp. 318–329 (1992)

    Google Scholar 

  9. Jung, S., Kim, T.S.: An Agglomerative Hierarchical Clustering Using Partial Maximum Array and Incremental Similarity Computation Method. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proc. 2001 IEEE International Conference on Data Mining, San Jose, California, USA, pp. 265–272. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  10. Lance, G., Williams, W.: A General Theory of Classificatory Sorting Strategies I Hierarchical Systems. Computer Journal 9, 373–380 (1967)

    Google Scholar 

  11. Delin, J.: The Language of Everyday Life. Sage, London (2000)

    Google Scholar 

  12. van Dijk, T.: News as Discourse. Lawrence Erlbaum, Hillsdale (1988)

    Google Scholar 

  13. Bell, A.: The Language of News Media. Blackwell Publishing, Oxford (1991)

    Google Scholar 

  14. Yang, Y., Pierce, T., Carbonell, J.: A study on retrospective and on-line event detection. In: Proc. of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 28–36 (1998)

    Google Scholar 

  15. Ibrahimov, O., Sethi, I., Dimitrova, N.: Clustering of Imperfect Transcripts Using a Novel Similarity Measure. In: Proc. of the SIGIR 2001 Workshop on Information Retrieval Techniques for Speech Applications (2002)

    Google Scholar 

  16. Liu, X., Gong, Y., Xu, W., Zhu, S.: Document Clustering with Cluster Refinement and Model Selection Capabilities. In: Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–198. ACM Press, New York (2002)

    Chapter  Google Scholar 

  17. Papka, R., Allan, J.: On-line New Event Detection using Single-pass Clustering. Technical Report UMASS Computer Science Technical Report, Department of Computer Science, University of Massachusetts (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Flynn, C., Dunnion, J. (2004). Domain-Informed Topic Detection. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_76

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics