Abstract
We discuss Topic Detection, a sub-task of the Topic Detection and Tracking (TDT) Project, and present a system that uses the linguistic and temporal features of news reportage to enhance the discovery of events in a collection of news articles. We describe an online application of these techniques that constructs topical clusters from live news feeds. We conclude that these approaches promise more coherent and useful clusters and suggest some areas of future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study Final Report. In: Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998)
Arampatzis, A., van der Weide, T., Koster, C., van Bommel, P.: Term Selection for Filtering based on Distribution of Terms over Time. In: Proc. RIAO 2000 Con tent-Based Multimedia Information Access, Paris, France, pp. 1221–1237 (2000)
Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: Proc. 23rd Annual International ACM SIGIR Conference, pp. 224–231 (2000)
Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning Approaches for Detecting and Tracking News Events. IEEE Intelligent Systems 14, 32–43 (1999)
Griths, A., Robinson, L., Willett, P.: Hierarchic Agglomerative Clustering Methods for Automatic Document Classication. Journal of Documentation 40, 175–205 (1984)
Carbonell, J., Yang, Y., Lafferty, J., Brown, R., Pierce, T., Liu, X.: CMU Report on TDT-2: Segmentation, Detection and Tracking. In: Proc. of the DARPA Broadcast News Conference (1999)
Gundersen, G., Steihaug, T.: Data Structures in Java for Matrix Computations. In: Proc. Norsk Informatikkkonferanse, NIK 2002, pp. 97–108 (2002)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: a Cluster-based Approach to Browsing Large Document Collections. In: Proc. 15th Annual International ACM SIGIR Conference, pp. 318–329 (1992)
Jung, S., Kim, T.S.: An Agglomerative Hierarchical Clustering Using Partial Maximum Array and Incremental Similarity Computation Method. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proc. 2001 IEEE International Conference on Data Mining, San Jose, California, USA, pp. 265–272. IEEE Computer Society, Los Alamitos (2001)
Lance, G., Williams, W.: A General Theory of Classificatory Sorting Strategies I Hierarchical Systems. Computer Journal 9, 373–380 (1967)
Delin, J.: The Language of Everyday Life. Sage, London (2000)
van Dijk, T.: News as Discourse. Lawrence Erlbaum, Hillsdale (1988)
Bell, A.: The Language of News Media. Blackwell Publishing, Oxford (1991)
Yang, Y., Pierce, T., Carbonell, J.: A study on retrospective and on-line event detection. In: Proc. of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 28–36 (1998)
Ibrahimov, O., Sethi, I., Dimitrova, N.: Clustering of Imperfect Transcripts Using a Novel Similarity Measure. In: Proc. of the SIGIR 2001 Workshop on Information Retrieval Techniques for Speech Applications (2002)
Liu, X., Gong, Y., Xu, W., Zhu, S.: Document Clustering with Cluster Refinement and Model Selection Capabilities. In: Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–198. ACM Press, New York (2002)
Papka, R., Allan, J.: On-line New Event Detection using Single-pass Clustering. Technical Report UMASS Computer Science Technical Report, Department of Computer Science, University of Massachusetts (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Flynn, C., Dunnion, J. (2004). Domain-Informed Topic Detection. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_76
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive