Abstract
The distinctive features of the Bveritas online news integration archive are as follows: 1) automatic clustering of related news documents into themes; 2) organization of these news clusters in a theme map; 3) extraction of meaningful labels for each news cluster; and 4) generation of links to related news articles. Several ways of retrieving news stories from this Bveritas archive are described. The retrieval methods range from the usual query box and links to related stories, to an interactive world map that allows news retrieval by country, to an interactive theme map. Query and browsing are mediated by a Scatter/Gather interface that allows the user to select interesting clusters, out of which the subset of documents are gathered and re-clustered for the user to visually inspect. The user is then asked to select new interesting clusters. This alternating selection/clustering process continues until the user decides to view the individual news story titles.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Azcarraga, A., and Yap, T. Jr. (2001) Extracting Meaningful Labels for WEBSOM-Based Text Archives. 10th ACM International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, USA.
Azcarraga, A., Yap, T. Jr., and Chua, T.S. (2002) Comparing Keyword Extraction Techniques for WEBSOM Text Archives. International Journal of Artificial Intelligence Tools, Vol. 11, No 2.
Cutting, D. et al. (1992) Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. Proc 15th ACM/SIGIR, Copenhagen.
Cutting, D. et al. (1993) Constant Interaction-Time Scatter/Gather Browsing of Large Document Collections. Proc 16th ACM/SIGIR, Pittsburg.
Hearst, M. et al. (1995) Scatter/Gather as a Tool for the Navigation of Retrieval Results. Proc 1995 AAAI Fall Symposium on Knowledge Navigation.
Hearst, M. et al. (1996) Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proc 19th ACM/SIGIR, Zurich.
Hearst, M. et al. (1996) Four TREC-4 Tracks: the Xerox Site Report. Proc 4th Text REtrieval Conference (TREC-4), Nov 1-3, Arlington, VA.
Honkela, T. et al. (1997) WEBSOM-Self-Organizing Maps of Document Collections. Proc WSOM’97, Espoo, Finland.
Kaski, S. et al. (1998) Statistical Aspects of the WEBSOM System in Organizing Document Collections. Computing Science and Statistics. Vol. 29. pp. 281–290.
Kaski, S. (1998) Dimensionality reduction by random mapping: Fast similarity computation for clustering. Proc IJCNN’98, International Joint Conference on Neural Networks, Vol. 1, Piscataway, NJ.
Kohonen, T. (1982) Analysis of a Simple Self-Organizing Process, Biological Cybernetics, Vol. 44, pp. 135–140.
Kohonen, T. (1988) Self-Organization and Associative Memory. Series in Information Sciences, Second Edition. Berlin, Springer-Verlag.
Kohonen, T. (1995) Self-Organizing Maps. Berlin, Springer-Verlag.
Kohonen, T. (1998) Self-Organization of Very Large Document Collections: State of the Art. Intl Conference on Artificial Neural Networks, ICANN98. Skovde, Sweden.
Kohonen, T. et al. (2000) Self Organization of a Massive Document Collection, IEEE Trans on Neural Networks, Vol. 11, no 3, pp. 574–585.
Merkl, D. and Rauber, A. (2000). Uncovering the Hierarchical Structure of Text Archives by Using an Unsupervised Neural Networks with Adaptive Architecture. PAKDD’2000. Kyoto, Japan.
Lagus et al, (1999) WEBSOM for Textual Data Mining. Artificial Intelligence Review, Vol. 13, pp. 345–364.
Rauber, A. and Merkl, D. (1999). Mining Text Archives: Creating Readable Maps to Structure and Describe Document Collections. PKDD99.
Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Azcarraga, A.P., Seng Chua, T., Tan, J. (2002). Retrieving News Stories from a News Integration Archive. In: Lim, E.P., et al. Digital Libraries: People, Knowledge, and Technology. ICADL 2002. Lecture Notes in Computer Science, vol 2555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36227-4_22
Download citation
DOI: https://doi.org/10.1007/3-540-36227-4_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00261-1
Online ISBN: 978-3-540-36227-2
eBook Packages: Springer Book Archive