Abstract
The singular value decomposition, or SVD, has been studied in the past as a tool for detecting and understanding patterns in a collection of documents. We show how the matrices produced by the SVD calculation can be interpreted, allowing us to spot patterns of characters that indicate particular topics in a corpus. A test collection, consisting of two days of AP newswire traffic, is used as a running example.
Contact author: Charles Nicholas, Department of Computer Science and Electrical Engineering, UMBC, 1000 Hilltop Circle, Baltimore, MD 21250 USA, 410-455-2594, -3969 (fax), nicholas@cs.umbc.edu
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Michael Berry. Large scale singular value calculations. International Journal of Supercomputer Applications, 6:13–49, 1992.
Michael Berry, Susan Dumais, and Gavin O’Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573–595, December 1995.
M. Damashek. Gauging similarity with n-grams: Language-independent categorization of text. Science, 267:843–848, 10 February 1995.
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407, 1990.
Susan Dumais. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments & Computers, 23(2):229–236, 1991.
Donna Harman. Overview of the Fourth Text REtrieval Conference (TREC-4). National Institute of Standards and Technology, 1995.
Bradley Kjell and Ophir Frieder. Visualization of literary style. In IEEE International Conference on Systems, Man and Cybernetics, pages 656–661. IEEE, 18–21 October 1992.
Thomas Landauer and Michael Littman. Computerized cross-language document retrieval using latent semantic indexing. United States Patent 5,301,109, 5 April 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nicholas, C., Dahlberg, R. (1998). Spotting Topics with the Singular Value Decomposition. In: Munson, E.V., Nicholas, C., Wood, D. (eds) Principles of Digital Document Processing. PODDP 1998. Lecture Notes in Computer Science, vol 1481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49654-8_7
Download citation
DOI: https://doi.org/10.1007/3-540-49654-8_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65086-7
Online ISBN: 978-3-540-49654-0
eBook Packages: Springer Book Archive