ABSTRACT
Latent semantic analysis (LSA) has been intensively studied because of its wide application to Information Retrieval and Natural Language Processing. Yet, traditional models such as LSA only examine one (current) version of the document. However, due to the recent proliferation of collaboratively generated content such as threads in online forums, Collaborative Question Answering archives, Wikipedia, and other versioned content, the document generation process is now directly observable. In this study, we explore how this additional temporal information about the document evolution could be used to enhance the identification of latent document topics. Specifically, we propose a novel hidden-topic modeling algorithm, temporal Latent Semantic Analysis (tLSA), which elegantly extends LSA to modeling document revision history using tensor decomposition. Our experiments show that tLSA outperforms LSA on word relatedness estimation using benchmark data, and explore applications of tLSA for other tasks.
- A. Aji, Y. Wang, E. Agichtein, and E. Gabrilovich. Using the past to score the present: Extending term weighting models with revision history analysis. In CIKM, 2010. Google ScholarDigital Library
- J. D. Carroll and J. J. Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika, 35:283--319, 1970.Google ScholarCross Ref
- S. Deerwester, S. T. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. In JASIST, 1990.Google ScholarCross Ref
- K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. Word at a time: Computing word relatedness using temporal semantic analysis. In WWW, 2011. Google ScholarDigital Library
Index Terms
- Temporal latent semantic analysis for collaboratively generated content: preliminary results
Recommendations
Update Summarization Based on Latent Semantic Analysis
TSD '09: Proceedings of the 12th International Conference on Text, Speech and DialogueThis paper deals with our recent research in text summarization. We went from single-document summarization through multi-document summarization to update summarization. We describe the development of our summarizer which is based on latent semantic ...
Text summarization of Turkish texts using latent semantic analysis
COLING '10: Proceedings of the 23rd International Conference on Computational LinguisticsText summarization solves the problem of extracting important information from huge amount of text data. There are various methods in the literature that aim to find out well-formed summaries. One of the most commonly used methods is the Latent Semantic ...
Topic-based Amharic text summarization with probabilistic latent semantic analysis
MEDES '12: Proceedings of the International Conference on Management of Emergent Digital EcoSystemsThis paper investigates the problem of building a concept-based single-document Amharic text summarization system. Because local languages like Amharic lack extensive linguistic resources, we propose to use statistical approaches called topic modeling ...
Comments