ABSTRACT
How to organize and visualize big amount of text messages stored on one's mobile phone is a challenging problem, since they can hardly be organized by threads as we do for emails due to lack of necessary metadata such as "subject" and "reply-to". In this paper, we propose an innovative approach based on clustering algorithms and natural language processing methods. We first cluster the text messages into candidate conversations based on their temporal attributes, and then do further analysis using a semantic model based on Latent Dirichlet Allocation (LDA). Considering that the text messages are usually short and sparse, we trained the model using a large scale external data collected from twitter-like web sites, and applied the model to text messages. In the end, the text messages are organized as conversations based on their topics. We evaluated our approach based on 122,359 text messages collected from 50 university students during 6 months.
- Y. Yang, T. Pierce, and J. Carbonell. "A study of retrospective and on-line event detection". In Proceedings of SIGIR'98. Melbourne, Australia, 28--36, Aug, 1998 Google ScholarDigital Library
- J. Allan. Introduction to topic detection and tracking. In J. Allan, editor, Topic Detection and Tracking---Event -based Information Organization, 1--16. Kluwer Academic Publisher, 2002 Google ScholarDigital Library
- Matthew Cooper, Jonathan Foote, Andreas Girgensohn and Lynn Wilcox, 2005. Temporal event clustering for digital photo collections. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), 269--288, Aug, 2005 Google ScholarDigital Library
- Q, Zhao, P. Mitra, "Event Detection and Visulization for Social Text Streams", In Proceedings of ICWSM'2007, Colorado, USA, 26--28, Mar. 2007.Google Scholar
- Griffiths T, Steyvers M (2004). Finding scientific topics. Natl Acad Sci 101:5228--5235Google ScholarCross Ref
Index Terms
- Topic detection and organization of mobile text messages
Recommendations
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Finding division points for a time series corpus based on structural change point detection
This paper describes a method of finding the proper points for dividing a corpus with time series information to extract local and frequent keywords. Previous works have proposed the corpus separating method for extracting keywords from a corpus. ...
A density-based method for adaptive LDA model selection
Topic models have been successfully used in information classification and retrieval. These models can capture word correlations in a collection of textual documents with a low-dimensional set of multinomial distribution, called ''topics''. However, it ...
Comments