Abstract
By mapping messages into a large context, we can compute the distances between them, and then classify them. We test this conjecture on Twitter messages: Messages are mapped onto their most similar Wikipedia pages, and the distances between pages are used as a proxy for the distances between messages. This technique yields more accurate classification of a set of Twitter messages than alternative techniques using string edit distance and latent semantic analysis.
Chapter PDF
Similar content being viewed by others
References
McNamara, D.S.: Computational Methods to Extract Meaning From Text and Advance Theories of Human Cognition. Topics in Cognitive Science 3(1), 3–27 (2011)
Twitter blog. (2010), http://blog.twitter.com/2010/02/measuring-tweets.html
Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: Twitterstand: News in tweets. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 42–51. ACM, New York (2009)
Demirbas, M., Bayir, M.A., Akcora, C.G., Yilmaz, Y.: Crowd-sourced Sensing and Collaboration Using Twitter. In: 11th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), IEEE Computer Society Press, Los Alamitos (2010)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on the World Wide Web, pp. 851–860. ACM, New York (2010)
Weick, K.E.: Sensemaking in organizations. Sage Publications, Inc., Thousand Oaks (1995)
Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on twitter: A first look. In: Proceedings of the Workshop on Analytics for Noisy, Unstructured Text Data (2010)
Macskassy, S.A.: Leveraging contextual information to explore posting and linking behaviors of bloggers. In: International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 64–71. IEEE, Los Alamitos (2010)
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34, 443–498 (2009)
Bratus, S., Rumshisky, A., Magar, R., Thompson, P.: Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data. In: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 101–106. ACM, New York (2009)
Strube, M., Ponzetto, S.P.: WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the National Conference on Artificial Intelligence, p. 1419. AAAI Press, MIT Press (2006)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (2010)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceeding of the 17th International Conference on World Wide Web, pp. 91–100. ACM, New York (2008)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford (2009)
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 841–842. ACM, New York (2010)
Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. Journal of the American society for information science and technology 60(11), 2169–2188 (2009)
Stone, B., Dennis, S., Kwantes, P.J.: Comparing Methods for Single Paragraph Similarity Analysis. Wiley Online Library, Chichester (2010)
Venables, W.N., Ripley, B.D.: Modern applied statistics with S. Springer, Heidelberg (2002)
Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (2002)
Dumais, S.T., Landauer, T.K.: A solution to Platos problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological review 104, 211–240 (1997)
Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, pp. 1–10. ACM, New York (2010)
Bibiko, H.-J.: R code for Levensthein distance (2006)
Wild, F.: Latent Semantic Analysis Package in R (2010)
Kruschke, J.K.: ALCOVE: An exemplar-based connectionist model ot category learning. Connectionist psychology: a text with readings 99(1), 107 (1999)
Love, B.C., Medin, D.L., Gureckis, T.M.: SUSTAIN: A network model of category learning. Psychological Review 111(2), 309–332 (2004)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101(1), 5228 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Genc, Y., Sakamoto, Y., Nickerson, J.V. (2011). Discovering Context: Classifying Tweets through a Semantic Transform Based on Wikipedia. In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Foundations of Augmented Cognition. Directing the Future of Adaptive Systems. FAC 2011. Lecture Notes in Computer Science(), vol 6780. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21852-1_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-21852-1_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21851-4
Online ISBN: 978-3-642-21852-1
eBook Packages: Computer ScienceComputer Science (R0)