Abstract
This paper describes the design and implementation of tools to extract, analyze and explore an arbitrarily great amount of public messages from diverse sources. The aim of our work is to flexibly support sentiment analysis by quickly adapting to different use cases, languages, and message sources. First, a highly parallel scraper has been implemented, allowing the user to customize the behavior with scripting technologies and thus being able to manage dynamically loaded content. Then, a novel framework is developed to support agile programming, building and validating a classifier for sentiment analysis. Finally, a web application allows the real-time selection and projection of the analysis results in different dimensions in an OLAP fashion.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the Data: Parallel Analysis with Sawzall. Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure 13(4), 227–298
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM (1970)
Yang, C., Yen, C., Tan, C., Madden, S.R.: Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: ICDE, pp. 657–668 (2010)
Bautin, M., Vijayarenu, L., Skiena, S.: International sentiment analysis for news and blogs. In: ICWSM (2008)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and trends in information retrieval (2008)
Clark, E., Araki, K.: Text normalization in social media: progress, problems and applications for a pre-processing system of casual English. Procedia-Social and Behavioral Sciences 27, 2–11 (2011)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs Up? Sentiment Classification Using Machine Learning Techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10 (2002)
Snyder, B., Barzilay, R.: Multiple Aspect Ranking using the Good Grief Algorithm. In: Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (2007)
Pang, B., Lee, L.: Seeing Stars: Exploiting Class Relationships For Sentiment Categorization With Respect To Rating Scales. In: Proceedings of ACL, pp. 115–124 (2005)
Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of LREC, pp. 417–422 (2006)
Meena, A., Prabhakar, T.V.: Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 573–580. Springer, Heidelberg (2007)
Nasukawa, T., Yi, J.: Sentiment analysis: Capturing favorability using natural language processing. In: Proceedings of the 2nd Int. Conference on Knowledge Capture, pp. 70–77. ACM (2003)
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D.: Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology 61(12), 2544–2558 (2010)
Ahmad, K., Cheng, D., Almas, Y.: Multi-lingual sentiment analysis of financial news streams. In: Proc. of the 1st Intl. Conf. on Grid in Finance (2006)
Gill, A.J., Gergle, D., French, R.M., Oberlander, J.: Emotion Rating from Short Blog Texts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2008)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)
Chang, P.-C., Galley, M., Manning, C.D.: Optimizing Chinese word segmentation for machine translation performance. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 224–232. Association for Computational Linguistics (2008)
Porter, M.F.: Snowball: A language for stemming algorithms (2001)
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the 18th International Conference on Machine Learning, pp. 609–616 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Farina, J., Mazuran, M., Quintarelli, E. (2014). Extraction, Sentiment Analysis and Visualization of Massive Public Messages. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-01863-8_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01862-1
Online ISBN: 978-3-319-01863-8
eBook Packages: EngineeringEngineering (R0)