ABSTRACT
The social sciences strive to understand the political, social, and cultural world around us, but have been impaired by limited access to the quantitative data sources enjoyed by the hard sciences. Careful analysis of Web document streams holds enormous potential to solve longstanding problems in a variety of social science disciplines through massive data analysis. This paper introduces the TextMap Access system, which provides ready access to a wealth of interesting statistics on millions of people, places, and things across a number of interesting web corpora. Powered by a flexible and scalable distributed statistics computation framework using Hadoop, continually updated corpora include newspapers, blogs, patent records, legal documents, and scientific abstracts; well over a terabyte of raw text and growing daily. The Lydia Textmap Access system, available through http://www.textmap.com/access, provides instant access for students and scholars through a convenient web user-interface. We describe the architecture of the TextMap Access system, and its impact on current research in political science, sociology, and business/marketing.
- Spinn3r. http://spinn3r.com/.Google Scholar
- Apache Software Foundation. The Hadoop Project. http://lucene.apache.org/hadoop.Google Scholar
- M. Bautin, L. Vijayarenu, and S. Skiena. International Sentiment Analysis for News and Blogs. In Proc. of the International Conference on Weblogs and Social Media, Seattle, WA, April 2008.Google Scholar
- M. Bautin, C. Ward, and S. Skiena. A scalable architecture for historical news analysis. Submitted, 2009.Google Scholar
- J. Box-Steffensmeier, D. Darmofal, and C. Farrell. The endogenous relationship of campaign expenditures, expected vote, and media coverage. In American Political Science Association annual meeting, 2005.Google Scholar
- H. Brandenburg. Revisiting the "Liberal Media Bias": A Quantitative Study into Candidate Treatment by the Broadcast Media During the 2004 Presidential Election Campaign. In Proc. of the Annual Meeting of the American Political Science Association, Philadelphia, Sep 2006.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. of the OSDI'04: Sixth Symposium on Operating System Design and Implementation, pages 137--150. Google ScholarDigital Library
- N. Godbole, M. Srinivasaiah, and S. Skiena. Large-Scale Sentiment Analysis for News and Blogs. In Proc. of the International Conference on Weblogs and Social Media, Mar. 2007.Google Scholar
- L. Huddie, C. Johnston, and M. Lebo. Elite influence, media coverage, and public opinion on the iraq war. In Midwest Political Science Association 67th Annual National Conference, 2009.Google Scholar
- L. Lloyd. Lydia: A System for the Large Scale Analysis of Natural Language Text. PhD thesis, Stony Brook University 2006. Google ScholarDigital Library
- L. Lloyd, P. Kaulgud, and S. Skiena. Newspapers vs. blogs: Who gets the scoop? In Computational Approaches to Analyzing Weblogs (AAAI-CAAW 2006), volume AAAI Press, Technical Report SS-06-03, pages 117--124, 2006.Google Scholar
- L. Lloyd, D. Kechagias, and S. Skiena. Lydia: A system for large-scale news analysis. In SPIRE, pages 161--166, 2005. Google ScholarDigital Library
- G. A. Miller. WordNet: a lexical database for English. Commun. ACM, 38(11):39--41, 1995. Google ScholarDigital Library
- B. Pang and L. Lee. Opinion Mining and Sentiment Analysis. Now Publishers, 2008.Google ScholarDigital Library
- F. Sussan. Personal communicaton, 2009.Google Scholar
- C. Ward, S. Skiena, A. van de Rijt, and E. Shor. Sociological news analysis. in preparation, 2009.Google Scholar
- C. B. Ward, M. Bautin, and S. Skiena. Identifying differences in news coverage between cultural/ethnic groups. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 3:511--514,2009. Google ScholarDigital Library
- W. Zhang and S. Skiena. Improving movie gross prediction through news analysis. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 1:301--304, 2009. Google ScholarDigital Library
- W. Zhang and S. Skiena. Trading strategies to exploit news sentiment. Submitted for publication, 2009.Google Scholar
Index Terms
- Access: news and blog analysis for the social sciences
Recommendations
Blog data analytics using blogtrackers
ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningSocial media has grown to be the place for voicing one's opinions, sharing information, and shaping discourse. Individuals use social media as a platform to mobilize, coordinate, and conduct cyber campaigns ranging from awareness for diseases or ...
Analysing features of Japanese splogs and characteristics of keywords
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the webThis paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in ...
Identifying Domain Experts in the Blogosphere -- Ranking Blogs Based on Topic Consistency
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01Current ranking algorithms, such as Page Rank, Technorati authority, and BI-Impact, favor blogs that report on a diversity of topics since those attract a large audience and thus more visitors, links, and comments. On the other side, niche blogs with a ...
Comments