skip to main content
10.1145/1772690.1772889acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
demonstration

Access: news and blog analysis for the social sciences

Published:26 April 2010Publication History

ABSTRACT

The social sciences strive to understand the political, social, and cultural world around us, but have been impaired by limited access to the quantitative data sources enjoyed by the hard sciences. Careful analysis of Web document streams holds enormous potential to solve longstanding problems in a variety of social science disciplines through massive data analysis. This paper introduces the TextMap Access system, which provides ready access to a wealth of interesting statistics on millions of people, places, and things across a number of interesting web corpora. Powered by a flexible and scalable distributed statistics computation framework using Hadoop, continually updated corpora include newspapers, blogs, patent records, legal documents, and scientific abstracts; well over a terabyte of raw text and growing daily. The Lydia Textmap Access system, available through http://www.textmap.com/access, provides instant access for students and scholars through a convenient web user-interface. We describe the architecture of the TextMap Access system, and its impact on current research in political science, sociology, and business/marketing.

References

  1. Spinn3r. http://spinn3r.com/.Google ScholarGoogle Scholar
  2. Apache Software Foundation. The Hadoop Project. http://lucene.apache.org/hadoop.Google ScholarGoogle Scholar
  3. M. Bautin, L. Vijayarenu, and S. Skiena. International Sentiment Analysis for News and Blogs. In Proc. of the International Conference on Weblogs and Social Media, Seattle, WA, April 2008.Google ScholarGoogle Scholar
  4. M. Bautin, C. Ward, and S. Skiena. A scalable architecture for historical news analysis. Submitted, 2009.Google ScholarGoogle Scholar
  5. J. Box-Steffensmeier, D. Darmofal, and C. Farrell. The endogenous relationship of campaign expenditures, expected vote, and media coverage. In American Political Science Association annual meeting, 2005.Google ScholarGoogle Scholar
  6. H. Brandenburg. Revisiting the "Liberal Media Bias": A Quantitative Study into Candidate Treatment by the Broadcast Media During the 2004 Presidential Election Campaign. In Proc. of the Annual Meeting of the American Political Science Association, Philadelphia, Sep 2006.Google ScholarGoogle Scholar
  7. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. of the OSDI'04: Sixth Symposium on Operating System Design and Implementation, pages 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Godbole, M. Srinivasaiah, and S. Skiena. Large-Scale Sentiment Analysis for News and Blogs. In Proc. of the International Conference on Weblogs and Social Media, Mar. 2007.Google ScholarGoogle Scholar
  9. L. Huddie, C. Johnston, and M. Lebo. Elite influence, media coverage, and public opinion on the iraq war. In Midwest Political Science Association 67th Annual National Conference, 2009.Google ScholarGoogle Scholar
  10. L. Lloyd. Lydia: A System for the Large Scale Analysis of Natural Language Text. PhD thesis, Stony Brook University 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Lloyd, P. Kaulgud, and S. Skiena. Newspapers vs. blogs: Who gets the scoop? In Computational Approaches to Analyzing Weblogs (AAAI-CAAW 2006), volume AAAI Press, Technical Report SS-06-03, pages 117--124, 2006.Google ScholarGoogle Scholar
  12. L. Lloyd, D. Kechagias, and S. Skiena. Lydia: A system for large-scale news analysis. In SPIRE, pages 161--166, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. A. Miller. WordNet: a lexical database for English. Commun. ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Pang and L. Lee. Opinion Mining and Sentiment Analysis. Now Publishers, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Sussan. Personal communicaton, 2009.Google ScholarGoogle Scholar
  16. C. Ward, S. Skiena, A. van de Rijt, and E. Shor. Sociological news analysis. in preparation, 2009.Google ScholarGoogle Scholar
  17. C. B. Ward, M. Bautin, and S. Skiena. Identifying differences in news coverage between cultural/ethnic groups. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 3:511--514,2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Zhang and S. Skiena. Improving movie gross prediction through news analysis. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 1:301--304, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Zhang and S. Skiena. Trading strategies to exploit news sentiment. Submitted for publication, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Access: news and blog analysis for the social sciences

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • demonstration

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub