Skip to main content
Log in

Using cocitation information to estimate political orientation in web documents

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This paper introduces a simple method for estimating cultural orientation, the affiliation of online entities in a polarized field of discourse. In particular, cocitation information is used to estimate the political orientation of hypertext documents. A type of cultural orientation, the political orientation of a document is the degree to which it participates in traditionally left- or right-wing beliefs. Estimating documents' political orientation is of interest for personalized information retrieval and recommender systems. In its application to politics, the method uses a simple probabilistic model to estimate the strength of association between a document and left- and right-wing communities. The model estimates the likelihood of cocitation between a document of interest and a small number of documents of known orientation. The model is tested on three sets of data, 695 partisan web documents, 162 political weblogs, and 198 nonpartisan documents. Accuracy above 90% is obtained from the cocitation model, outperforming lexically based classifiers at statistically significant levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Rajagopalan S, Ramakrishnan S, Xu Y (2003) Mining newsgroups using networks arising from social behavior. In: Proceedings of the twelfth international conference on World Wide Web. ACM, Budapest, Hungary, pp 529–535

  2. Agresti A (2002) categorical data analysis, 2nd edn. Wiley, Hoboken, NJ

    Google Scholar 

  3. Barabasi L (2002) linked: The new science of networks. Perseus, New York

    Google Scholar 

  4. Beineke P, Hastie T, Vaithyanathan S (2004) The sentimental factor: improving review classification via human-provided information. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, ACL, Barcelona, pp 263–270

  5. Botafogo RA, Shneiderman B (1991) Identifying aggregates in hypertext. In: UK conference on hypertext, pp 63–74

  6. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30:107–117

    Google Scholar 

  7. Burges CJC A (1998) Tutorial on support vector machines. Data Min Knowl Discov 2(2):121–167

    Google Scholar 

  8. Church KW, Hanks P (1989) Word association norms, mutual information and lexicography. In: 27th annual conference of the ACL, ACL, New Brunswick, NJ, pp 76–83

  9. Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the twelfth international conference on World Wide Web, ACM, Budapest, Hungary, pp 519–528

  10. Ehrlich E (2003) What will happen when a national political machine can fit on a laptop? The New York Times, December 14, p B01

  11. Firth JR (1957) A synopsis of linguistic theory 1930–1955. Studies in linguistic analysis. Philological Society, Oxford, pp 1–32

    Google Scholar 

  12. Gibson D, Kleinberg J, Raghavan P (1998) Inferring web communities from link topology. In: Proceedings of the ninth ACM conference on hypertext and hypermedia: Links, objects, time and space–-structure in hypermedia systems, ACM, Budapest, Hungary, pp 225–234

  13. Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Cohen, P.R., Washlster W (eds) The thirty-fifth annual meeting of the association for computational linguistics, ACL, Somerset, NJ, pp 174–181

  14. Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the eleventh international conference on World Wide Web, ACM, Budapest, Hungary, pp 517–526

  15. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632

    Article  MATH  MathSciNet  Google Scholar 

  16. Mitchell TM (1997) Machine learning. McGraw-Hill, Boston, MA

    Google Scholar 

  17. Okrent D (2003) An advocate for times readers introduces himself. The New York Times, December 7, p 2

  18. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: The 2002 conference on empirical methods in natural language processing (EMNLP), pp 79–86

  19. Rheingold H (2002) Smart Mobs: The next social revolution. Perseus, New York

    Google Scholar 

  20. Schamber L (1994) Relevance and information behavior. In: Williams ME (ed) Annual review of information science and technology, vol 29. American Society for Information Science, Medford, NJ, pp 3–48

  21. Shapiro SM (2003) The dean connection. New York Times Magazine, December 7, p 56

  22. Tang R, Solomon P (1998) Towards an understanding of the dynamics of relevance judgement: an analysis of one person's search behavior. Inf Process Manage 34:237–256

    Google Scholar 

  23. Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting of the association for computational linguistics, ACL, Philadelphia, PA, pp 417–424

  24. Turney P, Littman M (2002) Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report ERB-1094, National Research Council Canada,Institute for Information Technology

  25. Watts DJ (2003) Six degrees: The science of a connected Age. W.W. Norton, New York

    Google Scholar 

  26. Wiebe J (2000) Learning subjective adjectives from corpora. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on innovative applications of artificial intelligence, AAAI Press/MIT Press, Austin, TX, pp 735–740

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Efron, M. Using cocitation information to estimate political orientation in web documents. Knowl Inf Syst 9, 492–511 (2006). https://doi.org/10.1007/s10115-005-0214-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0214-9

Navigation