Abstract
This paper introduces a simple method for estimating cultural orientation, the affiliation of online entities in a polarized field of discourse. In particular, cocitation information is used to estimate the political orientation of hypertext documents. A type of cultural orientation, the political orientation of a document is the degree to which it participates in traditionally left- or right-wing beliefs. Estimating documents' political orientation is of interest for personalized information retrieval and recommender systems. In its application to politics, the method uses a simple probabilistic model to estimate the strength of association between a document and left- and right-wing communities. The model estimates the likelihood of cocitation between a document of interest and a small number of documents of known orientation. The model is tested on three sets of data, 695 partisan web documents, 162 political weblogs, and 198 nonpartisan documents. Accuracy above 90% is obtained from the cocitation model, outperforming lexically based classifiers at statistically significant levels.
Similar content being viewed by others
References
Agrawal R, Rajagopalan S, Ramakrishnan S, Xu Y (2003) Mining newsgroups using networks arising from social behavior. In: Proceedings of the twelfth international conference on World Wide Web. ACM, Budapest, Hungary, pp 529–535
Agresti A (2002) categorical data analysis, 2nd edn. Wiley, Hoboken, NJ
Barabasi L (2002) linked: The new science of networks. Perseus, New York
Beineke P, Hastie T, Vaithyanathan S (2004) The sentimental factor: improving review classification via human-provided information. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, ACL, Barcelona, pp 263–270
Botafogo RA, Shneiderman B (1991) Identifying aggregates in hypertext. In: UK conference on hypertext, pp 63–74
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30:107–117
Burges CJC A (1998) Tutorial on support vector machines. Data Min Knowl Discov 2(2):121–167
Church KW, Hanks P (1989) Word association norms, mutual information and lexicography. In: 27th annual conference of the ACL, ACL, New Brunswick, NJ, pp 76–83
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the twelfth international conference on World Wide Web, ACM, Budapest, Hungary, pp 519–528
Ehrlich E (2003) What will happen when a national political machine can fit on a laptop? The New York Times, December 14, p B01
Firth JR (1957) A synopsis of linguistic theory 1930–1955. Studies in linguistic analysis. Philological Society, Oxford, pp 1–32
Gibson D, Kleinberg J, Raghavan P (1998) Inferring web communities from link topology. In: Proceedings of the ninth ACM conference on hypertext and hypermedia: Links, objects, time and space–-structure in hypermedia systems, ACM, Budapest, Hungary, pp 225–234
Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Cohen, P.R., Washlster W (eds) The thirty-fifth annual meeting of the association for computational linguistics, ACL, Somerset, NJ, pp 174–181
Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the eleventh international conference on World Wide Web, ACM, Budapest, Hungary, pp 517–526
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Mitchell TM (1997) Machine learning. McGraw-Hill, Boston, MA
Okrent D (2003) An advocate for times readers introduces himself. The New York Times, December 7, p 2
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: The 2002 conference on empirical methods in natural language processing (EMNLP), pp 79–86
Rheingold H (2002) Smart Mobs: The next social revolution. Perseus, New York
Schamber L (1994) Relevance and information behavior. In: Williams ME (ed) Annual review of information science and technology, vol 29. American Society for Information Science, Medford, NJ, pp 3–48
Shapiro SM (2003) The dean connection. New York Times Magazine, December 7, p 56
Tang R, Solomon P (1998) Towards an understanding of the dynamics of relevance judgement: an analysis of one person's search behavior. Inf Process Manage 34:237–256
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting of the association for computational linguistics, ACL, Philadelphia, PA, pp 417–424
Turney P, Littman M (2002) Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report ERB-1094, National Research Council Canada,Institute for Information Technology
Watts DJ (2003) Six degrees: The science of a connected Age. W.W. Norton, New York
Wiebe J (2000) Learning subjective adjectives from corpora. In: Proceedings of the seventeenth national conference on artificial intelligence and twelfth conference on innovative applications of artificial intelligence, AAAI Press/MIT Press, Austin, TX, pp 735–740
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Efron, M. Using cocitation information to estimate political orientation in web documents. Knowl Inf Syst 9, 492–511 (2006). https://doi.org/10.1007/s10115-005-0214-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0214-9