Abstract
We study the properties of the principal eigenvector for the adjacency matrix (and related matrices) for a general directed graph. In particular—motivated by the use of the eigenvector for estimating the “importance” of the nodes in the graph—we focus on the distribution of positive weight in this eigenvector, and give a coherent picture which builds upon and unites earlier results. We also propose a simple method—“T-Rank”—for generating importance scores. T-Rank generates authority scores via a one-level, non-normalized matrix, and is thus distinct from known methods such as PageRank (normalized), HITS (two-level), and SALSA (two-level and normalized). We show, using our understanding of the principal eigenvector, that T-Rank has a much less severe “sink problem” than does PageRank. Also, we offer numerical results which quantify the “tightly-knit community” or TKC effect. We find that T-Rank has a stronger TKC effect than PageRank, and we offer a novel interpolation method which allows for continuous tuning of the strength of this TKC effect. Finally, we propose two new “sink remedies”, i.e., methods for ensuring that the principal eigenvector is positive everywhere. One of our sink remedies (source pumping) is unique among sink remedies, in that it gives a positive eigenvector without rendering the graph strongly connected. We offer a preliminary evaluation of the effects and possible applications of these new sink remedies.
Similar content being viewed by others
References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: LinkKDD’05: proceedings of the 3rd international workshop on Link discovery. ACM, New York, pp 36–43
Arasu A, Novak J, Tomkins A, Tomlin J (2002) PageRank computation and the structure of the Web: experiments and algorithms. In: Proceedings of the 11th international world wide web conference
Avrachenkov K, Litvak N, Pham KS (2007) Distribution of pagerank mass among principle components of the web. In: Workshop on algorithms and models for the web-graph (WAW2007). San Diego, December 11–12
Baeza-Yates R, Saint-Jean F, Castillo C (2002) Web structure, dynamics and page quality. In: String processing and information retrieval, vol 2476, Lecture Notes in Computer Science. Springer, pp 117–130
Berkhin P (2005) A survey on pagerank computing. Internet Math 2(1): 73–120
Berman A, Plemmons RJ (1979) Nonnegative matrices in the mathematical sciences. Academic Press, New York
Berman A, Shaked-Monderer N (2009) Encyclopedia of complexity and systems science. chapter Nonnegative Matrices and Digraphs. Springer
Bianchini M, Gori M, Scarselli F (2005) Inside pagerank. ACM Trans Inter Tech 5(1): 92–128
Bjelland J, Canright GS, Engø-Monsen K (2008) Web link analysis: estimating a document’s importance from its context. Telektronikk 1: 95–113
Bjelland J, Canright G, Engø-Monsen K (2009) Encyclopedia of complexity and systems science, chapter Link Analysis and Web Search. Springer
Boldi P, Vigna S (2004) The webgraph framework I: compression techniques. In: Proceedings of the 13th international world wide web conference. ACM Press, pp 595–601
Boldi P, Santini M, Vigna S (2005) Pagerank as a function of the damping factor. In: WWW ’05: proceedings of the 14th international conference on world wide web. ACM, New York, pp 557–566
Broder A, Kumar R, Maghoul F, Raghavan P, Stata R (2000) Graph structure in the web. In: Proceedings of the 9th international world wide web conference, pp 247–256
Ding C, He X, Husbands P, Zha H, Simon HD (2002) Pagerank, hits and a unified framework for link analysis. In: SIGIR ’02: proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 353–354
Donato D, Leonardi S, Millozzi S, Tsaparas P (2005) Mining the inner structure of the web graph. In: Proceeding of the 8th international workshop on the web and databases, pp 145–150
Ebel H, Mielsch LI, Bornholdt S (2002) Scale-free topology of e-mail networks. Phys Rev E 66(3): 035103
Farkas Illés J, Derényi I, Barabási A, Vicsek T (2001) Spectra of real-world graphs: beyond the semicircle law. Phys Rev E 64(2): 026704
Gantmacher FR (1959) The theory of matrices, vol 2. Chelsea, New York
Gleich D (2006) MatlabBGL. Stanford University Institute for Computational and Mathematical Engineering
Goh K-I, Kahng B, Kim D (2001) Spectra and eigenvectors of scale-free networks. Phys Rev E 64(5): 051903
Gospodnetic O, Hatcher E (2004) Lucene in action. Manning Publications, Greenwich
Harary F, Norman RZ, Cartwright D (1965) Structural models: an introduction to the theory of directed graphs. Wiley, New York
Hirai J, Raghavan S, Garcia-Molina H, Paepcke A (2000) WebBase: a repository of Web pages. Comput Netw (Amsterdam, Netherlands: 1999) 33(1–6): 277–293
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632
Langville AN, Meyer CD (2004) Deeper inside pagerank. Internet Math 1(3): 335–400
Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47(1): 135–161
Langville A, Meyer C (2006) Google’s pageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton
Lempel R, Moran S (2001) Salsa: the stochastic approach for link-structure analysis. ACM Trans Inf Syst 19(2): 131–160
Meila M, Pentney W (2007) Clustering by weighted cuts in directed graphs. In: SDM, SIAM
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, Cambridge
Ng AY, Zheng AX, Jordan MI (2001a) Stable algorithms for link analysis. In: SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 258–266
Ng AY, Zheng AX, Jordan MI (2001b) Link analysis, eigenvectors and stability. In: IJCAI, pp 903–910
Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project
Rothblum UG (1975) Algebraic eigenspaces of nonnegative matrices. Linear Algebra Appl 12: 281–292
Tarjan R (1972) Depth-first search and linear graph algorithms. SICOMP 1(2): 146–160
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
Victory HD Jr (1985) On nonnegative solutions of matrix equations. SIAM J Algebraic Discret Methods 6(3): 406–412
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: R. Bayardo.
Rights and permissions
About this article
Cite this article
Bjelland, J., Burgess, M., Canright, G. et al. Eigenvectors of directed graphs and importance scores: dominance, T-Rank, and sink remedies. Data Min Knowl Disc 20, 98–151 (2010). https://doi.org/10.1007/s10618-009-0154-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0154-1