Skip to main content
Log in

A preconditioned conjugate gradient algorithm for GeneRank with application to microarray data mining

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The problem of identifying key genes is of fundamental importance in biology and medicine. The GeneRank model explores connectivity data to produce a prioritization of the genes in a microarray experiment that is less susceptible to variation caused by experimental noise than the one based on expression levels alone. The GeneRank algorithm amounts to solving an unsymmetric linear system. However, when the matrix in question is very large, the GeneRank algorithm is inefficient and even can be infeasible. On the other hand, the adjacency matrix is symmetric in the GeneRank model, while the original GeneRank algorithm fails to exploit the symmetric structure of the problem in question. In this paper, we discover that the GeneRank problem can be rewritten as a symmetric positive definite linear system, and propose a preconditioned conjugate gradient algorithm to solve it. Numerical experiments support our theoretical results, and show superiority of the novel algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aerts S et al (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24: 537–544

    Article  Google Scholar 

  • Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. Proc LSS Comput Syst Bioinform Conf 8: 37–46

    Google Scholar 

  • Bai Z, Demmel J, Dongarra J, Ruhe A, van der Vorst H (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  • Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182: 418–477

    Article  MathSciNet  MATH  Google Scholar 

  • Cipra B (2000) The best of the 20th Century: editors name top 10 algorithms. SIAM News 33(4)

  • Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  • Franke L et al (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78: 1011–1025

    Article  Google Scholar 

  • Freschi V (2007) Protein function prediction from interaction networks using a random walk ranking algorithm. IEEE international conference on bioinformatics and bioengineering, pp 42–48

  • Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore, London

    MATH  Google Scholar 

  • Hai D, Lee W, Thuy H (2008) A PageRanking based method for identifying chracteristic genes of a disease. IEEE Int Conf Netw Sens Control 6–8: 1496–1499

    Article  Google Scholar 

  • Hestenes M, Stiefel E (1952) Methods of conjugate gradients for linear systems. J Res Natl Bur Stand 49: 409–436

    Article  MathSciNet  MATH  Google Scholar 

  • Jia Z (1997) Refined iterative algorithms based on Arnoldi’s process for large unsymmetric eigenproblems. Linear Algebra Appl 259: 1–23

    Article  MathSciNet  MATH  Google Scholar 

  • Ma X, Lee H, Wang L, Sun F (2007) CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data. Bioinformatics 23(2): 215–221

    Article  Google Scholar 

  • Morrison J, Breitling R, Higham D, Gilbert D (2005) GeneRank: using search engine for the analysis of microarray experiments. BMC Bioinform 6: 233–246

    Article  Google Scholar 

  • Page L, Brin S, Motwami R, Winograd T (1998) The PageRank citation ranking: bring order to the web, Technical report. Computer Science Department, Stanford University, Palo Alto

    Google Scholar 

  • Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  • Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3: 1–13

    Article  Google Scholar 

  • Taylor A, Higham D (2008) CONTEST: a controllable test matrix toolbox for MATLAB. ACM Trans Math Softw 35, Article 26

  • The MATHWORKS, Inc. (2004) MATLAB 7. September

  • Wu G, Zhang Y, Wei Y (2010) Krylov subspace algorithms for computing GeneRank for the analysis of microarray data mining. J Comput Biol 17: 631–646

    Article  MathSciNet  Google Scholar 

  • Wu G, Zhang Y, Wei Y (under review) Accelerating the Arnoldi-type algorithm for computing Google’s PageRank

  • Xenarios I, Salwinski L, Duan X, Higney P, Kim S (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30: 303–305

    Article  Google Scholar 

  • Yue B, Liang H, Bai F (2007) Understanding the GeneRank model. IEEE 1st Int Conf Bioinform Biomed Eng 6–8: 248–251

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Xu.

Additional information

Responsible editor: Pierre Baldi.

Dedicated to Prof. Zhi-hao Cao on the occasion of his 75th birthday.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, G., Xu, W., Zhang, Y. et al. A preconditioned conjugate gradient algorithm for GeneRank with application to microarray data mining. Data Min Knowl Disc 26, 27–56 (2013). https://doi.org/10.1007/s10618-011-0245-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0245-7

Keywords

Navigation