Skip to main content

Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7017))

Abstract

Novel analytical techniques have dramatically enhanced our understanding of many application domains including biological networks inferred from gene expression studies. However, there are clear computational challenges associated to the large datasets generated from these studies. The algorithmic solution of some NP-hard combinatorial optimization problems that naturally arise on the analysis of large networks is difficult without specialized computer facilities (i.e. supercomputers). In this work, we address the data clustering problem of large-scale biological networks with a polynomial-time algorithm that uses reasonable computing resources and is limited by the available memory. We have adapted and improved the MSTkNN graph partitioning algorithm and redesigned it to take advantage of external memory (EM) algorithms. We evaluate the scalability and performance of our proposed algorithm on a well-known breast cancer microarray study and its associated dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Inostroza-Ponta, M.: An Integrated and Scalable Approach Based on Combinatorial Optimization Techniques for the Analysis of Microarray Data, PhD thesis, The University of Newcastle, Australia (2008)

    Google Scholar 

  2. Gonzalez-Barrios, J.M., Quiroz, A.J.: A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree. Statistics and Probability Letters 62(3), 23–34 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  3. Inostroza-Ponta, M., Mendes, A., Berretta, R., Moscato, P.: An integrated QAP-based approach to visualize patterns of gene expression similarity. In: Randall, M., Abbass, H.A., Wiles, J. (eds.) ACAL 2007. LNCS (LNAI), vol. 4828, pp. 156–167. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Dementiev, R., Sanders, P., Schultes, D., Sibeyn, J.: Engineering an external memory minimum spanning tree algorithm. In: 3rd IFIP Intl. Conf. on Theoretical Computer Science, pp. 195–208 (2004)

    Google Scholar 

  5. Sibeyn, J.: External Connected Components. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 468–479. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Schultes, D.: External memory spanning forests and connected components, Technical report (2004), http://algo2.iti.kit.edu/dementiev/files/cc.pdf

  7. Vitter, J.S.: External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys 33 (2001)

    Google Scholar 

  8. Xu, Y., Olman, V., Xu, D.: Clustering Gene Expression Data Using a Graph-Theoretic Approach: An Application of Minimum Spanning Tree. Bioinformatics 18(4), 526–535 (2002)

    Article  Google Scholar 

  9. Grygorash, O., Zhou, Y., Jorgensen, Z.: Minimum Spanning Tree Based Clustering Algorithms. In: Proc. of the 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2006), pp. 73–81. IEEE Computer Society, Washington, DC, USA (2006)

    Google Scholar 

  10. Doowang, J.: An external memory approach to computing the maximal repeats across classes of dna sequences. Asian Journal of Health and Information Sciences 1(3), 276–295 (2006)

    Google Scholar 

  11. Choi, J.H., Cho, H.G.: Analysis of common k-mers for whole genome sequences using SSB-tree. Japanese Society for Bioinformatics 13, 30–41 (2002)

    Google Scholar 

  12. Chiang, Y., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., et al.: External-memory graph algorithms, In. In: SODA 1995: Proceedings of the Sixth Annual ACM-SIAM, pp. 139–149. Society for IAM, Philadelphia (1995)

    Google Scholar 

  13. Abello, J., Buchsbaum, A.L., Westbrook, J.R.: A functional approach to external graph algorithms. Algorithmica, 332–343 (1998)

    Google Scholar 

  14. van de Vijver, M.J., He, Y.D., van’t Veer, L.J., Dai, H., et al.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25) (2002)

    Google Scholar 

  15. Fayyad, U.M., Irarni, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: IJCAI, pp. 1022–1029 (1993)

    Google Scholar 

  16. Cotta, C., Sloper, C., Moscato, P.: Evolutionary Search of Thresholds for Robust Feature Set Selection: Application to the Analysis of Microarray Data. In: Raidl, G.R., Cagnoni, S., Branke, J., Corne, D.W., Drechsler, R., Jin, Y., Johnson, C.G., Machado, P., Marchiori, E., Rothlauf, F., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2004. LNCS, vol. 3005, pp. 21–30. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Rocha de Paula, M., Ravetti, M.G., Rosso, O.A., Berretta, R., Moscato, P.: Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PLoS ONE 6(e17481) (2011)

    Google Scholar 

  18. Jiang, X.P., Elliot, R.L., Head, J.F.: Manipulation of iron transporter genes results in the suppression of human and mouse mammary adenocarcinomas. Anticancer Res. 30(3), 759–765 (2010)

    Google Scholar 

  19. Shamir, R., Sharan, R.: CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis. In: Proc. of ISMB, pp. 307–316 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arefin, A.S., Inostroza-Ponta, M., Mathieson, L., Berretta, R., Moscato, P. (2011). Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2011. Lecture Notes in Computer Science, vol 7017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24669-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24669-2_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24668-5

  • Online ISBN: 978-3-642-24669-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics