Skip to main content

Adenine: A HPC-Oriented Tool for Biological Data Exploration

  • Conference paper
  • First Online:
Book cover Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2017)

Abstract

adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github.com/slipguru/adenine under FreeBSD license.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://mpi-forum.org/.

  2. 2.

    http://slipguru.github.io/adenine.

References

  1. Barrett, T., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2013)

    Article  Google Scholar 

  2. Bishop, C.M.: Pattern recognition. Mach. Learn. 4, 359–422 (2006)

    Google Scholar 

  3. Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling: Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X

    Book  MATH  Google Scholar 

  4. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)

    Article  Google Scholar 

  5. De Souto, M.C.P., Jaskowiak, P.A., Costa, I.G.: Impact of missing data imputation methods on gene expression clustering and classification. BMC Bioinform. 16(1), 64 (2015)

    Article  Google Scholar 

  6. Demšar, J., et al.: Orange: data mining toolbox in Python. J. Mach. Learn. Res. 14(1), 2349–2353 (2013)

    MATH  Google Scholar 

  7. Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  8. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  Google Scholar 

  9. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, Berlin (2001). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  10. Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  Google Scholar 

  11. Jolliffe, I.: Principal Component Analysis. Wiley Online Library, Hoboken (2002)

    MATH  Google Scholar 

  12. Lewis, J.M., De Sa, V.R., Van Der Maaten, L.: Divvy: fast and intuitive exploratory data analysis. J. Mach. Learn. Res. 14(1), 3159–3163 (2013)

    Google Scholar 

  13. Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)

    Google Scholar 

  14. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  15. Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)

    Article  Google Scholar 

  16. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  17. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  18. Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0020217

    Chapter  Google Scholar 

  19. Schulz, W.: Molecular Biology of Human Cancers: An Advanced Student’s Textbook. Springer, Dordrecht (2005). https://doi.org/10.1007/978-1-4020-3186-1

    Book  Google Scholar 

  20. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  21. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  22. Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  23. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(85), 2579–2605 (2008)

    MATH  Google Scholar 

Download references

Acknowledgments

We would like to acknowledge Dr. Davide Marini for his help, assistance and support in using the high-performance computing (HPC) systems operated by the Ligurian Cluster for Marine Technologies (DLTM - http://www.dltm.it).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuele Fiorini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fiorini, S., Tomasi, F., Squillario, M., Barla, A. (2019). Adenine: A HPC-Oriented Tool for Biological Data Exploration. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14160-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14159-2

  • Online ISBN: 978-3-030-14160-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics