Adenine: A HPC-Oriented Tool for Biological Data Exploration

Fiorini, Samuele; Tomasi, Federico; Squillario, Margherita; Barla, Annalisa

doi:10.1007/978-3-030-14160-8_6

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10834))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

441 Accesses

Abstract

adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github.com/slipguru/adenine under FreeBSD license.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Barrett, T., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2013)
Article Google Scholar
Bishop, C.M.: Pattern recognition. Mach. Learn. 4, 359–422 (2006)
Google Scholar
Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling: Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X
Book MATH Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
De Souto, M.C.P., Jaskowiak, P.A., Costa, I.G.: Impact of missing data imputation methods on gene expression clustering and classification. BMC Bioinform. 16(1), 64 (2015)
Article Google Scholar
Demšar, J., et al.: Orange: data mining toolbox in Python. J. Mach. Learn. Res. 14(1), 2349–2353 (2013)
MATH Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, Berlin (2001). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet Google Scholar
Jolliffe, I.: Principal Component Analysis. Wiley Online Library, Hoboken (2002)
MATH Google Scholar
Lewis, J.M., De Sa, V.R., Van Der Maaten, L.: Divvy: fast and intuitive exploratory data analysis. J. Mach. Learn. Res. 14(1), 3159–3163 (2013)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0020217
Chapter Google Scholar
Schulz, W.: Molecular Biology of Human Cancers: An Advanced Student’s Textbook. Springer, Dordrecht (2005). https://doi.org/10.1007/978-1-4020-3186-1
Book Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(85), 2579–2605 (2008)
MATH Google Scholar

Download references

Acknowledgments

We would like to acknowledge Dr. Davide Marini for his help, assistance and support in using the high-performance computing (HPC) systems operated by the Ligurian Cluster for Marine Technologies (DLTM - http://www.dltm.it).

Author information

Authors and Affiliations

Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16146, Genoa, Italy
Samuele Fiorini, Federico Tomasi, Margherita Squillario & Annalisa Barla

Authors

Samuele Fiorini
View author publications
You can also search for this author in PubMed Google Scholar
Federico Tomasi
View author publications
You can also search for this author in PubMed Google Scholar
Margherita Squillario
View author publications
You can also search for this author in PubMed Google Scholar
Annalisa Barla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samuele Fiorini .

Editor information

Editors and Affiliations

University of Cagliari, Cagliari, Italy
Massimo Bartoletti
University of Genova, Genoa, Italy
Annalisa Barla
University of Stirling, Stirling, UK
Andrea Bracciali
Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
Gunnar W. Klau
Houston Methodist Research Institute, Houston, TX, USA
Leif Peterson
University of Udine, Udine, Italy
Alberto Policriti
University of Salerno, Fisciano, Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fiorini, S., Tomasi, F., Squillario, M., Barla, A. (2019). Adenine: A HPC-Oriented Tool for Biological Data Exploration. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-14160-8_6
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14159-2
Online ISBN: 978-3-030-14160-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics