Abstract
Nowadays, the importance of collecting large amounts of data is becoming increasingly crucial, along with the application of efficient and effective analysis techniques, in many areas. One of the most important field in which Big Data is becoming of fundamental importance is the biomedical domain, also due to the decreasing cost of acquiring and analyzing biomedical data. Furthermore, the emergence of more accessible technologies and the increasing speed-up of algorithms, also thanks to parallelization techniques, is helping at making the application of Big Data in healthcare a fast-growing field.
This paper presents a novel framework, Biomedical Hadoop Image Processing Interface (BioHIPI), capable of storing biomedical image collections in a Distributed File System (DFS) for exploiting the parallel processing of Big Data on a cluster of machines. The work is based on the Apache Hadoop technology and makes use of the Hadoop Distributed File System (HDFS) for storing images, the MapReduce libraries for parallel programming for processing, and Yet Another Resource Negotiator (YARN) to run processes on the cluster.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source code is available at https://github.com/memoclaudio/BioHipi.
- 2.
References
Henschen, D.: Emerging Options: MapReduce, Hadoop: Young, But Impressive. Information Week (2010). 24
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP 2003), pp. 29–43 (2003)
Schindelin, J., Rueden, C.T., Hiner, M.C., Eliceiri, K.W.: The ImageJ ecosystem: an open platform for biomedical image analysis. Mol. Reprod. Dev. 82(7–8), 518–529 (2015)
Margolis, R., Derr, L., Dunn, M., Huerta, M., Larkin, J., Sheehan, J., Mark, G., Green, E.D.: The National Institutes of Health’s Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data. J. Am. Med. Inform. Assoc. 21(6), 957–958 (2014)
Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: a literature review. Biomed. Inf. Insights 8, 1–10 (2016)
Sweeney, C., Liu, L., Arietta, S., Lawrence, J.: HIPI: a Hadoop image processing interface for image-based MapReduce tasks. University of Virginia (2011)
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinf. 11(Suppl 12), S1 (2010)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)
Dean, J., Sanjay, G.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC 2013), Article 5 (2013)
Acknowledgments
Claudio Stamile is funded by an EU MC ITN TRANSACT 2012 (316679) project. Francesco Calimeri has been partially supported by the Italian Ministry for Economic Development (MISE) under project “PIUCultura – Paradigmi Innovativi per l’Utilizzo della Cultura” (n. F/020016/01-02/X27), and by the EU under project “Smarter Solutions in the Big Data World (S2BDW)” (n. F/050389/01-03/X32) funded within the call “HORIZON2020” PON I&C 2014-2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Calimeri, F., Caracciolo, M., Marzullo, A., Stamile, C. (2018). BioHIPI: Biomedical Hadoop Image Processing Interface. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science(), vol 10710. Springer, Cham. https://doi.org/10.1007/978-3-319-72926-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-72926-8_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72925-1
Online ISBN: 978-3-319-72926-8
eBook Packages: Computer ScienceComputer Science (R0)