Abstract
Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison of multiple results, and facilitate algorithm sensitivity studies. The sizes of images and analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present SparkGIS, a distributed, in-memory spatial data processing framework to query, retrieve, and compare large volumes of analytical image result data for algorithm evaluation. Our approach combines the in-memory distributed processing capabilities of Apache Spark and the efficient spatial query processing of Hadoop-GIS. The experimental evaluation of SparkGIS for heatmap computations used to compare nucleus segmentation results from multiple images and analysis runs shows that SparkGIS is efficient and scalable, enabling algorithm evaluation and algorithm sensitivity studies on large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Mongo hadoop. https://github.com/mongodb/mongo-hadoop
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Beck, A.H., Sangoi, A.R., Leung, S., Marinelli, R.J., Nielsen, T.O., van de Vijver, M.J., West, R.B., van de Rijn, M., Koller, D.: Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3(108), 108ra113 (2011)
Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S.R., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kur, T.M., Moreno, C.S., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. JAMIA 19(2), 317–323 (2012)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Eldawy, A.: Spatialhadoop: towards flexible and scalable spatial processing using mapreduce. In: Proceedings of the 2014 SIGMOD PhD Symposium, pp. 46–50. ACM, New York (2014)
Frye, R., McKenney, M.: Big data storage techniques for spatial databases: implications of big data architecture on spatial query processing. In: Information Granularity, Big Data, and Computational Intelligence, pp. 297–323. Springer, Switzerland (2015)
Fuchs, T.J., Buhmann, J.M.: Computational pathology: challenges and promises for tissue analysis. Comput. Med. Imaging Graph. 35(7), 515–530 (2011)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz (1901)
Jia Yu, J.W., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 2015 International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL 2015 (2015)
Kong, J., Cooper, L.A.D., Wang, F., Chisolm, C., Moreno, C.S., Kur, T.M., Widener, P.M., Brat, D.J., Saltz, J.H.: A comprehensive framework for classification of nuclei in digital microscopy imaging: an application to diffuse gliomas. In: ISBI, pp. 2128–2131. IEEE (2011)
Louis, D.N., Feldman, M., Carter, A.B., Dighe, A.S., Pfeifer, J.D., Bry, L., Almeida, J.S., Saltz, J., Braun, J., Tomaszewski, J.E., et al.: Computational pathology: a path ahead. Archives of Pathology and Laboratory Medicine (2015)
Nishimura, S., Das, S., Agrawal, D., Abbadim A.E.: Md-hbase: a scalable multi-dimensional data infrastructure for location aware services. In: Proceedings of the 2011 IEEE 12th International Conference on Mobile Data Management, MDM 2011, vol. 01, pp. 7–16. IEEE Computer Society, Washington, DC (2011)
You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: IEEE CloudDM Workshop, to appear 2015. http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf
Yuan, Y., Failmezger, H., Rueda, O.M., Ali, H.R., Gräf, S., Chin, S.-F., Schwarz, R.F., Curtis, C., Dunning, M.J., Bardwell, H., Johnson, N., Doyle, S., Turashvili, G., Provenzano, E., Aparicio, S., Caldas, C., Markowetz, F.: Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4(157), 157ra143 (2012)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012, p. 2. USENIX Association, Berkeley (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010)
Acknowledgments
This work was funded in part by HHSN261200800001E from the NCI, 1U24CA180924-01A1 from the NCI, 5R01LM011119-05 and 5R01LM009239-07 from the NLM.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Baig, F., Mehrotra, M., Vo, H., Wang, F., Saltz, J., Kurc, T. (2016). SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-41576-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41575-8
Online ISBN: 978-3-319-41576-5
eBook Packages: Computer ScienceComputer Science (R0)