ABSTRACT
Spatial analysis of Big data is a key component of Cyber-GIS. However, how to utilize existing cyberinfrastructure (e.g. large computing clusters) to perform parallel and distributed spatial analysis on Big data remains a huge challenge. Problems such as efficient spatial weights creation, spatial statistics and spatial regression of Big data still need investigation. In this research, we propose a MapReduce algorithm for creating contiguity-based spatial weights. This algorithm provides the ability to create spatial weights from very large spatial datasets efficiently by using computing resources that are organized in the Hadoop framework. It works in the paradigm of MapReduce: mappers are distributed in computing clusters to find contiguous neighbors in parallel, then reducers collect the results and generate the weights matrix. To test the performance of this algorithm, we design experiment to create contiguity-based weights matrix from artificial spatial data with up to 190 million polygons using Amazon's Hadoop framework called Elastic MapReduce. The experiment demonstrates the scalability of this parallel algorithm which utilizes large computing clusters to solve the problem of creating contiguity weights on Big data.
- L. Anselin. From spacestat to cybergis twenty years of spatial data analysis software. International Regional Science Review, 35(2):131--157, 2012.Google ScholarCross Ref
- L. Anselin and S. J. Rey. Spatial econometrics in an age of cybergiscience. International Journal of Geographical Information Science, 26(12):2211--2226, 2012. Google ScholarDigital Library
- L. Anselin, I. Syabri, and Y. Kho. Geoda: an introduction to spatial data analysis. Geographical analysis, 38(1):5--22, 2006.Google ScholarCross Ref
- M. F. Goodchild. Whose hand on the tiller? revisiting "spatial statistical analysis and gis". pages 49--59, 2010.Google Scholar
- C. Ji, T. Dong, Y. Li, Y. Shen, K. Li, W. Qiu, W. Qu and M. Guo. Inverted grid-based knn query processing with mapreduce. In ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh, pages 25--32. IEEE, 2012. Google ScholarDigital Library
- S. J. Rey and L. Anselin. Pysal: A python library of spatial analytical methods. pages 175--193, 2010.Google Scholar
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010. Google ScholarDigital Library
- S. Wang. A cybergis framework for the synthesis of cyberinfrastructure, gis, and spatial analysis. Annals of the Association of American Geographers, 100(3):535--557, 2010.Google ScholarCross Ref
- S. Wang, L. Anselin, B. Bhaduri, C. Crosby, M. F. Goodchild, Y. Liu, and T. L. Nyerges. Cybergis software: a synthetic review and integration roadmap. International Journal of Geographical Information Science, 27(11):2122--2145, 2013. Google ScholarDigital Library
Index Terms
- A MapReduce algorithm to create contiguity weights for spatial analysis of big data
Recommendations
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on ServicesIn the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Prominence of MapReduce in Big Data Processing
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network TechnologiesBig Data has come up with aureate haste and a clef enabler for the social business, Big Data gifts an opportunity to create extraordinary business advantage and better service delivery. Big Data is bringing a positive change in the decision making ...
Disease Surveillance System for Big Climate Data Processing and Dengue Transmission
Ambient intelligence is an emerging platform that provides advances in sensors and sensor networks, pervasive computing, and artificial intelligence to capture the real time climate data. This result continuously generates several exabytes of ...
Comments