Definitions
Spatial big data is a spatio-temporal data that is too large or requires data-intensive computation that is too demanding for traditional computing architectures. Stream processing in this context is the processing of spatio-temporal data in motion. The data is observational; it is produced by sensors – moving or otherwise. Computations on the data are made as the data is produced or received. A distributed processing cluster is a networked collection of computers that communicate and process data in a coordinated manner. Computers in the cluster are coordinated to solve a common problem. A lambda architecture is a scalable, fault-tolerant data-processing architecture that is designed to handle large quantities of data by exploiting both stream and batch processing methods. Data partitioninginvolves physically dividing a dataset into separate data stores on a distributed processing cluster. This...
References
Abel DJ, Ooi BC, Tan K-L, Power R, Yu JX (1995) Spatial join strategies in distributed spatial DBMS. In: Advances in spatial databases – 4th international symposium, SSD’95. Lecture notes in computer science, vol 1619. Springer, Portland, pp 348–367
Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endow 6(11):1009–1020
Alexander W, Copeland G (1988) Process and dataflow control in distributed data-intensive systems. In: Proceedings of the 1988 ACM SIGMOD international conference on management of data (SIGMOD ’88), pp 90–98. https://doi.org/10.1145/50202.50212
Apache (2006) Welcome to Apache Hadoop!. http://hadoop.apache.org. Accessed 26 Mar 2018
Brinkhoff T, Kriegel HP, Seeger B (1996) Parallel processing of spatial joins using r-trees. In: Proceedings of the 12th international conference on data engineering, New Orleans, Louisiana, pp 258–265
Chang F, Dean J, Ghemawat S, Hsieh W, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2). https://doi.org/10.1145/1365815.1365816
Chang WY, Abu-Amara H, Sanford JF (2010) Transforming Enterprise Cloud Services. Springer, London, pp 55–56
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. https://doi.org/10.1145/1327452.1327492
DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6). https://doi.org/10.1145/129888.129894
DeWitt DJ, Gerber RH, Graefe G, Heytens ML, Kumar KB, Muralikrishna M (1986) GAMMA – a high performance dataflow database machine. In: Proceedings of the 12th international conference on very large data bases (VLDB ’86), Kyoto, Japan, pp 228–237
Du Z, Zhao X, Ye X, Zhou J, Zhang F, Liu R (2017) An effective high-performance multiway spatial join algorithm with spark. ISPRS Int J Geo-Information 6(4):96
Eldawy A, Mokbel MF (2015) SpatialHadoop: a mapreduce framework for spatial data. In: IEEE 31st international conference on data engineering (ICDE), Seoul, South Korea, pp 1352–1363
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, pp 226–231
Garillot F, Maas G (2018) Stream processing with apache spark: best practices for scaling and optimizing Apache spark. O’Reilly Media, Sebastopol. http://shop.oreilly.com/product/0636920047568.do
Gedik B, Andrade H, Wu K-L, Yu PS, Doo M (2008) SPADE: the system s declarative stream processing engine. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 1123–1134. https://doi.org/10.1145/1376616.1376729
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: Proceedings of the 19th ACM symposium on operating systems principles, Oct 2003, pp 29–43. https://doi.org/10.1145/945445.945450
Grossman M, Sarkar, V (2016) SWAT: a programmable, in-memory, distributed, high-performance computing platform. In: Proceedings of the 25th ACM international symposium on high-performance parallel and distributed computing (HPDC ’16). ACM, New York, pp 81–92. https://doi.org/10.1145/2907294.2907307
Hagedorn S, Götze P, Sattler KU (2017) The STARK framework for spatio-temporal data analytics on spark. In: Proceedings of the 17th conference on database systems for business, technology, and the web (BTW 2017), Stuttgart
Hassaan M, Elghandour I (2016) A real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study. In: Proceedings of the 3rd IEEE/ACM international conference on big data computing, applications and technologies (BDCAT ’16). ACM, New York, pp 168–177. https://doi.org/10.1145/3006299.3006304
Hong S, Choi W, Jeong W-K (2017) GPU in-memory processing using spark for iterative computation. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid ’17), pp 31–41. https://doi.org/10.1109/CCGRID.2017.41
Hughes JN, Annex A, Eichelberger CN, Fox A, Hulbert A, Ronquest M (2015) Geomesa: a distributed architecture for spatio-temporal fusion. In: Proceedings of SPIE defense and security. https://doi.org/10.1117/12.2177233
Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32(1):7
Klein J, Buglak R, Blockow D, Wuttke T, Cooper B (2016) A reference architecture for big data systems in the national security domain. In: Proceedings of the 2nd international workshop on BIG data software engineering (BIGDSE ’16). https://doi.org/10.1145/2896825.2896834
Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems, 1st edn. Manning Publications, Greenwich
McInnes L, Healy J (2017) Accelerated hierarchical density based clustering. In: IEEE international conference on data mining workshops (ICDMW), New Orleans, Louisiana, pp 33–42
Mysore D, Khupat S, Jain S (2013) Big data architecture and patterns. IBM, White Paper, 2013. http://www.ibm.com/developerworks/library/bdarchpatterns1. Accessed 26 Mar 2018
NoSQL (2009) NoSQL definition. http://nosql-database.org. Accessed 26 Mar 2018
Pavlo A, Aslett M (2016) What’s really new with NewSQL? SIGMOD Rec 45(2):45–55. https://doi.org/10.1145/3003665.3003674
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Prasad S, McDermott M, Puri S, Shah D, Aghajarian D, Shekhar S, Zhou X (2015) A vision for GPU-accelerated parallel computation on geo-spatial datasets. SIGSPATIAL Spec 6(3):19–26. https://doi.org/10.1145/2766196.2766200
Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):1. https://doi.org/10.1145/2522968.2522979
Sena B, Allian AP, Nakagawa EY (2017) Characterizing big data software architectures: a systematic mapping study. In: Proceeding of the 11th Brazilian symposium on software components, architectures, and reuse (SBCARS ’17). https://doi.org/10.1145/3132498.3132510
Shekhar S, Gunturi V, Evans MR, Yang KS. 2012. Spatial big-data challenges intersecting mobility and cloud computing. In: Proceedings of the eleventh ACM international workshop on data engineering for wireless and mobile access (MobiDE ’12), pp 1–6. https://doi.org/10.1145/2258056.2258058
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). https://doi.org/10.1109/MSST.2010.5496972
Sriharsha R (2015) Magellan: geospatial analytics on spark. https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/. Accessed June 2017
Tang M, Yu Y, Malluhi QM, Ouzzani M, Aref WG (2016) LocationSpark: a distributed in-memory data management system for big spatial data. Proc VLDB Endow 9(13):1565–1568. https://doi.org/10.14778/3007263.3007310
Whitman RT, Park MB, Ambrose SM, Hoel EG (2014) Spatial indexing and analytics on Hadoop. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL ’14), pp 73–82. https://doi.org/10.1145/2666310.2666387
Whitman RT, Park MB, Marsh BG, Hoel EG (2017) Spatio-temporal join on Apache spark. In: Hoel E, Newsam S, Ravada S, Tamassia R, Trajcevski G (eds) Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL’17). https://doi.org/10.1145/3139958.3139963
Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data (SIGMOD ’16), pp 1071–1085. https://doi.org/10.1145/2882903.2915237
You S, Zhang J, Gruenwald L (2015) Large-scale spatial join query processing in Cloud. In: 2015 31st IEEE international conference on data engineering workshops, Seoul, 13–17 April 2015, pp 34–41
Yu J, Wu J, Sarwat M (2015) Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA
Yuan Y, Salmi MF, Huai Y, Wang K, Lee R, Zhang X (2016) Spark-GPU: an accelerated in-memory data processing engine on clusters. In: Proceedings of the 2016 IEEE international conference on big data (Big Data 2016), Washington, DC, pp 273–283
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10), Boston, MA
Zhang S, Han J, Liu Z, Wang K, Xu Z (2009) SJMR: parallelizing spatial join with mapreduce on clusters. In: IEEE international conference on Cluster computing (CLUSTER’09), New Orleans, Louisiana, pp 1–8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Hoel, E.G. (2018). Architectures. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_216-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_216-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering