Architectures

Hoel, Erik G.

doi:10.1007/978-3-319-63962-8_216-1

Erik G. Hoel³

303 Accesses

Synonyms

Apache Hadoop; Lambda architecture; Location analytic architecture

Definitions

Spatial big data is a spatio-temporal data that is too large or requires data-intensive computation that is too demanding for traditional computing architectures. Stream processing in this context is the processing of spatio-temporal data in motion. The data is observational; it is produced by sensors – moving or otherwise. Computations on the data are made as the data is produced or received. A distributed processing cluster is a networked collection of computers that communicate and process data in a coordinated manner. Computers in the cluster are coordinated to solve a common problem. A lambda architecture is a scalable, fault-tolerant data-processing architecture that is designed to handle large quantities of data by exploiting both stream and batch processing methods. Data partitioninginvolves physically dividing a dataset into separate data stores on a distributed processing cluster. This...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Abel DJ, Ooi BC, Tan K-L, Power R, Yu JX (1995) Spatial join strategies in distributed spatial DBMS. In: Advances in spatial databases – 4th international symposium, SSD’95. Lecture notes in computer science, vol 1619. Springer, Portland, pp 348–367
Chapter Google Scholar
Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. Proc VLDB Endow 6(11):1009–1020
Article Google Scholar
Alexander W, Copeland G (1988) Process and dataflow control in distributed data-intensive systems. In: Proceedings of the 1988 ACM SIGMOD international conference on management of data (SIGMOD ’88), pp 90–98. https://doi.org/10.1145/50202.50212
Apache (2006) Welcome to Apache Hadoop!. http://hadoop.apache.org. Accessed 26 Mar 2018
Brinkhoff T, Kriegel HP, Seeger B (1996) Parallel processing of spatial joins using r-trees. In: Proceedings of the 12th international conference on data engineering, New Orleans, Louisiana, pp 258–265
Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh W, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2). https://doi.org/10.1145/1365815.1365816
Article Google Scholar
Chang WY, Abu-Amara H, Sanford JF (2010) Transforming Enterprise Cloud Services. Springer, London, pp 55–56
Book Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. https://doi.org/10.1145/1327452.1327492
Article Google Scholar
DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6). https://doi.org/10.1145/129888.129894
Article Google Scholar
DeWitt DJ, Gerber RH, Graefe G, Heytens ML, Kumar KB, Muralikrishna M (1986) GAMMA – a high performance dataflow database machine. In: Proceedings of the 12th international conference on very large data bases (VLDB ’86), Kyoto, Japan, pp 228–237
Google Scholar
Du Z, Zhao X, Ye X, Zhou J, Zhang F, Liu R (2017) An effective high-performance multiway spatial join algorithm with spark. ISPRS Int J Geo-Information 6(4):96
Article Google Scholar
Eldawy A, Mokbel MF (2015) SpatialHadoop: a mapreduce framework for spatial data. In: IEEE 31st international conference on data engineering (ICDE), Seoul, South Korea, pp 1352–1363
Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, pp 226–231
Google Scholar
Garillot F, Maas G (2018) Stream processing with apache spark: best practices for scaling and optimizing Apache spark. O’Reilly Media, Sebastopol. http://shop.oreilly.com/product/0636920047568.do
Gedik B, Andrade H, Wu K-L, Yu PS, Doo M (2008) SPADE: the system s declarative stream processing engine. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 1123–1134. https://doi.org/10.1145/1376616.1376729
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: Proceedings of the 19th ACM symposium on operating systems principles, Oct 2003, pp 29–43. https://doi.org/10.1145/945445.945450
Grossman M, Sarkar, V (2016) SWAT: a programmable, in-memory, distributed, high-performance computing platform. In: Proceedings of the 25th ACM international symposium on high-performance parallel and distributed computing (HPDC ’16). ACM, New York, pp 81–92. https://doi.org/10.1145/2907294.2907307
Hagedorn S, Götze P, Sattler KU (2017) The STARK framework for spatio-temporal data analytics on spark. In: Proceedings of the 17th conference on database systems for business, technology, and the web (BTW 2017), Stuttgart
Google Scholar
Hassaan M, Elghandour I (2016) A real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study. In: Proceedings of the 3rd IEEE/ACM international conference on big data computing, applications and technologies (BDCAT ’16). ACM, New York, pp 168–177. https://doi.org/10.1145/3006299.3006304
Chapter Google Scholar
Hong S, Choi W, Jeong W-K (2017) GPU in-memory processing using spark for iterative computation. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid ’17), pp 31–41. https://doi.org/10.1109/CCGRID.2017.41
Chapter Google Scholar
Hughes JN, Annex A, Eichelberger CN, Fox A, Hulbert A, Ronquest M (2015) Geomesa: a distributed architecture for spatio-temporal fusion. In: Proceedings of SPIE defense and security. https://doi.org/10.1117/12.2177233
Chapter Google Scholar
Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32(1):7
Article Google Scholar
Klein J, Buglak R, Blockow D, Wuttke T, Cooper B (2016) A reference architecture for big data systems in the national security domain. In: Proceedings of the 2nd international workshop on BIG data software engineering (BIGDSE ’16). https://doi.org/10.1145/2896825.2896834
Chapter Google Scholar
Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems, 1st edn. Manning Publications, Greenwich
Google Scholar
McInnes L, Healy J (2017) Accelerated hierarchical density based clustering. In: IEEE international conference on data mining workshops (ICDMW), New Orleans, Louisiana, pp 33–42
Google Scholar
Mysore D, Khupat S, Jain S (2013) Big data architecture and patterns. IBM, White Paper, 2013. http://www.ibm.com/developerworks/library/bdarchpatterns1. Accessed 26 Mar 2018
NoSQL (2009) NoSQL definition. http://nosql-database.org. Accessed 26 Mar 2018
Pavlo A, Aslett M (2016) What’s really new with NewSQL? SIGMOD Rec 45(2):45–55. https://doi.org/10.1145/3003665.3003674
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Prasad S, McDermott M, Puri S, Shah D, Aghajarian D, Shekhar S, Zhou X (2015) A vision for GPU-accelerated parallel computation on geo-spatial datasets. SIGSPATIAL Spec 6(3):19–26. https://doi.org/10.1145/2766196.2766200
Article Google Scholar
Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):1. https://doi.org/10.1145/2522968.2522979
Article Google Scholar
Sena B, Allian AP, Nakagawa EY (2017) Characterizing big data software architectures: a systematic mapping study. In: Proceeding of the 11th Brazilian symposium on software components, architectures, and reuse (SBCARS ’17). https://doi.org/10.1145/3132498.3132510
Shekhar S, Gunturi V, Evans MR, Yang KS. 2012. Spatial big-data challenges intersecting mobility and cloud computing. In: Proceedings of the eleventh ACM international workshop on data engineering for wireless and mobile access (MobiDE ’12), pp 1–6. https://doi.org/10.1145/2258056.2258058
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). https://doi.org/10.1109/MSST.2010.5496972
Sriharsha R (2015) Magellan: geospatial analytics on spark. https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/. Accessed June 2017
Tang M, Yu Y, Malluhi QM, Ouzzani M, Aref WG (2016) LocationSpark: a distributed in-memory data management system for big spatial data. Proc VLDB Endow 9(13):1565–1568. https://doi.org/10.14778/3007263.3007310
Article Google Scholar
Whitman RT, Park MB, Ambrose SM, Hoel EG (2014) Spatial indexing and analytics on Hadoop. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL ’14), pp 73–82. https://doi.org/10.1145/2666310.2666387
Whitman RT, Park MB, Marsh BG, Hoel EG (2017) Spatio-temporal join on Apache spark. In: Hoel E, Newsam S, Ravada S, Tamassia R, Trajcevski G (eds) Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL’17). https://doi.org/10.1145/3139958.3139963
Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data (SIGMOD ’16), pp 1071–1085. https://doi.org/10.1145/2882903.2915237
You S, Zhang J, Gruenwald L (2015) Large-scale spatial join query processing in Cloud. In: 2015 31st IEEE international conference on data engineering workshops, Seoul, 13–17 April 2015, pp 34–41
Google Scholar
Yu J, Wu J, Sarwat M (2015) Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA
Google Scholar
Yuan Y, Salmi MF, Huai Y, Wang K, Lee R, Zhang X (2016) Spark-GPU: an accelerated in-memory data processing engine on clusters. In: Proceedings of the 2016 IEEE international conference on big data (Big Data 2016), Washington, DC, pp 273–283
Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10), Boston, MA
Google Scholar
Zhang S, Han J, Liu Z, Wang K, Xu Z (2009) SJMR: parallelizing spatial join with mapreduce on clusters. In: IEEE international conference on Cluster computing (CLUSTER’09), New Orleans, Louisiana, pp 1–8
Google Scholar

Download references

Author information

Authors and Affiliations

Environmental Systems Research Institute, Redlands, CA, USA
Erik G. Hoel

Authors

Erik G. Hoel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik G. Hoel .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

No affiliation provided
Timos Sellis
No affiliation provided
Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Hoel, E.G. (2018). Architectures. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_216-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_216-1
Received: 28 April 2018
Accepted: 28 April 2018
Published: 11 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics