Clustering of Geospatial Big Data in a Distributed Environment

Triplet, Thomas; Foucher, Samuel

doi:10.1007/978-3-319-17885-1_1625

Thomas Triplet⁴ &
Samuel Foucher⁴

683 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,599.99; Price excludes VAT (USA)

Hardcover Book: USD 1,999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data (SIGMOD’98), New York. ACM, pp 94–105
Chapter Google Scholar
Alam S, Dobbie G, Koh YS, Riddle P, Rehman SU (2014) Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol Comput 17(0):1–13
Article Google Scholar
Andrienko G (2008) Spatio-temporal aggregation for visual analysis of movements. In: Proceedings of IEEE symposium on visual analytics science and technology (VAST 2008), Columbus, pp 51–58
Google Scholar
Austwick MZ, O’Brien O, Strano E, Viana M (2013) The structure of spatial networks and communities in bicycle sharing systems. PLoS ONE 8(9):e74685, 09
Google Scholar
Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M (2014) Ears (earthquake alert and report system): a real time decision support system for earthquake crisis management. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’14), New York. ACM, pp 1749–1758
Google Scholar
Brewer E (2012) Cap twelve years later: how the “rules” have changed. Computer 45(2):23–29
Article Google Scholar
Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27
Article Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th symposium on operating systems design and implementation (OSDI’06), Berkeley. USENIX Association, pp 205–218
Google Scholar
Chen X, Vo H, Aji A, Wang F (2014) High performance integrated spatial big data analytics. In: Proceedings of the 3rd ACM SIGSPATIAL international workshop on analytics for big geospatial data (BigSpatial’14), New York. ACM, pp 11–14
Google Scholar
Dai H-K, Su H-C (2003) Approximation and analytical studies of inter-clustering performances of space-filling curves. In: Banderier C, Krattenthaler C (eds) Discrete random walks (DRW’03), Paris, Sept 1–5 2003. Discrete mathematics and theoretical computer science proceedings, vol AC. DMTCS, pp 53–68
Google Scholar
Daschiel H, Datcu M (2005) Information mining in remote sensing image archives: system evaluation. IEEE Trans Geosci Remote Sens 43(1):188–199
Article MATH Google Scholar
Dean J, Ghemawat S Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on opearting systems design & implementation (OSDI’04), vol 6, Berkeley. USENIX Association, pp 10–10
Google Scholar
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. In: Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles (SOSP’07), New York. ACM, pp 205–220
Chapter Google Scholar
Ehrlich R, Bezdek JC, Fullh W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Google Scholar
Eldawy A, Mokbel MF (2015) Spatialhadoop: a mapreduce framework for spatial data. In: Proceedings of the 31st IEEE international conference on data engineering (ICDE), Seoul
Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad UM (eds) Second international conference on knowledge discovery and data mining. AAAI Press, Palo Alto, pp 226–231
Google Scholar
Foth N (2010) Long-term change around skytrain stations in Vancouver, Canada: a demographic shift-share analysis. Geograph Bull 51:37–52
MathSciNet Google Scholar
Fox A, Eichelberger C, Hughes J, Lyon S (2013) Spatio-temporal indexing in non-relational distributed databases. In: 2013 IEEE international conference on big data, Santa Clara, pp 291–299
Google Scholar
Gahlot V, Swami BL, Parida M, Kalla P (2012) User oriented planning of bus rapid transit corridor in GIS environment. Int J Sustain Built Environ 1:102–109
Article Google Scholar
Gao H, Jiang J, She L, Fu Y (2010) A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework. J Digit Content Technol Appl 4(3):95–100
Google Scholar
Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the 19th ACM symposium on operating systems principles (SOSP ’03), New York. ACM, pp 29–43
Google Scholar
Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2):51–59
Article Google Scholar
Gouineau F, Landry T, Triplet T (2016) PatchWork: a scalable density-grid clustering algorithm. In: Proceedings of the 31st ACM symposium on applied computing, data mining track, Pisa
Google Scholar
Hagenauer J, Helbich M (2013) Contextual neural gas for spatial clustering and analysis. Int J Geograph Inf Sci 27:251–266
Article Google Scholar
He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: a scalable mapreduce-based DBSCAN algorithm for heavily skewed data. Front Comput Sci 8(1):83–99
Article MathSciNet Google Scholar
Hinneburg A, Gabriel H-H (2007) Denclue 2.0: fast clustering based on kernel density estimation. In: Proceedings of the 7th international conference on intelligent data analysis (IDA’07). Springer, Berlin/Heidelberg, pp 70–80
Google Scholar
Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98), New York, 27–31 Aug 1998. AAAI Press, pp 58–65
Google Scholar
Hong-bo X, Zhong-xiao H, Qi-Long H (2009) A clustering algorithm based on grid partition of space-filling curve. In: 2009 fourth international conference on internet computing for science and engineering (ICICSE), Harbin, pp 260–265
Google Scholar
Hruschka ER, Campello RJGB, Freitas AA, de Carvalho ACPLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):133–155
Article Google Scholar
ISO (2004) Geographic information—simple feature access—Part 1: common architecture. ISO 19125–1:2004, International Organization for Standardization, Geneva
Google Scholar
ISO (2008) Geographic information—simple feature access—Part 2: SQL option. ISO 19125–2:2004, International Organization for Standardization, Geneva
Google Scholar
Jestes J, Yi K, Li F (2011) Building wavelet histograms on large data in mapreduce. Proc VLDB Endow 5(2):109–120
Article Google Scholar
Jin C, Patwary MMA, Agrawal A, Hendrix W, Liao W-k, Choudhary A (2013) Disc: a distributed single-linkage hierarchical clustering algorithm using mapreduce. In: Proceedings of the 4th international SC workshop on data intensive computing in the clouds, Denver. (http://datasys.cs.iit.edu/events/DataCloud2013/)
Jin C, Liu R, Chen Z, Hendrix W, Agrawal A, Choudhary A (2015) A scalable hierarchical clustering algorithm using spark. In: IEEE first international conference on big data computing service and applications, Redwood City, pp 418–426
Google Scholar
Kanellakis PC, Kuper GM, Revesz P (1995) Constraint query languages. J Comput Syst Sci 51(1):26–52
Article MathSciNet Google Scholar
Kisilevich S, Mansmann F, Keim D (2010a) P-dbscan: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In: Proceedings of the 1st international conference and exhibition on computing for geospatial research & application (COM.Geo ’10), Washington, DC. ACM, Springer, pp 1–4. (http://www.springer.com/us/book/9780387098227)
Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010b) Spatio-temporal clustering. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, pp 855–874. http://www.springer.com/us/book/9780387098227
Kuijpers B, Alvares LO, Palma AT, Bogorny V (2008) A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the 2008 ACM symposium on applied computing, Fortaleza, pp 863–868
Google Scholar
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet MATH Google Scholar
Lv Z, Hu Y, Zhong H, Wu J, Li B, Zhao H (2010) Parallel k-means clustering of remote sensing images based on MapReduce. In: Proceedings of the 2010 international conference on web information systems and mining (WISM’10). Springer, Berlin/Heidelberg, pp 162–170
Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, Berkeley/Los Angeles
Google Scholar
Miller HJ (2010) The data avalanche is here. Shouldn’t we be digging? J Reg Sci 50:181–201
Article Google Scholar
Ng RT, Han J, Ieee Computer Society (2005) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 1003–1017
Google Scholar
Noticewala M, Vaghela D (2014) Article: Mr-idbscan: efficient parallel incremental dbscan algorithm using mapreduce. Int J Comput Appl 93(4):13–18
Google Scholar
Patwary MA, Palsetia D, Agrawal A, Liao W-k, Manne F, Choudhary A (2012) A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the international conference on high performance computing, networking, storage and analysis (SC’12), Los Alamitos. IEEE Computer Society Press, pp 62:1–62:11
Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. Proc Int Confer Very Large Data Bases 24:428–439
Google Scholar
Stonebraker M (1986) The case for shared nothing. IEEE Database Eng Bull 9(1):4–9
Google Scholar
Wang W, Yang J, Muntz RR (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases (VLDB’97), San Francisco. Morgan Kaufmann Publishers Inc, pp 186–195
Google Scholar
Webber J (2012) A programmatic introduction to neo4j. In: Proceedings of the 3rd annual conference on systems, programming, and applications: software for humanity (SPLASH’12), New York. ACM, pp 217–218
Chapter Google Scholar
Wood J, O’Brien O, Slingsby A, Dykes J (2011) Visualizing the dynamics of London’s bicycle-hire scheme. Cartogr Int J Geograph Inf Geovis 46(4):239–251
Google Scholar
Xiaoyun C, Yi C, Xiaoli Q, Min Y, Yanshan H (2009) PGMCLU: a novel parallel grid-based clustering algorithm for multi-density datasets. In: 1st IEEE symposium on web society, 2009 (SWS’09), Lanzhou, pp 166–171
Google Scholar
Yu Y, Zhao J, Wang X, Wang Q, Zhang Y (2015) Cludoop: an efficient distributed density-based clustering for big data using Hadoop. Int J Distrib Sensor Netw 2015(2):1–13
Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX conference on hot topics in cloud computing (HotCloud’10), Berkeley. USENIX Association, pp 10–10
Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation (NSDI’12), Berkeley. USENIX Association, pp 2–2
Google Scholar
Zhang H, Zhou Y, Li J, Wang X, Yan B (2010) Analyze the wild birds’ migration tracks by MPI-based parallel clustering algorithm. In: Proceedings of the 6th international conference on advanced data mining and applications: Part I (ADMA’10). Springer, Berlin/Heidelberg, pp 383–393
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Research Institute of Montreal, 405 avenue Ogilvy, H3N 1M3, Montreal, QC, Canada
Thomas Triplet & Samuel Foucher

Authors

Thomas Triplet
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Foucher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Triplet .

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, MN, USA
Shashi Shekhar
Management Science and Information Systems Department, Rutgers Business School, Rutgers, The State University of New Jersey, Newark, NJ, USA
Hui Xiong
Department of Management Sciences, Tippie College of Business, University of Iowa, Iowa City, IA, USA
Xun Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Triplet, T., Foucher, S. (2017). Clustering of Geospatial Big Data in a Distributed Environment. In: Shekhar, S., Xiong, H., Zhou, X. (eds) Encyclopedia of GIS. Springer, Cham. https://doi.org/10.1007/978-3-319-17885-1_1625

Download citation

DOI: https://doi.org/10.1007/978-3-319-17885-1_1625
Published: 12 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17884-4
Online ISBN: 978-3-319-17885-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics