Skip to main content

Clustering of Geospatial Big Data in a Distributed Environment

  • Reference work entry
  • First Online:
Encyclopedia of GIS
  • 683 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 1,599.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data (SIGMOD’98), New York. ACM, pp 94–105

    Chapter  Google Scholar 

  • Alam S, Dobbie G, Koh YS, Riddle P, Rehman SU (2014) Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol Comput 17(0):1–13

    Article  Google Scholar 

  • Andrienko G (2008) Spatio-temporal aggregation for visual analysis of movements. In: Proceedings of IEEE symposium on visual analytics science and technology (VAST 2008), Columbus, pp 51–58

    Google Scholar 

  • Austwick MZ, O’Brien O, Strano E, Viana M (2013) The structure of spatial networks and communities in bicycle sharing systems. PLoS ONE 8(9):e74685, 09

    Google Scholar 

  • Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M (2014) Ears (earthquake alert and report system): a real time decision support system for earthquake crisis management. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’14), New York. ACM, pp 1749–1758

    Google Scholar 

  • Brewer E (2012) Cap twelve years later: how the “rules” have changed. Computer 45(2):23–29

    Article  Google Scholar 

  • Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27

    Article  Google Scholar 

  • Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th symposium on operating systems design and implementation (OSDI’06), Berkeley. USENIX Association, pp 205–218

    Google Scholar 

  • Chen X, Vo H, Aji A, Wang F (2014) High performance integrated spatial big data analytics. In: Proceedings of the 3rd ACM SIGSPATIAL international workshop on analytics for big geospatial data (BigSpatial’14), New York. ACM, pp 11–14

    Google Scholar 

  • Dai H-K, Su H-C (2003) Approximation and analytical studies of inter-clustering performances of space-filling curves. In: Banderier C, Krattenthaler C (eds) Discrete random walks (DRW’03), Paris, Sept 1–5 2003. Discrete mathematics and theoretical computer science proceedings, vol AC. DMTCS, pp 53–68

    Google Scholar 

  • Daschiel H, Datcu M (2005) Information mining in remote sensing image archives: system evaluation. IEEE Trans Geosci Remote Sens 43(1):188–199

    Article  MATH  Google Scholar 

  • Dean J, Ghemawat S Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on opearting systems design & implementation (OSDI’04), vol 6, Berkeley. USENIX Association, pp 10–10

    Google Scholar 

  • DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. In: Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles (SOSP’07), New York. ACM, pp 205–220

    Chapter  Google Scholar 

  • Ehrlich R, Bezdek JC, Fullh W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Google Scholar 

  • Eldawy A, Mokbel MF (2015) Spatialhadoop: a mapreduce framework for spatial data. In: Proceedings of the 31st IEEE international conference on data engineering (ICDE), Seoul

    Google Scholar 

  • Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad UM (eds) Second international conference on knowledge discovery and data mining. AAAI Press, Palo Alto, pp 226–231

    Google Scholar 

  • Foth N (2010) Long-term change around skytrain stations in Vancouver, Canada: a demographic shift-share analysis. Geograph Bull 51:37–52

    MathSciNet  Google Scholar 

  • Fox A, Eichelberger C, Hughes J, Lyon S (2013) Spatio-temporal indexing in non-relational distributed databases. In: 2013 IEEE international conference on big data, Santa Clara, pp 291–299

    Google Scholar 

  • Gahlot V, Swami BL, Parida M, Kalla P (2012) User oriented planning of bus rapid transit corridor in GIS environment. Int J Sustain Built Environ 1:102–109

    Article  Google Scholar 

  • Gao H, Jiang J, She L, Fu Y (2010) A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework. J Digit Content Technol Appl 4(3):95–100

    Google Scholar 

  • Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the 19th ACM symposium on operating systems principles (SOSP ’03), New York. ACM, pp 29–43

    Google Scholar 

  • Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2):51–59

    Article  Google Scholar 

  • Gouineau F, Landry T, Triplet T (2016) PatchWork: a scalable density-grid clustering algorithm. In: Proceedings of the 31st ACM symposium on applied computing, data mining track, Pisa

    Google Scholar 

  • Hagenauer J, Helbich M (2013) Contextual neural gas for spatial clustering and analysis. Int J Geograph Inf Sci 27:251–266

    Article  Google Scholar 

  • He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: a scalable mapreduce-based DBSCAN algorithm for heavily skewed data. Front Comput Sci 8(1):83–99

    Article  MathSciNet  Google Scholar 

  • Hinneburg A, Gabriel H-H (2007) Denclue 2.0: fast clustering based on kernel density estimation. In: Proceedings of the 7th international conference on intelligent data analysis (IDA’07). Springer, Berlin/Heidelberg, pp 70–80

    Google Scholar 

  • Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98), New York, 27–31 Aug 1998. AAAI Press, pp 58–65

    Google Scholar 

  • Hong-bo X, Zhong-xiao H, Qi-Long H (2009) A clustering algorithm based on grid partition of space-filling curve. In: 2009 fourth international conference on internet computing for science and engineering (ICICSE), Harbin, pp 260–265

    Google Scholar 

  • Hruschka ER, Campello RJGB, Freitas AA, de Carvalho ACPLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):133–155

    Article  Google Scholar 

  • ISO (2004) Geographic information—simple feature access—Part 1: common architecture. ISO 19125–1:2004, International Organization for Standardization, Geneva

    Google Scholar 

  • ISO (2008) Geographic information—simple feature access—Part 2: SQL option. ISO 19125–2:2004, International Organization for Standardization, Geneva

    Google Scholar 

  • Jestes J, Yi K, Li F (2011) Building wavelet histograms on large data in mapreduce. Proc VLDB Endow 5(2):109–120

    Article  Google Scholar 

  • Jin C, Patwary MMA, Agrawal A, Hendrix W, Liao W-k, Choudhary A (2013) Disc: a distributed single-linkage hierarchical clustering algorithm using mapreduce. In: Proceedings of the 4th international SC workshop on data intensive computing in the clouds, Denver. (http://datasys.cs.iit.edu/events/DataCloud2013/)

  • Jin C, Liu R, Chen Z, Hendrix W, Agrawal A, Choudhary A (2015) A scalable hierarchical clustering algorithm using spark. In: IEEE first international conference on big data computing service and applications, Redwood City, pp 418–426

    Google Scholar 

  • Kanellakis PC, Kuper GM, Revesz P (1995) Constraint query languages. J Comput Syst Sci 51(1):26–52

    Article  MathSciNet  Google Scholar 

  • Kisilevich S, Mansmann F, Keim D (2010a) P-dbscan: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In: Proceedings of the 1st international conference and exhibition on computing for geospatial research & application (COM.Geo ’10), Washington, DC. ACM, Springer, pp 1–4. (http://www.springer.com/us/book/9780387098227)

  • Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010b) Spatio-temporal clustering. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, pp 855–874. http://www.springer.com/us/book/9780387098227

  • Kuijpers B, Alvares LO, Palma AT, Bogorny V (2008) A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the 2008 ACM symposium on applied computing, Fortaleza, pp 863–868

    Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  • Lv Z, Hu Y, Zhong H, Wu J, Li B, Zhao H (2010) Parallel k-means clustering of remote sensing images based on MapReduce. In: Proceedings of the 2010 international conference on web information systems and mining (WISM’10). Springer, Berlin/Heidelberg, pp 162–170

    Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, Berkeley/Los Angeles

    Google Scholar 

  • Miller HJ (2010) The data avalanche is here. Shouldn’t we be digging? J Reg Sci 50:181–201

    Article  Google Scholar 

  • Ng RT, Han J, Ieee Computer Society (2005) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 1003–1017

    Google Scholar 

  • Noticewala M, Vaghela D (2014) Article: Mr-idbscan: efficient parallel incremental dbscan algorithm using mapreduce. Int J Comput Appl 93(4):13–18

    Google Scholar 

  • Patwary MA, Palsetia D, Agrawal A, Liao W-k, Manne F, Choudhary A (2012) A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the international conference on high performance computing, networking, storage and analysis (SC’12), Los Alamitos. IEEE Computer Society Press, pp 62:1–62:11

    Google Scholar 

  • Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. Proc Int Confer Very Large Data Bases 24:428–439

    Google Scholar 

  • Stonebraker M (1986) The case for shared nothing. IEEE Database Eng Bull 9(1):4–9

    Google Scholar 

  • Wang W, Yang J, Muntz RR (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases (VLDB’97), San Francisco. Morgan Kaufmann Publishers Inc, pp 186–195

    Google Scholar 

  • Webber J (2012) A programmatic introduction to neo4j. In: Proceedings of the 3rd annual conference on systems, programming, and applications: software for humanity (SPLASH’12), New York. ACM, pp 217–218

    Chapter  Google Scholar 

  • Wood J, O’Brien O, Slingsby A, Dykes J (2011) Visualizing the dynamics of London’s bicycle-hire scheme. Cartogr Int J Geograph Inf Geovis 46(4):239–251

    Google Scholar 

  • Xiaoyun C, Yi C, Xiaoli Q, Min Y, Yanshan H (2009) PGMCLU: a novel parallel grid-based clustering algorithm for multi-density datasets. In: 1st IEEE symposium on web society, 2009 (SWS’09), Lanzhou, pp 166–171

    Google Scholar 

  • Yu Y, Zhao J, Wang X, Wang Q, Zhang Y (2015) Cludoop: an efficient distributed density-based clustering for big data using Hadoop. Int J Distrib Sensor Netw 2015(2):1–13

    Google Scholar 

  • Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX conference on hot topics in cloud computing (HotCloud’10), Berkeley. USENIX Association, pp 10–10

    Google Scholar 

  • Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation (NSDI’12), Berkeley. USENIX Association, pp 2–2

    Google Scholar 

  • Zhang H, Zhou Y, Li J, Wang X, Yan B (2010) Analyze the wild birds’ migration tracks by MPI-based parallel clustering algorithm. In: Proceedings of the 6th international conference on advanced data mining and applications: Part I (ADMA’10). Springer, Berlin/Heidelberg, pp 383–393

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Triplet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this entry

Cite this entry

Triplet, T., Foucher, S. (2017). Clustering of Geospatial Big Data in a Distributed Environment. In: Shekhar, S., Xiong, H., Zhou, X. (eds) Encyclopedia of GIS. Springer, Cham. https://doi.org/10.1007/978-3-319-17885-1_1625

Download citation

Publish with us

Policies and ethics