Skip to main content
Log in

Optimizing distance-based methods for large data sets

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

Distance-based methods for measuring spatial concentration of industries have received an increasing popularity in the spatial econometrics community. However, a limiting factor for using these methods is their computational complexity since both their memory requirements and running times are in \({\mathcal {O}}(n^2)\). In this paper, we present an algorithm with constant memory requirements and shorter running time, enabling distance-based methods to deal with large data sets. We discuss three recent distance-based methods in spatial econometrics: the D&O-Index by Duranton and Overman (Rev Econ Stud 72(4):1077–1106, 2005), the M-function by Marcon and Puech (J Econ Geogr 10(5):745–762, 2010) and the Cluster-Index by Scholl and Brenner (Reg Stud (ahead-of-print):1–15, 2014). Finally, we present an alternative calculation for the latter index that allows the use of data sets with millions of firms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Note, that Eq. (2) is modified for the observation of an unmarked point pattern. See Marcon and Puech (2010, p. 749) for the original formula of the M-function.

  2. Note, that the latest version of dbmss already includes computational improvements. See Sect. 4 for more information.

References

  • Baddeley A, Møller J, Waagepetersen RP (2000) Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Stat Ned 3(54):329–350

    Article  Google Scholar 

  • Barlet M, Briant A, Crusson L (2013) Location patterns of service industries in France: a distance-based approach. Reg Sci Urban Econ 43(2):338–351

    Article  Google Scholar 

  • Duque JC, Aldstadt J, Velasquez E, Franco JL, Betancourt A (2011) A computationally efficient method for delineating irregularly shaped spatial clusters. J Geogr Syst 13(4):355–372

    Article  Google Scholar 

  • Duranton G, Overman H (2005) Testing for localization using micro-geographic data. Rev Econ Stud 72(4):1077–1106

    Article  MATH  MathSciNet  Google Scholar 

  • Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100(3):1195–1213

    Article  Google Scholar 

  • Espa G, Arbia G, Giuliani D et al (2010) Measuring industrial agglomeration with inhomogeneous k-function: the case of ict firms in milan (Italy). Artículo de trabajo 14:1–11

    Google Scholar 

  • German Federal Ministry of Economics and Technology (2010) Möglichkeiten und Grenzen einer Verbesserung der Wettbewerbssituation der Automobilindustrie durch Abbau von branchenspezifischen Kosten aus Informationspflichten. BMBF, Stuttgart

    Google Scholar 

  • Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geographical analysis 24(3):189–206

    Article  Google Scholar 

  • Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318

    Article  Google Scholar 

  • Koh HJ, Riedel N (2014) Assessing the localization pattern of German manufacturing and service industries: a distance-based approach. Reg Stud 48(5):823–843

    Article  Google Scholar 

  • Kosfeld R, Eckey HF, Lauridsen J (2011) Spatial point pattern analysis and industry concentration. Ann Reg Sci 47(2):311–328

    Article  Google Scholar 

  • Marcon E, Traissac S, Lang G (2013) A statistical test for ripleys function rejection of poisson null hypothesis. Int Sch Res Not 1:1–9

    Google Scholar 

  • Marcon E, Lang G, Traissac S, Puech F (2015) DBMSs: distance-based measures of spatial structures. http://CRAN.R-project.org/package=dbmss, R package version 2.1.2

  • Marcon E, Puech F (2012) A typology of distance-based measures of spatial concentration. In: Technical report. http://halshs.archives-ouvertes.fr/halshs-00679993/

  • Marcon E, Puech F (2003) Evaluating the geographic concentration of industries using distance-based methods. J Econ Geogr 3(4):409–428

    Article  Google Scholar 

  • Marcon E, Puech F (2010) Measures of the geographic concentration of industries: improving distance-based methods. J Econ Geogr 10(5):745–762

    Article  Google Scholar 

  • Miller HJ (2010) The data avalanche is here. Shouldn’t we be digging? J Reg Sci 50(1):181–201

    Article  Google Scholar 

  • Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 256–265

  • Openshaw S (1984) The modifiable areal unit problem. Institute of British Geographers, Norwich

    Google Scholar 

  • Ripley BD (2005) Spatial statistics. Wiley, New Jersey

    Google Scholar 

  • Sankaranarayanan J, Samet H, Varshney A (2007) A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput Graph 31(2):157–174

    Article  Google Scholar 

  • Scholl T, Brenner T (2014) Detecting spatial clustering using a firm-level cluster index. Reg Stud. doi:10.1080/00343404.2014.958456

    Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S. Springer, New York

    Book  MATH  Google Scholar 

  • Vitali S, Napoletano M, Fagiolo G (2013) Spatial localization in manufacturing: a cross-country analysis. Reg Stud 47(9):1534–1554

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Scholl.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scholl, T., Brenner, T. Optimizing distance-based methods for large data sets. J Geogr Syst 17, 333–351 (2015). https://doi.org/10.1007/s10109-015-0219-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-015-0219-1

Keywords

JEL Classification

Navigation