Abstract
Distance-based methods for measuring spatial concentration of industries have received an increasing popularity in the spatial econometrics community. However, a limiting factor for using these methods is their computational complexity since both their memory requirements and running times are in \({\mathcal {O}}(n^2)\). In this paper, we present an algorithm with constant memory requirements and shorter running time, enabling distance-based methods to deal with large data sets. We discuss three recent distance-based methods in spatial econometrics: the D&O-Index by Duranton and Overman (Rev Econ Stud 72(4):1077–1106, 2005), the M-function by Marcon and Puech (J Econ Geogr 10(5):745–762, 2010) and the Cluster-Index by Scholl and Brenner (Reg Stud (ahead-of-print):1–15, 2014). Finally, we present an alternative calculation for the latter index that allows the use of data sets with millions of firms.


Similar content being viewed by others
References
Baddeley A, Møller J, Waagepetersen RP (2000) Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Stat Ned 3(54):329–350
Barlet M, Briant A, Crusson L (2013) Location patterns of service industries in France: a distance-based approach. Reg Sci Urban Econ 43(2):338–351
Duque JC, Aldstadt J, Velasquez E, Franco JL, Betancourt A (2011) A computationally efficient method for delineating irregularly shaped spatial clusters. J Geogr Syst 13(4):355–372
Duranton G, Overman H (2005) Testing for localization using micro-geographic data. Rev Econ Stud 72(4):1077–1106
Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100(3):1195–1213
Espa G, Arbia G, Giuliani D et al (2010) Measuring industrial agglomeration with inhomogeneous k-function: the case of ict firms in milan (Italy). Artículo de trabajo 14:1–11
German Federal Ministry of Economics and Technology (2010) Möglichkeiten und Grenzen einer Verbesserung der Wettbewerbssituation der Automobilindustrie durch Abbau von branchenspezifischen Kosten aus Informationspflichten. BMBF, Stuttgart
Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geographical analysis 24(3):189–206
Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318
Koh HJ, Riedel N (2014) Assessing the localization pattern of German manufacturing and service industries: a distance-based approach. Reg Stud 48(5):823–843
Kosfeld R, Eckey HF, Lauridsen J (2011) Spatial point pattern analysis and industry concentration. Ann Reg Sci 47(2):311–328
Marcon E, Traissac S, Lang G (2013) A statistical test for ripleys function rejection of poisson null hypothesis. Int Sch Res Not 1:1–9
Marcon E, Lang G, Traissac S, Puech F (2015) DBMSs: distance-based measures of spatial structures. http://CRAN.R-project.org/package=dbmss, R package version 2.1.2
Marcon E, Puech F (2012) A typology of distance-based measures of spatial concentration. In: Technical report. http://halshs.archives-ouvertes.fr/halshs-00679993/
Marcon E, Puech F (2003) Evaluating the geographic concentration of industries using distance-based methods. J Econ Geogr 3(4):409–428
Marcon E, Puech F (2010) Measures of the geographic concentration of industries: improving distance-based methods. J Econ Geogr 10(5):745–762
Miller HJ (2010) The data avalanche is here. Shouldn’t we be digging? J Reg Sci 50(1):181–201
Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 256–265
Openshaw S (1984) The modifiable areal unit problem. Institute of British Geographers, Norwich
Ripley BD (2005) Spatial statistics. Wiley, New Jersey
Sankaranarayanan J, Samet H, Varshney A (2007) A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput Graph 31(2):157–174
Scholl T, Brenner T (2014) Detecting spatial clustering using a firm-level cluster index. Reg Stud. doi:10.1080/00343404.2014.958456
Venables WN, Ripley BD (2002) Modern applied statistics with S. Springer, New York
Vitali S, Napoletano M, Fagiolo G (2013) Spatial localization in manufacturing: a cross-country analysis. Reg Stud 47(9):1534–1554
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Scholl, T., Brenner, T. Optimizing distance-based methods for large data sets. J Geogr Syst 17, 333–351 (2015). https://doi.org/10.1007/s10109-015-0219-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-015-0219-1