Abstract
Census block groups are used in location selection to determine the average drive time for all residents within a given radius to a proposed new store. The United States census uses 220,334 block groups, however the spatial distance between neighboring block groups in densely populated areas is small enough to cluster multiple block groups into a single unit. In this paper, we evaluate the efficiency and accuracy of drive time computations performed on clusters generated by our novel approach of constrained recursive reclustering as run on three traditional clustering algorithms—affinity propagation, k-means, and mean shift. We perform comparisons of our constrained recursive reclustering approach against drive times computed using the original census block group, and using clusters obtained by traditional reclustering. Unlike traditional clustering, where clustering is performed in a single pass, our approach continues reclustering each new cluster until a user specified stopping criteria is reached. We show that traditional clustering techniques generate sub-optimal clusters, with large spatial distances between the cluster centroid and cluster points making them unusable for computing drive times. Our approach provides reductions of 81.2%, 83.4%, and 10.2% for affinity propagation, k-means, and mean shift respectively when run on 220,334 census block groups. Using 200 randomly sampled locations each from Lowe’s, CVS, and Walmart, we show that compared to the original block groups there is no statistically significant difference in drive time computations when using clusters generated by constrained recursive reclustering with affinity propagation for any of the three businesses, and with k-means for CVS and Walmart. While statistically significant differences are obtained with k-means for Lowe’s and with mean shift for all three businesses, the differences are negligible, with the mean difference for each location set being within 30 s.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aras, H., Erdoğmuş, Ş., Koç, E.: Multi-criteria selection for a wind observation station location using analytic hierarchy process. Renewable Energy 29(8), 1383–1392 (2004)
Athas, W.F., Adams-Cameron, M., Hunt, W.C., Amir-Fazli, A., Key, C.R.: Travel distance to radiation therapy and receipt of radiotherapy following breast-conserving surgery. JNCI 92(3), 269–271 (2000)
Banaei-Kashani, F., Ghaemi, P., Wilson, J.P.: Maximal reverse skyline query. In: Proceedings of ACM SIGSPATIAL, pp. 421–424 (2014)
Blanchard, T., Lyson, T.: Access to low cost groceries in nonmetropolitan counties: large retailers and the creation of food deserts. In: Measuring Rural Diversity Conference Proceedings, pp. 21–22, November 2002
Bradley, P., Bennett, K., Demiriz, A.: Constrained k-means clustering. Microsoft Research, Redmond, pp. 1–8 (2000)
Branas, C.C., et al.: Access to trauma centers in the United States. JAMA 293(21), 2626–2633 (2005)
Carr, B.G., Branas, C.C., Metlay, J.P., Sullivan, A.F., Camargo, C.A.: Access to emergency care in the United States. Ann. Emerg. Med. 54(2), 261–269 (2009)
Çebi, F., Otay, I.: Multi-criteria and multi-stage facility location selection under interval type-2 fuzzy environment: a case study for a cement factory. IJCIS 8(2), 330–344 (2015)
US Census: 2010 us census block group data (2010). http://www2.census.gov/geo/docs/reference/cenpop2010/blkgrp/CenPop2010_Mean_BG.txt
Chen, L., et al.: Bike sharing station placement leveraging heterogeneous urban open data. In: Proceedings of ACM Ubicomp, pp. 571–575 (2015)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Farber, S., Morang, M.Z., Widener, M.J.: Temporal variability in transit-based accessibility to supermarkets. Appl. Geogr. 53, 149–159 (2014)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Ghaemi, P., Shahabi, K., Wilson, J.P., Banaei-Kashani, F.: Optimal network location queries. In: Proceedings of ACM SIGSPATIAL, pp. 478–481 (2010)
Ghaemi, P., Shahabi, K., Wilson, J.P., Banaei-Kashani, F.: Continuous maximal reverse nearest neighbor query on spatial networks. In: Proceedings of ACM SIGSPATIAL, pp. 61–70 (2012)
Goodman, D.C., Fisher, E., Stukel, T.A., Chang, C.h.: The distance to community medical care and the likelihood of hospitalization: is closer always better? Am. J. Public Health 87(7), 1144–1150 (1997)
Google: Google Maps Distance Matrix API (2017). https://developers.google.com/maps/documentation/distance-matrix/
Gwinn, D., Helmick, J., Banerjee, N.K., Banerjee, S.: Optimal estimation of census block group clusters to improve the computational efficiency of drive time calculations. In: GISTAM, pp. 96–106 (2018)
Jiao, J., Moudon, A.V., Ulmer, J., Hurvitz, P.M., Drewnowski, A.: How to identify food deserts: measuring physical and economic access to supermarkets in King County, Washington. Am. J. Public Health 102(10), e32–e39 (2012)
Kahraman, C., Ruan, D., Doǧan, I.: Fuzzy group decision-making for facility location selection. Inf. Sci. 157, 135–153 (2003)
Karamshuk, D., Noulas, A., Scellato, S., Nicosia, V., Mascolo, C.: Geo-spotting: mining online location-based services for optimal retail store placement. In: Proceedings of ACM SIGKDD, pp. 793–801 (2013)
Kuo, R., Chi, S., Kao, S.: A decision support system for locating convenience store through fuzzy AHP. Comput. Ind. Eng. 37(1), 323–326 (1999)
Li, Y., Zheng, Y., Ji, S., Wang, W., Gong, Z., et al.: Location selection for ambulance stations: a data-driven approach. In: Proceedings of ACM SIGSPATIAL, p. 85 (2015)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Love, R.F., Morris, J.G.: Mathematical models of road travel distances. Manage. Sci. 25(2), 130–139 (1979)
Nallamothu, B.K., Bates, E.R., Wang, Y., Bradley, E.H., Krumholz, H.M.: Driving times and distances to hospitals with percutaneous coronary intervention in the United States. Circulation 113(9), 1189–1195 (2006)
Nattinger, A.B., Kneusel, R.T., Hoffmann, R.G., Gilligan, M.A.: Relationship of distance from a radiotherapy facility and initial breast cancer treatment. JNCI 93(17), 1344–1346 (2001)
Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res 12, 2825–2830 (2011)
Qu, Y., Zhang, J.: Trade area analysis using user generated mobile location data. In: Proceedings of International Conference on World Wide Web, pp. 1053–1064. ACM (2013)
Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_15
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Statista: Total number of Walmart stores worldwide from 2008 to 2018 (2018). https://www.statista.com/statistics/256172/total-number-of-walmart-stores-worldwide/
Tzeng, G.H., Chen, Y.W.: The optimal location of airport fire stations: a fuzzy multi-objective programming and revised genetic algorithm approach. Transp. Plan. Technol. 23(1), 37–55 (1999)
Tzeng, G.H., Teng, M.H., Chen, J.J., Opricovic, S.: Multicriteria selection for a restaurant location in Taipei. Int. J. Hosp. Manage. 21(2), 171–187 (2002)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)
Wang, F., Chen, L., Pan, W.: Where to place your next restaurant?: Optimal restaurant placement via leveraging user-generated reviews. In: Proceedings of ACM CIKM, pp. 2371–2376 (2016)
Wang, Y., Jiang, W., Liu, S., Ye, X., Wang, T.: Evaluating trade areas using social media data with a calibrated huff model. ISPRS Int. J. Geo-Inf. 5(7), 112 (2016)
Xiao, X., Yao, B., Li, F.: Optimal location queries in road network databases. In: IEEE ICDE, pp. 804–815 (2011)
Xu, M., Wang, T., Wu, Z., Zhou, J., Li, J., Wu, H.: Demand driven store site selection via multiple spatial-temporal data. In: Proceedings of ACM SIGSPATIAL, p. 40 (2016)
Yang, J., Lee, H.: An AHP decision model for facility location selection. Facilities 15(9/10), 241–254 (1997)
Yong, D.: Plant location selection based on fuzzy topsis. Int. J. Adv. Manuf. Technol. 28(7), 839–844 (2006)
Yu, Z., Tian, M., Wang, Z., Guo, B., Mei, T.: Shop-type recommendation leveraging the data from social media and location-based services. ACM TKDD 11(1), 1 (2016)
Yu, Z., Zhang, D., Yang, D.: Where is the largest market: ranking areas by popularity from location based social networks. In: IEEE UIC/ATC, pp. 157–162 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gwinn, D., Helmick, J., Kholgade Banerjee, N., Banerjee, S. (2019). Comparison of Traditional and Constrained Recursive Clustering Approaches for Generating Optimal Census Block Group Clusters. In: Ragia, L., Grueau, C., Laurini, R. (eds) Geographical Information Systems Theory, Applications and Management. GISTAM 2018. Communications in Computer and Information Science, vol 1061. Springer, Cham. https://doi.org/10.1007/978-3-030-29948-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-29948-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29947-7
Online ISBN: 978-3-030-29948-4
eBook Packages: Computer ScienceComputer Science (R0)