Comparison of Traditional and Constrained Recursive Clustering Approaches for Generating Optimal Census Block Group Clusters

Gwinn, Damon; Helmick, Jordan; Kholgade Banerjee, Natasha; Banerjee, Sean

doi:10.1007/978-3-030-29948-4_2

Damon Gwinn¹⁰,
Jordan Helmick¹¹,
Natasha Kholgade Banerjee¹⁰ &
…
Sean Banerjee¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1061))

Included in the following conference series:

International Conference on Geographical Information Systems Theory, Applications and Management

234 Accesses

Abstract

Census block groups are used in location selection to determine the average drive time for all residents within a given radius to a proposed new store. The United States census uses 220,334 block groups, however the spatial distance between neighboring block groups in densely populated areas is small enough to cluster multiple block groups into a single unit. In this paper, we evaluate the efficiency and accuracy of drive time computations performed on clusters generated by our novel approach of constrained recursive reclustering as run on three traditional clustering algorithms—affinity propagation, k-means, and mean shift. We perform comparisons of our constrained recursive reclustering approach against drive times computed using the original census block group, and using clusters obtained by traditional reclustering. Unlike traditional clustering, where clustering is performed in a single pass, our approach continues reclustering each new cluster until a user specified stopping criteria is reached. We show that traditional clustering techniques generate sub-optimal clusters, with large spatial distances between the cluster centroid and cluster points making them unusable for computing drive times. Our approach provides reductions of 81.2%, 83.4%, and 10.2% for affinity propagation, k-means, and mean shift respectively when run on 220,334 census block groups. Using 200 randomly sampled locations each from Lowe’s, CVS, and Walmart, we show that compared to the original block groups there is no statistically significant difference in drive time computations when using clusters generated by constrained recursive reclustering with affinity propagation for any of the three businesses, and with k-means for CVS and Walmart. While statistically significant differences are obtained with k-means for Lowe’s and with mean shift for all three businesses, the differences are negligible, with the mean difference for each location set being within 30 s.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aras, H., Erdoğmuş, Ş., Koç, E.: Multi-criteria selection for a wind observation station location using analytic hierarchy process. Renewable Energy 29(8), 1383–1392 (2004)
Article Google Scholar
Athas, W.F., Adams-Cameron, M., Hunt, W.C., Amir-Fazli, A., Key, C.R.: Travel distance to radiation therapy and receipt of radiotherapy following breast-conserving surgery. JNCI 92(3), 269–271 (2000)
Article Google Scholar
Banaei-Kashani, F., Ghaemi, P., Wilson, J.P.: Maximal reverse skyline query. In: Proceedings of ACM SIGSPATIAL, pp. 421–424 (2014)
Google Scholar
Blanchard, T., Lyson, T.: Access to low cost groceries in nonmetropolitan counties: large retailers and the creation of food deserts. In: Measuring Rural Diversity Conference Proceedings, pp. 21–22, November 2002
Google Scholar
Bradley, P., Bennett, K., Demiriz, A.: Constrained k-means clustering. Microsoft Research, Redmond, pp. 1–8 (2000)
Google Scholar
Branas, C.C., et al.: Access to trauma centers in the United States. JAMA 293(21), 2626–2633 (2005)
Article Google Scholar
Carr, B.G., Branas, C.C., Metlay, J.P., Sullivan, A.F., Camargo, C.A.: Access to emergency care in the United States. Ann. Emerg. Med. 54(2), 261–269 (2009)
Article Google Scholar
Çebi, F., Otay, I.: Multi-criteria and multi-stage facility location selection under interval type-2 fuzzy environment: a case study for a cement factory. IJCIS 8(2), 330–344 (2015)
Article Google Scholar
US Census: 2010 us census block group data (2010). http://www2.census.gov/geo/docs/reference/cenpop2010/blkgrp/CenPop2010_Mean_BG.txt
Chen, L., et al.: Bike sharing station placement leveraging heterogeneous urban open data. In: Proceedings of ACM Ubicomp, pp. 571–575 (2015)
Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
Farber, S., Morang, M.Z., Widener, M.J.: Temporal variability in transit-based accessibility to supermarkets. Appl. Geogr. 53, 149–159 (2014)
Article Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet Google Scholar
Ghaemi, P., Shahabi, K., Wilson, J.P., Banaei-Kashani, F.: Optimal network location queries. In: Proceedings of ACM SIGSPATIAL, pp. 478–481 (2010)
Google Scholar
Ghaemi, P., Shahabi, K., Wilson, J.P., Banaei-Kashani, F.: Continuous maximal reverse nearest neighbor query on spatial networks. In: Proceedings of ACM SIGSPATIAL, pp. 61–70 (2012)
Google Scholar
Goodman, D.C., Fisher, E., Stukel, T.A., Chang, C.h.: The distance to community medical care and the likelihood of hospitalization: is closer always better? Am. J. Public Health 87(7), 1144–1150 (1997)
Article Google Scholar
Google: Google Maps Distance Matrix API (2017). https://developers.google.com/maps/documentation/distance-matrix/
Gwinn, D., Helmick, J., Banerjee, N.K., Banerjee, S.: Optimal estimation of census block group clusters to improve the computational efficiency of drive time calculations. In: GISTAM, pp. 96–106 (2018)
Google Scholar
Jiao, J., Moudon, A.V., Ulmer, J., Hurvitz, P.M., Drewnowski, A.: How to identify food deserts: measuring physical and economic access to supermarkets in King County, Washington. Am. J. Public Health 102(10), e32–e39 (2012)
Article Google Scholar
Kahraman, C., Ruan, D., Doǧan, I.: Fuzzy group decision-making for facility location selection. Inf. Sci. 157, 135–153 (2003)
Article Google Scholar
Karamshuk, D., Noulas, A., Scellato, S., Nicosia, V., Mascolo, C.: Geo-spotting: mining online location-based services for optimal retail store placement. In: Proceedings of ACM SIGKDD, pp. 793–801 (2013)
Google Scholar
Kuo, R., Chi, S., Kao, S.: A decision support system for locating convenience store through fuzzy AHP. Comput. Ind. Eng. 37(1), 323–326 (1999)
Article Google Scholar
Li, Y., Zheng, Y., Ji, S., Wang, W., Gong, Z., et al.: Location selection for ambulance stations: a data-driven approach. In: Proceedings of ACM SIGSPATIAL, p. 85 (2015)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Love, R.F., Morris, J.G.: Mathematical models of road travel distances. Manage. Sci. 25(2), 130–139 (1979)
Article Google Scholar
Nallamothu, B.K., Bates, E.R., Wang, Y., Bradley, E.H., Krumholz, H.M.: Driving times and distances to hospitals with percutaneous coronary intervention in the United States. Circulation 113(9), 1189–1195 (2006)
Article Google Scholar
Nattinger, A.B., Kneusel, R.T., Hoffmann, R.G., Gilligan, M.A.: Relationship of distance from a radiotherapy facility and initial breast cancer treatment. JNCI 93(17), 1344–1346 (2001)
Article Google Scholar
Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Qu, Y., Zhang, J.: Trade area analysis using user generated mobile location data. In: Proceedings of International Conference on World Wide Web, pp. 1053–1064. ACM (2013)
Google Scholar
Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_15
Chapter MATH Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Statista: Total number of Walmart stores worldwide from 2008 to 2018 (2018). https://www.statista.com/statistics/256172/total-number-of-walmart-stores-worldwide/
Tzeng, G.H., Chen, Y.W.: The optimal location of airport fire stations: a fuzzy multi-objective programming and revised genetic algorithm approach. Transp. Plan. Technol. 23(1), 37–55 (1999)
Article Google Scholar
Tzeng, G.H., Teng, M.H., Chen, J.J., Opricovic, S.: Multicriteria selection for a restaurant location in Taipei. Int. J. Hosp. Manage. 21(2), 171–187 (2002)
Article Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)
Google Scholar
Wang, F., Chen, L., Pan, W.: Where to place your next restaurant?: Optimal restaurant placement via leveraging user-generated reviews. In: Proceedings of ACM CIKM, pp. 2371–2376 (2016)
Google Scholar
Wang, Y., Jiang, W., Liu, S., Ye, X., Wang, T.: Evaluating trade areas using social media data with a calibrated huff model. ISPRS Int. J. Geo-Inf. 5(7), 112 (2016)
Article Google Scholar
Xiao, X., Yao, B., Li, F.: Optimal location queries in road network databases. In: IEEE ICDE, pp. 804–815 (2011)
Google Scholar
Xu, M., Wang, T., Wu, Z., Zhou, J., Li, J., Wu, H.: Demand driven store site selection via multiple spatial-temporal data. In: Proceedings of ACM SIGSPATIAL, p. 40 (2016)
Google Scholar
Yang, J., Lee, H.: An AHP decision model for facility location selection. Facilities 15(9/10), 241–254 (1997)
Article Google Scholar
Yong, D.: Plant location selection based on fuzzy topsis. Int. J. Adv. Manuf. Technol. 28(7), 839–844 (2006)
Article Google Scholar
Yu, Z., Tian, M., Wang, Z., Guo, B., Mei, T.: Shop-type recommendation leveraging the data from social media and location-based services. ACM TKDD 11(1), 1 (2016)
Article Google Scholar
Yu, Z., Zhang, D., Yang, D.: Where is the largest market: ranking areas by popularity from location based social networks. In: IEEE UIC/ATC, pp. 157–162 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Clarkson University, Potsdam, NY, USA
Damon Gwinn, Natasha Kholgade Banerjee & Sean Banerjee
MedExpress, Morgantown, WV, USA
Jordan Helmick

Authors

Damon Gwinn
View author publications
You can also search for this author in PubMed Google Scholar
Jordan Helmick
View author publications
You can also search for this author in PubMed Google Scholar
Natasha Kholgade Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Sean Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sean Banerjee .

Editor information

Editors and Affiliations

Technical University of Crete, Chania, Crete, Greece
Lemonia Ragia
Polytechnic Institute of Setúbal, Setúbal, Portugal
Cédric Grueau
Knowledge Systems Institute, Skokie, IL, USA
Robert Laurini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gwinn, D., Helmick, J., Kholgade Banerjee, N., Banerjee, S. (2019). Comparison of Traditional and Constrained Recursive Clustering Approaches for Generating Optimal Census Block Group Clusters. In: Ragia, L., Grueau, C., Laurini, R. (eds) Geographical Information Systems Theory, Applications and Management. GISTAM 2018. Communications in Computer and Information Science, vol 1061. Springer, Cham. https://doi.org/10.1007/978-3-030-29948-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-29948-4_2
Published: 22 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29947-7
Online ISBN: 978-3-030-29948-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics