Skip to main content

Comparison of Traditional and Constrained Recursive Clustering Approaches for Generating Optimal Census Block Group Clusters

  • Conference paper
  • First Online:
Geographical Information Systems Theory, Applications and Management (GISTAM 2018)

Abstract

Census block groups are used in location selection to determine the average drive time for all residents within a given radius to a proposed new store. The United States census uses 220,334 block groups, however the spatial distance between neighboring block groups in densely populated areas is small enough to cluster multiple block groups into a single unit. In this paper, we evaluate the efficiency and accuracy of drive time computations performed on clusters generated by our novel approach of constrained recursive reclustering as run on three traditional clustering algorithms—affinity propagation, k-means, and mean shift. We perform comparisons of our constrained recursive reclustering approach against drive times computed using the original census block group, and using clusters obtained by traditional reclustering. Unlike traditional clustering, where clustering is performed in a single pass, our approach continues reclustering each new cluster until a user specified stopping criteria is reached. We show that traditional clustering techniques generate sub-optimal clusters, with large spatial distances between the cluster centroid and cluster points making them unusable for computing drive times. Our approach provides reductions of 81.2%, 83.4%, and 10.2% for affinity propagation, k-means, and mean shift respectively when run on 220,334 census block groups. Using 200 randomly sampled locations each from Lowe’s, CVS, and Walmart, we show that compared to the original block groups there is no statistically significant difference in drive time computations when using clusters generated by constrained recursive reclustering with affinity propagation for any of the three businesses, and with k-means for CVS and Walmart. While statistically significant differences are obtained with k-means for Lowe’s and with mean shift for all three businesses, the differences are negligible, with the mean difference for each location set being within 30 s.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aras, H., Erdoğmuş, Ş., Koç, E.: Multi-criteria selection for a wind observation station location using analytic hierarchy process. Renewable Energy 29(8), 1383–1392 (2004)

    Article  Google Scholar 

  2. Athas, W.F., Adams-Cameron, M., Hunt, W.C., Amir-Fazli, A., Key, C.R.: Travel distance to radiation therapy and receipt of radiotherapy following breast-conserving surgery. JNCI 92(3), 269–271 (2000)

    Article  Google Scholar 

  3. Banaei-Kashani, F., Ghaemi, P., Wilson, J.P.: Maximal reverse skyline query. In: Proceedings of ACM SIGSPATIAL, pp. 421–424 (2014)

    Google Scholar 

  4. Blanchard, T., Lyson, T.: Access to low cost groceries in nonmetropolitan counties: large retailers and the creation of food deserts. In: Measuring Rural Diversity Conference Proceedings, pp. 21–22, November 2002

    Google Scholar 

  5. Bradley, P., Bennett, K., Demiriz, A.: Constrained k-means clustering. Microsoft Research, Redmond, pp. 1–8 (2000)

    Google Scholar 

  6. Branas, C.C., et al.: Access to trauma centers in the United States. JAMA 293(21), 2626–2633 (2005)

    Article  Google Scholar 

  7. Carr, B.G., Branas, C.C., Metlay, J.P., Sullivan, A.F., Camargo, C.A.: Access to emergency care in the United States. Ann. Emerg. Med. 54(2), 261–269 (2009)

    Article  Google Scholar 

  8. Çebi, F., Otay, I.: Multi-criteria and multi-stage facility location selection under interval type-2 fuzzy environment: a case study for a cement factory. IJCIS 8(2), 330–344 (2015)

    Article  Google Scholar 

  9. US Census: 2010 us census block group data (2010). http://www2.census.gov/geo/docs/reference/cenpop2010/blkgrp/CenPop2010_Mean_BG.txt

  10. Chen, L., et al.: Bike sharing station placement leveraging heterogeneous urban open data. In: Proceedings of ACM Ubicomp, pp. 571–575 (2015)

    Google Scholar 

  11. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)

    Article  Google Scholar 

  12. Farber, S., Morang, M.Z., Widener, M.J.: Temporal variability in transit-based accessibility to supermarkets. Appl. Geogr. 53, 149–159 (2014)

    Article  Google Scholar 

  13. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  Google Scholar 

  14. Ghaemi, P., Shahabi, K., Wilson, J.P., Banaei-Kashani, F.: Optimal network location queries. In: Proceedings of ACM SIGSPATIAL, pp. 478–481 (2010)

    Google Scholar 

  15. Ghaemi, P., Shahabi, K., Wilson, J.P., Banaei-Kashani, F.: Continuous maximal reverse nearest neighbor query on spatial networks. In: Proceedings of ACM SIGSPATIAL, pp. 61–70 (2012)

    Google Scholar 

  16. Goodman, D.C., Fisher, E., Stukel, T.A., Chang, C.h.: The distance to community medical care and the likelihood of hospitalization: is closer always better? Am. J. Public Health 87(7), 1144–1150 (1997)

    Article  Google Scholar 

  17. Google: Google Maps Distance Matrix API (2017). https://developers.google.com/maps/documentation/distance-matrix/

  18. Gwinn, D., Helmick, J., Banerjee, N.K., Banerjee, S.: Optimal estimation of census block group clusters to improve the computational efficiency of drive time calculations. In: GISTAM, pp. 96–106 (2018)

    Google Scholar 

  19. Jiao, J., Moudon, A.V., Ulmer, J., Hurvitz, P.M., Drewnowski, A.: How to identify food deserts: measuring physical and economic access to supermarkets in King County, Washington. Am. J. Public Health 102(10), e32–e39 (2012)

    Article  Google Scholar 

  20. Kahraman, C., Ruan, D., Doǧan, I.: Fuzzy group decision-making for facility location selection. Inf. Sci. 157, 135–153 (2003)

    Article  Google Scholar 

  21. Karamshuk, D., Noulas, A., Scellato, S., Nicosia, V., Mascolo, C.: Geo-spotting: mining online location-based services for optimal retail store placement. In: Proceedings of ACM SIGKDD, pp. 793–801 (2013)

    Google Scholar 

  22. Kuo, R., Chi, S., Kao, S.: A decision support system for locating convenience store through fuzzy AHP. Comput. Ind. Eng. 37(1), 323–326 (1999)

    Article  Google Scholar 

  23. Li, Y., Zheng, Y., Ji, S., Wang, W., Gong, Z., et al.: Location selection for ambulance stations: a data-driven approach. In: Proceedings of ACM SIGSPATIAL, p. 85 (2015)

    Google Scholar 

  24. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  25. Love, R.F., Morris, J.G.: Mathematical models of road travel distances. Manage. Sci. 25(2), 130–139 (1979)

    Article  Google Scholar 

  26. Nallamothu, B.K., Bates, E.R., Wang, Y., Bradley, E.H., Krumholz, H.M.: Driving times and distances to hospitals with percutaneous coronary intervention in the United States. Circulation 113(9), 1189–1195 (2006)

    Article  Google Scholar 

  27. Nattinger, A.B., Kneusel, R.T., Hoffmann, R.G., Gilligan, M.A.: Relationship of distance from a radiotherapy facility and initial breast cancer treatment. JNCI 93(17), 1344–1346 (2001)

    Article  Google Scholar 

  28. Park, H.S., Jun, C.H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)

    Article  Google Scholar 

  29. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Qu, Y., Zhang, J.: Trade area analysis using user generated mobile location data. In: Proceedings of International Conference on World Wide Web, pp. 1053–1064. ACM (2013)

    Google Scholar 

  31. Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_15

    Chapter  MATH  Google Scholar 

  32. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  33. Statista: Total number of Walmart stores worldwide from 2008 to 2018 (2018). https://www.statista.com/statistics/256172/total-number-of-walmart-stores-worldwide/

  34. Tzeng, G.H., Chen, Y.W.: The optimal location of airport fire stations: a fuzzy multi-objective programming and revised genetic algorithm approach. Transp. Plan. Technol. 23(1), 37–55 (1999)

    Article  Google Scholar 

  35. Tzeng, G.H., Teng, M.H., Chen, J.J., Opricovic, S.: Multicriteria selection for a restaurant location in Taipei. Int. J. Hosp. Manage. 21(2), 171–187 (2002)

    Article  Google Scholar 

  36. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)

    Google Scholar 

  37. Wang, F., Chen, L., Pan, W.: Where to place your next restaurant?: Optimal restaurant placement via leveraging user-generated reviews. In: Proceedings of ACM CIKM, pp. 2371–2376 (2016)

    Google Scholar 

  38. Wang, Y., Jiang, W., Liu, S., Ye, X., Wang, T.: Evaluating trade areas using social media data with a calibrated huff model. ISPRS Int. J. Geo-Inf. 5(7), 112 (2016)

    Article  Google Scholar 

  39. Xiao, X., Yao, B., Li, F.: Optimal location queries in road network databases. In: IEEE ICDE, pp. 804–815 (2011)

    Google Scholar 

  40. Xu, M., Wang, T., Wu, Z., Zhou, J., Li, J., Wu, H.: Demand driven store site selection via multiple spatial-temporal data. In: Proceedings of ACM SIGSPATIAL, p. 40 (2016)

    Google Scholar 

  41. Yang, J., Lee, H.: An AHP decision model for facility location selection. Facilities 15(9/10), 241–254 (1997)

    Article  Google Scholar 

  42. Yong, D.: Plant location selection based on fuzzy topsis. Int. J. Adv. Manuf. Technol. 28(7), 839–844 (2006)

    Article  Google Scholar 

  43. Yu, Z., Tian, M., Wang, Z., Guo, B., Mei, T.: Shop-type recommendation leveraging the data from social media and location-based services. ACM TKDD 11(1), 1 (2016)

    Article  Google Scholar 

  44. Yu, Z., Zhang, D., Yang, D.: Where is the largest market: ranking areas by popularity from location based social networks. In: IEEE UIC/ATC, pp. 157–162 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Banerjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gwinn, D., Helmick, J., Kholgade Banerjee, N., Banerjee, S. (2019). Comparison of Traditional and Constrained Recursive Clustering Approaches for Generating Optimal Census Block Group Clusters. In: Ragia, L., Grueau, C., Laurini, R. (eds) Geographical Information Systems Theory, Applications and Management. GISTAM 2018. Communications in Computer and Information Science, vol 1061. Springer, Cham. https://doi.org/10.1007/978-3-030-29948-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29948-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29947-7

  • Online ISBN: 978-3-030-29948-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics